# Pandas 1 - Questions
© Advanced Analytics, Amir Ben Haim, 2024

## DataFrame basics

### A few of the fundamental routines

<br>
<br>
<hr class="dotted">
<br>
<br>

#### Exercise 1

Import pandas under the name `pd`.

In [1]:
import pandas as pd

#### Exercise 2

Consider the following Python dictionary `data` and Python list `labels`:

``` python
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, None, 5, 2, 4.5, None, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```

<br>
Animals and trips to a vet

<br></br>
Create a DataFrame `df` from this dictionary `data` which has the index `labels`.
<br>

In [2]:
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, None, 5, 2, 4.5, None, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data,index = labels )

#### Exercise 3

Display a summary of the basic information about this DataFrame and its data.

In [3]:
df.describe()

Unnamed: 0,age,visits
count,8.0,10.0
mean,3.4375,1.9
std,2.007797,0.875595
min,0.5,1.0
25%,2.375,1.0
50%,3.0,2.0
75%,4.625,2.75
max,7.0,3.0


#### Exercise 4

Return the first 3 rows of the DataFrame `df`.

In [11]:
df.head(3)
df[0:3]
df.iloc[0:3]
df.loc["a":"c"]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no


#### Exercise 5

Select just the 'animal' and 'age' columns from the DataFrame `df`.

In [14]:
pd.DataFrame({"animal":df.animal,"age":df.age})

df[["animal","age"]]

df.loc[:,["animal","age"]]

Unnamed: 0,animal,age
a,cat,2.5
b,cat,3.0
c,snake,0.5
d,dog,
e,dog,5.0
f,cat,2.0
g,snake,4.5
h,cat,
i,dog,7.0
j,dog,3.0


#### Exercise 6

Select the data in rows `[3, 4, 8]` *and* in columns `['animal', 'age']`.

In [38]:
df.iloc[[3,4,8]][['animal', 'age']]

Unnamed: 0,animal,age
d,dog,
e,dog,5.0
i,dog,7.0


#### Exercise 7

Select only the rows where the number of visits is greater than 2.

In [42]:
df[df.visits>2]

Unnamed: 0,animal,age,visits,priority
b,cat,3.0,3,yes
d,dog,,3,yes
f,cat,2.0,3,no


#### Exercise 8

Select the rows where the age is missing, i.e. is `NaN`.

In [48]:
df[df.age.isna()]

Unnamed: 0,animal,age,visits,priority
d,dog,,3,yes
h,cat,,1,yes


#### Exercise 9

Select the rows where the animal is a cat *and* the age is less than 3.

In [72]:
df[(df.age < 3) & (df.animal == "cat")]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
f,cat,2.0,3,no


#### Exercise 10

Select the rows the age is between 2 and 4 (inclusive).

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
f,cat,2.0,3,no
j,dog,3.0,1,no


#### Exercise 11

Change the age in row 'f' to 1.5.

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


#### Exercise 12

Calculate the sum of all visits (the total number of visits).

19

#### Exercise 13

Count the number of each type of animal in `df`.

cat      4
dog      4
snake    2
Name: animal, dtype: int64

#### Exercise 14

Sort `df` first by the values in the 'age' in *decending* order, then by the value in the 'visit' column in *ascending* order.

Unnamed: 0,animal,age,visits,priority
i,dog,7.0,2,no
e,dog,5.0,2,no
g,snake,4.5,1,no
j,dog,3.0,1,no
b,cat,3.0,3,yes
a,cat,2.5,1,yes
f,cat,1.5,3,no
c,snake,0.5,2,no
h,cat,,1,yes
d,dog,,3,yes


#### Exercise 15

Return all rows with <b><u>NOT NULL</u></b> values in the 'age' column

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
i,dog,7.0,2,no
j,dog,3.0,1,no
