## Importing pandas

### Getting started and checking your pandas setup

**1.** Import pandas under the alias `pd`.

In [257]:
import numpy as np
import pandas as pd

**2.** Print the version of pandas that has been imported.

In [258]:
pd._version

<module 'pandas._version' from 'C:\\Users\\SAURABH\\anaconda3\\lib\\site-packages\\pandas\\_version.py'>

**3.** Try checking for the help of any of the function in pandas.

In [259]:
## shift+tab

## DataFrame basics

### A few of the fundamental routines for selecting, sorting, adding and aggregating data in DataFrames


Consider the following Python dictionary `data` and Python list `labels`:

``` python
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```
**4.** Create a DataFrame `df` from this dictionary `data` which has the index `labels`.

In [260]:
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [261]:
df = pd.DataFrame(data,index=labels)
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,2.0,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


**5.** Display a summary of the basic information about this DataFrame and its data (*hint: there is a single method that can be called on the DataFrame*).

In [262]:
df.describe(include='all')

Unnamed: 0,animal,age,visits,priority
count,10,8.0,10.0,10
unique,3,,,2
top,cat,,,no
freq,4,,,6
mean,,3.4375,1.9,
std,,2.007797,0.875595,
min,,0.5,1.0,
25%,,2.375,1.0,
50%,,3.0,2.0,
75%,,4.625,2.75,


**6.** Return the first 3 rows of the DataFrame `df`.

In [263]:
df.head(3)

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no


**7.** Select just the 'animal' and 'age' columns from the DataFrame `df`.

In [264]:
df[['animal','age']]

Unnamed: 0,animal,age
a,cat,2.5
b,cat,3.0
c,snake,0.5
d,dog,
e,dog,5.0
f,cat,2.0
g,snake,4.5
h,cat,
i,dog,7.0
j,dog,3.0


**8.** Select the data in rows `[3, 4, 8]` *and* in columns `['animal', 'age']`.

In [265]:
df.loc[['c','d','h'],['animal','age']]

Unnamed: 0,animal,age
c,snake,0.5
d,dog,
h,cat,


In [266]:
df.iloc[[3,4,8],:].loc[:,['animal','age']]

Unnamed: 0,animal,age
d,dog,
e,dog,5.0
i,dog,7.0


**9.** Select only the rows where the number of visits is greater than 3.

In [267]:
df[df['visits']>3]

Unnamed: 0,animal,age,visits,priority


**10.** Check for missing values in the data.

In [268]:
df.isnull().sum()

animal      0
age         2
visits      0
priority    0
dtype: int64

**11.** Select the rows where the animal is a cat *and* the age is less than 3.

In [269]:
df[(df['animal']=='cat') & (df['age']<3)]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
f,cat,2.0,3,no


**12.** Select the rows the age is between 2 and 4 (inclusive).

In [270]:
df[(df['age']>=2) & (df['age']<=4)]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
f,cat,2.0,3,no
j,dog,3.0,1,no


**13.** Change the age in row 'f' to 1.5.

In [271]:
df.loc[['f'],['age']]=1.5
df.head(10)

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


**14.** Calculate the sum of all visits in `df` (i.e. find the total number of visits).

In [272]:
df['visits'].sum()

19

**15.** Calculate the mean age for each different animal in `df`. Explore the groupby function.

In [273]:
df['age'].mean()

3.375

**16.** Append a new row 'k' to `df` with your choice of values for each column. Then delete that row to return the original DataFrame.

In [274]:
df.loc['k']=['hashhog',4,3,'yes']
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


In [275]:
df=df.drop('k',axis=0)
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


**17.** Count the number of each type of animal in `df`.

In [276]:
df['animal'].value_counts()

cat      4
dog      4
snake    2
Name: animal, dtype: int64

**18.** Sort `df` first by the values in the 'age' in *decending* order, then by the value in the 'visit' column in *ascending* order (so row `i` should be first, and row `d` should be last).

In [277]:
df=df.sort_values(by='visits',ascending=True).sort_values(by='age',ascending=False)
df

Unnamed: 0,animal,age,visits,priority
i,dog,7.0,2,no
e,dog,5.0,2,no
g,snake,4.5,1,no
j,dog,3.0,1,no
b,cat,3.0,3,yes
a,cat,2.5,1,yes
f,cat,1.5,3,no
c,snake,0.5,2,no
h,cat,,1,yes
d,dog,,3,yes


**19.** The 'priority' column contains the values 'yes' and 'no'. Replace this column with a column of boolean values: 'yes' should be `True` and 'no' should be `False`.

In [278]:
df['priority']=df['priority'].map({'yes' :True, 'no' :False})
df

Unnamed: 0,animal,age,visits,priority
i,dog,7.0,2,False
e,dog,5.0,2,False
g,snake,4.5,1,False
j,dog,3.0,1,False
b,cat,3.0,3,True
a,cat,2.5,1,True
f,cat,1.5,3,False
c,snake,0.5,2,False
h,cat,,1,True
d,dog,,3,True


**20.** In the 'animal' column, change the 'snake' entries to 'python'.

In [279]:
df=df.replace({'animal':{'snake':'python'}})
df

Unnamed: 0,animal,age,visits,priority
i,dog,7.0,2,False
e,dog,5.0,2,False
g,python,4.5,1,False
j,dog,3.0,1,False
b,cat,3.0,3,True
a,cat,2.5,1,True
f,cat,1.5,3,False
c,python,0.5,2,False
h,cat,,1,True
d,dog,,3,True
