# 101 Pandas Exercises for Data Analysis

## Index
#### 1. How to import pandas and check the version?
#### 2. How to create a series from a list, numpy array and dict?
####  3. How to convert the index of a series into a column of a dataframe?
#### 4. How to combine many series to form a dataframe?
#### 5. How to assign name to the series’ index?
#### 6. How to get the items of series A not present in series B?
#### 7. How to get the items not common to both series A and series B?
#### 8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?
####  9. How to get frequency counts of unique items of a series?
#### 10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?


## 1. How to import pandas and check the version?

In [87]:
import pandas as pd
pd.__version__

'0.23.4'

In [88]:
print(pd.show_versions(as_json=True))

{'system': {'commit': None, 'python': '3.7.0.final.0', 'python-bits': 64, 'OS': 'Windows', 'OS-release': '7', 'machine': 'AMD64', 'processor': 'Intel64 Family 6 Model 37 Stepping 2, GenuineIntel', 'byteorder': 'little', 'LC_ALL': 'None', 'LANG': 'None', 'LOCALE': 'None.None'}, 'dependencies': {'pandas': '0.23.4', 'pytest': '3.8.0', 'pip': '19.0.3', 'setuptools': '40.2.0', 'Cython': '0.28.5', 'numpy': '1.15.1', 'scipy': '1.1.0', 'pyarrow': None, 'xarray': '0.11.3', 'IPython': '6.5.0', 'sphinx': '1.7.9', 'patsy': '0.5.0', 'dateutil': '2.7.3', 'pytz': '2018.5', 'blosc': None, 'bottleneck': '1.2.1', 'tables': '3.4.4', 'numexpr': '2.6.8', 'feather': None, 'matplotlib': '2.2.3', 'openpyxl': '2.5.6', 'xlrd': '1.1.0', 'xlwt': '1.3.0', 'xlsxwriter': '1.1.0', 'lxml': '4.2.5', 'bs4': '4.6.3', 'html5lib': '1.0.1', 'sqlalchemy': '1.2.11', 'pymysql': None, 'psycopg2': None, 'jinja2': '2.10', 's3fs': None, 'fastparquet': None, 'pandas_gbq': None, 'pandas_datareader': '0.7.0'}}
None


## 2. How to create a series from a list, numpy array and dict?

In [89]:
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(1,27,1)
mydict = dict(zip(mylist, myarr))
print(mylist)
print(myarr)
print(mydict)

['a', 'b', 'c', 'e', 'd', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26]
{'a': 1, 'b': 2, 'c': 3, 'e': 4, 'd': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}


In [90]:
ser1 = pd.Series(mylist)
ser2 = pd.Series(myarr)
ser3 = pd.Series(mydict)
print(ser3.head())

a    1
b    2
c    3
e    4
d    5
dtype: int64


## 3. How to convert the index of a series into a column of a dataframe?

In [91]:
ser = pd.Series(mydict)
ser.head()

a    1
b    2
c    3
e    4
d    5
dtype: int64

In [92]:
df1 = pd.DataFrame(ser).reset_index()
df1.head()

Unnamed: 0,index,0
0,a,1
1,b,2
2,c,3
3,e,4
4,d,5


## 4. How to combine many series to form a dataframe?

In [93]:
# Solution 1
df2 = pd.concat([ser1, ser2], axis = 1)
df2.head()

Unnamed: 0,0,1
0,a,1
1,b,2
2,c,3
3,e,4
4,d,5


In [94]:
# Solution 2
df3 = pd.DataFrame({'col1': ser1, 'col2': ser2})
print(df3.head())

  col1  col2
0    a     1
1    b     2
2    c     3
3    e     4
4    d     5


## 5. How to assign name to the series’ index?

In [95]:
ser4 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser4.head()

0    a
1    b
2    c
3    e
4    d
dtype: object

In [96]:
# Solution
ser4.name = 'alphabets'
ser4.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabets, dtype: object

## 6. How to get the items of series A not present in series B?

In [97]:
#From ser1 remove items present in ser2.

serA = pd.Series([1, 2, 3, 4, 5])
serB = pd.Series([4, 5, 6, 7, 8])

In [98]:
# Solution
serA[~serA.isin(serB)]

0    1
1    2
2    3
dtype: int64

## 7. How to get the items not common to both series A and series B?

In [99]:
#Get all items of ser1 and ser2 not common to both

serA = pd.Series([1, 2, 3, 4, 5])
serB = pd.Series([4, 5, 6, 7, 8])

In [100]:
# Solution

# Solution
ser_u = pd.Series(np.union1d(serA, serB))      # union
ser_u



0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
dtype: int64

In [101]:
ser_i = pd.Series(np.intersect1d(serA, serB))  # intersect
ser_i


0    4
1    5
dtype: int64

In [102]:
ser_u[~ser_u.isin(ser_i)]

0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64

## 8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?

In [103]:
ser1 = pd.Series(np.random.normal(10, 5, 25))
ser1

0     15.345382
1     10.892570
2     15.067081
3     14.062397
4      2.570016
5     17.859567
6     13.729090
7     22.363467
8      9.228003
9     12.353193
10    19.248645
11    15.668455
12     7.221409
13     9.786501
14     4.961234
15    12.539056
16    12.768738
17    18.415628
18    14.260554
19    -1.952173
20    17.426132
21     8.089652
22    11.386978
23     4.503454
24     3.844052
dtype: float64

In [104]:
# Solution
np.percentile(ser1, q=[0, 25, 50, 75, 100])

array([-1.95217346,  8.08965155, 12.53905644, 15.34538218, 22.36346718])

In [105]:
ser2 = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
ser2

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64

In [106]:
# Solution
np.percentile(ser2, q=[0, 25, 50, 75, 100])

array([ 1.  ,  3.25,  5.5 ,  7.75, 10.  ])

## 9. How to get frequency counts of unique items of a series?

In [107]:
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=20)))
ser

0     e
1     e
2     d
3     e
4     e
5     d
6     f
7     g
8     d
9     e
10    e
11    a
12    h
13    a
14    g
15    a
16    b
17    h
18    f
19    c
dtype: object

In [108]:
# Solution
ser.value_counts()

e    6
a    3
d    3
h    2
f    2
g    2
b    1
c    1
dtype: int64

In [109]:
ser.unique()

array(['e', 'd', 'f', 'g', 'a', 'h', 'b', 'c'], dtype=object)

## 10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?

In [110]:
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))
ser

0     1
1     1
2     3
3     2
4     1
5     3
6     2
7     1
8     4
9     1
10    3
11    2
dtype: int32

In [None]:
# Solution
print("Top 2 Freq:", ser.value_counts())


In [80]:
ser.value_counts().index[:2]

Index([3, 1], dtype='object')

In [82]:
~ser.isin(ser.value_counts().index[:2])

0     False
1     False
2     False
3     False
4     False
5      True
6      True
7     False
8     False
9      True
10    False
11    False
dtype: bool

In [79]:
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
ser

0         1
1         1
2         3
3         1
4         3
5     Other
6     Other
7         3
8         3
9     Other
10        3
11        3
dtype: object