# 101 Pandas Exercises for Data Analysis

Exercises taken from [Machine Learning +](https://www.machinelearningplus.com/python/101-pandas-exercises-python/)

1. How to import pandas and check the version?

In [1]:
import pandas as pd
print(pd.__version__)

1.3.3


2. How to create a series from a list, numpy array and dict?<br>

Create a pandas series from each of the items below: a list, numpy and a  dictionary <br>

`import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))`

In [2]:
import numpy as np

mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

In [3]:
ser1 = pd.Series(mylist)
ser1.head(5)

0    a
1    b
2    c
3    e
4    d
dtype: object

In [4]:
ser2 = pd.Series(myarr)
ser2.head()

0    0
1    1
2    2
3    3
4    4
dtype: int32

In [5]:
ser3 = pd.Series(mydict)
ser3.head()

a    0
b    1
c    2
e    3
d    4
dtype: int32

3. How to convert the index of a series into a column of a dataframe?

Convert the series `ser3` into a dataframe with its index as another column on the dataframe.

In [6]:
pd.DataFrame(ser3).reset_index().head()

Unnamed: 0,index,0
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


4. How to combine many series to form a dataframe?

Combine ser1 and ser2 to form a dataframe.

In [7]:
pd.DataFrame({'letters': ser1, 'numbers': ser2}).head()

Unnamed: 0,letters,numbers
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


5. How to assign name to the series’ index?

Give a name to the series `ser1` calling it ‘alphabets’.

In [8]:
ser1.name = 'alphabets'
ser1.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabets, dtype: object

6. How to get the items of series A not present in series B?

From ser1 remove items present in ser2.

In [9]:
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

ser1[~ser1.isin(ser2)]

0    1
1    2
2    3
dtype: int64

8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?

In [10]:
np.random.seed(0)
ser = pd.Series(np.random.normal(10, 5, 25))

print("Min:", ser.min())
print("25th percentile:", ser.quantile(.25))
print("Median:", ser.median())
print("75th percentile:", ser.quantile(.75))
print("Max:", ser.max()) 

Min: -2.7649490791703926
25th percentile: 9.48390574103221
Median: 12.052992509691862
75th percentile: 14.893689920528697
Max: 21.348773119938038


9. How to get frequency counts of unique items of a series?

Calculte the frequency counts of each unique value ser

In [11]:
np.random.seed(0)
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

ser.value_counts()

h    5
a    5
f    4
d    4
b    4
e    3
g    3
c    2
dtype: int64

10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?

From ser, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.

In [12]:
np.random.seed(0)
ser = pd.Series(np.random.randint(1, 5, [12]))
ser

0     1
1     4
2     2
3     1
4     4
5     4
6     4
7     4
8     2
9     4
10    2
11    3
dtype: int32

In [13]:
keep = ser.value_counts().nlargest(2, keep='all').index
keep

Int64Index([4, 2], dtype='int64')

In [14]:
ser.where((ser == keep[0]) | (ser == keep[1]), other='Other')

0     Other
1         4
2         2
3     Other
4         4
5         4
6         4
7         4
8         2
9         4
10        2
11    Other
dtype: object

11. How to bin a numeric series to 10 groups of equal size?

Bin the series ser into 10 equal deciles and replace the values with the bin name.

In [15]:
np.random.seed(0)
ser = pd.Series(np.random.random(20))
ser.head()

0    0.548814
1    0.715189
2    0.602763
3    0.544883
4    0.423655
dtype: float64

In [16]:
pd.qcut(ser, q=10, labels=['1st', '2nd', '3rd', '4th', '5th',
                           '6th', '7th', '8th', '9th', '10th']).head()

0    5th
1    7th
2    6th
3    4th
4    3rd
dtype: category
Categories (10, object): ['1st' < '2nd' < '3rd' < '4th' ... '7th' < '8th' < '9th' < '10th']

12. How to convert a numpy array to a dataframe of given shape?

Reshape the series ser into a dataframe with 7 rows and 5 columns

In [17]:
np.random.seed(0)
ser = pd.Series(np.random.randint(1, 10, 35))
ser.head()

0    6
1    1
2    4
3    4
4    8
dtype: int32

In [18]:
pd.DataFrame(ser.to_numpy().reshape((7,5)))

Unnamed: 0,0,1,2,3,4
0,6,1,4,4,8
1,4,6,3,5,8
2,7,9,9,2,7
3,8,8,9,2,6
4,9,5,4,1,4
5,6,1,3,4,9
6,2,4,4,4,8


13. How to find the positions of numbers that are multiples of 3 from a series?

Find the positions of numbers that are multiples of 3 from ser.

In [19]:
np.random.seed(0)
ser = pd.Series(np.random.randint(1, 10, 7))
ser

0    6
1    1
2    4
3    4
4    8
5    4
6    6
dtype: int32

In [20]:
ser[ser % 3 == 0].index

Int64Index([0, 6], dtype='int64')

14. How to extract items at given positions from a series

From ser, extract the items at positions in list pos.

In [21]:
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

ser.iloc[pos]

0     a
4     e
8     i
14    o
20    u
dtype: object

15. How to stack two series vertically and horizontally ?

Stack ser1 and ser2 vertically and horizontally (to form a dataframe).

In [22]:
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))

In [23]:
# Horizontal
pd.concat([ser1, ser2], axis = 1)

Unnamed: 0,0,1
0,0,a
1,1,b
2,2,c
3,3,d
4,4,e


In [24]:
# Vertical
pd.concat([ser1, ser2])

0    0
1    1
2    2
3    3
4    4
0    a
1    b
2    c
3    d
4    e
dtype: object

16. How to get the positions of items of series A in another series B?

Get the positions of items of ser2 in ser1 as a list.

In [25]:
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

In [26]:
# This way the indices don't match the correspondent values
# ser1[ser1.isin(ser2)].index.to_list()

In [27]:
[pd.Index(ser1).get_loc(i) for i in ser2]

[5, 4, 0, 8]

17. How to compute the mean squared error on a truth and predicted series?

Compute the mean squared error of truth and pred series.

In [28]:
np.random.seed(0)
truth = pd.Series(range(10))
pred = pd.Series(range(10)) + np.random.random(10)

print(truth)
print('\n')
print(pred)

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64


0    0.548814
1    1.715189
2    2.602763
3    3.544883
4    4.423655
5    5.645894
6    6.437587
7    7.891773
8    8.963663
9    9.383442
dtype: float64


In [29]:
np.mean((truth - pred)**2)

0.41319910235287544