# 101 Pandas Exercises for Data Analysis

## Index
#### 11. How to bin a numeric series to 10 groups of equal size?
#### 12. How to convert a numpy array to a dataframe of given shape? (L1)
#### 13. How to find the positions of numbers that are multiples of 3 from a series?
#### 14. How to extract items at given positions from a series
#### 15. How to stack two series vertically and horizontally ?
#### 16. How to get the positions of items of series A in another series B?
#### 17. How to compute the mean squared error on a truth and predicted series?
#### 18. How to convert the first character of each element in a series to uppercase?
####  19. How to calculate the number of characters in each word in a series?
#### 20. How to compute difference of differences between consequtive numbers of a series?


## 11. How to bin a numeric series to 10 groups of equal size?

In [2]:
import pandas as pd
import numpy as np
ser = pd.Series(np.random.random(20))
ser

0     0.408811
1     0.485231
2     0.385842
3     0.877897
4     0.188152
5     0.263332
6     0.702705
7     0.836753
8     0.905422
9     0.901401
10    0.823866
11    0.128312
12    0.193088
13    0.019329
14    0.471292
15    0.768870
16    0.777923
17    0.773290
18    0.646396
19    0.468905
dtype: float64

In [4]:
# Solution
pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1], 
        labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th'])

0      4th
1      5th
2      3rd
3      9th
4      2nd
5      3rd
6      6th
7      9th
8     10th
9     10th
10     8th
11     1st
12     2nd
13     1st
14     5th
15     7th
16     8th
17     7th
18     6th
19     4th
dtype: category
Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]

## 12. How to convert a numpy array to a dataframe of given shape? (L1)

In [5]:
# Reshape the series ser into a dataframe with 7 rows and 5 columns

ser = pd.Series(np.random.randint(1, 10, 35))
ser

0     1
1     5
2     2
3     6
4     3
5     6
6     5
7     9
8     8
9     6
10    3
11    6
12    2
13    9
14    2
15    8
16    4
17    1
18    2
19    4
20    8
21    8
22    9
23    5
24    4
25    2
26    4
27    6
28    6
29    9
30    8
31    9
32    6
33    3
34    4
dtype: int32

In [6]:
# Solution
df = pd.DataFrame(ser.values.reshape(7,5))
print(df)

   0  1  2  3  4
0  1  5  2  6  3
1  6  5  9  8  6
2  3  6  2  9  2
3  8  4  1  2  4
4  8  8  9  5  4
5  2  4  6  6  9
6  8  9  6  3  4


## 13. How to find the positions of numbers that are multiples of 3 from a series?

In [7]:
ser = pd.Series(np.random.randint(1, 10, 7))
ser

0    2
1    9
2    7
3    1
4    6
5    4
6    3
dtype: int32

In [9]:
# Solution

np.argwhere(ser % 3==0)

array([[1],
       [4],
       [6]], dtype=int64)

## 14. How to extract items at given positions from a series

In [10]:
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

In [11]:
# Solution
ser.take(pos)

0     a
4     e
8     i
14    o
20    u
dtype: object

## 15. How to stack two series vertically and horizontally ?(to form a dataframe).

In [13]:
ser1 = pd.Series(range(5))
ser1

0    0
1    1
2    2
3    3
4    4
dtype: int64

In [14]:
ser2 = pd.Series(list('abcde'))
ser2

0    a
1    b
2    c
3    d
4    e
dtype: object

In [15]:
# Output
# Vertical
ser1.append(ser2)


0    0
1    1
2    2
3    3
4    4
0    a
1    b
2    c
3    d
4    e
dtype: object

In [18]:
ser1.append(ser2).reset_index(drop = True)

0    0
1    1
2    2
3    3
4    4
5    a
6    b
7    c
8    d
9    e
dtype: object

In [16]:
# Horizontal
df = pd.concat([ser1, ser2], axis=1)
print(df)

   0  1
0  0  a
1  1  b
2  2  c
3  3  d
4  4  e


## 16. How to get the positions of items of series B in another series A?

In [19]:
#Get the positions of items of ser2 in ser1 as a list.

ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

In [20]:
# Solution 1
[np.where(i == ser1)[0].tolist()[0] for i in ser2]


[5, 4, 0, 8]

In [21]:
# Solution 2
[pd.Index(ser1).get_loc(i) for i in ser2]

[5, 4, 0, 8]

## 17. How to compute the mean squared error on a truth and predicted series?

In [28]:
truth = pd.Series(range(10))
pred = pd.Series(range(10)) + np.random.random(10)
pred

0    0.753362
1    1.200920
2    2.745821
3    3.494867
4    4.301511
5    5.829115
6    6.880006
7    7.840176
8    8.798994
9    9.367325
dtype: float64

In [29]:
# Solution
np.mean((truth-pred)**2)


0.4441029340142345

## 18. How to convert the first character of each element in a series to uppercase?

In [30]:
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

In [33]:
# Solution 1
ser.map(lambda x: x.title())

0     How
1      To
2    Kick
3    Ass?
dtype: object

In [34]:
# Solution 2
ser.map(lambda x: x[0].upper() + x[1:])

0     How
1      To
2    Kick
3    Ass?
dtype: object

In [36]:
# Solution 3
pd.Series([i.title() for i in ser])

0     How
1      To
2    Kick
3    Ass?
dtype: object

In [35]:
#Soloution 4
ser.str.title()

0     How
1      To
2    Kick
3    Ass?
dtype: object

## 19. How to calculate the number of characters in each word in a series?

In [37]:
ser = pd.Series(['how', 'to', 'kick', 'ass?'])
ser

0     how
1      to
2    kick
3    ass?
dtype: object

In [38]:
# Solution
ser.map(lambda x: len(x))

0    3
1    2
2    4
3    4
dtype: int64

In [39]:
ser.str.len()

0    3
1    2
2    4
3    4
dtype: int64

## 20. How to compute difference of differences between consequtive numbers of a series?

In [40]:
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])
ser

0     1
1     3
2     6
3    10
4    15
5    21
6    27
7    35
dtype: int64

In [41]:
# Solution
print(ser.diff().tolist())


[nan, 2.0, 3.0, 4.0, 5.0, 6.0, 6.0, 8.0]


In [42]:
print(ser.diff().diff().tolist())

[nan, nan, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0]
