Welcome back, guys! We will continue with part 2 in this series of Pandas exercise. I am very excited about this post because we will introducing DataFrame, the most used Pandas data structure. I hope you guys will enjoy this post.

With no further due, let's get started.

We will start by importing Pandas and NumPy

In [4]:
import pandas as pd
import numpy as np

### Ex 26: How to get the mean of a series grouped by another series?

Q: Compute the mean of weights of each fruit.

In [36]:
fruits = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))
weights = pd.Series(np.linspace(1, 10, 10))

#### Desired output

In [42]:
# Keep in mind that your values will be different from mine and you might only randomly select only 2 fruits instead of 3.

![Pandas_ex26](/blog/assets/post_cont_image/pandas_ex26.png)

#### Solution

In [39]:
fruits_weights = pd.concat({"fruits":fruits,"weights":weights},axis=1)

In [40]:
fruits_weights.groupby(by="fruits").mean()

Unnamed: 0_level_0,weights
fruits,Unnamed: 1_level_1
apple,5.4
banana,6.5
carrot,2.0


We concatenate horizontally (by setting the axis = 1) the two series into a dataframe by using the concat function and use that dataframe to group the fruits by the name of the fruit. After the grouping the dataframe, we get the mean of each fruit using the mean function.

### Ex 27: How to compute the euclidean distance between two series?

Q: Compute the [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between series (points) p and q, using a packaged formula and another solution without.

Euclidean distance formular:

![Pandas_ex27](/blog/assets/post_cont_image/pandas_ex27.png)

In [43]:
p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])

#### Desired output

In [44]:
# 18.165

#### Solution

#### 1st Method using a built-in function

In [45]:
np.linalg.norm(p-q)

18.16590212458495

We can get the Euclidean distance by calling the NumPy function linalg.norm function and pass in the difference in the two series.

#### 2nd Method without using a built-in function

In [49]:
sum((p - q)**2)**.5

18.16590212458495

Using the Euclidean formula provided, we can use operators to find the Euclidean distance. We first subtract the corresponding elements in the two series and apply 2 as an exponent then sum it up and finally get the square root.  

### Ex 28: How to find all the local maxima (or peaks) in a numeric series?

Q: Get the positions of peaks (values surrounded by smaller values on both sides) in ser.

In [50]:
ser = pd.Series([2, 10, 3, 4, 9, 10, 2, 7, 3])

#### Desired output

In [52]:
# array([1, 5, 7])

#### Solution

In [53]:
from scipy.signal import argrelextrema

argrelextrema(ser.values, np.greater)

(array([1, 5, 7]),)

To calculate the relative extrema of the series, we use argrelextrema function from the scipy (Scientific Python) which is a Python library close to NumPy used for mathematics, science, and engineering. 

In that function, we pass in the series and the comparator. Since we are looking for the maxima, in this case, the comparator will be np.greater.

### Ex 29: How to replace missing spaces in a string with the least frequent character?

Q: Replace the spaces in my_str with the least frequent character.

In [114]:
ser = pd.Series(list('dbc deb abed gagde'))

#### Desired output

In [111]:
# least frequent element is c

# ['d',
#  'b',
#  'c',
#  'c',
#  'd',
#  'e',
#  'b',
#  'c',
#  'a',
#  'b',
#  'e',
#  'd',
#  'c',
#  'g',
#  'a',
#  'g',
#  'd',
#  'e']

#### Solution

In [101]:
from collections import Counter

least_common_char = Counter(ser.replace(" ","")).most_common()[-1][0]

In [116]:
Counter(ser.replace(" ","")).most_common()

[('d', 4), ('b', 3), ('', 3), ('e', 3), ('a', 2), ('g', 2), ('c', 1)]

In [102]:
least_common_char

'c'

In [109]:
ser.replace(" ",least_common_char)

['d',
 'b',
 'c',
 'c',
 'd',
 'e',
 'b',
 'c',
 'a',
 'b',
 'e',
 'd',
 'c',
 'g',
 'a',
 'g',
 'd',
 'e']

To replace the white space with the most common element in the series, we need first to find the most common character in the series. 

To find it, we use the counter function from the collection library. We pass in the series without the white space (by replacing " " by "") and apply to the counter function, the most_common function. We will get back a list of tuples will all characters and their counts in decreasing order. We use -1 to target the last tuple and 0 to get the character in that tuple.

Now that we have the least common character, we can replace all the instances of white space by the least common character.

### Ex 30: How to create a TimeSeries starting ‘2000-01-01’ and 10 weekends (Saturdays) and have random numbers as values?

Q: Create a TimeSeries starting ‘2000-01-01’ and 10 weekends (Saturdays) and have random numbers as values?

In [120]:
ser = pd.Series(["2000-01-01"])
ser

0    2000-01-01
dtype: object

In [None]:
# values can be random
2000-01-01    4
2000-01-08    1
2000-01-15    8
2000-01-22    4
2000-01-29    4
2000-02-05    2
2000-02-12    4
2000-02-19    9
2000-02-26    6
2000-03-04    6

In [134]:
pd.Series(np.random.randint(1,high=10,size=10),pd.date_range("2000-01-01",periods=10,freq="W-SAT"))

2000-01-01    1
2000-01-08    9
2000-01-15    1
2000-01-22    3
2000-01-29    8
2000-02-05    7
2000-02-12    3
2000-02-19    8
2000-02-26    4
2000-03-04    7
Freq: W-SAT, dtype: int64