# 04 Using Pandas

### Applying functions to a DataFrame

### Overview
<span>
    <table>
        <tr><td>Build a dataset<td><tr>
        <tr><td>Call functions on a DataFrame<td><tr>
        <tr><td>Using Apply and Lambdas<td><tr>
    <table>
<span>

## Setup

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

# turn on data table rendering
pd.set_option('display.notebook_repr_html', True)


## Build a data frame from a toy data set

In [2]:
# Constructing a beer sales DataFrame
df = pd.DataFrame({'Billy Beer': [13884, 23008, 17883, 24435, 49938],
                   'Lucky Lager': [34565, 83938, 59437, 28843, 48285],
                   'Triple Bock': [39987, 35512, 23542, 37729, 36647]})
df

Unnamed: 0,Billy Beer,Lucky Lager,Triple Bock
0,13884,34565,39987
1,23008,83938,35512
2,17883,59437,23542
3,24435,28843,37729
4,49938,48285,36647


In [3]:
# Quick insights / descriptive statistics
df. describe()

Unnamed: 0,Billy Beer,Lucky Lager,Triple Bock
count,5.0,5.0,5.0
mean,25829.6,51013.6,34683.4
std,14115.302841,21934.601587,6443.542294
min,13884.0,28843.0,23542.0
25%,17883.0,34565.0,35512.0
50%,23008.0,48285.0,36647.0
75%,24435.0,59437.0,37729.0
max,49938.0,83938.0,39987.0


## Call functions on a DataFrame

In [4]:
# Computing the mean sales for each brand
df.mean()

Billy Beer     25829.6
Lucky Lager    51013.6
Triple Bock    34683.4
dtype: float64

In [5]:
# Calculate the 75% quartile
df.quantile(q=.75)

Billy Beer     24435.0
Lucky Lager    59437.0
Triple Bock    37729.0
Name: 0.75, dtype: float64

In [6]:
# Calculate the sample standard deviation
df.std()

Billy Beer     14115.302841
Lucky Lager    21934.601587
Triple Bock     6443.542294
dtype: float64

In [7]:
# Calculate the population standard deviation
df.std(ddof=0)

Billy Beer     12625.110670
Lucky Lager    19618.904084
Triple Bock     5763.279434
dtype: float64

## Using Apply or Lambda expression

In [8]:
# The same as calling .mean on the DataFrame
df.apply(np.mean)

Billy Beer     25829.6
Lucky Lager    51013.6
Triple Bock    34683.4
dtype: float64

In [9]:
# Same as df.apply(np.mean)
np.mean(df)

  return mean(axis=axis, dtype=dtype, out=out, **kwargs)


Billy Beer     25829.6
Lucky Lager    51013.6
Triple Bock    34683.4
dtype: float64

In [10]:
# Specify a function to apply to the DataFrame
def zscore(series):
    result = (series - series.mean()) / series.std()
    return result

# Call Apply on the highest function
df.apply(zscore)

Unnamed: 0,Billy Beer,Lucky Lager,Triple Bock
0,-0.846287,-0.749893,0.823088
1,-0.199897,1.501026,0.128594
2,-0.562978,0.384023,-1.72908
3,-0.098801,-1.010759,0.472659
4,1.707962,-0.124397,0.304739


In [11]:
# The same result values as using scipy stats zscore with
# a dynamic degrees of freedom of 1 
stats.zscore(df, ddof=1)

Unnamed: 0,Billy Beer,Lucky Lager,Triple Bock
0,-0.846287,-0.749893,0.823088
1,-0.199897,1.501026,0.128594
2,-0.562978,0.384023,-1.72908
3,-0.098801,-1.010759,0.472659
4,1.707962,-0.124397,0.304739


In [13]:
# Calculate inter quartile range with a lambda expression
inter_quartile_range = df.apply(lambda x: x.quantile(q=.75) - x.quantile(q=.25))
inter_quartile_range

Billy Beer      6552.0
Lucky Lager    24872.0
Triple Bock     2217.0
dtype: float64

### Done!

#### Next: _Matplotlib_