# Summary Functions and Maps

In [10]:
import pandas as pd

wines_df = pd.read_csv("./wines.csv")
wines_df.head()
wines_df.columns = wines_df.columns.str.strip()

## Summary Functions
There are a few major summary functions, and they include:
- `mean()` - Returns the mean of all columns
- `corr()` - Returns the correlation between columns in a DataFrame
- `count()` - Returns the number of non-null values in each DataFrame column
- `max()` - Returns the highest value in each column
- `min()` - Returns the lowest value in each column
- `median()` - Returns the median of each column
- `std()` - Returns the standard deviation of each column
- `var()` - Returns the variance of each column
- `sum()` - Returns the sum of all the values in each column
- `describe()` - Returns a statistical summary for each column
- `value_counts()` - Returns the number of times each value occurs in a column

... and more.

In [11]:
wines_df.price = wines_df.price.astype(float)
wines_df.price.mean()

49.5

## Maps

There are several forms of the map function Pandas. They are:
- `apply()` - Used for substituting each value in a DataFrame OR Series with another value. 
  - On a Series, `apply()` can be used to apply a function to every value in the series. It applies the function one cell at a time.
  - On a DataFrame, `apply()` can be used to apply a function to every value in the DataFrame. It applies the function one row at a time.
- `map()` - Used for substituting each value in a Series with another value.
- `applymap()` - Used for substituting each value in a DataFrame with another value.


### `apply` on a Series

In [19]:
def quadruple_price(cell_price):
    return cell_price * 4

tmp_df = pd.DataFrame(wines_df, copy=True)
tmp_df.price.apply(quadruple_price)

0    220.0
1      NaN
2    132.0
3    292.0
4    148.0
Name: price, dtype: float64

### `apply` on a DataFrame

In [20]:
def modify_dataframe(row_series):
    row_series.price = row_series.price * 2
    row_series.comments = row_series.comments + "!"
    return row_series

tmp_df = pd.DataFrame(wines_df, copy=True)
tmp_df.apply(modify_dataframe, axis='columns')

Unnamed: 0.1,Unnamed: 0,country,price,comments
0,0,USA,110.0,comments 1!
1,1,Canada,,comments 2!
2,2,Brazil,66.0,comments 3!
3,3,,146.0,comments 4!
4,4,USA,74.0,comments 5!


### `map` on a Series

In [16]:
def double_price(cell_price) -> int:
    if type(cell_price) is not float:
        raise TypeError("Price must be a float.")
    return cell_price * 2

tmp_df = pd.DataFrame(wines_df, copy=True)
tmp_df.price = tmp_df.price.map(double_price)
tmp_df.price = tmp_df.price.map(double_price)
tmp_df.price

0    220.0
1      NaN
2    132.0
3    292.0
4    148.0
Name: price, dtype: float64

### `applymap` on a DataFrame

In [17]:
def triple_price(cell) -> int:
    cell = cell * 3
    return cell

tmp_df = pd.DataFrame(wines_df, copy=True)
tmp_df.applymap(triple_price)

Unnamed: 0.1,Unnamed: 0,country,price,comments
0,0,USA USA USA,165.0,comments 1 comments 1 comments 1
1,3,Canada Canada Canada,,comments 2 comments 2 comments 2
2,6,Brazil Brazil Brazil,99.0,comments 3 comments 3 comments 3
3,9,NaN NaN NaN,219.0,comments 4 comments 4 comments 4
4,12,USA USA USA,111.0,comments 5 comments 5 comments 5
