## Multi-level grouping and broadcasting grouped aggregations

### What you will learn in this session

- Calculating grouped summary statistics using more than one column to group
- Broadcasting grouped summary statistics to create new columns

### Examples from the intro

#### import pandas and dataset

In [1]:
import pandas as pd

vc_sales = pd.read_csv('console_sales.csv')

#### Multi-level grouping and aggregating

Calculating the mean of the sold_year column for each company per decade.

In [24]:
vc_sales.groupby(['company', 'decade'])['sold_year'].mean()

company    decade
Microsoft  2000      7.386598e+06
           2010      6.462089e+06
Nintendo   2000      1.624794e+07
           2010      7.101272e+06
Sony       2000      9.799716e+06
           2010      6.926928e+06
Name: sold_year, dtype: float64

#### Broadcasting/creating new column based on grouped aggregation

Using `.transform()` will reproduce each of the aggregated values for the number of rows per group.

In [2]:
vc_sales.groupby(['platform'])['sold_year'].transform('sum')

0     84892507
1     84892507
2     84892507
3     84892507
4     84892507
        ...   
76    15939902
77    15939902
78    15939902
79    15939902
80    15939902
Name: sold_year, Length: 81, dtype: int64

Creating a new column for the total sales per platform.

In [25]:
vc_sales['total_platform_sales'] = vc_sales.groupby(['platform'])['sold_year'].transform('sum')

vc_sales.head()

Unnamed: 0,company,platform,year,sold_year,sold_total,sold_year_pct_change,decade,total_platform_sales
0,Microsoft,X360,2005,1178267,1178267,,2000,84892507
1,Microsoft,X360,2006,6801532,7979799,577.24879,2000,84892507
2,Microsoft,X360,2007,7879552,15859351,98.74374,2000,84892507
3,Microsoft,X360,2008,10913123,26772474,68.811914,2000,84892507
4,Microsoft,X360,2009,10160518,36932992,37.95136,2000,84892507


### Task 1- Multi-level grouping

1. Run the code cell below to load the titanic dataset
2. Calculate the average survival rate for each level of sex and pclass combined.
3. Calcualte the mean, standard deviation, min and max of age for each level of pclass and sex.

In [1]:
import pandas as pd
import seaborn as sns
titanic = sns.utils.load_dataset('titanic')

titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [1]:
# Calculate the average survival rate for each level of sex and pclass combined.



In [3]:
# Calcualte the mean, standard deviation, min and max of age for each level of pclass and sex.



### Task 2 - Calculating new columns based on a grouping

1. Calculate a new column called 'pclass_survival' for the mean survival rate for each level of pclass.
2. Calculate a new column called 'pclass_sex_survival' for the mean survival rate for each level of pclass and sex combined.

In [5]:
# calculate a new column called 'pclass_survival' for the mean survival rate for each level of pclass.



In [7]:
# calculate a new column called 'pclass_sex_survival' for the mean survival rate for each level of pclass and sex combined.

