# Groupby Function

> 1. GroupBy returns a grouped DataFrame.
> 2. We can apply two operations on a GroupBy DataFrame:
>> 1) Aggregation
>> 2) Filtering (using filter() or apply())

### 1. Loading and Selecting Data

In [1]:
import pandas as pd

data = pd.read_csv('../00_data/deaths-temperature-gasparrini.csv')
data = data.iloc[:,1:]
data.head()

Unnamed: 0,Code,Year,Extreme cold,Moderate cold,Moderate heat,Extreme heat
0,AUS,2015,0.67,5.82,0.14,0.32
1,BRA,2015,0.49,2.34,0.48,0.22
2,CAN,2015,0.25,4.21,0.27,0.26
3,CHN,2015,1.06,9.31,0.24,0.4
4,ITA,2015,0.85,8.51,0.94,0.67


### 2. GroupBy Operation

In [2]:
grouped_df = data.groupby('Code')
print(grouped_df)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002576D1B0950>


### 3. Aggregation 
> -> Aggregation functions also return a DataFrame. We can apply any DataFrame operation on it.  
> -> List of function to perform aggregation
> 1. sum()
> 2. mean()
> 3. median()
> 4. size()
> 5. count()

In [3]:
print(grouped_df.sum())

      Year  Extreme cold  Moderate cold  Moderate heat  Extreme heat
Code                                                                
AUS   2015          0.67           5.82           0.14          0.32
BRA   2015          0.49           2.34           0.48          0.22
CAN   4030          0.50           8.42           0.54          0.52
CHN   4030          2.12          18.62           0.48          0.80
ESP   6045          2.13          14.25           1.62          1.56
GBR   2015          0.86           7.62           0.07          0.22
ITA   4030          1.70          17.02           1.88          1.34
JPN   6045          2.31          27.12           0.39          0.54
KOR   6045          1.05          19.74           0.30          0.63
SWE   6045          0.81          10.32           0.09          0.45
THA   4030          0.88           4.34           0.94          0.56
TWN   6045          2.13           9.57           1.80          0.75
USA   2015          0.45          

In [4]:
print(grouped_df.mean())

        Year  Extreme cold  Moderate cold  Moderate heat  Extreme heat
Code                                                                  
AUS   2015.0          0.67           5.82           0.14          0.32
BRA   2015.0          0.49           2.34           0.48          0.22
CAN   2015.0          0.25           4.21           0.27          0.26
CHN   2015.0          1.06           9.31           0.24          0.40
ESP   2015.0          0.71           4.75           0.54          0.52
GBR   2015.0          0.86           7.62           0.07          0.22
ITA   2015.0          0.85           8.51           0.94          0.67
JPN   2015.0          0.77           9.04           0.13          0.18
KOR   2015.0          0.35           6.58           0.10          0.21
SWE   2015.0          0.27           3.44           0.03          0.15
THA   2015.0          0.44           2.17           0.47          0.28
TWN   2015.0          0.71           3.19           0.60          0.25
USA   

In [5]:
print(grouped_df.median())

        Year  Extreme cold  Moderate cold  Moderate heat  Extreme heat
Code                                                                  
AUS   2015.0          0.67           5.82           0.14          0.32
BRA   2015.0          0.49           2.34           0.48          0.22
CAN   2015.0          0.25           4.21           0.27          0.26
CHN   2015.0          1.06           9.31           0.24          0.40
ESP   2015.0          0.71           4.75           0.54          0.52
GBR   2015.0          0.86           7.62           0.07          0.22
ITA   2015.0          0.85           8.51           0.94          0.67
JPN   2015.0          0.77           9.04           0.13          0.18
KOR   2015.0          0.35           6.58           0.10          0.21
SWE   2015.0          0.27           3.44           0.03          0.15
THA   2015.0          0.44           2.17           0.47          0.28
TWN   2015.0          0.71           3.19           0.60          0.25
USA   

In [6]:
print(grouped_df.size())

Code
AUS    1
BRA    1
CAN    2
CHN    2
ESP    3
GBR    1
ITA    2
JPN    3
KOR    3
SWE    3
THA    2
TWN    3
USA    1
dtype: int64


In [7]:
print(grouped_df.count())

      Year  Extreme cold  Moderate cold  Moderate heat  Extreme heat
Code                                                                
AUS      1             1              1              1             1
BRA      1             1              1              1             1
CAN      2             2              2              2             2
CHN      2             2              2              2             2
ESP      3             3              3              3             3
GBR      1             1              1              1             1
ITA      2             2              2              2             2
JPN      3             3              3              3             3
KOR      3             3              3              3             3
SWE      3             3              3              3             3
THA      2             2              2              2             2
TWN      3             3              3              3             3
USA      1             1          

### 3.1. Aggregation Dataframe Operations:

In [8]:
grouped_sum_df = grouped_df.sum()
print(grouped_sum_df['Moderate cold'])

Code
AUS     5.82
BRA     2.34
CAN     8.42
CHN    18.62
ESP    14.25
GBR     7.62
ITA    17.02
JPN    27.12
KOR    19.74
SWE    10.32
THA     4.34
TWN     9.57
USA     5.15
Name: Moderate cold, dtype: float64


In [9]:
data2 = grouped_sum_df[grouped_sum_df['Moderate cold'] > 5]
print(data2)

      Year  Extreme cold  Moderate cold  Moderate heat  Extreme heat
Code                                                                
AUS   2015          0.67           5.82           0.14          0.32
CAN   4030          0.50           8.42           0.54          0.52
CHN   4030          2.12          18.62           0.48          0.80
ESP   6045          2.13          14.25           1.62          1.56
GBR   2015          0.86           7.62           0.07          0.22
ITA   4030          1.70          17.02           1.88          1.34
JPN   6045          2.31          27.12           0.39          0.54
KOR   6045          1.05          19.74           0.30          0.63
SWE   6045          0.81          10.32           0.09          0.45
TWN   6045          2.13           9.57           1.80          0.75
USA   2015          0.45           5.15           0.14          0.21


### 4. Filtering the Grouped DataFrame:

In [10]:
print(grouped_df.filter(lambda x: x['Moderate heat'].sum() > .9)['Moderate heat'])


4     0.94
7     0.54
9     0.60
10    0.47
15    0.54
17    0.60
20    0.94
23    0.54
25    0.60
26    0.47
Name: Moderate heat, dtype: float64
