# Ex - GroupBy

### Introduction:

GroupBy can be summarized as Split-Apply-Combine.

Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Check out this [Diagram](http://i.imgur.com/yjNkiwL.png)  
### Step 1. Import the necessary libraries

In [None]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://github.com/thieu1995/csv-files/blob/main/data/pandas/drinks.csv).

In [None]:
data = pd.read_csv('https://raw.githubusercontent.com/thieu1995/csv-files/main/data/pandas/drinks.csv')
data

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF
...,...,...,...,...,...,...
188,Venezuela,333,100,3,7.7,SA
189,Vietnam,111,2,1,2.0,AS
190,Yemen,6,0,0,0.1,AS
191,Zambia,32,19,4,2.5,AF


### Step 3. Assign it to a variable called drinks.

In [None]:
drinks = data

### Step 4. Which continent drinks more beer on average?

In [None]:
beer = drinks.groupby('continent')['beer_servings'].mean()
highest_beer = beer.idxmax()
print(highest_beer)

EU


### Step 5. For each continent print the statistics for wine consumption.

In [None]:
wine_continent = drinks.groupby('continent')['wine_servings'].describe()
print(wine_continent)

           count        mean        std  min   25%    50%     75%    max
continent                                                               
AF          53.0   16.264151  38.846419  0.0   1.0    2.0   13.00  233.0
AS          44.0    9.068182  21.667034  0.0   0.0    1.0    8.00  123.0
EU          45.0  142.222222  97.421738  0.0  59.0  128.0  195.00  370.0
OC          16.0   35.625000  64.555790  0.0   1.0    8.5   23.25  212.0
SA          12.0   62.416667  88.620189  1.0   3.0   12.0   98.50  221.0


### Step 6. Print the mean alcohol consumption per continent for every column

In [None]:
mean_alcohol_consumption = drinks.groupby('continent').agg({
    'beer_servings': 'mean',
    'spirit_servings': 'mean',
    'wine_servings': 'mean',
    'total_litres_of_pure_alcohol': 'mean'
})
print(mean_alcohol_consumption)

           beer_servings  spirit_servings  wine_servings  \
continent                                                  
AF             61.471698        16.339623      16.264151   
AS             37.045455        60.840909       9.068182   
EU            193.777778       132.555556     142.222222   
OC             89.687500        58.437500      35.625000   
SA            175.083333       114.750000      62.416667   

           total_litres_of_pure_alcohol  
continent                                
AF                             3.007547  
AS                             2.170455  
EU                             8.617778  
OC                             3.381250  
SA                             6.308333  


### Step 7. Print the median alcohol consumption per continent for every column

In [None]:
median_alcohol_consumption = drinks.groupby('continent').agg({
    'beer_servings': 'median',
    'spirit_servings': 'median',
    'wine_servings': 'median',
    'total_litres_of_pure_alcohol': 'median'
})
print(median_alcohol_consumption)

           beer_servings  spirit_servings  wine_servings  \
continent                                                  
AF                  32.0              3.0            2.0   
AS                  17.5             16.0            1.0   
EU                 219.0            122.0          128.0   
OC                  52.5             37.0            8.5   
SA                 162.5            108.5           12.0   

           total_litres_of_pure_alcohol  
continent                                
AF                                 2.30  
AS                                 1.20  
EU                                10.00  
OC                                 1.75  
SA                                 6.85  


### Step 8. Print the mean, min and max values for spirit consumption.
#### This time output a DataFrame

In [None]:
spirit_consumption_stats = drinks.agg({
    'spirit_servings': ['mean', 'min', 'max']
})

spirit_consumption_stats = spirit_consumption_stats.T.rename(
    columns={
        'mean': 'Mean',
        'min': 'Min',
        'max': 'Max'
    }
)

print(spirit_consumption_stats)

                      Mean  Min    Max
spirit_servings  80.994819  0.0  438.0
