# Ex - GroupBy

### Introduction:

GroupBy can be summarizes as Split-Apply-Combine.

Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Check out this [Diagram](http://i.imgur.com/yjNkiwL.png)  
### Step 1. Import the necessary libraries

In [5]:
import pandas as pd

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv). 

### Step 3. Assign it to a variable called drinks.

In [6]:
drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv')
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF


### Step 4. Which continent drinks more beer on average?

In [7]:
drinks.groupby('continent').beer_servings.mean()

# Signature: drinks.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)
# Docstring:
# Group series using mapper (dict or key function, apply given function
# to group, return result as series) or by a series of columns.

# Parameters
# ----------
# by : mapping function / list of functions, dict, Series, or tuple /
#     list of column names.
#     Called on each element of the object index to determine the groups.
#     If a dict or Series is passed, the Series or dict VALUES will be
#     used to determine the groups
# axis : int, default 0
# level : int, level name, or sequence of such, default None
#     If the axis is a MultiIndex (hierarchical), group by a particular
#     level or levels
# as_index : boolean, default True
#     For aggregated output, return object with group labels as the
#     index. Only relevant for DataFrame input. as_index=False is
#     effectively "SQL-style" grouped output
# sort : boolean, default True
#     Sort group keys. Get better performance by turning this off.
#     Note this does not influence the order of observations within each
#     group.  groupby preserves the order of rows within each group.
# group_keys : boolean, default True
#     When calling apply, add group keys to index to identify pieces
# squeeze : boolean, default False
#     reduce the dimensionality of the return type if possible,
#     otherwise return a consistent type

# Examples
# --------
# DataFrame results

# >>> data.groupby(func, axis=0).mean()
# >>> data.groupby(['col1', 'col2'])['col3'].mean()

# DataFrame with hierarchical index

# >>> data.groupby(['col1', 'col2']).mean()


continent
AF     61.471698
AS     37.045455
EU    193.777778
OC     89.687500
SA    175.083333
Name: beer_servings, dtype: float64

### Step 5. For each continent print the statistics for wine consumption.

In [8]:
drinks.groupby('continent').wine_servings.describe()

continent       
AF         count     53.000000
           mean      16.264151
           std       38.846419
           min        0.000000
           25%        1.000000
           50%        2.000000
           75%       13.000000
           max      233.000000
AS         count     44.000000
           mean       9.068182
           std       21.667034
           min        0.000000
           25%        0.000000
           50%        1.000000
           75%        8.000000
           max      123.000000
EU         count     45.000000
           mean     142.222222
           std       97.421738
           min        0.000000
           25%       59.000000
           50%      128.000000
           75%      195.000000
           max      370.000000
OC         count     16.000000
           mean      35.625000
           std       64.555790
           min        0.000000
           25%        1.000000
           50%        8.500000
           75%       23.250000
           max      21

### Step 6. Print the mean alcoohol consumption per continent for every column

In [9]:
drinks.groupby('continent').mean()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AF,61.471698,16.339623,16.264151,3.007547
AS,37.045455,60.840909,9.068182,2.170455
EU,193.777778,132.555556,142.222222,8.617778
OC,89.6875,58.4375,35.625,3.38125
SA,175.083333,114.75,62.416667,6.308333


### Step 7. Print the median alcoohol consumption per continent for every column

In [10]:
drinks.groupby('continent').median()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AF,32.0,3.0,2.0,2.3
AS,17.5,16.0,1.0,1.2
EU,219.0,122.0,128.0,10.0
OC,52.5,37.0,8.5,1.75
SA,162.5,108.5,12.0,6.85


### Step 8. Print the mean, min and max values for spirit consumption.
#### This time output a DataFrame

In [13]:
drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])

# Aggregation: computing a summary statistic (or statistics) about each group. Some examples:

# Compute group sums or means
# Compute group sizes / counts

# DataFrameGroupBy.agg(arg, *args, **kwargs)[source]
# Aggregate using input function or dict of {column -> function}

# Parameters:	
# arg : function or dict
# Function to use for aggregating groups. If a function, must either work when passed a DataFrame or when passed to 
# DataFrame.apply. If passed a dict, the keys must be DataFrame column names.
# Accepted Combinations are:
# string cythonized function name
# function
# list of functions
# dict of columns -> functions
# nested dict of names -> dicts of functions

# Once the GroupBy object has been created, several methods are available to perform a computation on the grouped data.

# An obvious one is aggregation via the aggregate or equivalently agg method

# See http://pandas.pydata.org/pandas-docs/stable/groupby.html#aggregation for more info. about the agg() method

Unnamed: 0_level_0,mean,min,max
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AF,16.339623,0,152
AS,60.840909,0,326
EU,132.555556,0,373
OC,58.4375,0,254
SA,114.75,25,302
