In [1]:
import pandas as pd
from pandas import DataFrame
from pandas import Series
import numpy as npA

In [2]:
df = DataFrame(
    {
        'name': ['John', 'Merry', 'Anna', 'Bob', 'Alice'],
        'age': [20, 21, 20, 22, 21],
        'point': [9, 10, 8, 9, 8],
        'subject': ['math', 'physics', 'math', 'math', 'physics']
    }
)

In [3]:
df

Unnamed: 0,age,name,point,subject
0,20,John,9,math
1,21,Merry,10,physics
2,20,Anna,8,math
3,22,Bob,9,math
4,21,Alice,8,physics


**QUESTION 1**: How can i group math subject to a new dataframe, and physic to a new dataframe?

But let have a quickly view to the SQL **groupby**

```SQL
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;
```

As we can see, **groupby** always has an "aggregate_function(column_name)". The same as **groupby** in pandas.

But in the first , we need to consider how many functionals **groupby** has?

```
Signature: df.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)
Docstring:
Group series using mapper (dict or key function, apply given function
to group, return result as series) or by a series of columns.

Parameters
----------
by : mapping function / list of functions, dict, Series, or tuple /
    list of column names.
    Called on each element of the object index to determine the groups.
    If a dict or Series is passed, the Series or dict VALUES will be
    used to determine the groups
axis : int, default 0
level : int, level name, or sequence of such, default None
    If the axis is a MultiIndex (hierarchical), group by a particular
    level or levels
as_index : boolean, default True
    For aggregated output, return object with group labels as the
    index. Only relevant for DataFrame input. as_index=False is
    effectively "SQL-style" grouped output
sort : boolean, default True
    Sort group keys. Get better performance by turning this off.
    Note this does not influence the order of observations within each
    group.  groupby preserves the order of rows within each group.
group_keys : boolean, default True
    When calling apply, add group keys to index to identify pieces
squeeze : boolean, default False
    reduce the dimensionality of the return type if possible,
    otherwise return a consistent type

Examples
--------
DataFrame results

>>> data.groupby(func, axis=0).mean()
>>> data.groupby(['col1', 'col2'])['col3'].mean()

DataFrame with hierarchical index

>>> data.groupby(['col1', 'col2']).mean()

Returns
-------
GroupBy object
File:      c:\program files\anaconda2\lib\site-packages\pandas\core\generic.py
Type:      instancemethod
```

To answer question 1, we have 2 tasks to do is:
1. group our dataset by subject 
2. get dataframe from groups we has grouped

Here is the task 1, we using option **by=Series** to group a columns:

In [20]:
grouped = df.groupby(by=df['subject'])

Here is the task 2, we using option **get_group** to get a dataframe a columns:

In [21]:
df_math = grouped.get_group(name='math', obj=None)
df_math

Unnamed: 0,age,name,point,subject
0,20,John,9,math
2,20,Anna,8,math
3,22,Bob,9,math


In [22]:
df_physics = grouped.get_group(name='physics')
df_physics

Unnamed: 0,age,name,point,subject
1,21,Merry,10,physics
4,21,Alice,8,physics


**QUESTION 2**: I want to count how many people get 9 points, how many get 10 points, and other point? 

```
Input: df
Output: a DataFrame contain 2 columns: point and quantity of each point
```

We have to 2 tasks to do:
1. Group our dataset by point
2. Count elements in each group

In [26]:
point_grouped = df.groupby(by=df['point'])
point_grouped.count()

Unnamed: 0_level_0,age,name,subject
point,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
8,2,2,2
9,2,2,2
10,1,1,1
