![avater](9-1.png)  
分组键可以有多种形式，且类型不必相同：
+ 列表或数组，其长度与待分组的轴一样 
+ 表示DataFrame某个列名的值
+ 字典或Series，给出待分组轴上的值与分组名之间的对应关系
+ 函数，用于处理轴索引或索引中的各个标签

In [None]:
import pandas as pd
import numpy as np

In [None]:
# pandas dataframe pd.DataFrame()
df = pd.DataFrame({'key1':['a','a','b','b','a'],
                  'key2':['one','two','one','two','one'],
                  'data1':np.random.randn(5),
                  'data2':np.random.randn(5)})
df 

In [None]:
# pandas series s.groupby()
# pandas GroupBy g.mean()
grouped = df.loc[:,'data1'].groupby(df.loc[:,'key1'])
grouped.mean()

In [None]:
means = df.loc[:,'data1'].groupby([df.loc[:,'key1'],df.loc[:,'key2']]).mean()
means

In [None]:
# pandas series s.unstack()
means.unstack()

In [None]:
# numpy Array creation routines np.array()
states = np.array(['Ohio','California','California','Ohio','Ohio'])
years = np.array([2005,2005,2006,2005,2006])
df.loc[:,'data1'].groupby([states,years]).mean()

In [None]:
df.groupby('key1').mean()
# 非数值数据会被作为 “麻烦列” 从结果中排除

In [None]:
df.groupby(['key1','key2']).mean()

In [None]:
# pandas GroupBy g.size()
# 分组键中的任何缺失值都会被排除在结果之外
df.groupby(['key1','key2']).size()

# 对分组进行迭代
pandas GroupBy g.\_\_iter\_\_()  
pandas dataframe d.groupby()

In [None]:
for name,group in df.groupby('key1'):
    print(name)
    print(group)

In [None]:
for (k1,k2), group in df.groupby(['key1','key2']):
    print(k1,k2)
    print(group)

In [None]:
pieces = dict(list(df.groupby('key1')))
pieces['b']

In [None]:
#pandas dataframe d.dtypes
df.dtypes

In [None]:
grouped = df.groupby(df.dtypes,axis=1)
dict(list(grouped))

# 选取一个或一组列

In [None]:
df.loc[:'data1'].groupby('key1')

In [None]:
#语法糖
# pandas GroupBy DataFrameGroupBy[] 
df.groupby('key1')['data1']

In [None]:
# pandas GroupBy DataFrameGroupBy[[]] 
df.groupby(['key1','key2'])[['data2']].mean()

In [None]:
s_grouped=df.groupby(['key1','key2'])['data2']
s_grouped

In [None]:
s_grouped.mean()

# 通过字典或Series进行分组

In [None]:
#pandas dataframe d.at[]
people = pd.DataFrame(np.random.randn(5,5),
                     columns=['a','b','c','d','e'],
                     index=['Joe','Steve','Wes','Jim','Travis'])
people.at[2:3,['b','c']]=np.nan
people

In [None]:
# pandas dataframe d.groupby() 
# pandas GroupBy g.sum()
# 假设已知列的分组关系，并希望根据分组计算列的总计，只需要将字段传给groupby
mapping={'a':'red','b':'red','c':'blue','d':'blue','e':'red','f':'orange'}
by_column=people.groupby(mapping,axis=1)
by_column.sum()

In [None]:
# 如果用Series做分组键，则pandas会检查Series以确保其索引跟分组轴是对齐的
map_series = pd.Series(mapping)
map_series

In [None]:
# pandas GroupBy g.count()
people.groupby(map_series,axis=1).count()

# 通过函数进行分组

In [None]:
# 任何被当做分组键的函数都会在各个索引上被调用一次，其返回值就会被用作分组名称
people.groupby(len).sum()

In [None]:
# 将函数跟数组、列表、字典、Series混合使用也不是问题，因为任务东西最终都会被转换为数组
# pandas GroupBy g.min()
key_list = ['one','one','one','two','two']
people.groupby([len,key_list]).min()

# 根据索引级别分组
pandas Index pd.MultiIndex.from_arrays()  
层次化索引数据最方便的地方就在于它能够根据索引级别进行聚合，要实现该目的，通过level关键字传入级别编号或名称即可

In [None]:
columns = pd.MultiIndex.from_arrays([['US','US','US','JP','JP'],
                                    [1,3,5,1,3]],names=['cty','tenor'])
hier_df = pd.DataFrame(np.random.randn(4,5),columns=columns)
hier_df

In [None]:
hier_df.groupby(level='cty',axis=1).count()