# R与Python手牵手：数据的分组排序

这次介绍如何在Python与R中进行表格数据的分组排序，也就是分组进行统一运算，以及按照规则进行排列。

## Python

In [1]:
#载入模块
import pandas as pd
import numpy as np
import matplotlib.pylab as plt

#载入数据
edu = pd.read_csv('G:/Py/introduction-datascience-python-book-master/files/ch02/educ_figdp_1_Data.csv',
                  na_values=':', usecols=['TIME', 'GEO', 'Value'])   #na_values是把“：”符号认为是缺失值的意思

数据可以在以下网址取得：https://github.com/DataScienceUB/introduction-datascience-python-book

### 排序

In [2]:
#按照Value从大到小排序，ascending控制正序倒序，inplace控制是否在原数据集进行修改

edu.sort_values(by='Value', ascending=False, inplace=True)
edu.head()

Unnamed: 0,TIME,GEO,Value
130,2010,Denmark,8.81
131,2011,Denmark,8.75
129,2009,Denmark,8.74
121,2001,Denmark,8.44
122,2002,Denmark,8.44


In [3]:
#返回原始排序

edu.sort_index(axis=0, ascending=True, inplace=True)
edu.head()

Unnamed: 0,TIME,GEO,Value
0,2000,European Union (28 countries),
1,2001,European Union (28 countries),
2,2002,European Union (28 countries),5.0
3,2003,European Union (28 countries),5.03
4,2004,European Union (28 countries),4.95


由此，我们也可以发现，无论我们怎么进行排序，原始的行号是一直得以保存的，有了这个我们就可以从新恢复数据的排序。

### 分组

In [4]:
#根据GEO分组得到组内的平均值

group = edu[['GEO', 'Value']].groupby('GEO').mean()
group.head()

Unnamed: 0_level_0,Value
GEO,Unnamed: 1_level_1
Austria,5.618333
Belgium,6.189091
Bulgaria,4.093333
Cyprus,7.023333
Czech Republic,4.168333


### 整合

In [5]:
filtered_data = edu[edu['TIME'] > 2005]
pivedu = pd.pivot_table(filtered_data, values='Value',
                        index=['GEO'], columns=['TIME'])
pivedu.head()

TIME,2006,2007,2008,2009,2010,2011
GEO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Austria,5.4,5.33,5.47,5.98,5.91,5.8
Belgium,5.98,6.0,6.43,6.57,6.58,6.55
Bulgaria,4.04,3.88,4.44,4.58,4.1,3.82
Cyprus,7.02,6.95,7.45,7.98,7.92,7.87
Czech Republic,4.42,4.05,3.92,4.36,4.25,4.51


In [8]:
pivedu.loc[['Spain', 'Portugal'], [2006, 2011]]

TIME,2006,2011
GEO,Unnamed: 1_level_1,Unnamed: 2_level_1
Spain,4.26,4.82
Portugal,5.07,5.27
