# 重命名与合并（Renaming & Combining）
*恭喜你坚持到最后一节课了！*<p>
本节主要学习如何重命名那些你不满意的字段或索引名称，以及如何合并多个 Dataframe 和 Series

In [1]:
import pandas as pd
pd.set_option("display.max_rows", 5)
# 读取红酒评价数据集
wine_reviews = pd.read_csv("winemag-data-50k-v2.csv", index_col=0)
wine_reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
49998,Italy,The beautiful thing about this wine is the ric...,Le Rive,90,45.0,Veneto,Soave Classico,,,,Suavia 2008 Le Rive (Soave Classico),Garganega,Suavia
49999,US,This is a particularly fine vintage for the po...,Dr. Wolfe's Family Red,90,15.0,Washington,Washington,Washington Other,Paul Gregutt,@paulgwine,Thurston Wolfe 2009 Dr. Wolfe's Family Red Red...,Red Blend,Thurston Wolfe


## 重命名（Renaming）
第一个要讲的功能是 **rename()** ，它可以用来更改索引（index）或者字段（column）的名称。<p>
比如我们把分数字段 points 改成 score 来试试：

In [2]:
wine_reviews.rename(columns={'points': 'score'})

Unnamed: 0,country,description,designation,score,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
49998,Italy,The beautiful thing about this wine is the ric...,Le Rive,90,45.0,Veneto,Soave Classico,,,,Suavia 2008 Le Rive (Soave Classico),Garganega,Suavia
49999,US,This is a particularly fine vintage for the po...,Dr. Wolfe's Family Red,90,15.0,Washington,Washington,Washington Other,Paul Gregutt,@paulgwine,Thurston Wolfe 2009 Dr. Wolfe's Family Red Red...,Red Blend,Thurston Wolfe


**rename()** 函数允许你通过指定 *index* 或者 *column* 关键词参数来修改对应的名称。它支持很多输入格式，不过python的字典格式是最方便使用的。比如我们修改制定的某几行索引名称：

In [3]:
wine_reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
firstEntry,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
secondEntry,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
49998,Italy,The beautiful thing about this wine is the ric...,Le Rive,90,45.0,Veneto,Soave Classico,,,,Suavia 2008 Le Rive (Soave Classico),Garganega,Suavia
49999,US,This is a particularly fine vintage for the po...,Dr. Wolfe's Family Red,90,15.0,Washington,Washington,Washington Other,Paul Gregutt,@paulgwine,Thurston Wolfe 2009 Dr. Wolfe's Family Red Red...,Red Blend,Thurston Wolfe


日常使用中我们可能更常修改字段（column）的名称而非索引（index），所以 **set_index()** 可能会更方便一些。比如我们直接把国家当做索引来用：

In [4]:
wine_reviews.set_index('country')

Unnamed: 0_level_0,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...
Italy,The beautiful thing about this wine is the ric...,Le Rive,90,45.0,Veneto,Soave Classico,,,,Suavia 2008 Le Rive (Soave Classico),Garganega,Suavia
US,This is a particularly fine vintage for the po...,Dr. Wolfe's Family Red,90,15.0,Washington,Washington,Washington Other,Paul Gregutt,@paulgwine,Thurston Wolfe 2009 Dr. Wolfe's Family Red Red...,Red Blend,Thurston Wolfe


如果你还是想恢复成原来那样子，只需要用到我们第四节曾经用过的 **reset_index()** 就好啦：

In [5]:
wine_reviews.set_index('country').reset_index()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
49998,Italy,The beautiful thing about this wine is the ric...,Le Rive,90,45.0,Veneto,Soave Classico,,,,Suavia 2008 Le Rive (Soave Classico),Garganega,Suavia
49999,US,This is a particularly fine vintage for the po...,Dr. Wolfe's Family Red,90,15.0,Washington,Washington,Washington Other,Paul Gregutt,@paulgwine,Thurston Wolfe 2009 Dr. Wolfe's Family Red Red...,Red Blend,Thurston Wolfe


#### 不论是行索引（row index）还是列索引（column index）都可以设置名称属性（name attribute）
这里要用到的就是 **rename_axis()** 功能

In [6]:
wine_reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns') # 相当于excel中给表头起名字

fields,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
wines,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
49998,Italy,The beautiful thing about this wine is the ric...,Le Rive,90,45.0,Veneto,Soave Classico,,,,Suavia 2008 Le Rive (Soave Classico),Garganega,Suavia
49999,US,This is a particularly fine vintage for the po...,Dr. Wolfe's Family Red,90,15.0,Washington,Washington,Washington Other,Paul Gregutt,@paulgwine,Thurston Wolfe 2009 Dr. Wolfe's Family Red Red...,Red Blend,Thurston Wolfe


## 合并（combining）
在操作各种数据表的时候，我们有时会需要将多个 Dataframes 或 Series 合并处理。Pandas 提供了三个核心功能来帮助我们实现该需求：<p>
**concat()** , **join()** 和 **merge()** <p>
我们先来造两个小数据集：

In [7]:
pd.set_option("display.max_rows", 10)

df1 = pd.DataFrame({'name':['Lee', 'Jay', 'Casp'],
                   'age':[23, 25, 26],
                   'math':[78, 65, 80]})
df1

Unnamed: 0,name,age,math
0,Lee,23,78
1,Jay,25,65
2,Casp,26,80


In [8]:
df2 = pd.DataFrame({'name':['Dustin', 'Floa', 'Helen'],
                   'age':[22, 24, 21],
                   'math':[81, 77, 90]})
df2

Unnamed: 0,name,age,math
0,Dustin,22,81
1,Floa,24,77
2,Helen,21,90


### concat()
**concat()** 是最简单的合并方法，它将数据按照轴（axis）直接拼接在一起：

In [9]:
df3 = pd.concat([df1, df2])
df3

Unnamed: 0,name,age,math
0,Lee,23,78
1,Jay,25,65
2,Casp,26,80
0,Dustin,22,81
1,Floa,24,77
2,Helen,21,90


因为列名都一样所以直接就拼好了，但是发现索引好像还是原始的012012？不用我说你也知道这里要怎么重置索引了吧！

In [10]:
df3.reset_index()

Unnamed: 0,index,name,age,math
0,0,Lee,23,78
1,1,Jay,25,65
2,2,Casp,26,80
3,0,Dustin,22,81
4,1,Floa,24,77
5,2,Helen,21,90


Whoops，好像还有一点点问题，这里多了一列叫做index的字段，因为 **reset_index()** 函数会默认保留原始 index 到列。<p>那么只要删掉它就好了，有两个好用的方法：第一个就是直接传参数 **reset_index(drop=True)** (默认是false)

In [11]:
df3.reset_index(drop=True)

Unnamed: 0,name,age,math
0,Lee,23,78
1,Jay,25,65
2,Casp,26,80
3,Dustin,22,81
4,Floa,24,77
5,Helen,21,90


你还可以使用 **pd.drop()** 功能直接丢弃掉指定的行或列：<p>
记得参数一定要指定轴 axis=1 （column 列），默认是 axis=0 （row 行）

In [12]:
df4 = df3.reset_index().drop('index', axis=1)
df4

Unnamed: 0,name,age,math
0,Lee,23,78
1,Jay,25,65
2,Casp,26,80
3,Dustin,22,81
4,Floa,24,77
5,Helen,21,90


既然我们提到了如何**删除/丢弃**不需要的行或列，那么就顺手记一下怎么**添加**新的行或列吧。<p>
比较常用的方式添加行（row）就是使用之前学过的 **loc** 操作：

In [13]:
df4.loc[6] = ['Martin',19,53] # 指定添加的index序号，添加的值数量上要和 Dataframe 的 column 数量相同，不然会报错。
df4

Unnamed: 0,name,age,math
0,Lee,23,78
1,Jay,25,65
2,Casp,26,80
3,Dustin,22,81
4,Floa,24,77
5,Helen,21,90
6,Martin,19,53


添加列（column），最直接的方式就是 **df[ ]** 或者也可以使用 **loc** 操作：

In [14]:
df4['art'] = 50 # 给该列全部添加了同样的值
df4.loc[:, 'english'] = [30,40,50,60,70,80,90] # 输入的值如果是一个list，一定要保证list的长度等于Dataframe的长度（行数）
df4

Unnamed: 0,name,age,math,art,english
0,Lee,23,78,50,30
1,Jay,25,65,50,40
2,Casp,26,80,50,50
3,Dustin,22,81,50,60
4,Floa,24,77,50,70
5,Helen,21,90,50,80
6,Martin,19,53,50,90


## join()
更加复杂一些的合并手段就是 **join()** 操作了。**join()** 可以根据共同的行索引（row index）将两个不同的 Dataframe 对象连接在一起：

In [15]:
df1

Unnamed: 0,name,age,math
0,Lee,23,78
1,Jay,25,65
2,Casp,26,80


这里我们再创造一个新的数据集，**需要注意的是：join() 操作需要两个连接的 Dataframe 列名不同，若有相同列名则需改名**

In [16]:
df5 = pd.DataFrame({'color':['black', 'white', 'yellow'],
                   'music':[64, 69, 72]})
df5

Unnamed: 0,color,music
0,black,64
1,white,69
2,yellow,72


In [17]:
df1.join(df5)

Unnamed: 0,name,age,math,color,music
0,Lee,23,78,black,64
1,Jay,25,65,white,69
2,Casp,26,80,yellow,72


这样就实现了把一个 Dataframe 的列（columns）拼接到另一个 Dataframe 身上

## merge()
同样是连接两个 Dataframes，但是 **merge()** 需要两个 Dataframes 有相同的列（column）<p>
我们再加工一下刚才的df5，给它添加name的列：

In [18]:
df5['name'] = ['Lee', 'Jay', 'Jack']
df5

Unnamed: 0,color,music,name
0,black,64,Lee
1,white,69,Jay
2,yellow,72,Jack


In [19]:
pd.merge(df1,df5)

Unnamed: 0,name,age,math,color,music
0,Lee,23,78,black,64
1,Jay,25,65,white,69


可以看到两个数据表根据姓名连接到了一起，因为df5姓名中多了一个Jack，而这个名字没有出现在df1中，所以结果只剩下了共同的两行数据。<p>
至于为什么会这样，这就涉及到merge函数的默认参数问题：<p>
  **merge(left, right, how='inner', on=None, left_on=None, right_on=None, eft_index=False, right_index=False, sort=True, uffixes=('_x', '_y'), copy=True, indicator=False)**

**merge()** 函数里有个参数叫 **how** (连接方式)，默认为内连接（inner join），内连接只取交集。而 **join()** 操作同样有这个参数，默认为左连接（left join），以左侧 Dataframe 的索引为准。我们可以试试看把 **merge** 的连接改为左连接：

In [20]:
pd.merge(df1, df5, how='left')

Unnamed: 0,name,age,math,color,music
0,Lee,23,78,black,64.0
1,Jay,25,65,white,69.0
2,Casp,26,80,,


以df1的索引为准，如果没有能匹配上的数据，则自动填入 NaN 值<p>
#### 关于 **merge** 和 **join** 的更多参数介绍与对比，可以参考[Pandas DataFrame连接表，几种连接方法的对比](https://www.jianshu.com/p/8344df71b2b3)
#### 或直接查询官方文档，我们入门介绍中就不再详述啦~

# —————————— 不怎么华丽的分割线 ——————————
*首先感谢你看到这里，短暂的 Pandas 入门之旅就告一段落啦 ~ 接下来就是检验你学习成果的时候了！* <p>

**我准备了__4__道练习题，里面或许会用到你没学过的知识，不过别担心，相关提示已经写在了题里，加油吧！！**