## 0.pandas简介
&emsp; &emsp; 首先引用官方介绍：“pandas是一个采用BSD协议的开源库，为Python编程语言提供了高性能，易于使用的数据结构和数据分析工具。”这里不做过多展开，让我们马上通过一个实例来展示pandas的功能吧！首先导入相关库：

In [1]:
# 导入相关库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import datetime
import re

## 1.文件读取
&emsp;&emsp;这里以读取csv文件为例，介绍3种常用的数据读取函数：
- df = pd.read_csv(path='file.csv')
- df = pd.read_json('file.json') *#可以传入json格式字符串*
- df = pd.read_excel('file.xls', sheetname=[0,1..]) *#读取多个sheet，返回多个df的字典*

In [2]:
# 导入数据集
dc = pd.read_csv('data/dc.csv')
marvel = pd.read_csv('data/marvel.csv')

## 2.查看DateFrame
&emsp;&emsp;在导入数据集后，我们可以通过下列函数来查看DateFrame：
- df.info() *#查看DateFrame信息*
- df.describe() *#描述性统计*
- df.columns *#查看列名*
- df.index *#查看索引*
- df.head() *#查看DateFrame头五行*
- df.tail() *#查看DateFrame尾五行*

In [3]:
dc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6896 entries, 0 to 6895
Data columns (total 13 columns):
page_id             6896 non-null int64
name                6896 non-null object
urlslug             6896 non-null object
ID                  4883 non-null object
ALIGN               6295 non-null object
EYE                 3268 non-null object
HAIR                4622 non-null object
SEX                 6771 non-null object
GSM                 64 non-null object
ALIVE               6893 non-null object
APPEARANCES         6541 non-null float64
FIRST APPEARANCE    6827 non-null object
YEAR                6827 non-null float64
dtypes: float64(2), int64(1), object(10)
memory usage: 700.5+ KB


In [4]:
marvel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16376 entries, 0 to 16375
Data columns (total 13 columns):
page_id             16376 non-null int64
name                16376 non-null object
urlslug             16376 non-null object
ID                  12606 non-null object
ALIGN               13564 non-null object
EYE                 6609 non-null object
HAIR                12112 non-null object
SEX                 15522 non-null object
GSM                 90 non-null object
ALIVE               16373 non-null object
APPEARANCES         15280 non-null float64
FIRST APPEARANCE    15561 non-null object
Year                15561 non-null float64
dtypes: float64(2), int64(1), object(10)
memory usage: 1.6+ MB


In [5]:
dc.columns

Index(['page_id', 'name', 'urlslug', 'ID', 'ALIGN', 'EYE', 'HAIR', 'SEX',
       'GSM', 'ALIVE', 'APPEARANCES', 'FIRST APPEARANCE', 'YEAR'],
      dtype='object')

In [6]:
marvel.columns

Index(['page_id', 'name', 'urlslug', 'ID', 'ALIGN', 'EYE', 'HAIR', 'SEX',
       'GSM', 'ALIVE', 'APPEARANCES', 'FIRST APPEARANCE', 'Year'],
      dtype='object')

In [7]:
# 统一列名
marvel.rename(columns={'Year':'YEAR'}, inplace = True)

In [8]:
dc.index

RangeIndex(start=0, stop=6896, step=1)

In [9]:
dc.head()

Unnamed: 0,page_id,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR
0,1422,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,3093.0,"1939, May",1939.0
1,23387,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,2496.0,"1986, October",1986.0
2,1458,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,,Living Characters,1565.0,"1959, October",1959.0
3,1659,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,,Living Characters,1316.0,"1987, February",1987.0
4,1576,Richard Grayson (New Earth),\/wiki\/Richard_Grayson_(New_Earth),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,1237.0,"1940, April",1940.0


In [10]:
marvel.tail()

Unnamed: 0,page_id,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR
16371,657508,Ru'ach (Earth-616),\/Ru%27ach_(Earth-616),No Dual Identity,Bad Characters,Green Eyes,No Hair,Male Characters,,Living Characters,,,
16372,665474,Thane (Thanos' son) (Earth-616),\/Thane_(Thanos%27_son)_(Earth-616),No Dual Identity,Good Characters,Blue Eyes,Bald,Male Characters,,Living Characters,,,
16373,695217,Tinkerer (Skrull) (Earth-616),\/Tinkerer_(Skrull)_(Earth-616),Secret Identity,Bad Characters,Black Eyes,Bald,Male Characters,,Living Characters,,,
16374,708811,TK421 (Spiderling) (Earth-616),\/TK421_(Spiderling)_(Earth-616),Secret Identity,Neutral Characters,,,Male Characters,,Living Characters,,,
16375,673702,Yologarch (Earth-616),\/Yologarch_(Earth-616),,Bad Characters,,,,,Living Characters,,,


## 3.缺失值处理
&emsp;&emsp;在上一步，我们可以看到DateFrame中有不少数据是缺失的（显示为`NaN`），我们可以通过**dropna()函数**去掉含有缺失数据的行，但在这里，我们希望保留这些行，因此使用**fillna()函数**填充缺失的数据：

In [11]:
# 将marvel数据集EYE列缺失的数据填充为Unknow
marvel['EYE'].fillna('UnKnown')

0         Hazel Eyes
1          Blue Eyes
2          Blue Eyes
3          Blue Eyes
4          Blue Eyes
5          Blue Eyes
6         Brown Eyes
7         Brown Eyes
8         Brown Eyes
9          Blue Eyes
10         Blue Eyes
11         Blue Eyes
12        Green Eyes
13         Blue Eyes
14         Blue Eyes
15         Blue Eyes
16         Grey Eyes
17        Green Eyes
18         Blue Eyes
19        Brown Eyes
20         Blue Eyes
21         Blue Eyes
22         Blue Eyes
23         Blue Eyes
24        Green Eyes
25        Brown Eyes
26         Blue Eyes
27        Green Eyes
28        Green Eyes
29       Yellow Eyes
            ...     
16346        UnKnown
16347        UnKnown
16348        UnKnown
16349     White Eyes
16350        UnKnown
16351        UnKnown
16352        UnKnown
16353        UnKnown
16354        UnKnown
16355        UnKnown
16356        UnKnown
16357        UnKnown
16358        UnKnown
16359     Black Eyes
16360     Black Eyes
16361        UnKnown
16362       R

In [12]:
dc['EYE'].fillna('UnKnown')

0        Blue Eyes
1        Blue Eyes
2       Brown Eyes
3       Brown Eyes
4        Blue Eyes
5        Blue Eyes
6        Blue Eyes
7        Blue Eyes
8        Blue Eyes
9        Blue Eyes
10       Blue Eyes
11       Blue Eyes
12       Blue Eyes
13       Blue Eyes
14       Blue Eyes
15       Blue Eyes
16       Blue Eyes
17      Green Eyes
18      Brown Eyes
19      Green Eyes
20      Green Eyes
21       Blue Eyes
22       Blue Eyes
23      Green Eyes
24       Blue Eyes
25      Brown Eyes
26       Blue Eyes
27       Blue Eyes
28      Green Eyes
29      Brown Eyes
           ...    
6866     Blue Eyes
6867       UnKnown
6868       UnKnown
6869       UnKnown
6870       UnKnown
6871       UnKnown
6872       UnKnown
6873     Blue Eyes
6874       UnKnown
6875       UnKnown
6876    Green Eyes
6877      Red Eyes
6878       UnKnown
6879     Blue Eyes
6880       UnKnown
6881    Green Eyes
6882    Brown Eyes
6883     Blue Eyes
6884    Black Eyes
6885    Green Eyes
6886       UnKnown
6887       U

In [13]:
dc.head()

Unnamed: 0,page_id,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR
0,1422,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,3093.0,"1939, May",1939.0
1,23387,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,2496.0,"1986, October",1986.0
2,1458,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,,Living Characters,1565.0,"1959, October",1959.0
3,1659,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,,Living Characters,1316.0,"1987, February",1987.0
4,1576,Richard Grayson (New Earth),\/wiki\/Richard_Grayson_(New_Earth),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,1237.0,"1940, April",1940.0


## 4.添加/插入行列
&emsp;&emsp;接下来我们为两个DateFrame添加`COMPANY`列，并演示如何插入行列。其中插入行的过程会略复杂，需要先切割，再拼接。

In [14]:
marvel['COMPANY'] = 'Marvel'

In [15]:
dc['COMPANY'] = 'DC'

In [16]:
# 查看是否添加成功
dc.head()

Unnamed: 0,page_id,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR,COMPANY
0,1422,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,3093.0,"1939, May",1939.0,DC
1,23387,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,2496.0,"1986, October",1986.0,DC
2,1458,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,,Living Characters,1565.0,"1959, October",1959.0,DC
3,1659,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,,Living Characters,1316.0,"1987, February",1987.0,DC
4,1576,Richard Grayson (New Earth),\/wiki\/Richard_Grayson_(New_Earth),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,1237.0,"1940, April",1940.0,DC


In [17]:
# 将page_id列取出
page_id = dc.pop('page_id')

In [18]:
# 检查是否取出成功
dc.head()

Unnamed: 0,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR,COMPANY
0,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,3093.0,"1939, May",1939.0,DC
1,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,2496.0,"1986, October",1986.0,DC
2,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,,Living Characters,1565.0,"1959, October",1959.0,DC
3,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,,Living Characters,1316.0,"1987, February",1987.0,DC
4,Richard Grayson (New Earth),\/wiki\/Richard_Grayson_(New_Earth),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,1237.0,"1940, April",1940.0,DC


In [19]:
# 重新插入page_id列
dc.insert(0, 'page_id', page_id)

In [20]:
# 检查是否插入成功
dc.head()

Unnamed: 0,page_id,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR,COMPANY
0,1422,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,3093.0,"1939, May",1939.0,DC
1,23387,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,2496.0,"1986, October",1986.0,DC
2,1458,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,,Living Characters,1565.0,"1959, October",1959.0,DC
3,1659,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,,Living Characters,1316.0,"1987, February",1987.0,DC
4,1576,Richard Grayson (New Earth),\/wiki\/Richard_Grayson_(New_Earth),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,1237.0,"1940, April",1940.0,DC


In [21]:
# 创建需要插入的行的数据
insertRow = pd.DataFrame([[42, 'Gaius', 'Unknown', 'Secret Identity', 'Good Characters','Blake Eyes', 'Black Hair', 'Male Characters', 'Unknown', 'Living Characters', '2333.0', '1995, August', '1995', 'Unknown']],
                         columns=['page_id', 'name', 'urlslug', 'ID', 'ALIGN', 'EYE', 'HAIR', 'SEX', 'GSM', 'ALIVE', 'APPEARANCES', 'FIRST APPEARANCE', 'YEAR', 'COMPANY'])
# 将dc分割为above和below
above = dc.loc[:2]
below = dc.loc[3:]
# 拼接above和below
dc = above.append(insertRow,ignore_index=True).append(below,ignore_index=True)
dc.head()

Unnamed: 0,page_id,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR,COMPANY
0,1422,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,3093.0,"1939, May",1939,DC
1,23387,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,2496.0,"1986, October",1986,DC
2,1458,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,,Living Characters,1565.0,"1959, October",1959,DC
3,42,Gaius,Unknown,Secret Identity,Good Characters,Blake Eyes,Black Hair,Male Characters,Unknown,Living Characters,2333.0,"1995, August",1995,Unknown
4,1659,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,,Living Characters,1316.0,"1987, February",1987,DC


## 5. 合并DateFrame
&emsp;&emsp;通过**concat()函数**合并DateFrame——`pd.concat(list)`，`list`中为各个DateFrame。

In [22]:
comic = pd.concat([dc, marvel], ignore_index=True)

In [23]:
comic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23273 entries, 0 to 23272
Data columns (total 14 columns):
page_id             23273 non-null int64
name                23273 non-null object
urlslug             23273 non-null object
ID                  17490 non-null object
ALIGN               19860 non-null object
EYE                 9878 non-null object
HAIR                16735 non-null object
SEX                 22294 non-null object
GSM                 155 non-null object
ALIVE               23267 non-null object
APPEARANCES         21822 non-null object
FIRST APPEARANCE    22389 non-null object
YEAR                22389 non-null object
COMPANY             23273 non-null object
dtypes: int64(1), object(13)
memory usage: 2.5+ MB


In [24]:
comic.head()

Unnamed: 0,page_id,name,urlslug,ID,ALIGN,EYE,HAIR,SEX,GSM,ALIVE,APPEARANCES,FIRST APPEARANCE,YEAR,COMPANY
0,1422,Batman (Bruce Wayne),\/wiki\/Batman_(Bruce_Wayne),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,3093.0,"1939, May",1939,DC
1,23387,Superman (Clark Kent),\/wiki\/Superman_(Clark_Kent),Secret Identity,Good Characters,Blue Eyes,Black Hair,Male Characters,,Living Characters,2496.0,"1986, October",1986,DC
2,1458,Green Lantern (Hal Jordan),\/wiki\/Green_Lantern_(Hal_Jordan),Secret Identity,Good Characters,Brown Eyes,Brown Hair,Male Characters,,Living Characters,1565.0,"1959, October",1959,DC
3,42,Gaius,Unknown,Secret Identity,Good Characters,Blake Eyes,Black Hair,Male Characters,Unknown,Living Characters,2333.0,"1995, August",1995,Unknown
4,1659,James Gordon (New Earth),\/wiki\/James_Gordon_(New_Earth),Public Identity,Good Characters,Brown Eyes,White Hair,Male Characters,,Living Characters,1316.0,"1987, February",1987,DC


## 6.导出数据
&emsp;&emsp;通过**to_csv()函数**导出数据：

In [25]:
comic.to_csv('data/comic_characters.csv')