# Series和DataFrame

## 学习目标
+ 掌握Series的常用属性及方法
+ 掌握DataFrame的常用属性及方法
+ 掌握DataFrame行列标签的设置

## 1. Series 详解
Series是 pandas 中用来存储一维数据的容器。

### 1.1 创建 Series
1. 创建 Series 的最简单方法是传入一个Python列表

+ 如果传入的数据类型是统一的数字，那么最终的 dtype 类型是int64
+ 如果传入的数据类型是统一的字符串，那么最终的 dtype 类型是object
+ 如果传入的数据类型是多种类型，那么最终的 dtype 类型也是object

In [3]:
import pandas as pd

s = pd.Series(['banana', 42])
print(s)
print(type(s))

s = pd.Series(['banana', 'apple'])
print(s)
print(type(s))

s = pd.Series([50, 42])
print(s)
print(type(s))

0    banana
1        42
dtype: object
<class 'pandas.core.series.Series'>
0    banana
1     apple
dtype: object
<class 'pandas.core.series.Series'>
0    50
1    42
dtype: int64
<class 'pandas.core.series.Series'>


2. 创建 Series 时，也可以通过 index 参数来指定行标签

In [4]:
s = pd.Series(['smart', 18], index=['name', 'age'])
print(s)
print(type(s))

name    smart
age        18
dtype: object
<class 'pandas.core.series.Series'>


### 1.2 Series 常用操作
#### 常用属性和方法：

|属性或方法|说明|
|----|----|
|s.shape|查看 Series 数据的形状|
|s.size|查看 Series 数据的个数|
|s.index|获取 Series 数据的行标签|
|s.values|获取 Series 数据的元素值|
|s.keys()|获取 Series 数据的行标签，和 s.index 效果相同|
|s.loc[行标签]|根据行标签获取 Series 中的某个元素数据|
|s.iloc[行位置]|根据行位置获取 Series 中的某个元素数据|
|s.dtypes|查看 Series 数据元素的类型|

#### 示例演示：
1.加载 scientists.csv 数据集，并获取 Age 列的数据

In [5]:
scientists = pd.read_csv('./data/scientists.csv')
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [14]:
# 并获取 Age 列的数据
age_series = scientists['Age']
print(age_series)
print(type(age_series))

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64
<class 'pandas.core.series.Series'>


2. 常用属性和方法演示

In [15]:
age_series.shape
age_series.size
age_series.index
age_series.values
age_series.keys()
age_series.loc[1]
age_series.iloc[1]
age_series.dtypes

dtype('int64')

#### 常用统计方法：

|方法|说明|
|----|----|
|s.mean()|计算 Series 数据中元素的平均值|
|s.max()|计算 Series 数据中元素的最大值|
|s.min()|计算 Series 数据中元素的最小值|
|s.std()|计算 Series 数据中元素的标准差|
|s.value_counts()|统计 Series 数据中不同元素的个数|
|s.count()|统计 Series 数据中非空(NaN)元素的个数|
|s.describe()|显示 Series 数据中元素的各种统计值|

#### 示例演示：

1）mean、max、min、std统计方法演示

In [16]:
# 计算年龄的平均值
age_series.mean()

59.125

In [17]:
# 计算年龄的最大值
age_series.max()

90

In [18]:
# 计算年龄的最小值
age_series.min()

37

In [19]:
# 计算年龄的标准差
age_series.std()

18.325918413937288

2. value_counts 统计方法演示

In [20]:
# 获取职业这一列数据
occupation_series = scientists['Occupation']
print(occupation_series)
occupation_series.value_counts()

0               Chemist
1          Statistician
2                 Nurse
3               Chemist
4             Biologist
5             Physician
6    Computer Scientist
7         Mathematician
Name: Occupation, dtype: object


Chemist               2
Biologist             1
Physician             1
Statistician          1
Mathematician         1
Nurse                 1
Computer Scientist    1
Name: Occupation, dtype: int64

3. count 统计方法演示

In [21]:
# 统计 Born 这一列非空元素的个数
scientists['Born'].count()

8

In [22]:
scientists['Born'].size

8

4. describe 统计方法演示

In [24]:
# age_series 是数值型数据
age_series.describe()

# occupation_series 是非数值型数据
occupation_series.describe()

count           8
unique          7
top       Chemist
freq            2
Name: Occupation, dtype: object

Series方法(备查)：
![Series方法](./pic/chapter03-01.png)

### 1.3 bool 索引
Series 支持 bool 索引，可以从 Series 获取 bool 索引为 True 的位置对应的数据。

In [25]:
age_series

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64

In [26]:
bool_values = [False, True, True, True, False, False, False, True]
age_series[bool_values]

1    61
2    90
3    66
7    77
Name: Age, dtype: int64

#### 应用：从 age_series 中筛选出年龄大于平均值的数据.

In [28]:
# 应用：从 age_series 中删选出年龄大于平均值的数据.
age_series[age_series>age_series.mean()]

1    61
2    90
3    66
7    77
Name: Age, dtype: int64

In [27]:
age_series>age_series.mean()

0    False
1     True
2     True
3     True
4    False
5    False
6    False
7     True
Name: Age, dtype: bool

### 1.4 Series 运算

|情况|说明|
|----|----|
|Series 和 数值型数据运算|Series 中的每个元素和数值型数据逐一运算，返回新的 Series|
|Series 和 另一 Series 运算|两个 Series 中相同行标签的元素分别进行运算，若不存在相同的行标签，计算后的结果为 NaN，最终返回新的 Series|

#### Series 和 数值型数据运算：

In [29]:
age_series

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64

In [30]:
# 加法
age_series + 100

0    137
1    161
2    190
3    166
4    156
5    145
6    141
7    177
Name: Age, dtype: int64

In [31]:
# 乘法
age_series * 2

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64

#### Series 和 另一 Series 运算：

In [33]:
# 加法
age_series + age_series
# 乘法
age_series * age_series

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64

In [34]:
# 创建新的 Series 数据
new_series = pd.Series([1, 100])
new_series 

0      1
1    100
dtype: int64

In [35]:
age_series

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64

>注意：age_series 中的一些元素在 new_series 中不存在相同行标签的数据

In [36]:
# 两个 Series 相加
age_series + new_series

0     38.0
1    161.0
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
dtype: float64

In [37]:
# 两个 Series 相乘
age_series * new_series

0      37.0
1    6100.0
2       NaN
3       NaN
4       NaN
5       NaN
6       NaN
7       NaN
dtype: float64

## 2. DataFrame 详解
### 2.1 创建 DataFrame
1. 可以使用字典来创建DataFrame

In [38]:
peoples = pd.DataFrame({
    'Name': ['Smart', 'David'],
    'Occupation': ['Teacher', 'IT Engineer'],
    'Age': [18, 30]
})
peoples

Unnamed: 0,Name,Occupation,Age
0,Smart,Teacher,18
1,David,IT Engineer,30


2. 创建 DataFrame 的时候可以使用colums参数指定列的顺序，也可以使用 index 参数来指定行标签

In [39]:
peoples = pd.DataFrame({
    'Occupation': ['Teacher', 'IT Engineer'],
    'Age': [18, 30]
}, columns=['Age', 'Occupation'], index=['Smart', 'David'])
peoples

Unnamed: 0,Age,Occupation
Smart,18,Teacher
David,30,IT Engineer


3. 也可以使用嵌套列表创建 DataFrame，并使用 columns 参数指定列标签，使用 index 参数来指定行标签

In [40]:
peoples = pd.DataFrame([
    ['Teacher', 18],
    ['IT Engineer', 30]
], columns=['Occupation', 'Age'], index=['Smart', 'David'])
peoples

Unnamed: 0,Occupation,Age
Smart,Teacher,18
David,IT Engineer,30


### 2.2 DataFrame 常用操作

#### 常用属性和方法：

|属性或方法|说明|
|----|----|
|df.shape|查看 DataFrame 数据的形状|
|df.size|查看 DataFrame 数据元素的总个数|
|df.ndim|查看 DataFrame 数据的维度|
|len(df)|获取 DataFrame 数据的行数|
|df.index|获取 DataFrame 数据的行标签|
|df.columns|获取 DataFrame 数据的列标签|
|df.dtypes|查看 DataFrame 每列数据元素的类型|
|df.info()|查看 DataFrame 每列的结构|
|df.head(n)|获取 DataFrame 的前 n 行数据，n 默认为 5|
|df.tail(n)|获取 DataFrame 的后 n 行数据，n 默认为 5|

1. 常用属性和方法演示

In [43]:
scientists
# type(scientists)

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [47]:
scientists.shape
scientists.size
scientists.ndim
len(scientists)

8

In [49]:
scientists.index
scientists.columns

Index(['Name', 'Born', 'Died', 'Age', 'Occupation'], dtype='object')

In [51]:
scientists.dtypes
scientists.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        8 non-null      object
 1   Born        8 non-null      object
 2   Died        8 non-null      object
 3   Age         8 non-null      int64 
 4   Occupation  8 non-null      object
dtypes: int64(1), object(4)
memory usage: 448.0+ bytes


> Pandas与Python常用数据类型对照：

|Pandas类型|Python类型|说明|
|----|----|----|
|object|string|字符串类型|
|int64|int|整形|
|float64|float|浮点型|
|datetime64|datetime|日期时间类型，python中需要加载|

In [53]:
scientists.head()
scientists.tail()

Unnamed: 0,Name,Born,Died,Age,Occupation
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


#### 常用统计方法：

|方法|说明|
|----|----|
|s.max()|计算 DataFrame 数据中每列元素的最大值|
|s.min()|计算 DataFrame 数据中每列元素的最小值|
|s.count()|统计 DataFrame 数据中每列非空(NaN)元素的个数|
|s.describe()|显示 DataFrame 数据中每列元素的各种统计值|

1. max、min、count 演示

In [57]:
scientists.max()
scientists.min()
scientists.count()

Name          8
Born          8
Died          8
Age           8
Occupation    8
dtype: int64

2. describe 方法演示

In [58]:
scientists.describe()

Unnamed: 0,Age
count,8.0
mean,59.125
std,18.325918
min,37.0
25%,44.0
50%,58.5
75%,68.75
max,90.0


> 注意：describe 方法默认只显示数值型列的统计信息，可以通过 include 参数设置显示非数值型列的统计信息

In [59]:
import numpy as np
scientists.describe(include=[np.object_])

Unnamed: 0,Name,Born,Died,Occupation
count,8,8,8,8
unique,8,8,8,7
top,Rosaline Franklin,1777-04-30,1858-06-16,Chemist
freq,1,1,1,2


#### 2.3 bool 索引
DataFrame 支持 bool 索引，可以从 DataFrame 获取 bool 索引为 True 的对应行的数据。

In [60]:
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [61]:
bool_values = [False, True, True, True, False, False, False, True]
scientists[bool_values]

Unnamed: 0,Name,Born,Died,Age,Occupation
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


#### 应用：获取 scientists 中 Age 大于平均值的科学家信息

In [62]:
scientists['Age'] > scientists['Age'].mean()

0    False
1     True
2     True
3     True
4    False
5    False
6    False
7     True
Name: Age, dtype: bool

In [63]:
# 应用：获取 scientists 中 Age 大于平均值的科学家信息
scientists[scientists['Age'] > scientists['Age'].mean()]

Unnamed: 0,Name,Born,Died,Age,Occupation
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


### 2.4 DataFrame 运算

|情况|说明|
|----|----|
|DataFrame 和 数值型数据运算|DataFrame 中的每个元素和数值型数据逐一运算，返回新的 DataFrame|
|DataFrame 和 另一 DataFrame 运算|两个 DataFrame 中相同行标签和列标签的元素分别进行运算，若不存在相同的行标签或列标签，计算后的结果为 NaN，最终返回新的 DataFrame|

#### DataFrame 和 数值型数据运算：

In [64]:
# DataFrame 和 数值型数据运算
scientists * 2

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline FranklinRosaline Franklin,1920-07-251920-07-25,1958-04-161958-04-16,74,ChemistChemist
1,William GossetWilliam Gosset,1876-06-131876-06-13,1937-10-161937-10-16,122,StatisticianStatistician
2,Florence NightingaleFlorence Nightingale,1820-05-121820-05-12,1910-08-131910-08-13,180,NurseNurse
3,Marie CurieMarie Curie,1867-11-071867-11-07,1934-07-041934-07-04,132,ChemistChemist
4,Rachel CarsonRachel Carson,1907-05-271907-05-27,1964-04-141964-04-14,112,BiologistBiologist
5,John SnowJohn Snow,1813-03-151813-03-15,1858-06-161858-06-16,90,PhysicianPhysician
6,Alan TuringAlan Turing,1912-06-231912-06-23,1954-06-071954-06-07,82,Computer ScientistComputer Scientist
7,Johann GaussJohann Gauss,1777-04-301777-04-30,1855-02-231855-02-23,154,MathematicianMathematician


#### DataFrame 和 另一 DataFrame 运算：

In [65]:
# DataFrame 和 另一 DataFrame 运算
scientists + scientists

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline FranklinRosaline Franklin,1920-07-251920-07-25,1958-04-161958-04-16,74,ChemistChemist
1,William GossetWilliam Gosset,1876-06-131876-06-13,1937-10-161937-10-16,122,StatisticianStatistician
2,Florence NightingaleFlorence Nightingale,1820-05-121820-05-12,1910-08-131910-08-13,180,NurseNurse
3,Marie CurieMarie Curie,1867-11-071867-11-07,1934-07-041934-07-04,132,ChemistChemist
4,Rachel CarsonRachel Carson,1907-05-271907-05-27,1964-04-141964-04-14,112,BiologistBiologist
5,John SnowJohn Snow,1813-03-151813-03-15,1858-06-161858-06-16,90,PhysicianPhysician
6,Alan TuringAlan Turing,1912-06-231912-06-23,1954-06-071954-06-07,82,Computer ScientistComputer Scientist
7,Johann GaussJohann Gauss,1777-04-301777-04-30,1855-02-231855-02-23,154,MathematicianMathematician


In [67]:
# DataFrame 和 另一 DataFrame 运算
scientists + scientists[:4]

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline FranklinRosaline Franklin,1920-07-251920-07-25,1958-04-161958-04-16,74.0,ChemistChemist
1,William GossetWilliam Gosset,1876-06-131876-06-13,1937-10-161937-10-16,122.0,StatisticianStatistician
2,Florence NightingaleFlorence Nightingale,1820-05-121820-05-12,1910-08-131910-08-13,180.0,NurseNurse
3,Marie CurieMarie Curie,1867-11-071867-11-07,1934-07-041934-07-04,132.0,ChemistChemist
4,,,,,
5,,,,,
6,,,,,
7,,,,,


### 2.5 行标签和列表签操作
#### 2.5.1 加载数据后，指定某列数据作为行标签

>加载数据文件时，如果不指定行标签，Pandas会自动加上从0开始的行标签；
>可以通过df.set_index('列名')的方法重新将指定的列数据设置为行标签

In [68]:
scientists = pd.read_csv('./data/scientists.csv')
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [69]:
# 设置 Name 列的值作为行标签
scientists_df = scientists.set_index('Name')
scientists_df

Unnamed: 0_level_0,Born,Died,Age,Occupation
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
William Gosset,1876-06-13,1937-10-16,61,Statistician
Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
Marie Curie,1867-11-07,1934-07-04,66,Chemist
Rachel Carson,1907-05-27,1964-04-14,56,Biologist
John Snow,1813-03-15,1858-06-16,45,Physician
Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


设置行标签之后，可以通过 reset_index 方法重置行标签：

In [70]:
# 注意：reset_index返回的是一个新的 DataFrame
scientists_df.reset_index()

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


#### 2.5.2 加载数据时，指定某列数据作为行标签
>加载数据文件的时候，可以通过通过 index_col 参数，指定使用某一列数据作为行标签，index_col 参数可以指定列名或列位置

1. 加载 scientists.csv数据时，将 Name 列设置为行标签

In [72]:
pd.read_csv('./data/scientists.csv', index_col='Name')
# 或
pd.read_csv('./data/scientists.csv', index_col=0)

Unnamed: 0_level_0,Born,Died,Age,Occupation
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
William Gosset,1876-06-13,1937-10-16,61,Statistician
Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
Marie Curie,1867-11-07,1934-07-04,66,Chemist
Rachel Carson,1907-05-27,1964-04-14,56,Biologist
John Snow,1813-03-15,1858-06-16,45,Physician
Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


#### 2.5.3 加载数据后，修改行标签和列标签

|方式|说明|
|----|----|
|df.rename(index={'原行标签名': '新行标签名', ...}, columns={'原列标签名': '新列标签名', ...})|修改指定的行标签和列标签，rename修改后返回新的 DataFrame
|df.index = ['新行标签名1', '新行标签名2', ...] df.columns = ['新列标签名1', '新列标签名2', …]|修改行标签和列标签，直接对原 DataFrame 进行修改|

1. 加载 scientists.csv数据集

In [73]:
scientists = pd.read_csv('./data/scientists.csv', index_col='Name')
scientists

Unnamed: 0_level_0,Born,Died,Age,Occupation
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
William Gosset,1876-06-13,1937-10-16,61,Statistician
Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
Marie Curie,1867-11-07,1934-07-04,66,Chemist
Rachel Carson,1907-05-27,1964-04-14,56,Biologist
John Snow,1813-03-15,1858-06-16,45,Physician
Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


2. 使用 rename 修改行标签和列标签

In [74]:
index_name = {'Rosaline Franklin': 'rosaline franklin', 'John Snow': 'john snow'}
columns_name = {'Born': 'born', 'Age': 'age'}
# 注意：rename 修改之后，返回的是一个新的 DataFrame
scientists.rename(index=index_name, columns=columns_name)

Unnamed: 0_level_0,born,Died,age,Occupation
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
rosaline franklin,1920-07-25,1958-04-16,37,Chemist
William Gosset,1876-06-13,1937-10-16,61,Statistician
Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
Marie Curie,1867-11-07,1934-07-04,66,Chemist
Rachel Carson,1907-05-27,1964-04-14,56,Biologist
john snow,1813-03-15,1858-06-16,45,Physician
Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


3. 使用 df.index 和 df.columns 分别修改行标签和列标签

In [75]:
scientists

Unnamed: 0_level_0,Born,Died,Age,Occupation
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
William Gosset,1876-06-13,1937-10-16,61,Statistician
Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
Marie Curie,1867-11-07,1934-07-04,66,Chemist
Rachel Carson,1907-05-27,1964-04-14,56,Biologist
John Snow,1813-03-15,1858-06-16,45,Physician
Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [76]:
# 修改行标签
scientists.index = ['rosaline franklin', 'William Gosset', 'Florence Nightingale',
       'Marie Curie', 'Rachel Carson', 'john snow', 'Alan Turing',
       'Johann Gauss']
# 修改列标签
scientists.columns = ['born', 'Died', 'age', 'Occupation']
scientists

Unnamed: 0,born,Died,age,Occupation
rosaline franklin,1920-07-25,1958-04-16,37,Chemist
William Gosset,1876-06-13,1937-10-16,61,Statistician
Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
Marie Curie,1867-11-07,1934-07-04,66,Chemist
Rachel Carson,1907-05-27,1964-04-14,56,Biologist
john snow,1813-03-15,1858-06-16,45,Physician
Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


# 总结
+ 掌握Series的常用属性及方法
+ 掌握DataFrame的常用属性及方法
+ 掌握DataFrame行列标签的设置
    + set_index、reset_index
    + rename
    + df.index、df.columns