# 使用python进行数据探索性分析

## 1. 课题    
使用python进行数据探索性分析    
    
## 2. 课前准备    
确保教学计算机已全部安装同一版本的anaconda。（本次使用版本Anaconda3-2023.07-2-Windows-x86_64）    
    
## 3. 教学目标    
3.1 理解Python数据探索性分析的基本概念。    
3.2 熟练使用Python进行数据探索性分析。    
3.3 学会使用Pandas库进行数据导入、数据统计信息查看等操作。    
3.4 学习使用Matplotlib、Seaborn等库进行数据统计信息的可视化展示。    
3.5 对Python数据分析的实际应用场景有深入理解。    
    
## 4. 教学重点    
4.1 Pandas库的数据结构（DataFrame）和数据处理方法。    
4.2 Numpy库的数组操作和矩阵计算。    
4.3 Matplotlib和Seaborn库的图形绘制。    
4.4 数据探索性分析的基本流程和方法。    
    
## 5. 教学难点    
5.1 Pandas数据结构的理解和使用。        
5.2 数据统计信息查看方法的选择和应用。    
5.3 数据统计信息的可视化展示。    

## 实施步骤    
### 步骤 1: 启动Jupyter Notebook    
+ 在搜索栏输入"cmd"命令，启动命令提示符窗口。    
+ 输入"jupyter notebook"命令，并按回车键启动Jupyter Notebook。   
    
### 步骤 2: 创建新的Notebook    
+ 在Jupyter的Web界面中，点击右上角的 "New" 按钮。    
+ 选择 "Python 3"内核来创建一个新的Python 3 Notebook。    
    
### 步骤 3: 导入必要的库

In [1]:
import pandas as pd
import numpy as np

### 步骤 4: 导入数据集

In [2]:
# 用Pandas导入CSV文件
data_csv = pd.read_csv('train_chinese.csv')

### 步骤 5: 查看数据    
5.1 查看数据的基本信息

In [3]:
# 查看数据的前10行
data_csv.head(10)

Unnamed: 0,乘客ID,是否幸存,仓位等级,姓名,性别,年龄,兄弟姐妹个数,父母子女个数,船票信息,票价,客舱,登船港口
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


In [4]:
# 查看数据的列名
data_csv.columns

Index(['乘客ID', '是否幸存', '仓位等级', '姓名', '性别', '年龄', '兄弟姐妹个数', '父母子女个数', '船票信息',
       '票价', '客舱', '登船港口'],
      dtype='object')

In [5]:
# 查看数据的维度（行数和列数）
data_csv.shape

(891, 12)

In [6]:
# 查看数据类型和缺失值
data_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   乘客ID    891 non-null    int64  
 1   是否幸存    891 non-null    int64  
 2   仓位等级    891 non-null    int64  
 3   姓名      891 non-null    object 
 4   性别      891 non-null    object 
 5   年龄      714 non-null    float64
 6   兄弟姐妹个数  891 non-null    int64  
 7   父母子女个数  891 non-null    int64  
 8   船票信息    891 non-null    object 
 9   票价      891 non-null    float64
 10  客舱      204 non-null    object 
 11  登船港口    889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


5.2 查看数据的描述性统计信息

In [7]:
# 描述性统计信息
data_csv.describe()

Unnamed: 0,乘客ID,是否幸存,仓位等级,年龄,兄弟姐妹个数,父母子女个数,票价
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


### 步骤 6: 数据筛选和选择    
6.1 选择列

In [9]:
# 选择特定列
label = data_csv['是否幸存']
label

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: 是否幸存, Length: 891, dtype: int64

6.2 根据条件筛选行

In [11]:
# 筛选出年龄大于50岁或者小于15岁的行
filtered_data = data_csv[(data_csv['年龄'] > 50)|(data_csv['年龄'] < 15)]
filtered_data

Unnamed: 0,乘客ID,是否幸存,仓位等级,姓名,性别,年龄,兄弟姐妹个数,父母子女个数,船票信息,票价,客舱,登船港口
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
...,...,...,...,...,...,...,...,...,...,...,...,...
851,852,0,3,"Svensson, Mr. Johan",male,74.0,0,0,347060,7.7750,,S
852,853,0,3,"Boulos, Miss. Nourelain",female,9.0,1,1,2678,15.2458,,C
857,858,1,1,"Daly, Mr. Peter Denis",male,51.0,0,0,113055,26.5500,E17,S
869,870,1,3,"Johnson, Master. Harold Theodor",male,4.0,1,1,347742,11.1333,,S
