# 神奇宝贝数据分析

<ul>
<li><a href="#intro">一. 简介</a></li>
<li><a href="#wrangling">二. 数据整理</a></li>
<li><a href="#eda">三. 探索性数据分析</a></li>
<li><a href="#conclusions">四. 统计结论</a></li>
</ul>

<a id='intro'></a>
## 一. 简介

> **Pokemon数据集包含了从第一代至第七代的Pokemon的名字、属性、能力等信息  
本项目将通过Pokemon数据集探索一下几个问题：**  
1. 具有双属性Pokemon的比例
2. 最常见的Pokemon类别
3. 最常见的属性
4. 各代种族值最高的Pokemon
5. Pokemon雌雄比例
6. Pokemon平均身高和体重
7. 生命值、攻击力、防御力、特殊攻击力和特殊防御力分别最高的Pokemon
8. 传奇Pokemon的比例
9. 初始幸福度最低和最高的Pokemon

> **Pokemon数据库字段描述**  
本数据集中包含 801 条Pokemon信息  

字段|描述|数据类型
-----:|-----:|------:  
pokedex  |  编号  |  int64
name  |  名字  |  object
classfication  |  分类  |  object
type1  |  属性1  |  object
type2  |  属性2  |  object
abilities  |  特性  |  object
percentage_male  |  性别比例（有多少几率为male） |   float64
height_m  |  身高（米）  |  float64
wight_kg  |  体重（公斤） |   float64
base_total  |  种族值  |  int64
hp  |  生命值  |  int64
attack  |  攻击力  |  int64
defense  |  防御力  |  int64
sp_attack  |  特殊攻击力  |  int64
ap_defense  |  特殊防御力  |  int64
speed  |  速度  |  int64
base_happiness  |  初始幸福度 |   int64
capture_rate  |  普通精灵球抓到的概率  |  int64
generation  |  第几代  |  int64
is_legendary  |  是否是传奇Pokemon  |  int64
base_egg_steps  |  孵化步数  |  int64
experience_growth |   100级所需经验值  |  int64
against_bug  |  属性相克-虫  |  float64
against_dark  |  属性相克-恶  |  float64
against_dragon  |  属性相克-龙  |  float64
against_electric  |  属性相克-电  |  float64
against_fairy  |  属性相克-精灵  |  float64
against_fight  |  属性相克-格斗  |  float64
against_fire  |  属性相克-火  |  float64
against_flying  |  属性相克-飞行  |  float64
against_ghost  |  属性相克-幽灵  |  float64
against_grass  |  属性相克-草  |  float64
against_ground  |  属性相克-地面  |  float64
against_ice  |  属性相克-冰  |  float64
against_normal  |  属性相克-普通  |  float64
against_poison  |  属性相克-毒  |  float64
against_psychic  |  属性相克-精神  |  float64
against_rock  |  属性相克-岩石  |  float64
against_steel  |  属性相克-刚  |  float64
against_water  |  属性相克-水  |  float64

<a id='wrangling'></a>
## 二. 数据整理

In [22]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
% matplotlib inline

pd.set_option('max_columns', 100)
pd.set_option('max_rows', 10000)

In [44]:
df = pd.read_csv('pokemon.csv', encoding='ISO-8859-1')

In [50]:
df.head(3)

Unnamed: 0,pokedex,name,classfication,type1,type2,abilities,percentage_male,height_m,weight_kg,base_total,hp,attack,defense,sp_attack,sp_defense,speed,base_happiness,capture_rate,generation,is_legendary,base_egg_steps,experience_growth,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,against_grass,against_ground,against_ice,against_normal,against_poison,against_psychic,against_rock,against_steel,against_water
0,1,Bulbasaur,Seed Pokemon,grass,poison,"['Overgrow', 'Chlorophyll']",88.1,0.7,6.9,318,45,49,49,65,65,45,70,45,1,0,5120,1059860,1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,0.25,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.5
1,2,Ivysaur,Seed Pokemon,grass,poison,"['Overgrow', 'Chlorophyll']",88.1,1.0,13.0,405,60,62,63,80,80,60,70,45,1,0,5120,1059860,1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,0.25,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.5
2,3,Venusaur,Seed Pokemon,grass,poison,"['Overgrow', 'Chlorophyll']",88.1,2.0,100.0,625,80,100,123,122,120,80,70,45,1,0,5120,1059860,1.0,1.0,1.0,0.5,0.5,0.5,2.0,2.0,1.0,0.25,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.5


In [49]:
df.describe()

Unnamed: 0,pokedex,percentage_male,height_m,weight_kg,base_total,hp,attack,defense,sp_attack,sp_defense,speed,base_happiness,capture_rate,generation,is_legendary,base_egg_steps,experience_growth,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,against_grass,against_ground,against_ice,against_normal,against_poison,against_psychic,against_rock,against_steel,against_water
count,801.0,703.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0,801.0
mean,401.0,55.155761,1.155556,60.941199,428.377029,68.958801,77.857678,73.008739,71.305868,70.911361,66.334582,65.362047,98.675406,3.690387,0.087391,7191.011236,1054996.0,0.996255,1.057116,0.968789,1.07397,1.068976,1.065543,1.135456,1.192884,0.985019,1.03402,1.098002,1.208177,0.887016,0.975343,1.005306,1.250312,0.983458,1.058365
std,231.373075,20.261623,1.069952,108.514597,119.203577,26.576015,32.15882,30.769159,32.353826,27.942501,28.907662,19.598948,76.248866,1.93042,0.282583,6558.220422,160255.8,0.597248,0.438142,0.353058,0.654962,0.522167,0.717251,0.691853,0.604488,0.558256,0.788896,0.738818,0.735356,0.266106,0.549375,0.495183,0.697148,0.500117,0.606562
min,1.0,0.0,0.1,0.1,180.0,1.0,5.0,5.0,10.0,20.0,5.0,0.0,3.0,1.0,0.0,1280.0,600000.0,0.25,0.25,0.0,0.0,0.25,0.0,0.25,0.25,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.25,0.25,0.25
25%,201.0,50.0,0.6,9.0,320.0,50.0,55.0,50.0,45.0,50.0,45.0,70.0,45.0,2.0,0.0,5120.0,1000000.0,0.5,1.0,1.0,0.5,1.0,0.5,0.5,1.0,1.0,0.5,1.0,0.5,1.0,0.5,1.0,1.0,0.5,0.5
50%,401.0,50.0,1.0,27.3,435.0,65.0,75.0,70.0,65.0,66.0,65.0,70.0,60.0,4.0,0.0,5120.0,1000000.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
75%,601.0,50.0,1.5,63.0,505.0,80.0,100.0,90.0,91.0,90.0,85.0,70.0,170.0,5.0,0.0,6400.0,1059860.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0
max,801.0,100.0,14.5,999.9,780.0,255.0,185.0,230.0,194.0,230.0,180.0,140.0,255.0,7.0,1.0,30720.0,1640000.0,4.0,4.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,1.0,4.0,4.0,4.0,4.0,4.0


In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 801 entries, 0 to 800
Data columns (total 40 columns):
pokedex              801 non-null int64
name                 801 non-null object
classfication        801 non-null object
type1                801 non-null object
type2                417 non-null object
abilities            801 non-null object
percentage_male      703 non-null float64
height_m             801 non-null float64
weight_kg            801 non-null float64
base_total           801 non-null int64
hp                   801 non-null int64
attack               801 non-null int64
defense              801 non-null int64
sp_attack            801 non-null int64
sp_defense           801 non-null int64
speed                801 non-null int64
base_happiness       801 non-null int64
capture_rate         801 non-null int64
generation           801 non-null int64
is_legendary         801 non-null int64
base_egg_steps       801 non-null int64
experience_growth    801 non-null int64
agai

有部分Pokemon只有一种属性，故type2具有较多的缺失值  
有部分Pokemon不具有性别，故percentage_male也具有部分缺失值

<a id='eda'></a>
## 三. 探索性数据分析

<a id='conclusions'></a>
## 四. 统计结论