<a href="https://colab.research.google.com/github/eduardoscovino/Pokemon-analysis/blob/main/Pokemon_Analysis_With_Stats.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('drive/MyDrive/Kaggle/Pokemon.csv')
df.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


## The variables of the dataset

* `Name:` Name of the pokemon
* `Type 1:` Each pokemon has a type. Ex: Grass, water, fire...
* `Type 2:` A pokemon can also have more than one type
* `Total:` The whole power of the pokemon, which is the sum of its attibutes (the following columns)
* `HP:` Hit points (or health), which is the impact a pokemon can handle before being out of combat
* `Attack:` Power of normal attacks
* `Defense:` Power of resistence against normal attacks
* `Sp. Atk:` Power of the special attack
* `Sp. Def:` Power of resistence against special attacks
* `Speed:` The criteria to define which pokemon attacks first each round
* `Generation:` From which generation is the pokemon
* `Legendary:` A boolean variable to inform if the pokemon is legendary or not

## Data Cleaning

Actually, I rather prefer working with columns written uppercased. So, let's do this now.

In [None]:
df.columns = df.columns.str.upper()
df.head()

Unnamed: 0,#,NAME,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [None]:
df.loc[df.NAME.str.contains('Mega')].head()

Unnamed: 0,#,NAME,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
12,9,BlastoiseMega Blastoise,Water,,630,79,103,120,135,115,78,1,False
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False


If we look at the pokemons whose name contains 'Mega', we see that the strings before the 'Mega' are unnecessary.

So let's cut the strings before the 'Mega' in our dataset.

In [None]:
df.NAME = df.NAME.str.replace(".*(?=Mega)", "")
df.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,#,NAME,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,Mega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


## Treatment for null values

In [None]:
print(df.isnull().sum().sort_values(ascending=False) / df.shape[0])

TYPE 2        0.4825
#             0.0000
NAME          0.0000
TYPE 1        0.0000
TOTAL         0.0000
HP            0.0000
ATTACK        0.0000
DEFENSE       0.0000
SP. ATK       0.0000
SP. DEF       0.0000
SPEED         0.0000
GENERATION    0.0000
LEGENDARY     0.0000
dtype: float64


It seems only the `TYPE 2` has null values, which makes sense. These are the pokemons that don't have a dual type. In this case, let's replace the null values with "Type 1".

In [None]:
df['TYPE 2'].fillna('Type 1', inplace=True)
df.loc[df['TYPE 2'] == 'Type 1'].head()

Unnamed: 0,#,NAME,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
4,4,Charmander,Fire,Type 1,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,Type 1,405,58,64,58,80,65,80,1,False
9,7,Squirtle,Water,Type 1,314,44,48,65,50,64,43,1,False
10,8,Wartortle,Water,Type 1,405,59,63,80,65,80,58,1,False
11,9,Blastoise,Water,Type 1,530,79,83,100,85,105,78,1,False


## First impressions of the dataset

In [None]:
print('{} rows'.format(df.shape[0]))
print('{} pokemons'.format(df.NAME.unique().size))

800 rows
800 pokemons


As we got 1 pokemon per row, makes sense to put the column NAME as the index of the dataframe. Even more, the column `#` doens't make much sense to the analysis.

In [None]:
df.set_index('NAME', inplace=True)
df.head()

Unnamed: 0_level_0,#,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
NAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False
Mega Venusaur,3,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,4,Fire,Type 1,309,39,52,43,60,50,65,1,False


In [None]:
df=df.drop(['#'],axis=1) # we don't need this column anymore

In [None]:
df_summary = df.describe() # basic statistic informations
df_summary

Unnamed: 0,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


In [None]:
df.loc[(df.LEGENDARY == True) & (df.GENERATION == 1)]

Unnamed: 0,#,NAME,TYPE 1,TYPE 2,TOTAL,HP,ATTACK,DEFENSE,SP. ATK,SP. DEF,SPEED,GENERATION,LEGENDARY
156,144,Articuno,Ice,Flying,580,90,85,100,95,125,85,1,True
157,145,Zapdos,Electric,Flying,580,90,90,85,125,90,100,1,True
158,146,Moltres,Fire,Flying,580,90,100,90,125,85,90,1,True
162,150,Mewtwo,Psychic,Type 1,680,106,110,90,154,90,130,1,True
163,150,Mega Mewtwo X,Psychic,Fighting,780,106,190,100,154,100,130,1,True
164,150,Mega Mewtwo Y,Psychic,Type 1,780,106,150,70,194,120,140,1,True
