# Pokédex Dataset

Goals of the study:

* How does height and weight of a Pokémon correlate with its various base stats?
* What are the general distributions for the various Pokémon segments?
* What factors influence the Experience Growth and Egg Steps? Are these quantities correlated?
* Which type is the strongest overall? Which is the weakest

In [None]:
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

from sklearn.preprocessing import LabelEncoder, MinMaxScaler


In [None]:
df = pd.read_csv('/kaggle/input/pokedex/pokemon.csv', encoding='utf-16-le')
df.head()

In [None]:
df.info()

There are Pokémons with no secondary type and there is some others that don't have percent_male/female. For the second we will asume that there are 50% each.

In [None]:
df['secondary_type'].fillna('None', inplace=True)
df['percent_male'].replace(to_replace='50*',value='50.00',inplace=True)
df['percent_female'].replace(to_replace='50*',value='50.00',inplace=True)
df['percent_male'].fillna('50.00', inplace=True)
df['percent_female'].fillna('50.00', inplace=True)
df['capture_rate'].replace(to_replace='30 (Meteorite)255 (Core)',value='30',inplace=True)


S_type_list = df['secondary_type'].unique().tolist()
P_type_list = df['primary_type'].unique().tolist()
Class_list = df['classification'].unique().tolist()
Gen_list = df['gen'].unique().tolist()

df['gen'] = LabelEncoder().fit_transform(df['gen'])
df['Primary_type'] = LabelEncoder().fit_transform(df['primary_type'])
df['Secondary_type'] = LabelEncoder().fit_transform(df['secondary_type'])
df['classification'] = LabelEncoder().fit_transform(df['classification'])
df['percent_male'] = df['percent_male'].astype(float)
df['percent_female'] = df['percent_female'].astype(float)
df['capture_rate'] = df['capture_rate'].astype(int)

df['percent_male'].fillna(50.00, inplace=True)
df['percent_female'].fillna(50.00, inplace=True)

S_type_list = pd.DataFrame(sorted(list(zip(S_type_list,df['secondary_type'].unique())),key=lambda x: x[1]))
P_type_list = pd.DataFrame(sorted(list(zip(P_type_list,df['primary_type'].unique())),key=lambda x: x[1]))
Class_list = pd.DataFrame(sorted(list(zip(Class_list,df['classification'].unique())),key=lambda x: x[1]))
Gen_list = pd.DataFrame(sorted(list(zip(Gen_list,df['gen'].unique())),key=lambda x: x[1]))

## Visualization

In [None]:
Base_stats = ['Primary_type','primary_type','Secondary_type','secondary_type','classification','percent_male','percent_female',
              'height_m','weight_kg','capture_rate','base_egg_steps','hp','attack','defense',
              'sp_attack','sp_defense','speed','is_sublegendary','is_legendary','is_mythical']

df_BS = df[Base_stats]
df_BS.head()

### Correlations

In [None]:
plt.figure(figsize=(14,12))

heatmap = sns.heatmap(df_BS.corr(), vmin=-1,vmax=1, annot=True, cmap='viridis')

heatmap.set_title('Correlation Base Stats Heatmap', fontdict={'fontsize':12}, pad=12)
plt.show()

In order to answer the first question: *How does height and weight of a Pokémon correlate with its various base stats?*.

*Height* is correlated to:
* Weight, strongly.
* Base Egg Steps, moderately.
* hp, strongly.
* attack, moderately.
* defense, moderately.
* sp_attack, moderately.
* sp_defense, moderately.
* speed, moderately.
* sublegendary, weakly.
* legendary, moderately.

*Weight* is correlated to:
* Height, strongly.
* Base Egg Steps, strongly.
* hp, strongly.
* attack, moderately.
* defense, strongly.
* sp_attack, moderately.
* sp_defense, moderately.
* speed, moderately.
* sublegendary, moderately.
* legendary, moderately.

Experience Growth is not in the data set so the question: *What factors influence the Experience Growth and Egg Steps? Are these quantities correlated?* is impossible to answer. But *Base Egg Steps* is correlated to:
* Height, moderately.
* Weight, strongly
* hp, moderately.
* attack, moderately.
* defense, moderately.
* sp_attack, strongly.
* sp_defense, moderately.
* speed, moderately.
* sublegendary, strongly.
* legendary, strongly.
* mythical, strongly.

### Distributions

In order to answer the question: *What are the general distributions for the various Pokémon segments?*

In [None]:
fig = px.histogram(df, x='primary_type', color = 'primary_type',                   
                   title = 'Primary Type distribution')

fig.update_layout(xaxis=dict(tickmode ='array',
                             tickvals = np.arange(0,len(df.primary_type.unique())),
                            ticktext = P_type_list[0].values,
                            title = 'Primary Type'),
                 showlegend=False)
fig.update_xaxes(categoryorder='total descending')
fig.show()

In [None]:
fig = px.histogram(df, x='secondary_type', color = 'secondary_type',                   
                   title = 'Secondary Type distribution')

fig.update_layout(xaxis=dict(tickmode ='array',
                            title = 'Secondary Type'),
                 showlegend=False)
fig.update_xaxes(categoryorder='total descending')
fig.show()

In [None]:
fig = px.histogram(df, x='gen', color = 'gen',                   
                   title = 'Generation distribution')

fig.update_layout(xaxis=dict(tickmode ='array',
                             tickvals = np.arange(0,len(df.gen.unique())),
                            ticktext = Gen_list[0].values,
                            title = 'Generation'),
                 showlegend=False)
fig.show()

In [None]:
fig = px.histogram(df, x='height_m',                   
                   title = 'Height distribution')

fig.update_layout(xaxis=dict(tickmode ='array',
                             tickvals = np.arange(0,len(df.height_m.unique())),
                            title = 'm'),
                 showlegend=False)
fig2 = px.histogram(df, x='weight_kg',                   
                   title = 'Weight distribution')

fig2.update_layout(xaxis=dict(title = 'Kg'),
                 showlegend=False)
fig3 = px.histogram(df, x='capture_rate',                   
                   title = 'Capture Rate distribution')

fig3.update_layout(xaxis=dict(title = 'Capture Rate'),
                 showlegend=False)
fig4 = px.histogram(df, x='base_egg_steps',                 
                   title = 'Base Egg Step distribution')

fig4.update_layout(xaxis=dict(title = 'Steps'),
                 showlegend=False)
fig5 = px.histogram(df, x='attack',                   
                   title = 'Attack distribution')

fig5.update_layout(xaxis=dict(title = 'Damage'),
                 showlegend=False)
fig6 = px.histogram(df, x='sp_attack',                   
                   title = 'Special Attack distribution')

fig6.update_layout(xaxis=dict(title = 'Damage'),
                 showlegend=False)
fig7 = px.histogram(df, x='defense',                   
                   title = 'Defense distribution')

fig7.update_layout(xaxis=dict(title = 'Damage Block'),
                 showlegend=False)
fig8 = px.histogram(df, x='sp_defense',                   
                   title = 'Special Defense distribution')

fig8.update_layout(xaxis=dict(title = 'Damge Block'),
                 showlegend=False)
fig9 = px.histogram(df, x='speed',                   
                   title = 'Speed distribution')

fig9.update_layout(xaxis=dict(title = 'Speed'),
                 showlegend=False)
fig10 = px.histogram(df, x='is_sublegendary', color = 'gen',                  
                   title = 'Sublegendary distribution by generation')

fig10.update_layout(xaxis=dict(title = 'Sub Legendary'),
                 showlegend=False)
fig11 = px.histogram(df, x='is_legendary', color = 'gen',                  
                   title = 'Legendary distribution by generation')

fig11.update_layout(xaxis=dict(title = 'Legendary'),
                 showlegend=False)
fig12 = px.histogram(df, x='is_mythical', color = 'gen',                  
                   title = 'Mythical distribution by generation')

fig12.update_layout(xaxis=dict(title = 'Mythical'),
                 showlegend=False)

fig.show()
fig2.show()
fig3.show()
fig4.show()
fig5.show()
fig6.show()
fig7.show()
fig8.show()
fig9.show()
fig10.show()
fig11.show()
fig12.show()

## Search the best

Based in the combat stats will take a mean value combat for each pokémon to see the ranking. Grouping the Pokémon list by its Primary Type will see the best type overall.

This way the last question purposed: *Which type is the strongest overall? Which is the weakest* will be answered.

In [None]:
against_mean = df[[x for x in df.columns.values.tolist() if x not in Base_stats]]
against_mean = against_mean.drop(columns=['national_number','english_name','japanese_name','gen','abilities'])

df['damage_taken_perc']=np.mean(against_mean,axis=1)
df.head()

We are searching for the type which has the less Damage Taken Percentaje.

In [None]:
fig = px.histogram(df, x='damage_taken_perc', color = 'primary_type',                   
                   title = 'Damage Taken Precentaje Mean')

fig.update_layout(xaxis=dict(tickmode ='array',        
                            title = 'Damage taken %'))
                
fig.show()

The best type by precentage of damage taken by other types is 'steel' followed by 'water' and 'dark'.

Let's do an overall combat_stats by:

damage done + damage blocked + speed + hp.

norm(sp_attack+attack) + (1+(1-damage_taken_perc))·norm(sp_defense+defense) +norm(speed) + norm(hp)

In [None]:
attack_tot = df.attack+df.sp_attack
defense_tot = df.defense+df.sp_defense

df_CS = pd.DataFrame(columns=['English_name','P_type','S_type','Attack','Effective_Defense','Speed','Hp','Combat_Stats','Leg/Myth'])
df_CS['English_name']=df.english_name
df_CS['P_type']=df.primary_type
df_CS['S_type']=df.secondary_type
df_CS['Attack']=MinMaxScaler().fit_transform(np.array(attack_tot).reshape(-1,1))
df_CS['Effective_Defense']=(MinMaxScaler().fit_transform(np.array(defense_tot).reshape(-1,1)))*df.damage_taken_perc.values
df_CS['Speed']=MinMaxScaler().fit_transform(np.array(df.speed).reshape(-1,1))
df_CS['Hp']=MinMaxScaler().fit_transform(np.array(df.hp).reshape(-1,1))
df_CS['Combat_Stats']=df_CS[df_CS.columns.values.tolist()[2:6]].sum(axis=1)

for i in range(0,df.shape[0]):
    if df.is_sublegendary.iloc[i]==1:
        df_CS['Leg/Myth'].iloc[i]='Sub_Legendary'
    elif df.is_legendary.iloc[i]==1:
        df_CS['Leg/Myth'].iloc[i]='Legendary'
    elif df.is_mythical.iloc[i]==1:
        df_CS['Leg/Myth'].iloc[i]='Mythical'
    else:
        df_CS['Leg/Myth'].iloc[i]='Normal'


In [None]:
df_CS = df_CS.sort_values('Combat_Stats',ascending=False, ignore_index=True)

In [None]:
print('Top ten pokemons overall')
df_CS.head(10)

In [None]:
print('Top ten worst pokemons overall')
df_CS.tail(10)

In [None]:
fig = px.scatter(df_CS, x='Combat_Stats', y='Leg/Myth', color = 'P_type',
                 hover_name='English_name', hover_data=['S_type'],
                title='Combat stats by type and lLegendary/Mythic')


fig.show()

In [None]:
Combat_mean=[]
for i in P_type_list[0]:
    Combat_mean.append(np.mean(df_CS.Combat_Stats[df_CS.P_type.isin([i])]))
    
Combat_type=pd.DataFrame(list(zip(P_type_list[0],Combat_mean)))

fig = px.bar(x=Combat_type[1], y=Combat_type[0],color=Combat_type[0],orientation='h')

fig.update_layout(xaxis=dict(tickmode ='array',
                             title = 'Combat Stats'),
                  yaxis=dict(tickmode ='array',
                             title = 'Primary Type'),
                  title='Ranking of types',
                 showlegend=True)
fig.update_yaxes(categoryorder='total ascending')

fig.show()

#### Best types overall are:
* Dragon
* Electric
* Fire
* Steel
* Psychic

#### Worst types overall are:
* Bug
* Normal
* Grass
* Fairy
* Poison