<div align='center'><font size="5" color='#353B47'>A lot of Plotly Visualizations...</font></div>
<div align='center'><font size="4" color="#353B47">... Gotta catch'em all !!!</font></div>
<br>
<div align='center'><img src="https://i.ya-webdesign.com/images/pokemon-yellow-png-8.png" width="500"></div>
<br>
<hr>

<div align='justify'><font size='3'>The purpose of this notebook is to show my approach to the analysis of Pokémon from the first generation to the 8th using interactive and insightful (I hope) visualizations. This study is completely based on this <a src="https://www.kaggle.com/mariotormo/complete-pokemon-dataset-updated-090420">dataset</a></font></div>

## <div id="summary">Summary</div>

**<font size="2"><a href="#part1">EDA</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#chap1">1. Setup & Cleaning</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#chap2">2. Pokemon Stats</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#chap3">3. Barplot and pie charts</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#chap4">4. Main characteristics</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#chap5">5. The Legendaries</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#chap6">6. The best generation</a></font>**
**<br><br><font size="2"><a href="#part2">Modeling</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#part2_chap1">1. Preprocessing</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#part2_chap2">2. Random Forest</a></font>**
**<br>&nbsp;&nbsp;&nbsp;&nbsp;<font size="2"><a href="#part2_chap3">3. Feature Importance</a></font>**

-----

# EDA

# <div id="chap1">1. Setup & Cleaning</div>

In [None]:
import os
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from plotly.subplots import make_subplots
import plotly.express as px
import plotly.graph_objects as go
from plotly import tools
from plotly.offline import iplot, init_notebook_mode

from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

init_notebook_mode()

<div align='justify'><font size='3'>First of all, I'm going to eliminate those columns that I won't use for this study. However, they can be very useful especially for the analysis of affinities of the different types.</font></div>

In [None]:
filepath = '../input/'
pokemon = pd.read_csv(filepath + 'complete-pokemon-dataset-updated-090420/pokedex_(Update_05.20).csv').drop('Unnamed: 0', axis = 1)

columns_to_drop = ['japanese_name', 'german_name', 'against_normal', 'against_fire',
                  'against_water', 'against_electric', 'against_grass', 'against_ice',
                  'against_fight', 'against_poison', 'against_ground', 'against_flying',
                  'against_psychic', 'against_bug', 'against_rock', 'against_ghost',
                  'against_dragon', 'against_dark', 'against_steel', 'against_fairy']

pokemon = pokemon.drop(columns_to_drop, axis = 1)

In [None]:
pokemon.info()

<div align='justify'><font size='3'>The purpose of this study is to show the characteristics of Pokémon in their simplest form. I therefore erase all the attributes that can be applied to them. They will not be useful and are highly likely to bias the study.</font></div>

In [None]:
# Select mega pokemons, dinamax and alolan pokemons
mega_pokemons = pokemon.index[pokemon['name'].apply(lambda x: 'Mega ' in x)].tolist()
dinamax_pokemons = pokemon.index[pokemon['name'].apply(lambda x: 'max' in x)].tolist()
alolan_pokemons = pokemon[pokemon.name.apply(lambda x: 'Alolan' in x) == True].index.tolist()

# Concatenate
to_delete = np.concatenate((mega_pokemons, 
                            dinamax_pokemons, 
                            alolan_pokemons))

# Remove
pokemon = pokemon.drop(to_delete, axis=0)

In [None]:
# Check columns that have NAs
pokemon.isnull().sum()[pokemon.isnull().sum() > 0]

In [None]:
# Clear cache
del(mega_pokemons, 
    dinamax_pokemons, 
    alolan_pokemons, 
    to_delete)

**<font size="2"><a href="#summary">Back to summary</a></font>**

-----

# <div id="chap2">2. Pokemon Stats</div>

<font color="blue" size="4">What is the most powerful pokemon ?</font>

In [None]:
fig = px.histogram(pokemon, x="total_points",
                   marginal="box",
                   hover_data=pokemon.columns)

fig.update_layout(
    title="Total points distribution")

fig.show()

In [None]:
# Get index and print row of pokemon having highest total_points
highest_tot_points_idx = pokemon['total_points'].idxmax()
pokemon.loc[highest_tot_points_idx,:]

<div align='justify'><font size='3'>Primal Kyogre seems to have the highest total points among all selected Pokemons.</font></div>

<font color="blue" size="4">Minimums and Maximums</font>

In [None]:
def find_min_and_max(column_name):
    '''
    Get pokemon name according to its max and min attribute: column_name
    column_name: list of str
    '''
    
    # Find max
    max_index = pokemon[column_name].idxmax()
    max_pokemon = pokemon.loc[max_index, 'name']
    
    # Find min
    min_index = pokemon[column_name].idxmin()
    min_pokemon = pokemon.loc[min_index, 'name']
    
    print(f'Pokemon with min {column_name}: {min_pokemon}\nPokemon with max {column_name}: {max_pokemon}\n')
    return max_index, min_index

In [None]:
# Create dict for min and max values of selected columns
columns = ['attack', 'defense', 'sp_attack', 'sp_defense', 'hp', 'speed', 'catch_rate']
min_dict = {}
max_dict = {}
min_pok = {}
max_pok = {}

for colm in columns:
    max_index, min_index = find_min_and_max(colm)
    max_dict[colm] = pokemon.loc[max_index, colm]
    min_dict[colm] = pokemon.loc[min_index, colm]
    max_pok[colm] = pokemon.loc[max_index, 'name']
    min_pok[colm] = pokemon.loc[min_index, 'name']

In [None]:
fig = go.Figure([go.Bar(x=columns, 
                        y=list(max_dict.values()), 
                        hovertext=[f"{columns[i]}, {list(max_dict.values())[i]}, {list(max_pok.values())[i]}" for i in range(len(columns)) ], 
                        name="Highest")])

fig.add_trace(go.Bar(x=columns, 
                     y=list(min_dict.values()), 
                     hovertext=[f"{columns[i]}, {list(min_dict.values())[i]}, {list(min_pok.values())[i]}" for i in range(len(columns)) ], 
                     name='Lowest'))

fig.update_layout(
    title="Highest vs Lowest barplot")

fig.show()

**<font size="2"><a href="#summary">Back to summary</a></font>**

-----

# <div id="chap3">3. Barplot and pie charts</div>

<div align='justify'><font size='3'>Barplot showing distribution of type_1 and type_2 columns. Checking first that on the same row, type_1 and type_2 can not be equal.</font></div>

In [None]:
def are_row_wise_different(data, col1, col2):
    '''
    Check if two rows are identical
    data: dataframe
    col1: str
    col2: str
    '''
    
    if (sum(data[col1] == data[col2])==0):
        return(f'{col1} and {col2} are row wise different')
    else:
        return(f'at least one row has same value for {col1} and {col2}')
    
# Check that type_1 and type_2 are disjoint
print(are_row_wise_different(pokemon, "type_1", 'type_2'))

In [None]:
# Another way to verify wise row condition
assert sum(pokemon["type_1"] == pokemon["type_2"])==0, 'at least one row has same value for type_1 and type_2'

<font color="blue" size="4">Plot Pokemon Type via a Barplot</font>

In [None]:
graph_1 = pokemon.groupby('type_1').count().sort_values(by = 'name')
index_graph_1 = pokemon.groupby('type_1').count().index

graph_2 = pokemon.groupby('type_2').count().sort_values(by = 'name')
index_graph_2 = pokemon.groupby('type_2').count().index

In [None]:
fig = go.Figure(
    data=[go.Bar(x = index_graph_1, 
                 y=graph_1['name'])],
    layout_title_text="First type distribution",
)

fig.show()

fig = go.Figure(
    data=[go.Bar(x = index_graph_2, 
                 y=graph_2['name'],
                 marker_color = 'mediumpurple')],
    layout_title_text="Second type distribution"
)

fig.show()

<font color="blue" size="4">Same information can be displayed via a Pie chart</font>

In [None]:
fig = make_subplots(rows=1, 
                    cols=2, 
                    specs=[[{'type':'domain'}, 
                            {'type':'domain'}]])

fig.add_trace(go.Pie(labels=index_graph_1, 
                     values=graph_1['name'], 
                     name='Pie chart of first type'),
              1, 1)

fig.add_trace(go.Pie(labels=index_graph_2, 
                     values=graph_2['name'], 
                     name='Pie chart of second type'),
              1, 2)

# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.3, 
                  hoverinfo="label+percent+name")

fig.update_layout(
    title_text="Pie charts of First and Second Type")

fig.show()

<div align='justify'><font size='3'>For the first type, more than a third of the pokémons are of type: stone, steel or water. On the appetence type (secondary), more than a third are also of type: stone, psychic or water. The water and stone types are largely predominant in the distribution of Pokémon types over the 8 generations.</font></div>

**<font size="2"><a href="#summary">Back to summary</a></font>**

-----

# <div id="chap4">4. Main characteristics</div>


<font color="blue" size="4">Radar Charts (Spider Charts)</font>

<div align='justify'><font size='3'>A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. They are used to plot one or more groups of values over multiple common variables.</font></div>

In [None]:
# Select data
columns = ['attack', 'hp', 'defense', 'height_m', 'weight_kg', 'sp_attack', 'sp_defense', 'speed']
df = pokemon[columns].copy()

# Normalize data for better readability
normalized_df=(df-df.min())/(df.max()-df.min())

In [None]:
def radar_chart(pokemon_1_index, pokemon_2_index):
    '''
    Print radarchart of two pokemons
    pokemon_1_index: int, index of pokemon in 'normalized_df'
    pokemon_2_index: int, index of pokemon in 'normalized_df'
    '''
    
    fig = go.Figure()

    fig.add_trace(go.Scatterpolar(
          r=normalized_df.loc[pokemon_1_index,:].tolist(),
          theta=columns,
          fill='toself',
          name=pokemon.loc[pokemon_1_index,'name']
    ))
    
    fig.add_trace(go.Scatterpolar(
          r=normalized_df.loc[pokemon_2_index,:].tolist(),
          theta=columns,
          fill='toself',
          name=pokemon.loc[pokemon_2_index,'name']
    ))

    fig.update_layout(
      polar=dict(
        radialaxis=dict(
          visible=True,
          range=[0, 1]
        )),
      showlegend=True
    )
    
    fig.update_layout(
        title="Radar Chart: "+pokemon.loc[pokemon_1_index,'name']+" VS "+pokemon.loc[pokemon_2_index,'name'])
    
    fig.show()

In [None]:
radar_chart(pokemon_1_index = 100, pokemon_2_index = 97)

<font color="blue" size="4">Impact of height, weight and speed</font>

In [None]:
def cat_total_points(row):
    '''
    Create bins on total_points column
    '''
    
    if row.total_points <300:
        return 'Weakest'
    elif (row.total_points >= 300) & (row.total_points < 600):
        return 'Intermediate'
    else:
        return 'Strong'

# Create bins on total_points column
pokemon['cat_total_points'] = pokemon.apply(cat_total_points, axis='columns')

In [None]:
fig = go.Figure()
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Weakest','height_m'], 
                     name='Weakest',
                marker_color = 'indianred'))
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Intermediate','height_m'], 
                     name='Intermediate',
                marker_color = 'lightseagreen'))
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Strong','height_m'], 
                     name='Strong',
                marker_color = 'mediumpurple'))
fig.update_traces(boxpoints='all', jitter=0)
fig.update_layout(
    title="Height distribution")
fig.show()

fig = go.Figure()
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Weakest','weight_kg'], 
                     name='Weakest',
                marker_color = 'indianred'))
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Intermediate','weight_kg'], 
                     name='Intermediate',
                marker_color = 'lightseagreen'))
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Strong','weight_kg'], 
                     name='Strong',
                marker_color = 'mediumpurple'))
fig.update_traces(boxpoints='all', jitter=0)
fig.update_layout(
    title="Weight distribution")
fig.show()

fig = go.Figure()
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Weakest','speed'], 
                     name='Weakest',
                marker_color = 'indianred'))
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Intermediate','speed'], 
                     name='Intermediate',
                marker_color = 'lightseagreen'))
fig.add_trace(go.Box(y=pokemon.loc[pokemon['cat_total_points']=='Strong','speed'], 
                     name='Strong',
                marker_color = 'mediumpurple'))
fig.update_traces(boxpoints='all', jitter=0)
fig.update_layout(
    title="Speed distribution")

fig.show()

* <font color="blue">Height_m</font> : above 2m, we know that a pokemon can't be in the Weakest category, the median of that category is 0.4m, 1m for Intermediate category and 2.2m for Strongest category.
* <font color="blue">Weight_kg</font> : Median of 6kg for the Weakest, 30kg for the Intermediate category and 195kg for the Strongest one.
* <font color="blue">Speed</font> : Median of Speed 43km/h for the Weakest, 65km/h for the Intermediate category and 97km/h for the Strongest one.

On a whole, Stronger pokemon turn out to be taller, heavier and faster than other ones.

In [None]:
for colm in ['weight_kg', 'height_m', 'speed']:
    find_min_and_max(colm)

<font size="4">Lightest pokemon : Gastly</font>

<img src="https://static.pokemonpets.com/images/monsters-images-300-300/92-Gastly.png">

<font size="4">Heaviest pokemon : Cosmoem</font>

<img src="https://vignette.wikia.nocookie.net/nintendo/images/5/55/Cosmoem.png/revision/latest/top-crop/width/360/height/450?cb=20161220220724&path-prefix=en" width=300>

<font size="4">Smallest pokemon : Joltik</font>

<img src="https://static.pokemonpets.com/images/monsters-images-800-800/2595-Shiny-Joltik.png" width=300>

<font size="4">Biggest pokemon : Eternatus</font>

<img src="https://www.pokepedia.fr/images/1/15/Sprite_890_Infinimax_HOME.png" width=500>

<font size="4">Slowest pokemon : Shuckle</font>

<img src="https://static.pokemonpets.com/images/monsters-images-800-800/213-Shuckle.png" width=350>

<font size="4">Fastest pokemon : Deoxys Speed Forme</font>

<img src="https://images.gameinfo.io/pokemon/256/386-14.png" width = 400>

<font color="blue" size="4">Two dimension density plot</font>

In [None]:
columns = ['height_m', 'weight_kg', 'total_points']

fig = px.density_contour(pokemon[columns], 
                         x=np.log(pokemon['height_m']), 
                         y=np.log(pokemon['weight_kg']), 
                         marginal_x="histogram", 
                         marginal_y="histogram")

fig.update_layout(
    title="Two dimension density plot",
    xaxis_title="log(height_m)",
    yaxis_title="log(weight_kg)")

fig.show()

<font color="blue" size="4">Ternary plot</font>

In [None]:
data_to_consider = pokemon[['hp','attack','defense','total_points','cat_total_points']].copy()
data_to_consider = data_to_consider.dropna(0)

In [None]:
fig = px.scatter_ternary(data_to_consider, 
                         a="hp", 
                         b="attack", 
                         c="defense",
                         color="cat_total_points", 
                         size="total_points", 
                         size_max=15)

fig.update_layout(
    title="Ternary plot")

fig.show()

<div align='justify'><font size='3'>Pokemons belonging to Strong category are gathered in the middle of the ternary plot, which means their defense, attack and health points are quite balanced.</font></div>

**<font size="2"><a href="#summary">Back to summary</a></font>**

-------

# <div id="chap5">5. The Legendaries</div>

In [None]:
fig = make_subplots(rows=2, cols=2)

fig.add_trace(go.Box(y=pokemon.loc[pokemon['status']=='Normal', 'hp'], 
                     name='Not legendary',
                     marker_color = 'indianred'), 
              row=1, 
              col=1)

fig.add_trace(go.Box(y=pokemon.loc[pokemon['status']=='Legendary', 'hp'], 
                     name = 'Legendary',
                     marker_color = 'lightseagreen'), 
              row=1, 
              col=1)

fig.update_layout(title="Health Point, Attack, Defense, Total points Boxplots")
fig.update_traces(quartilemethod="exclusive") # or "inclusive", or "linear" by default

fig.add_trace(go.Box(y=pokemon.loc[pokemon['status']=='Normal', 'attack'],
                     marker_color = 'indianred',
                        
                     showlegend=False), 
              row=1, 
              col=2)
fig.add_trace(go.Box(y=pokemon.loc[pokemon['status']=='Legendary', 'attack'],
                     marker_color = 'lightseagreen', 
                     showlegend=False), 
              row=1, 
              col=2)
fig.update_traces(quartilemethod="exclusive")

fig.add_trace(go.Box(y=pokemon.loc[pokemon['status']=='Normal', 'defense'],
                     marker_color = 'indianred', 
                     showlegend=False), 
              row=2, 
              col=1)
fig.add_trace(go.Box(y=pokemon.loc[pokemon['status']=='Legendary', 'defense'],
                     marker_color = 'lightseagreen', 
                     showlegend=False), 
              row=2, 
              col=1)
fig.update_traces(quartilemethod="exclusive")

fig.add_trace(go.Box(y=pokemon.loc[pokemon['status']=='Normal', 'total_points'],
                     marker_color = 'indianred', 
                     showlegend=False), 
              row=2, 
              col=2)
fig.add_trace(go.Box(y=pokemon.loc[pokemon['status']=='Legendary', 'total_points'],
                     marker_color = 'lightseagreen', 
                     showlegend=False), 
              row=2, 
              col=2)
fig.update_traces(quartilemethod="exclusive")


fig.show()

On a median basis, Legendary pokemons have on average:
* <font color="blue">More hp</font> : from 65 to 100
* <font color="blue">More attack</font> : from 70 to 120
* <font color="blue">More defense</font> : from 70 to 100
* <font color="blue">More total points</font> : from 420 to 680

<font color="blue" size="4">Sub Legendary, Legendary and Mythical categories</font>

In [None]:
fig = px.violin(pokemon, 
                y="hp", 
                color="status", 
                box=True, 
                points="all",
          hover_data=pokemon.columns)
fig.update_layout(title="Pokemon status VS hp")
fig.show()

fig = px.violin(pokemon, 
                y="defense", 
                color="status", 
                box=True, 
                points="all",
          hover_data=pokemon.columns)
fig.update_layout(title="Pokemon status VS defense")
fig.show()

fig = px.violin(pokemon, 
                y="attack", 
                color="status", 
                box=True, 
                points="all",
          hover_data=pokemon.columns)

fig.update_layout(title="Pokemon status VS attack")
fig.show()

<font color="blue" size="4">Special pokemon count per generation</font>

In [None]:
legendary = pokemon[(pokemon['status']=='Legendary')].groupby('generation').count()['name']
sub_legendary = pokemon[(pokemon['status']=='Sub Legendary')].groupby('generation').count()['name']
mythical = pokemon[(pokemon['status']=='Mythical')].groupby('generation').count()['name']

special_pokemons = pd.concat([sub_legendary, legendary, mythical], axis = 1)
special_pokemons.columns = ['Sublegendaries', 'Legendaries', 'Mythicals']
special_pokemons['Total'] = special_pokemons['Sublegendaries'] + special_pokemons['Legendaries'] + special_pokemons['Mythicals']

In [None]:
gen = special_pokemons.index.tolist()

fig = go.Figure(data=[
    go.Bar(name='Sublegendaries', x=gen, y=special_pokemons['Sublegendaries']),
    go.Bar(name='Legendaries', x=gen, y=special_pokemons['Legendaries']),
    go.Bar(name='Mythicals', x=gen, y=special_pokemons['Mythicals']),
    go.Bar(name='Total', x=gen, y=special_pokemons['Total'])
])

fig.update_layout(barmode='group', title = 'Special pokemon count per generation',
                  xaxis_title="Generation",
                  yaxis_title="Count")

fig.show()

<div align='justify'><font size='3'>The seventh generation has release 30 special pokemons, 50% more than the fifth generation which is ranked second in number of special pokemons !</font></div>

<font color="blue" size="4">Special pokemons VS regular pokemons</font>

In [None]:
# Gather is_legendary, is_sub_legendary and is_mythical into a single attribute
pokemon['is_special'] = 0
pokemon.loc[pokemon['status'] != 'Normal', 'is_special'] = 1

In [None]:
columns = ['hp', 'attack', 'defense', 'sp_attack', 'sp_defense', 'speed','weight_kg','height_m', 'status']

fig = px.scatter_matrix(pokemon[columns],
    dimensions=['hp', 'attack', 'defense', 'sp_attack', 'sp_defense', 'speed','weight_kg','height_m'],
    color="status",
    width=1400, 
    height=1400)
fig.update_layout(title='Scatter matrix')
fig.show()

<div align='justify'><font size='3'>This scatterplot matrix shows that for each plot, special pokemon are on a whole all gathered in the top right area meaning that their stats are higher.</font></div>

<font color="blue" size="4">Special pokemon funnel</font>

<div align='justify'><font size='3'>For better readability, I create categorized columns.</font></div>

In [None]:
pokemon['total_points_cat'] = 100
pokemon.loc[(pokemon['total_points'] > 100) & (pokemon['total_points'] <= 200), 'total_points_cat'] = 200
pokemon.loc[(pokemon['total_points'] > 200) & (pokemon['total_points'] <= 300), 'total_points_cat'] = 300
pokemon.loc[(pokemon['total_points'] > 300) & (pokemon['total_points'] <= 400), 'total_points_cat'] = 400
pokemon.loc[(pokemon['total_points'] > 400) & (pokemon['total_points'] <= 500), 'total_points_cat'] = 500
pokemon.loc[(pokemon['total_points'] > 500) & (pokemon['total_points'] <= 600), 'total_points_cat'] = 600
pokemon.loc[(pokemon['total_points'] > 600) & (pokemon['total_points'] <= 700), 'total_points_cat'] = 700
pokemon.loc[(pokemon['total_points'] > 700) & (pokemon['total_points'] <= 800), 'total_points_cat'] = 800
pokemon.loc[(pokemon['total_points'] > 800), 'total_points_cat'] = 900

pokemon['hp_cat'] = 50
pokemon.loc[(pokemon['hp'] > 50) & (pokemon['hp'] <= 75), 'hp_cat'] = 75
pokemon.loc[(pokemon['hp'] > 75) & (pokemon['hp'] <= 100), 'hp_cat'] = 100
pokemon.loc[(pokemon['hp'] > 100) & (pokemon['hp'] <= 130), 'hp_cat'] = 130
pokemon.loc[(pokemon['hp'] > 130), 'hp_cat'] = 150

pokemon['attack_cat'] = 50
pokemon.loc[(pokemon['attack'] > 50) & (pokemon['attack'] <= 75), 'attack_cat'] = 75
pokemon.loc[(pokemon['attack'] > 75) & (pokemon['attack'] <= 100), 'attack_cat'] = 100
pokemon.loc[(pokemon['attack'] > 100) & (pokemon['attack'] <= 150), 'attack_cat'] = 150
pokemon.loc[(pokemon['attack'] > 150), 'attack_cat'] = 160

pokemon['defense_cat'] = 50
pokemon.loc[(pokemon['defense'] > 50) & (pokemon['defense'] <= 75), 'defense_cat'] = 75
pokemon.loc[(pokemon['defense'] > 75) & (pokemon['defense'] <= 100), 'defense_cat'] = 100
pokemon.loc[(pokemon['defense'] > 100) & (pokemon['defense'] <= 150), 'defense_cat'] = 150
pokemon.loc[(pokemon['defense'] > 150), 'defense_cat'] = 175

In [None]:
columns = ['status', 'hp_cat','attack_cat','defense_cat','total_points_cat']

data = pokemon[columns].copy()
fig = px.parallel_categories(data, color="total_points_cat", color_continuous_scale=px.colors.sequential.Inferno)
fig.update_layout(title='Funnel for special pokemons through main categorised characteristics')
fig.show()

**<font size="2"><a href="#summary">Back to summary</a></font>**

-----

# <div id="chap6">6. The best generation</div>

In [None]:
per_gen_pokemon = pokemon.groupby('generation').mean()[['total_points','hp','attack','defense']]

In [None]:
fig = go.Figure()
fig.add_trace(go.Bar(x=per_gen_pokemon.index, y=per_gen_pokemon['hp'],
                    name='Health Points'))
fig.add_trace(go.Bar(x=per_gen_pokemon.index, y=per_gen_pokemon['defense'],
                    name='Defense'))
fig.add_trace(go.Bar(x=per_gen_pokemon.index, y=per_gen_pokemon['attack'],
                    name='Attack'))
fig.add_trace(go.Bar(x=per_gen_pokemon.index, y=per_gen_pokemon['total_points'],
                    name='Total Points'))
fig.update_layout(barmode='stack', title = 'Stacked barplot of characteristic stats aggregated by generation')
fig.show()

<div align='justify'><font size='3'>If pokemon base points are a primary factor to determine globally whether a pokemon is powerful or weak, we can say that the seventh generation is the most powerful one. That may be explained with the fact there are 30 special pokemons. Lets compare the ranks of total_points per generation with the number of special pokemons per generation:

Total_points rank:
* 1st: Gen 7
* 2nd: Gen 4
* 3rd: Gen 6
* 4th: Gen 5
* 5th: Gen 8
* 6th: Gen 3 
* 7th: Gen 1 
* 8th: Gen 2

Special pokemons count:
* 1st: Gen 7
* 2nd: Gen 5
* 3rd: Gen 4
* 4th: Gen 3
* 5th: Gen 2
* 6th: Gen 1 
* 7-8th: Gen 6, 8</font></div>

<div align='justify'><font size='3'>We can't explain total_points average per generation only based on number of special pokemons it contains. In the next chapter I will use a ML algorithm to fit best a function that estimates total_points based on current columns and features I created in this EDA.</font></div>

**<font size="2"><a href="#summary">Back to summary</a></font>**

-------

# Modeling

# <div id="part2_chap1">1. Preprocessing</div>

In [None]:
pokemon.head()

In [None]:
# Number of missing values in each column of training data
pokemon.isnull().sum()

In [None]:
pokemon = pokemon.drop(['type_2','ability_2','ability_hidden','catch_rate','base_friendship','base_experience','egg_type_2','percentage_male'], axis = 1)
pokemon = pokemon.dropna()

In [None]:
# All categorical columns
object_cols = [col for col in pokemon.columns if pokemon[col].dtype == "object"]

object_unique = list(map(lambda col: pokemon[col].unique(), object_cols))
d = dict(zip(object_cols, object_unique))

In [None]:
# Columns that will be one-hot encoded
low_cardinality_cols = [col for col in object_cols if len(pokemon[col].unique()) < 10]

# Columns that will be dropped from the dataset
high_cardinality_cols = list(set(object_cols)-set(low_cardinality_cols))

print('Categorical columns that will be one-hot encoded:', low_cardinality_cols)
print('\nCategorical columns that will be dropped from the dataset:', high_cardinality_cols)

In [None]:
pd.value_counts(pokemon.growth_rate.values)

In [None]:
le = LabelEncoder()
le.fit(['Medium Slow',"Slow",'Fluctuating', "Medium Fast", 'Fast','Erratic'])
pokemon.growth_rate = le.transform(pokemon.growth_rate)

pokemon['is_legendary'] = 0
pokemon['is_sub_legendary'] = 0
pokemon['is_mythical'] = 0
pokemon.loc[pokemon['status']=='Legendary', 'is_legendary'] = 1
pokemon.loc[pokemon['status']=='Sub Legendary', 'is_sub_legendary'] = 1
pokemon.loc[pokemon['status']=='Mythical', 'is_mythical'] = 1

In [None]:
pokemon = pokemon.drop(['pokedex_number','name','status','hp_cat', 'attack_cat','cat_total_points','is_special','total_points_cat','ability_1', 'type_1', 'egg_type_1', 'species'], axis = 1)

In [None]:
pokemon.columns

**<font size="2"><a href="#summary">Back to summary</a></font>**

-----

# <div id="part2_chap2">2. Random Forest</div>

In [None]:
# Shuffling data
pokemon = pokemon.sample(frac=1)

# Split data and target
X = pokemon.drop('total_points', axis = 1)
y = pokemon['total_points']

In [None]:
# Split into train and validation set
train_X, val_X, train_y, val_y = train_test_split(X, y,random_state = 0)

In [None]:
model_rf = RandomForestRegressor(oob_score = True,
                                 random_state=0)

model_rf.fit(train_X, train_y)
preds = model_rf.predict(val_X)

print(mean_absolute_error(val_y, preds))

In [None]:
print('R^2 Training Score: {:.2f} \nOOB Score: {:.2f} \nR^2 Validation Score: {:.2f}'.format(model_rf.score(train_X, train_y), 
                                                                                             model_rf.oob_score_,
                                                                                             model_rf.score(val_X, val_y)))

**<font size="2"><a href="#summary">Back to summary</a></font>**

------

# <div id="part2_chap3">3. Feature Importance</div>


In [None]:
features = train_X.columns
importances = model_rf.feature_importances_
indices = np.argsort(importances)

fig = go.Figure(go.Bar(
            x=importances[indices],
            orientation='h'))

fig.update_layout(
    yaxis = dict(
        tickmode = 'array',
        tickvals = list(range(len(indices))),
        ticktext = [features[i] for i in indices]
    )
)

fig.update_layout(title="Feature importance")
fig.show()

**<font size="2"><a href="#summary">Back to summary</a></font>**

# References

* https://www.kaggle.com/mariotormo/complete-pokemon-dataset-updated-090420#pokedex_(Update.04.20).csv : Complete Pokemon Dataset (Updated 09.04.20)
* https://www.pokebip.com/ : Main French Pokemon news website

<hr>
<br>
<div align='justify'><font color="#353B47" size="4">Thank you for taking the time to read this notebook. I hope that I was able to answer your questions or your curiosity and that it was quite understandable. <u>any constructive comments are welcome</u>. They help me progress and motivate me to share better quality content. I am above all a passionate person who tries to advance my knowledge but also that of others. If you liked it, feel free to <u>upvote and share my work.</u> </font></div>
<br>
<div align='center'><font color="#353B47" size="3">Thank you and may passion guide you.</font></div>