## 0\. Read Me
The goal of the Data Analysis performed in this Notebook is to be able to identify what pokemon types are the best from a statistical point of view, both as attackers and defenser  

Using csv inputs containing pokemons ids and type and a type chart, we will : 
1. Load the Data into dataframes with homogeneous formats  
2. Perform a quick run down of the best / worst Monotypes attackers to define the metrics used after
3. feed the type matrix with every possible combination of types  

Then, we will perform an analysis on all dual and monotype combinations, creating a top for 
1. Best type to attack  
2. Best type(s) to defend  
3. Best STAB to attack  
4. Best movepools of 3 / 4 types to hit the most types

Finally, we will consider that types are not identically distributed among pokemon, and thus :
1. Study the distribution of types (most seen, missing types)
1. Recompute the best attacking types weighted by the type of every available pokemon (until Sword/Shield)

In [651]:
import numpy as np 
import pandas as pd 
import warnings
warnings.filterwarnings('ignore')

## 1\. Load data
We will use the format of the type from the type chart everywhere.  
The Data by pokemon has been downloaded from csv files available at https://github.com/veekun/pokedex  
The type_chart has been downloaded from the following github: https://github.com/zonination/pokemon-chart

In [652]:
type_chart = pd.read_csv('type_chart.csv', index_col=0)

In [653]:
type_colors = {	'Normal': '#A8A77A',	'Fire': '#EE8130',	'Water': '#6390F0',	'Electric': '#F7D02C',
	'Grass': '#7AC74C',	'Ice': '#96D9D6',	'Fighting': '#C22E28',	'Poison': '#A33EA1',
	'Ground': '#E2BF65',	'Flying': '#A98FF3',	'Psychic': '#F95587',	'Bug': '#A6B91A',
	'Rock': '#B6A136',	'Ghost': '#735797',	'Dragon': '#6F35FC',	'Dark': '#705746',
	'Steel': '#B7B7CE',	'Fairy': '#D685AD'}

In [655]:
poke_id = pd.read_csv('veekun_data/pokemon_types.csv')
pokemons = pd.read_csv('veekun_Data/pokemon.csv', index_col=0)
type_ids = pd.read_csv('veekun_Data/types.csv', index_col=0)
print(type_ids.identifier.unique())
poke_id.head()

['normal' 'fighting' 'flying' 'poison' 'ground' 'rock' 'bug' 'ghost'
 'steel' 'fire' 'water' 'grass' 'electric' 'psychic' 'ice' 'dragon' 'dark'
 'fairy' 'unknown' 'shadow']


Unnamed: 0,pokemon_id,type_id,slot
0,1,12,1
1,1,4,2
2,2,12,1
3,2,4,2
4,3,12,1


### 1.1 Quality Checks

In [654]:
type_order = np.sort(type_chart.columns)
# Check that types names are the same everywhere
pd.Series(type_colors.keys()).isin(type_order).min()

True

In [656]:
# Check for different forms
pokemons[pokemons.identifier.str.contains('meowth')]

Unnamed: 0_level_0,identifier,species_id,height,weight,base_experience,order,is_default
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
52,meowth,52,4,42,58,88,1
10107,meowth-alola,52,4,42,58,89,0
10161,meowth-galar,52,4,75,58,90,0


In [657]:
type_ids['identifier'] = type_ids.identifier.str[0].str.upper() + type_ids.identifier.str[1:]
# check that types names are still the same
type_ids[~type_ids.identifier.isin(type_order)]

Unnamed: 0_level_0,identifier,generation_id,damage_class_id
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10001,Unknown,2,
10002,Shadow,3,


In [658]:
 # we add ' / ' to create double types. We will later see this is not optimal, as it creates twins (a,b) & (b,a) 
poke_id['type_label'] = poke_id.apply(lambda x: ['',' / '][x.slot - 1]  + type_ids.identifier.loc[x.type_id], axis=1)
poke_id['pokemon'] = poke_id.pokemon_id.apply(lambda x: pokemons.loc[x].identifier )
pokedex = pd.DataFrame(poke_id.groupby(['pokemon_id', 'pokemon'])['type_label'].apply(lambda a:a.sum() ))
pokedex.reset_index(1, inplace=True)
pokedex.head()

Unnamed: 0_level_0,pokemon,type_label
pokemon_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,bulbasaur,Grass / Poison
2,ivysaur,Grass / Poison
3,venusaur,Grass / Poison
4,charmander,Fire
5,charmeleon,Fire


### 1.2 Quick Analysis on Monotype

In [726]:
print('Best overall types against monotypes by average multiplyer')
(type_chart).T.mean().sort_values(ascending = False).head()

Best overall types against monotypes by average multiplyer


Attacking
Ground    1.166667
Rock      1.138889
Fire      1.111111
Ice       1.111111
Flying    1.083333
dtype: float64

In [660]:
print('Best super effective types against monotypes')
(type_chart>1).T.sum().sort_values(ascending = False).head()

Best super effective types against monotypes


Attacking
Ground      5
Fighting    5
Rock        4
Ice         4
Fire        4
dtype: int64

In [661]:
print('Best types for at least neutral damage')
(type_chart>=1).T.sum().sort_values(ascending = False).head()

Best types for at least neutral damage


Attacking
Dragon    16
Ghost     16
Normal    15
Ground    15
Dark      15
dtype: int64

## 2\. Deep Analysis on all type combinations

### 2.1 Creating Dual Types and Styler

In [662]:
dual_types_chart = type_chart[type_order].copy()

for n, type_1 in enumerate(type_order):
    for type_2 in type_order[n+1:]:
        dual_types_chart[type_1 + ' / ' + type_2] = type_chart[type_1] * type_chart[type_2]
        
dual_types_chart.head()

Unnamed: 0_level_0,Bug,Dark,Dragon,Electric,Fairy,Fighting,Fire,Flying,Ghost,Grass,...,Poison / Psychic,Poison / Rock,Poison / Steel,Poison / Water,Psychic / Rock,Psychic / Steel,Psychic / Water,Rock / Steel,Rock / Water,Steel / Water
Attacking,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Normal,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,...,1.0,0.5,0.5,1.0,0.5,0.5,1.0,0.25,0.5,0.5
Fire,2.0,1.0,0.5,1.0,1.0,1.0,0.5,1.0,1.0,2.0,...,1.0,0.5,2.0,0.5,0.5,2.0,0.5,1.0,0.25,1.0
Water,1.0,1.0,0.5,1.0,1.0,1.0,2.0,1.0,1.0,0.5,...,1.0,2.0,1.0,0.5,2.0,1.0,0.5,2.0,1.0,0.5
Electric,1.0,1.0,0.5,0.5,1.0,1.0,1.0,2.0,1.0,0.5,...,1.0,1.0,1.0,2.0,1.0,1.0,2.0,1.0,2.0,2.0
Grass,0.5,1.0,0.5,1.0,1.0,1.0,0.5,0.5,1.0,0.5,...,0.5,1.0,0.25,1.0,2.0,0.5,2.0,1.0,4.0,1.0


In [663]:
dual_types = dual_types_chart.columns

In [664]:
# function that return DataFrame with unified style

def color_matrix(value):
    if value in dual_types:
        if ' / ' in value:
            return 'background: linear-gradient(110deg, '+type_colors[value.split(' / ')[0]] +' 50%, ' +type_colors[value.split(' / ')[1]]+ ' 50%)'
        return 'background-color: ' + type_colors[value]
    return ''


def pokemon_styler(data):
    format_maker = {}
    for col in data.columns:
        if 'Ratio' in col: 
            format_maker[col] = '{:.2%}'
    return data.style.applymap(lambda x: color_matrix(x)).set_properties(**{'text-align': 'center'}).format(format_maker)

### 2.2 Best attackers

In [665]:
Attackers_review = pd.DataFrame(index = range(1, len(type_order)+1))
Attackers_review['Overall best attackers'] = (dual_types_chart).T.mean().sort_values(ascending = False).index
Attackers_review['Average multiplier'] = [ 'x' +str(k)[:6] for k in (dual_types_chart).T.mean().sort_values(ascending = False).to_list()]

Attackers_review['Top super effective hits'] = (dual_types_chart>1).T.sum().sort_values(ascending = False).index
Attackers_review['Ratio of super effective hits'] =((dual_types_chart>1).T.sum().sort_values(ascending = False) /  len(dual_types_chart.T)).to_list()
Attackers_review['Top at least neutral hits'] = (dual_types_chart>=1).T.sum().sort_values(ascending = False).index
Attackers_review['Ratio of at least neutral hits'] =((dual_types_chart>=1).T.sum().sort_values(ascending = False) /  len(dual_types_chart.T)).to_list()

In [666]:
pokemon_styler(Attackers_review.iloc[:10])

Unnamed: 0,Overall best attackers,Average multiplier,Top super effective hits,Ratio of super effective hits,Top at least neutral hits,Ratio of at least neutral hits
1,Ground,x1.3230,Ground,38.01%,Ghost,80.70%
2,Rock,x1.2675,Rock,31.58%,Dragon,80.12%
3,Ice,x1.2076,Fighting,29.24%,Rock,77.19%
4,Fire,x1.2076,Fire,29.24%,Ground,76.02%
5,Flying,x1.1535,Ice,29.24%,Flying,75.44%
6,Fairy,x1.1535,Flying,24.56%,Fairy,75.44%
7,Water,x1.1535,Fairy,24.56%,Water,75.44%
8,Fighting,x1.1432,Water,24.56%,Dark,73.68%
9,Steel,x1.0964,Steel,22.81%,Psychic,72.51%
10,Dark,x1.0453,Ghost,18.13%,Ice,70.76%


### 2.3 Best defensers

In [667]:
Defensers_review = pd.DataFrame(index = range(1, len(dual_types)+1))
Defensers_review['Overall best defensers'] = (dual_types_chart).mean().sort_values(ascending = True).index
Defensers_review['Average multiplier'] = [ 'x' +str(k)[:6] for k in (dual_types_chart).mean().sort_values(ascending = True).to_list()]
# we add weight to double weaknesses
Defensers_review['Least number of weaknesses'] = ((dual_types_chart>1).sum() + (dual_types_chart>2).sum() /100).sort_values().index
Defensers_review['Number of weaknesses'] =(dual_types_chart>1).sum().sort_values().to_list()
# we add weight to double resistances and immunities
Defensers_review['Highest number of resistances'] = ((dual_types_chart<1).sum() + (dual_types_chart==0.25).sum()/100 + (dual_types_chart==0).sum()/10).sort_values(ascending=False).index
Defensers_review['Number of resistances'] =(dual_types_chart<1).sum().sort_values(ascending = False).to_list()

In [668]:
pokemon_styler(Defensers_review.iloc[:15])

Unnamed: 0,Overall best defensers,Average multiplier,Least number of weaknesses,Number of weaknesses,Highest number of resistances,Number of resistances
1,Fairy / Steel,x0.7361,Electric,1,Ghost / Steel,12
2,Flying / Steel,x0.75,Dark / Poison,1,Normal / Steel,12
3,Dragon / Steel,x0.7916,Dark / Ghost,1,Electric / Steel,12
4,Ghost / Steel,x0.7916,Normal,1,Fairy / Steel,11
5,Steel / Water,x0.8055,Ghost / Normal,1,Dark / Steel,11
6,Ghost / Normal,x0.8333,Ground / Water,1,Steel / Water,11
7,Steel,x0.8333,Bug / Steel,1,Steel,11
8,Dark / Ghost,x0.8611,Water,2,Flying / Steel,10
9,Fighting / Steel,x0.8611,Fairy / Normal,2,Ground / Steel,10
10,Poison / Steel,x0.875,Electric / Water,2,Fire / Steel,10


### 2.4 Best STABs
In Pokemon, a STAB is an attack that has "same type attack bonus" by bein the same type as its users.  
In this context, we use it to refer to a combo of at most two attacking types, using the best one for each situation.

In [684]:
# creating stab matrix
dual_types_offense = dual_types_chart.T.copy()

for dt in dual_types:
    if '/' in dt: 
        dual_types_offense[dt] = np.maximum(dual_types_chart.loc[dt.split(' / ')[0]],dual_types_chart.loc[dt.split(' / ')[1]] )

dual_types_offense = dual_types_offense.T

In [685]:
best_stab = pd.DataFrame(index = range(1, len(dual_types)+1))

best_stab['Top super effective hits'] = (dual_types_offense>1).T.sum().sort_values(ascending = False).index
best_stab['Ratio of super effective hits'] =((dual_types_offense>1).T.sum().sort_values(ascending = False) /  len(dual_types)).to_list()
best_stab['Top at least neutral hits'] = (dual_types_offense>=1).T.sum().sort_values(ascending = False).index
best_stab['Ratio of at least neutral hits'] = ((dual_types_offense>=1).T.sum().sort_values(ascending = False) / len(dual_types)).to_list()

In [686]:
pokemon_styler(best_stab.iloc[:10])

Unnamed: 0,Top super effective hits,Ratio of super effective hits,Top at least neutral hits,Ratio of at least neutral hits
1,Ground / Ice,63.74%,Fighting / Ghost,99.42%
2,Flying / Ground,61.40%,Ground / Ice,98.83%
3,Ground / Rock,60.82%,Ground / Rock,98.83%
4,Fire / Ground,59.65%,Dark / Fairy,98.25%
5,Fairy / Ground,59.06%,Flying / Ground,98.25%
6,Fighting / Rock,53.80%,Fairy / Ground,98.25%
7,Fighting / Ice,53.22%,Electric / Ice,98.25%
8,Bug / Ground,53.22%,Fairy / Ghost,98.25%
9,Ground / Steel,53.22%,Dragon / Ground,97.66%
10,Fairy / Rock,52.63%,Fairy / Water,97.66%


### 2.5 Best coverage

In [728]:
# For a movepool of 3 attacks, we want to compute how many combinations of 3 types
#   allow us to hit every type with at least neutral damage
num, den = 0, 0
for n, t1 in enumerate(type_order):
    for m, t2 in enumerate(type_order[n+1:]):
        stab = np.maximum(dual_types_chart.loc[t1],dual_types_chart.loc[t2] )
        for t3 in type_order[n+m+2:]:
            cov_stab = np.maximum(stab, dual_types_chart.loc[t3])
            if min(cov_stab)>=1:
                num += 1
            den +=1 
print(num, 'out of ' , den , 'groups of 3 types of attacks can hit every dual type with at least neutral damage')

97 out of  816 groups of 3 types of attacks can hit every dual type with at least neutral damage


In [729]:
# For a movepool of 4 attacks, we want to compute how many
#   types each combination can hit with super effective damage

best_types = []
coverage_df = []
least_neutral_hits = len(dual_types)
for j, t1 in enumerate(type_order):
    for k, t2 in enumerate(type_order[j+1:]):
        stab = np.maximum(dual_types_chart.loc[t1],dual_types_chart.loc[t2] )
        
        for l, t3 in enumerate(type_order[j+k+2:]):
            cov_stab = np.maximum(stab, dual_types_chart.loc[t3])
            
            for t4 in type_order[j+k+l+3:]:
                quad = np.maximum(cov_stab, dual_types_chart.loc[t4])
                coverage_df.append([t1, t2, t3, t4, sum(quad>1)/len(dual_types)])
                
coverage_df = pd.DataFrame(coverage_df, columns=['Type_1', 'Type_2', 'Type_3', 'Type_4', 'Ratio of super effective hits'])
coverage_df.sort_values('Ratio of super effective hits', ascending=False, inplace=True)

In [730]:
pokemon_styler( coverage_df.iloc[:20])

Unnamed: 0,Type_1,Type_2,Type_3,Type_4,Ratio of super effective hits
2292,Fairy,Ground,Ice,Rock,87.72%
1710,Electric,Fairy,Fire,Ground,86.55%
2156,Fairy,Fire,Ground,Rock,85.96%
2152,Fairy,Fire,Ground,Ice,85.96%
2798,Flying,Ground,Ice,Steel,85.38%
2265,Fairy,Grass,Ground,Rock,85.38%
2237,Fairy,Ghost,Ground,Rock,84.80%
370,Bug,Fairy,Ground,Rock,84.80%
930,Dark,Fairy,Ground,Rock,84.80%
2512,Fighting,Ground,Ice,Rock,84.21%


## Adding Pokemons
In this part, we will add  the statistical distribution of type from the pokedex and recompute some of the results above

### Correcting double types

In [675]:
mirror_dual_types = {}
for n, t1 in enumerate(type_order):
    for t2 in type_order[n+1:]:
        mirror_dual_types[t2 +' / ' + t1] = t1 + ' / ' + t2
for tt in dual_types:
    mirror_dual_types[tt] = tt

In [676]:
pokedex['Final_Type'] = pokedex.type_label.apply(lambda t: mirror_dual_types[t]) 

In [731]:
# Exemple : compute how many double types are not in the pokedex yet
pokemon_styler(pd.DataFrame([k for k in dual_types if k not in pokedex.Final_Type.unique()], columns=['Missing Types']))

Unnamed: 0,Missing Types
0,Bug / Dark
1,Bug / Dragon
2,Bug / Normal
3,Electric / Fighting
4,Fairy / Fighting
5,Fairy / Fire
6,Fairy / Ground
7,Fighting / Ground
8,Fire / Grass
9,Ghost / Normal


### Creating weighted DataFrames for type Chart
We create the dataframe by multiplying the distribution array with our type chart matrix

In [734]:
(pokedex.Final_Type.value_counts()/len(pokedex))

Water             0.065934
Normal            0.065934
Electric          0.043956
Psychic           0.041209
Grass             0.040293
                    ...   
Fighting / Ice    0.000916
Fire / Water      0.000916
Steel / Water     0.000916
Grass / Ground    0.000916
Fire / Ice        0.000916
Name: Final_Type, Length: 154, dtype: float64

In [722]:
# for average multiplyer
dual_types_weighted = dual_types_chart.multiply(pokedex.Final_Type.value_counts()).fillna(0)[dual_types]/len(pokedex)
# for super effective
dual_types_weighted_SE = (dual_types_chart>1).multiply(pokedex.Final_Type.value_counts()).fillna(0)[dual_types]/len(pokedex)
# for at least neutral 
dual_types_weighted_ALN = (dual_types_chart>=1).multiply(pokedex.Final_Type.value_counts()).fillna(0)[dual_types]/len(pokedex)

### Best STAB on average 

In [723]:
Attackers_review_weighted = pd.DataFrame(index = range(1, len(type_order)+1))
Attackers_review_weighted['Overall best attackers'] = dual_types_weighted.T.sum().sort_values(ascending = False).index
Attackers_review_weighted['Average multiplier'] = [ 'x' +str(k)[:6] for k in dual_types_weighted.T.sum().sort_values(ascending = False).to_list()]
Attackers_review_weighted['Top super effective hits'] = dual_types_weighted_SE.T.sum().sort_values(ascending = False).index
Attackers_review_weighted['Ratio of super effective hits'] =dual_types_weighted_SE.T.sum().sort_values(ascending = False).to_list()
Attackers_review_weighted['Top at least neutral hits'] = dual_types_weighted_ALN.T.sum().sort_values(ascending = False).index
Attackers_review_weighted['Ratio of at least neutral hits'] =dual_types_weighted_ALN.T.sum().sort_values(ascending = False).to_list()

In [724]:
pokemon_styler(Attackers_review_weighted)

Unnamed: 0,Overall best attackers,Average multiplier,Top super effective hits,Ratio of super effective hits,Top at least neutral hits,Ratio of at least neutral hits
1,Rock,x1.2431,Ice,28.57%,Dragon,87.73%
2,Ice,x1.2140,Rock,27.29%,Rock,83.52%
3,Flying,x1.1531,Ground,26.92%,Dark,82.88%
4,Ground,x1.1341,Fire,24.91%,Ghost,82.42%
5,Fire,x1.1334,Fighting,24.91%,Fairy,81.68%
6,Fairy,x1.0952,Flying,22.71%,Flying,81.14%
7,Fighting,x1.0721,Electric,21.79%,Normal,79.76%
8,Dark,x1.0693,Grass,18.22%,Psychic,77.66%
9,Water,x1.0675,Water,17.95%,Ground,75.64%
10,Electric,x1.0519,Bug,17.67%,Ice,73.08%


In [719]:
# for reminder
pokemon_styler(Attackers_review)

Unnamed: 0,Overall best attackers,Average multiplier,Top super effective hits,Ratio of super effective hits,Top at least neutral hits,Ratio of at least neutral hits
1,Ground,x1.3230,Ground,38.01%,Ghost,80.70%
2,Rock,x1.2675,Rock,31.58%,Dragon,80.12%
3,Ice,x1.2076,Fighting,29.24%,Rock,77.19%
4,Fire,x1.2076,Fire,29.24%,Ground,76.02%
5,Flying,x1.1535,Ice,29.24%,Flying,75.44%
6,Fairy,x1.1535,Flying,24.56%,Fairy,75.44%
7,Water,x1.1535,Fairy,24.56%,Water,75.44%
8,Fighting,x1.1432,Water,24.56%,Dark,73.68%
9,Steel,x1.0964,Steel,22.81%,Psychic,72.51%
10,Dark,x1.0453,Ghost,18.13%,Ice,70.76%


## Conclusion 
This quick analysis taught me, a veteran pokemon player, a few fun facts, such as :
- No moveset is able to hit every pokemon with super effective damage
- One Pokemon (Marshadow),  can hit almost every other pokemon with at least neutral damage (except for Hisuian Zoruark)
- The distribution of types, being very unbalanced, makes some types better attackers than in theory, and other worse.  

As for the data scientist in me, it let me play with fun styling concepts that I had not used yet in Pandas