# Matching

## Is it possible to match every Pokémon in a logical way ?


Let's import useful python libs

In [None]:
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sns

First load the pokemon csv from the Kaggle input directory into a pandas DataFrame

Now that we're setup, let's look at the data

In [None]:
df =  pd.read_csv('../input/pokemon.csv')
df.head(n=10)                              #print the first 10 rows of the table

## Columns

In [None]:
df.info()

There is 801 entries / rows in this dataset, one per pokémon

And there is 41 columns :

* **pokedex_number :** The entry number of the Pokemon in the National Pokedex
* **name :** English name of the pokémon
* **japanese_name :** Original name of the pokémon
* **generation :** The numbered generation which the Pokemon was first introduced
* **type1 :** Primary type of the pokémon
* **type2 :** Second type of the pokémon, null half of time because some pokémons have only one type
* **against_* :** 18 colums, one per type, corresponding to the factor of damage taken against a certain Type
* **hp :** heath point at level 1
* **attack :** attack at level 1
* **defense :** defense at level 1, used against normal attack in battles
* **sp_attack :** special attack at level 1
* **sp_defense :** special defense at level 1, used against special attack in battles
* **speed :** speed at level 1, used to determine which pokémon attack first in battles
* **base_total :** sum of hp, attack, defense, sp_attack, sp_defense and speed
* **abilities :** an array of all the possible passive abilities a pokémon can have when obtained
* **classfication :** a pokédex entry to classify each pokémon
* **height_m :** height in meter, null are unfortunately only missing value
* **weight_kg :** weight in kilogram, null are unfortunately only missing value
* **percentage_male :** the percentate of chance a pokémon will be a male, sexless if null
* **capture_rate :** the chance to capture the pokémon with a pokéball
* **base_egg_steps :** the base number of steps required to hatch an egg of this pokémon
* **base_happiness :** base hapiness when you obtain the pokémon
* **experience_growth :** xp growth rate per level, the more the value, the longer to go to level max
* **is_legendary :** boolean, true if the pokémon is legendary


## Type

Let's check the Pokémons primary type repartition

In [None]:
df.type1.value_counts().plot(kind='pie', title="Pokemon per type", autopct="%1.1f%%", figsize=(12,12), legend=False)

### Weight, height and sex per Type

In [None]:
columns = [
    "type1",
    "weight_kg",
    "height_m",
    "percentage_male"
]

attr_df = df.loc[:, columns]
cm = attr_df["height_m"].apply(lambda x: x * 100)
attr_df["height_cm"] = cm
attr_df = attr_df.drop(["height_m"], axis=1)
mean_grouped_type = attr_df.groupby(["type1"]).mean()
mean_grouped_type.plot(kind="bar", figsize=(12, 12))

Type doen't seems to influence that much the male percentage of the pokémon. However, it seems it influence weight.

## Stats

In [None]:
df_stats = df.loc[:, ["hp", "attack", "defense", "sp_attack", "sp_defense", "speed", "base_total"]]
df_stats.describe()

In [None]:
df_stats.plot(subplots=True, figsize=(12, 14))

In [None]:
df.loc[:,["base_total", "is_legendary"]].plot(subplots=True, figsize=(12, 6))

High total value of stats coincide most of the time with the fact that a Pokémon is legendary

Let's check this

In [None]:
p_df = df.loc[:, ["name", "type1", "type2", "is_legendary", "pokedex_number", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed", "base_total", "height_m", "weight_kg"]]
p_df.nlargest(15, "base_total")

We can effectively see that the pokémons with highest base stats are almost all legendary


## Stats repartition by Type 1

In [None]:
plt.subplots(figsize = (15,5))
plt.title("Attack by Type1")
sns.boxplot(x="type1", y="attack", data=df)
plt.ylim(0,200)
plt.show()

In [None]:
plt.subplots(figsize = (15,5))
plt.title("Defense by Type1")
sns.boxplot(x="type1", y="defense", data=df)
plt.ylim(0,200)
plt.show()

In [None]:
plt.subplots(figsize = (15,5))
plt.title("Attack spe. by Type1")
sns.boxplot(x="type1", y="sp_attack", data=df)
plt.ylim(0,200)
plt.show()

Type seems to influence a lot the base stats of each Pokémons.

For exemple:
* Steel type are usually the most defensive one
* Dragon, Ground and Fighting types usually have the most attack
* Psy, Dragon and Electric tend to be oriented on Atk Spe.

### Attack / Defense scatter plot

In [None]:
p_types = [
  ["fire", "#EE8130"],
  ["water", "#6390F0"]
]

ax = None
for p_type in p_types:
    sub_df = df.loc[df["type1"] == p_type[0]]
    ax = sub_df.plot(x="defense", y="attack", figsize=(12,12), s=70, kind="scatter", ax=ax, color=p_type[1], alpha=0.5)

This graph show that Fire Pokémons tends to be more offensive and less defensive than Water Pokémons

## Correlation between the attributes

In [None]:
plt.figure(figsize=(10,6))
sns.heatmap(p_df.corr(), annot=True) #df.corr() makes a correlation matrix and sns.heatmap is used to show the correlations heatmap
plt.show()

## Matching

Let's build our graph. First let's define a function calculating a matching score for our Pokémons using the observations we made

In [None]:
# ident used to find pokemons
ident = "name" #"pokedex_number" # "name"

def calc_score(pok1, pok2):
    return 1.0

Graph building:

In [None]:
match_percent = 0.15
sample_size = 151
graph = []

# fill empty weight and height
df["weight_kg"].fillna(1, inplace = True)
df["height_m"].fillna(1, inplace = True)

sample = df.head(sample_size)

for index, row in sample.iterrows():
    matchs = []
    weight_diff = row.weight_kg * match_percent
    height_diff = row.height_m * match_percent
    ## exclude automaticaly pokemons with 'match_percent' difference in height 
    sub_df = sample.loc[(row.height_m - height_diff < df.height_m) & (df.height_m < row.height_m + height_diff)]
    for sub_index, sub_row in sub_df.iterrows():
        score = calc_score(row[ident], sub_row[ident])
        if score > 0.60 and row[ident] != sub_row[ident]:
            matchs.append((sub_row[ident], score))
    graph.append((row[ident], matchs))

print (graph[0])

In [None]:
G = nx.Graph()

for pok in graph:
    G.add_node(pok[0]) # ident
    for match in pok[1]:
        G.add_edge(pok[0], match[0])

plt.figure(figsize=(15, 15))
nx.draw(G, with_labels=True, pos=nx.circular_layout(G), node_size=50, font_size=10, width=0.5, alpha=0.3)
plt.show()