# Which Pokemon is the best?
Using this dataset I would like to try and work out which Pokemon is the best, once and for all.
To begin, the first step is to import the data and then examine it.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

import warnings
warnings.filterwarnings(action='ignore')

In [None]:
data = pd.read_csv("../input/pokemon/Pokemon.csv")
print(data.head())

In [None]:
print(data.describe())

In [None]:
print(data.info())

For the most part, the data seems to look good. However, there is a large amount of data missing in the Type 2 column. I'm guessing this is because not all of the early Pokemon had two types. As the purpose of this investigation is to find the best Pokemon, this shouldn't be an issue. If I wished to do machine learning with this data, then there would be a problem and I would need to tackle it. It is also worth noting, some Pokemon have duplicate numbers. This is because there are different versions of the same pokemon (e.g. mega evolutions).

## How to find the best Pokemon?
A quick way to find the best Pokemon would be to look at which Pokemon has both high attack and high defense. So I shall do that first.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(10,10))
ax = sns.scatterplot(data["Attack"], data["Defense"], color='b')
plt.show()

I can see that there appears to be two Pokemon that have high defense and moderately high attack. I'd like to see which Pokemon these are.

In [None]:
data = data.sort_values("Attack", ascending=False)
print(data[data["Defense"] > 200])

Using the metrics of attack and defense, the strongest Pokemon would be Mega Aggron, with Mega Steelix a close second. Another important metric is a Pokemon's health. I shall plot this against Defense to see if there are any trends here.

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(10,10))
ax = sns.scatterplot(data["Defense"], data["HP"], color='b')
plt.show()

I was expecting to find some Pokemon in the top right corner of this plot, with high defense and high HP. Since this didn't go how I thought it would, I wish to explore the other parameters in this data set. The easiest way to do this, is to make a box and whisker plot.

In [None]:
sns.boxplot(data=data)

The Pokemon number and Total columns see to have much larger range than the other features, whilst Generation number and Legendary status have much smaller ranges. These columns do not seem to influence the other factors, so I shall remove them. If this was a machine learning project, I would keep the Legendary column as I think this would be a key factor.

In [None]:
pkmn = data.drop(['#', 'Total', 'Generation', 'Legendary'], axis=1)
sns.boxplot(data=pkmn)

With those columns removed, the data is now much clearer to see. However, this plot does not take into account Pokemon type, so I shall make a swarm plot that expresses this. I need to prepare the data by using the Pandas function melt, which reduces the dimensions of the data set.

In [None]:
pkmn = pd.melt(pkmn, id_vars=["Name", "Type 1", "Type 2"], var_name="Stat")

In [None]:
sns.set_style("whitegrid")
with sns.color_palette([
    "#8ED752", "#F95643", "#53AFFE", "#C3D221", "#BBBDAF",
    "#AD5CA2", "#F8E64E", "#F0CA42", "#F9AEFE", "#A35449",
    "#FB61B4", "#CDBD72", "#7673DA", "#66EBFF", "#8B76FF",
    "#8E6856", "#C3C1D7", "#75A4F9"], n_colors=18, desat=.9):
    plt.figure(figsize=(12,10))
    plt.ylim(0, 275)
    sns.swarmplot(x="Stat", y="value", data=pkmn, hue="Type 1", dodge=True, size=7)
    plt.legend(bbox_to_anchor=(1, 1), loc=2, borderaxespad=0.)

The graph makes it much easier to see a breakdown of all the parameters. However, this graph does not help answer the question of which Pokemon is best. I think the best way to do this, is to introduce a new column that averages all of the other stats. For now this will just be a average but I think a weighted average should be used to better find the best Pokemon. I wish to explore if this has any effect by introducing a second new column, that weights speed slightly less.

In [None]:
data["Average of stats"] = (data["HP"] + data["Attack"] + data["Defense"] + data["Sp. Atk"] + data["Sp. Def"] + data["Speed"])/6
#create a new variable that weights speed slightly less, to see if this changes anything
data["Average of stats_2"] = (data["HP"] + data["Attack"] + data["Defense"] + data["Sp. Atk"] + data["Sp. Def"] + (data["Speed"]*0.5))/6

In [None]:
plt.figure(figsize=(10,10))
sns.scatterplot(data["#"], data["Average of stats"], color='b')
plt.show()

Plotting the average against Pokemon number shows no correlation. However, it can be seen that there are several Pokemon with a higher average than others. If I sort the data by the average column and print the head, I should get the top 5 Pokemon.

In [None]:
data = data.sort_values("Average of stats", ascending=False)
print(data.head())

data = data.sort_values("Average of stats", ascending=False)
print(data.head())

# Conclusion
With my new metric that averages all of the other stats, I find it is a three way tie for most powerful Pokemon! They are: **MegaMewtwo X, MegaMewtwo Y and Mega Rayquaza.** I guess when Pokemon are in their Mega form they become more powerful. The exploration of weighing speed in the average also gave an interesting result and changes which Pokemon is the most powerful. In this case, it's a two way tie between **Primal Groudo**n and **Primal Kyogre**. This shows that investigating weighting this averaged sum is worth considering.
Final note, it seems legendary pokemon truly are the most powerful since the top 5 pokemon in both situations are legendaries.