<h2>Pokemon Dream Team</h2>
Everyone loves Pokemons (at least the millenials do). <br/>
Growing up, I used to be a huge nerd and played Red, Green, Yellow, Gold and Silver, Crystal, and Emerald. <br/>
I take my Pokemon seriously (Gold and Silver was the best in my humble opinion). <br/> 
**Question: What would be your dream Pokemon team given six Poke Balls?** <br/>
<img src="https://i.ibb.co/k3XP38J/non-tech-woman-in-tech.png" alt="non-tech-woman-in-tech" border="0" width="80%" hspace="10"><br/>
This would be my dream team now, but I wonder if analyzing the Pokemon directory list would make me change my mind. <br/>
<br/>
**Three Sub-Questions:** <br/>
* Is the pokemon with the biggest Total Points the 'best' pokemon?
* Are legendary pokemons better than non-legendary pokemons? 
* Are there any correlations between Attack-Defense, Attack-Sp.Atk, Attack-Speed, etc.? <br/>


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

pokemon = pd.read_csv("../input/pokemon/Pokemon.csv")
pokemon.head(10) 

In [None]:
pokemon['Generation'].describe()

According to this, there are 1 - 6 generations listed in this data set. I am only interested in the First Generation of pokemons, so we will filter out the rest from now on. 

In [None]:
first_gen = pokemon.loc[pokemon['Generation']==1]
first_gen.describe() 

There are 166 First Generation pokemons, so our population size is 166 from now on.


<h3>Q1: Is the pokemon with the biggest Total Points the 'best' pokemon? </h3>
Column 'Total' is the sum of HP, Attack, Defense, Sp.Atk, Sp.Def, and Speed. <br/>
It would make sense that the pokemon with the highest Total would be the 'best', but how would it compare to the follow-ups? <br/>
It's possible that the Total is skewed - a pokemon with incredible Speed but terrible Attack skills could have higher Total scores than a beastly pokemon. <br/>
<h5>Methodology: </h5>
&emsp;(1) Find the pokemon with the biggest Total Points <br/>
&emsp;(2) Find pokemons with the highest HP, Attack, Defense, Sp.Atk, Sp.Def, and Speed<br/>
&emsp;(3) Find follow-ups from the highest Total<br/>
&emsp;(4) Compare the pokemons based on all their specifications<br/>
&emsp;(5) Make a conclusion about whether the pokemon with the biggest Total Points is the 'best' pokemon. 

In [None]:
# Find the pokemon with the biggest Total Points  
first_gen.loc[first_gen.Total == first_gen.Total.max()] 

# Both Mewtwo X and Y have 780, so we will use both data.
# We will find Top Five and include them in the list for comparison.
result = first_gen.sort_values(by=['Total'],ascending=False).head(5) 

In [None]:
# Find pokemons with the highest HP, Attack, Defense, Sp.Atk, Sp.Def, and Speed
result = result.append(first_gen.loc[first_gen.HP == first_gen.HP.max()])
result = result.append(first_gen.loc[first_gen.Attack == first_gen.Attack.max()])
result = result.append(first_gen.loc[first_gen.Defense == first_gen.Defense.max()])
result = result.append(first_gen.loc[first_gen['Sp. Atk'] == first_gen['Sp. Atk'].max()])
result = result.append(first_gen.loc[first_gen['Sp. Def'] == first_gen['Sp. Def'].max()])
result = result.append(first_gen.loc[first_gen.Speed == first_gen.Speed.max()])
result.reset_index().drop(['index'], axis=1) 

In [None]:
# Compare the pokemons based on all their specifications
# We will delete any duplicates
result = result.drop_duplicates(subset="Name") 
result

In [None]:
fig, axes = plt.subplots(figsize=(15,9), ncols=3, nrows=2)
sns.scatterplot(x="HP", y="Total", data=result, hue="Legendary", ax=axes[0][0])
sns.scatterplot(x="Attack", y="Total", data=result, hue="Legendary", ax=axes[0][1])
sns.scatterplot(x="Defense", y="Total", data=result, hue="Legendary", ax=axes[0][2])
sns.scatterplot(x="Sp. Atk", y="Total", data=result, hue="Legendary", ax=axes[1][0])
sns.scatterplot(x="Sp. Def", y="Total", data=result, hue="Legendary", ax=axes[1][1])
sns.scatterplot(x="Speed", y="Total", data=result, hue="Legendary", ax=axes[1][2])


plt.subplots_adjust(
    wspace  =  0.3, 
    hspace  =  0.3
)

# The highest Total (Mewtwo group) is the only legendary pokemon in the list, so it is highlighted for comparison.

<h3> Q1 Conclusion</h3>
Are pokemons with the highest Total points the best? <br/>
The sample size is small (n=10), but we compared them to the best of the best in each six categories. <br/>
We can tell that the Mewtwo family (#150) generally have high points except Defense. They have particularly high Attack scores. <br/>
**Therefore, we can infer that Mewtwo and its two evolutionary forms (X & Y) are worthy of being called the "Best". **
<figure>
    <img src="https://games-b26f.kxcdn.com/wp-content/uploads/2017/03/3.3-picture-770x470.jpg" width="70%" >
    <figcaption style="text-align:center">Photo from: Dylan Siegler, 'Mega Mewtwo X and Y Become Available in Pokémon Sun and Moon'</figcaption>
</figure>

<h3>Q2: Are legendary pokemons better than non-legendary pokemons? </h3>
When playing the Pokemon games, it almost seemed like the purpose of playing is to capture the 'legendary' pokemon in the end. <br/>
There was no question about it growing up, but today I wonder if legendary pokemons are really the best. <br/>

<h5>Methodology: </h5>
&emsp;(1) Identify how many legendary pokemons exist in the list. <br/>
&emsp;(2) Try to find common traits among the legendary pokemons. <br/>
&emsp;(3) Try to find the unique traits compared to non-legendary pokemons. <br/>
&emsp;(4) Identify any outliers within the data set and their significance. <br/> 
&emsp;(5) Make a conclusion about whether legendary pokemons are more unique/stronger than non-legendary pokemons.

In [None]:
legend = first_gen.loc[first_gen.Legendary == True] 
# Out of 166 First Generation pokemons, 6 are Legendary.
legend

In [None]:
# Let's compare the Legendary pokemons from the average pokemons
fig, axes = plt.subplots(figsize=(15,10), ncols=3, nrows=2)
sns.boxplot(x='Legendary', y='HP', data=first_gen, palette='magma', ax=axes[0][0])
sns.boxplot(x='Legendary', y='Attack', data=first_gen, palette='magma', ax=axes[0][1])
sns.boxplot(x='Legendary', y='Defense', data=first_gen, palette='magma', ax=axes[0][2])
sns.boxplot(x='Legendary', y='Sp. Atk', data=first_gen, palette='magma', ax=axes[1][0])
sns.boxplot(x='Legendary', y='Sp. Def', data=first_gen, palette='magma', ax=axes[1][1])
sns.boxplot(x='Legendary', y='Speed', data=first_gen, palette='magma', ax=axes[1][2])

In [None]:
fig, axes = plt.subplots(figsize=(15,18), ncols=2, nrows=3)
sns.violinplot(x=first_gen["Legendary"], y=first_gen["HP"], ax=axes[0][0], palette='BrBG')
sns.violinplot(x=first_gen["Legendary"], y=first_gen["Attack"], ax=axes[0][1], palette='BrBG')
sns.violinplot(x=first_gen["Legendary"], y=first_gen["Defense"], ax=axes[1][0], palette='BrBG')
sns.violinplot(x=first_gen["Legendary"], y=first_gen["Sp. Atk"], ax=axes[1][1], palette='BrBG')
sns.violinplot(x=first_gen["Legendary"], y=first_gen["Sp. Def"], ax=axes[2][0], palette='BrBG')
sns.violinplot(x=first_gen["Legendary"], y=first_gen["Speed"], ax=axes[2][1], palette='BrBG')

In [None]:
fig, axes = plt.subplots(figsize=(15,9), ncols=3, nrows=2)
sns.swarmplot(y="HP", data=first_gen, ax=axes[0][0], dodge=True)
sns.swarmplot(y="HP", data=legend, ax=axes[0][0], color='orange', dodge=True)
sns.swarmplot(y="Attack", data=first_gen, ax=axes[0][1], dodge=True)
sns.swarmplot(y="Attack", data=legend, ax=axes[0][1], color='orange', dodge=True)
sns.swarmplot(y="Defense", data=first_gen, ax=axes[0][2], dodge=True)
sns.swarmplot(y="Defense", data=legend, ax=axes[0][2], color='orange', dodge=True)
sns.swarmplot(y="Sp. Atk", data=first_gen, ax=axes[1][0], dodge=True)
sns.swarmplot(y="Sp. Atk", data=legend, ax=axes[1][0], color='orange', dodge=True)
sns.swarmplot(y="Sp. Def", data=first_gen, ax=axes[1][1], dodge=True)
sns.swarmplot(y="Sp. Def", data=legend, ax=axes[1][1], color='orange', dodge=True)
sns.swarmplot(y="Speed", data=first_gen, ax=axes[1][2], dodge=True)
sns.swarmplot(y="Speed", data=legend, ax=axes[1][2], color='orange', dodge=True)
 
plt.subplots_adjust(
    wspace  =  0.3, 
    hspace  =  0.3
) 

<h3>Q2 Conclusion</h3>
From the boxplot, we can tell that Legendary pokemons have higher mean for all 6 skills than non-legendary pokemons. Their first-quartile data points were also higher. <br/>
From the violin plot, we can see that Legendary pokemons' distribution focal points (the widest parts) were also placed higher than non-legendary pokemons.<br/>
From the swarm plot, we've highlighted indvidual Legendary pokemons with orange points. There are obviously some non-legendary pokemons with higher specifications (ex: Defense). <br/>
**In conclusion, we can infer that legendary pokemons do not necessarily guarantee having the highest skill points individually, but as a group share higher range of skill points than non-legendary pokemons.** <br/>
<img src="https://i.ibb.co/NNGP2zt/non-tech-woman-in-tech-1.png" alt="non-tech-woman-in-tech-1" width="80%">

<h3>Q3: Are there any correlations between Attack-Defense, Attack-Sp.Atk, Attack-Speed, etc.?</h3>
In human world, there are correlations to your abilities. For example, weight and height have high correlation. <br/>
In Pokemon world, there are six categories: HP/Attack/Defense/Sp.Atk/Sp.Def/Speed.<br/>
Are there any correlations? If so, how strong or weak are the correlations?

<h5>Methodology: </h5>
&emsp;(1) Decide on which categories to use for correlation analysis.<br/>
&emsp;(2) Draw out the plots for correlation and see if there is any present.<br/>
&emsp;(3) If there are correlations, calculate how strong or weak they are.<br/>

In [None]:
from scipy import stats

def corrfunc(x, y, **kws):
    r, _ = stats.pearsonr(x, y)
    ax = plt.gca()
    ax.annotate(u"\u03C1 = {:.2f}".format(r), #unicode code for lowercase rho (ρ)
                xy=(.1, .9), xycoords=ax.transAxes)

graph_me = first_gen.loc[:, 'Total':'Speed']
graph_me
g = sns.pairplot(graph_me, palette="husl")
g.map_lower(corrfunc)
g.map_upper(corrfunc)
plt.show()

<h3>Q3 Conclusion</h3>
From the pairplot above, we can infer that there are **strong** correlations between: <br/> 
* **Total - Attack**
* **Total - Sp. Atk**
* **Total - Sp. Def**


There are **moderate** correlations between: <br/>
* **Total - HP**
* **Total - Defense**
* **Total - Speed**
* **HP - Sp. Def**
* **Sp. Atk - Sp. Def**

There are **weak** correations between: 
* **HP - Attack**
* **Attack - Defense**
* **Attack - Sp. Def**
* **Attack - Speed**

* **Sp. Atk - Speed**
* **Sp. Atk - Sp. Def**

Total turns out to be a good indicator of how strong or weak a pokemon is. <br/>
This inadvertently answers our Q1 question, which was "Are pokemons with the highest Total points the best?" Great! <br/>

In [None]:
# list of my original dream team pokemons
dream_team = ['Pikachu', 'Eevee', 'Mew', 'Charmander', 'Squirtle', 'Gyarados']

# i will also include the 'evolved' forms of my dream team pokemons to show their potential strengths
# Mew - no further evolution
# Gyrados - no further evolution
# Eevee - Vaporeon, Jolteon, and Flareon
# Pikachu - (final evolution) Raichu
# Squirtle - (final evolution) Blastoise
# Charmander - (final evolution) Charizard 
dream_evolve = ['Vaporeon', 'Jolteon', 'Flareon', 'Raichu', 'Blastoise',
                'Charizard', 'CharizardMega Charizard X', 'CharizardMega Charizard Y', 
                'BlastoiseMega Blastoise']

for dream in dream_evolve:
    dream_team.append(dream)

first_gen.loc[first_gen['Name'].isin(dream_team), 'Team'] = True
team_list = first_gen.loc[first_gen.Name.isin(dream_team)]
team_list.sort_values(by=['Total'], ascending=False)  

In [None]:
# we want to compare how our pokemons are in comparison to the average stats
first_gen.describe() 
# i think it would make sense that we would want pokemons with Total (1) above mean and (2) above 75%.

In [None]:
first_gen.sort_values(by=['Total'], ascending=False).head(10) 

<h3>Conclusion</h3>
Only **two out of six** of my original pokemon dream team passed the "above 75%" test: Mew & Gyrados. <br/> 
However, based on the sorted list by Total points, evolved forms of the rest of my pokemons rank high. In other words, they are powerful once they evolve so they have potential. However, Raichu (evolved Pikachu) still didn't pass the "above 75%" test. <br/> 

These are the following changes I would make to my dream team: 
* Since Mewtwo is the strongest pokemon, I would switch Mew to Mewtwo. <br/>
* Since Raichu is not strong enough, I would switch Pikachu to Dratini (final evolution: Dragonite)

Taking into consideration of everything I learned from this data analysis, this is my final dream team! <br/>
<img src="https://i.ibb.co/ncNctkB/non-tech-woman-in-tech-2.png" width="80%">
And with love and care, they will evolve into this even more amazing set :) <br/>
<img src="https://i.ibb.co/BBBymv0/non-tech-woman-in-tech-4.png" width="80%">
