Hey guys, here's a small visualization of some pokemon ideas. 
Some of the concepts I explore are: 
- Pseudo-Legendary pokemon
- The difference made by a primary or a secondary type
- Types with best offensive and defensive stats

I'll probably be posting an update soon with some thoughts I have running in my mind.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from subprocess import check_output
from IPython.display import display

from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler


pkmn = pd.read_csv("../input/Pokemon.csv")
dex = 721

Pokemon has some interesting details: For example, Golem is a Rock/Ground pokemon, while Rhydon is a Ground/Rock 
pokemon. These two have the same strengths (flying, rock, electric, bug, ice, poison)  and weaknesses (water, ice, steel, fighting, grass).
What's the advantage of having one versus the other? Is it better to be water/fighting over fighting/water?
Let's find out using the totals. 

In [None]:
#####################         What type differences are based on primary or secondary type?
##############   that is, what difference do you get if type 1 is fighting vs if type 2 is fighting

val = pkmn['Type 1'].unique()

cnt1 = cnt2 = 0
ind1 = ind2 = 0

avg1 = [0]*len(val)
avg2 = [0]*len(val)
cnt1 = [0]*len(val)
cnt2 = [0]*len(val)
for ind, mytype in enumerate(val):
    
    for k in list(range(1,len(pkmn['#']))):
        if pkmn['Type 1'][k] == mytype:
            cnt1[ind] += 1
            avg1[ind] += pkmn["Total"][k]
                                
        
        elif pkmn['Type 2'][k] == mytype:
            cnt2[ind] += 1
            avg2[ind] += pkmn["Total"][k]
           
diff = [0]*len(val)

for i in range(0,len(val)):
    avg1[i]/=cnt1[i]
    avg2[i]/=cnt2[i]
    diff[i] = avg1[i] - avg2[i]


type1v2 = pd.DataFrame({ 'Type': val,
						 'First-type count': cnt1,
						 'Average Total Type1': avg1,
						 'Second-type count': cnt2,
						 'Average Total Type2': avg2,
						 'Difference': diff

	})
    
display(type1v2)

 Looking at the difference column, we can see that Fighting and Ice have the biggest difference between being the first vs second type, being significantly better as a second type than the first.  Flying is the type that shows up best as the primary type of a pokemon!

Alright, now something more along these lines: Which types have the best offensive and defensive stats? 
Here we treat HP, Sp.Def and Def as defensive stats, and Atk, Sp.Atk, Spd as offensive stats. 

In [None]:
off_stats = [0]*len(val)
def_stats = [0]*len(val)
for ind, mytype in enumerate(val):
    for k in list(range(1,len(pkmn['#']))):
        if pkmn['Type 1'][k] == mytype or pkmn['Type 2'][k] == mytype:
            def_stats[ind] += pkmn['HP'][k] + pkmn['Defense'][k] + pkmn['Sp. Def'][k]
            off_stats[ind] += pkmn['Attack'][k] + pkmn['Sp. Atk'][k] + pkmn['Speed'][k]
            

for x in range(0, len(val)):
    off_stats[x] /= (cnt1[x]+cnt2[x])
    def_stats[x] /= (cnt1[x]+cnt2[x])
    
offVsdef = pd.DataFrame({ 'Type': val,
						 'Offensive Stat Avg.': off_stats,
						 'Defensive Stat Avg': def_stats
						 
	})
    
display(offVsdef)
    

Looking through this list, we see the following - 
Top 3 offensive types:
1. Dragon
2. Fire
3. Dark

Top 3 defensive types:
1. Steel
2. Dragon
3. Rock

Some of these results are predictable, since rock and steel are favored heavily for their reputation as defensive types. The dragon type is widely regarded as the best, and this shows in both its offensive and defensive stats. 

Let's see something interesting about dragon pokemon that explains why this happens to be true.


Let's digress a little - what are the average stats of legendary pokemon?

In [None]:
legend_pkmn = pkmn[pkmn['Legendary']==True]

legend_avg = 0
for x in list(range(1,len(pkmn['#']))):
	if pkmn['Legendary'][x] == True:
		legend_avg +=pkmn['Total'][x]
legend_avg /= len(legend_pkmn['#'])
print(legend_avg)

Alright, so legendary pokemon have an average total of 637. 
Now we'll take a look at pokemon whose totals come within 90% of this value.

In [None]:
pseudo_legendary = pkmn[(pkmn['Legendary']==False)&(pkmn['Total']>=0.9*legend_avg)]

print(pseudo_legendary.shape)

for y in range(1,len(pseudo_legendary['Total'])):
	k = pseudo_legendary.iloc[y]['Name']
	print(k)

That's quite a list = 50 pokemon have more than 573 as their total. Unfortunately, several of these are Mega-evolutions, which are a battle-only transition and hence, will be discounted. 

In [None]:
for y in range(1,len(pseudo_legendary['Total'])):
	k = pseudo_legendary.iloc[y]['Name']
	if "Mega" not in k:
		print(k)

That's much better - we're now down to 17. Now, discarding some of the legendary pokemon that aren't mentioned in the original csv file (Mew, Celebi, Manaphy, Cresselia, both Keldeo forms, both Meloetta forms, and Genesect), we are left with:
Dragonite

Tyranitar

Slaking

Salamence

Metagross

Garchomp

Hydreigon

Goodra

These pokemon are said to be pseudo legendary, for the terror they bring to any battlefield, and look at this: 5 out of the 8 are dragons (Dragonite, Salamence, Garchomp, Hydreigon, Goodra). This is partly why dragon pokemon are universally feared - some of their final stages are terrifying. 

Add to this the fact that dragon is almost always added to the final stage of an evolution, and you see why it's such a powerful offensive and defensive type. 

Let's compare the stat averages between the first few generations of pokemon, and the newest few. There are six in this csv file, so let's split it into 1-2-3 and 4-5-6.

We'll drop some of the less useful columns first...

In [None]:
del pkmn['Total'], pkmn['#'], pkmn['Legendary']

Now we'll add some color and make a couple boxplots to demonstrate (thanks to Andrew Gele's seaborn tutorial for some fantastic explanation).

In [None]:
sns.set_style("whitegrid")
with sns.color_palette([
    "#8ED752", "#F95643", "#53AFFE", "#C3D221", "#BBBDAF",
    "#AD5CA2", "#F8E64E", "#F0CA42", "#F9AEFE", "#A35449",
    "#FB61B4", "#CDBD72", "#7673DA", "#66EBFF", "#8B76FF",
    "#8E6856", "#C3C1D7", "#75A4F9"], n_colors=18, desat=.9):
    plt.figure(figsize=(8,8))
    plt.ylim(0, 275)
    pkmn_thisgen = pkmn[(pkmn['Generation']==1)|(pkmn['Generation']==2)|(pkmn['Generation']==3)]
    pkmn_secondsetgen = pkmn[(pkmn['Generation']==4)|(pkmn['Generation']==5)|(pkmn['Generation']==6)]
    sns.boxplot(data=pkmn_thisgen)
    sns.plt.show()
    plt.figure(figsize=(8,8))
    plt.ylim(0, 275)
    sns.boxplot(data=pkmn_secondsetgen)
    sns.plt.show()

From this, we can see that generations 1,2 and 3 had a significantly larger number of outliers than 4,5 or 6. 
While hp, def, sp.def, and speed remain about the same, there is a significant increase in the average atk and sp.atk stat. Interesting. 

A little digging reveals why:
Pokemon attacks are divided into 2 types - physical and special. Physical damage is enhanced by a pokemon's attack stat and enemy damaged reduced by their defensive stat. The same applies for special attacks with sp.atk and sp.def. 
In generations 1,2 and 3, a particular type's moves were entirely physical or special - there were no exceptions.
Physical Types: Normal, Fighting , Flying , Ground , Rock , Bug , Ghost , Poison , Steel
Special Types:  Water,  Grass , Fire ,  Ice ,  Electric ,  Psychic ,  Dragon ,  Dark 

In generation 4, move types began to be decided purely by the attack, rather than the type. For example, the introduction of Physical Water Type moves eg. Waterfall, or Special Steel Type moves eg. Flash Cannon.



How did stats change from one generation to the next? 
We'll compare each version individually, with the previous generations

In [None]:
pkmn_ourgen = pkmn[pkmn["Generation"]==1]
with sns.color_palette([
    "#8ED752", "#F95643", "#53AFFE", "#C3D221", "#BBBDAF",
    "#AD5CA2", "#F8E64E", "#F0CA42", "#F9AEFE", "#A35449",
    "#FB61B4", "#CDBD72", "#7673DA", "#66EBFF", "#8B76FF",
    "#8E6856", "#C3C1D7", "#75A4F9"], n_colors=18, desat=.9):
    plt.figure(figsize=(6,6))
    plt.ylim(0, 275)
    for i in range(1,7):
        pkmn_thisgen = pkmn[pkmn['Generation']==i]
        sns.boxplot(data=pkmn_thisgen)
        sns.plt.title('Gen %d' %i)
        sns.plt.show()

As we can see, Generation 4 saw a large increase in the maximum value within 1 standard deviation from the mean - in attack, defense, special attack, and special defense. 
The only outliers for the speed stat were all in generation 3 - Deoxys in its Attack, Defense, and Speed Formes, Ninjask, and Mega-Sceptile. 
Generation 5 had the highest values for 2 standard deviation widths in attack and special attack, but the Deoxys formes, which were firmly outliers in generation 3 are no longer outliers, because of the addition of a number of pokemon with similar attacking stats such as Kyurem-Black and Kyurem-White. 

Thank you for all your time! Leave a comment or a suggestion and I'll definitely take a look.
I'd planned to classify pokemon by some of their dominant stats, such as revenge-killers, suicide leads, tanks, etc. but I haven't gotten around to it just yet. 