## Fork of  Pokemon analysis based on their Gen & Nature

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
df = pd.read_csv("../input/Pokemon.csv")

In [3]:
df.head()

In [4]:
#Generation 1 pokemon 
df.Generation.value_counts()

In [5]:
#print(df.Generation.count())
df.describe()

<h4>From just a look on the dataframes, Its clear that:
<li> Pokemon can have dual or single nature.
<li> All the pokemon belongs to different generation starting from 1 to 6.
<li> There are no null values inside the dataframe.
<li> All the pokemon have 5 attributes namely HP, Attack, Defense, Sp. Atk, Sp. Def, Speed
<li> Total is the sum of all the 5 attributes.</h4>

<h5>Since pokemon have dual or single nature. We can add a new column specifying dual or single nature.</h5>

In [6]:
#Classifying pokemon with Single or Dual Type Nature
def getNumberOfTypes(x):
    numberOfTypes = 'Dual'
    if(pd.isnull(x[3])):
        numberOfTypes = 'Single'
    
    return numberOfTypes

df['Types'] = df.apply(getNumberOfTypes,axis=1)

In [7]:
df.head()

In [8]:
#splitting the pokemon generationwise
genOnePokemon = df[df.Generation==1]
genTwoPokemon = df[df.Generation==2]
genThreePokemon = df[df.Generation==3]
genFourPokemon = df[df.Generation==4]
genFivePokemon = df[df.Generation==5]
genSixPokemon = df[df.Generation==6]


In [9]:
#Spliting the generation one pokemons on the basis of Single and Dual Nature
genOneDual = genOnePokemon[genOnePokemon.Types=='Dual']
genOneSingle = genOnePokemon[genOnePokemon.Types=='Single']

<h3>Now, we have classified our gen I pokemon on the basis of their nature. Let's see what is composition of each type.</h3>

In [10]:
genOneSinglePer = (genOneDual['#'].count()/genOnePokemon['#'].count())*100

In [11]:
genOneDualPer = (genOneSingle['#'].count()/genOnePokemon['#'].count())*100

In [12]:
# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Single', 'Dual'
sizes = [genOneSinglePer,genOneDualPer]
explode = (0, 0.1)  # only "explode" the 2nd slice (i.e. 'Dual')

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()

<h5>Here we can see that there is little difference between the number of Single natured and Dual natured poekmon.</h5>

<h3>We have seen the composition on the basis of Single/Dual. Let's see the composition on the basis of type itself</h3>

In [13]:
typeOfPokemonsSeriesDf = genOnePokemon['Type 1'].value_counts().reset_index()
typeOfPokemonsSeriesDf['typePercentage'] =  (typeOfPokemonsSeriesDf['Type 1'] / typeOfPokemonsSeriesDf['Type 1'].sum())*100
def classifyPokemon(temp):
    pokeType = temp[0]
    #print(temp[0] + temp[1])
    if temp[2] < 4.0:
        pokeType = 'Others'
    return pokeType
typeOfPokemonsSeriesDf['type'] =  typeOfPokemonsSeriesDf.apply(classifyPokemon,axis=1)
typeOfPokemonsSeriesDf

<h5>Here, we are classifying all the pokemons based on their count. If the percentage composition of any type of pokemon is less than 4%, that pokemon will come under the 'Others' category.</h5>

In [14]:
newTypeOfPokemonsSeriesDf = typeOfPokemonsSeriesDf.groupby('type').sum()
labelsForTypes = newTypeOfPokemonsSeriesDf.reset_index().type.tolist()
sizes = newTypeOfPokemonsSeriesDf['typePercentage'].tolist()
                
explode = (0.0,0,0,0,0,0,0,0,0,0,0,0.1)  # only "explode" the 1st slice (i.e. 'Dual')

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labelsForTypes, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()

<h4><i>From the pie chart displayed above, It's clear that majority of gen I pokemon are of water type. So Let's start our EDA only on water type pokemons</i></h4>

<h3>Let's check the total stats of dual nature water type pokemon and see out of all the dual type which one is strongest.</h3>

In [15]:
#Let's check the total stats of dual natured, water type pokemon
waterPlusSomeTypePokemons = genOneDual[genOneDual['Type 1']=='Water'].groupby(['Type 1','Type 2']) 
waterPlusSomeTypePokemons['Total'].mean().plot(kind='barh')
plt.subplots_adjust(hspace=10)
plt.ylabel("<----- Natures of water pokemon ----->")
plt.xlabel("<----- No of observations ----->")

<h5>From the graph it's obvious that Water and Dark type pokemon have the best attributes in the Dual natured water type pokemons.But before reaching to our final conclusion, Let's check the number of observations of each type and see whether that's true for all cases or not.</h5>

In [16]:
waterPlusSomeTypePokemons['Total'].count().plot(kind='bar')
plt.xlabel("<----- Natures of water pokemon ----->")
plt.ylabel("<----- No of observations ----->")

<h5>From the graph, we can see that there is only one observation of (water,dark), (water,fighting), (water,flying).So, to conclude anything about them we need more observations.</h5>

<h3>Since we have dual as well single natured water pokemon. Let's check catching which type will be more beneficial.</h3>

In [17]:
#Dual natured water type vs Single natured water type?? Which ones are better?
#Line graph for the avg value of all the attributes.
waterTypePokemons = genOneSingle[genOneSingle['Type 1']=='Water'].mean()
waterTypePokemons.drop(labels=['#','Total','Generation','Legendary','Type 2'],inplace=True)

waterDualTypePokemons = genOneDual[genOneDual['Type 1']=='Water'].mean()
waterDualTypePokemons.drop(labels=['#','Total','Generation','Legendary'],inplace=True)

waterTypePokemons.plot(kind='line',label='Single Type')
waterDualTypePokemons.plot(kind='line',label='Dual Type')
plt.legend()

<h5>From the Line graph, We can conclude two points:
<li>Except the speed, all the other attributes of Dual natured water type pokemons are better than only water type pokemons.
<li>Defence is the best stat of all the gen I water type pokemons.</h5>

<h3>We have seen that the defence is the best stat for the gen I water type pokemons. Does all the generation follow this? Let's check.</h3>

In [18]:
#Does all water type pokemons in each generation have the defence points as their best stat?? Let's check.
#Line graph for the avg value of all the attributes.
waterTypePokemons1 = genOnePokemon[genOnePokemon['Type 1']=='Water'].mean()
waterTypePokemons1.drop(labels=['#','Total','Generation','Legendary'],inplace=True)

waterTypePokemons2 = genTwoPokemon[genTwoPokemon['Type 1']=='Water'].mean()
waterTypePokemons2.drop(labels=['#','Total','Generation','Legendary'],inplace=True)

waterTypePokemons3 = genThreePokemon[genThreePokemon['Type 1']=='Water'].mean()
waterTypePokemons3.drop(labels=['#','Total','Generation','Legendary'],inplace=True)

waterTypePokemons4 = genFourPokemon[genFourPokemon['Type 1']=='Water'].mean()
waterTypePokemons4.drop(labels=['#','Total','Generation','Legendary'],inplace=True)

waterTypePokemons5 = genFivePokemon[genFivePokemon['Type 1']=='Water'].mean()
waterTypePokemons5.drop(labels=['#','Total','Generation','Legendary'],inplace=True)

waterTypePokemons6 = genSixPokemon[genSixPokemon['Type 1']=='Water'].mean()
waterTypePokemons6.drop(labels=['#','Total','Generation','Legendary'],inplace=True)

fig = plt.figure(figsize=(16,10))
ax1 = fig.add_subplot(321)
waterTypePokemons1.plot(kind='line',label='gen1 Pokemon')
plt.legend()

ax2 = fig.add_subplot(322, sharey=ax1)
waterTypePokemons2.plot(kind='line',label='gen2 Pokemon')
plt.legend()

ax3 = fig.add_subplot(323,sharey=ax1)
waterTypePokemons3.plot(kind='line',label='gen3 Pokemon')
plt.legend()

ax4 = fig.add_subplot(324,sharey=ax1)
waterTypePokemons4.plot(kind='line',label='gen4 Pokemon')
plt.legend()

ax5 = fig.add_subplot(325,sharey=ax1)
waterTypePokemons5.plot(kind='line',label='gen5 Pokemon')
plt.legend()

ax6 = fig.add_subplot(326,sharey=ax1)
waterTypePokemons6.plot(kind='line',label='gen6 Pokemon')
plt.legend()

<h5>From the graph, Its clear that:
<li> Only gen I water type pokemon have defence as their best stat.
<li> Gen II and Gen IV water type pokemon have the HP as their best stat where as Gen IV and Gen VI have higher      special attacks.
<li> Gen III are more towards attacking side.</h5>

In [19]:
typesOfPokemon = genOnePokemon['Type 1'].value_counts().index
typeDict= {}

def getBestAttributeForType(typesOfPokemon):
    for pokemonType in typesOfPokemon:
        pokemonTypeDf = genOnePokemon[genOnePokemon['Type 1']==pokemonType].mean();
        pokemonTypeDf.drop(labels=['#','Total','Generation','Legendary'],inplace=True);
        pokemonTypeDf.sort_values(ascending=False,inplace=True);
        typeDict[pokemonType] = pokemonTypeDf.index[0];
    return typeDict;

typeDict = getBestAttributeForType(typesOfPokemon)
#print(typeDict)

def returnBestAttribute(type):
    return typeDict.get(type)

genOnePokemon['BestAttribute'] = genOnePokemon['Type 1'].apply(returnBestAttribute)

In [20]:
genOnePokemon.head()

In [21]:
temp = genOnePokemon.groupby(['Type 1'])['BestAttribute'].value_counts()
print(temp)

<h5>From the list above, we can say:
<li>Bug,Dragon,Fire,Ghost,Grass,Poison and Psycic types will have better attacking stats.
<li>Ground,Rock,Ice or water types will have better defensive stats. 
<li>Electric type pokemons will have better speed stats.
<li>Normal and Fairy types will have better health points.</h5>

In [22]:
for pokemonType in typesOfPokemon:
    meanSpeed = genOnePokemon[genOnePokemon['Type 1']==pokemonType]['Speed'].mean();
    
typeDict[pokemonType] = meanSpeed;
print(typeDict)

Earlier, we saw that the speed of electric type pokemon is their best attribute. But that does not mean that they are the fastest. This only means that Speed corresponds to max percentage of the total.

##Thank you for following!!!