![](https://wallpapers.com/images/high/pokemon-mystery-dungeon-9qdfdeoy8xdw8cb6.jpg)

<h2> Details on DataSet </h2>
<h4 style="font-family: Arial; line-height: 30px;"> This dataset contains information on all 802 Pokemon from all Seven Generations of Pokemon. The information contained in this dataset include Base Stats, Performance against Other Types, Height, Weight, Classification, Egg Steps, Experience Points, Abilities, etc. The information was scraped from<a> http://serebii.net/</a></h4>
    
<h2><u>Work Flow</u></h2>
 
<ul>
    <li><a href="#1" style="color:black"><h3>Descriptive Statistics</h3></a> </li>
    <li><a href="#2" style="color:black"><h3>Data Preprocessing & Feature Engineering</h3></a> </li>
    <li><a href="#3" style="color:black"><h3>Data Analysis & Visualizations</h3></a> </li>
</ul>

<h2> From the Visualizations and Processed data we will try to answer the following questions: </h2>
<h4 style="font-family: Arial; line-height: 30px;">
  <ol>
     <li><a href="#4" style="color:black">How does the speed of Pokemon relate to various base factors? </a> </li>
     <li><a href="#5" style="color:black">What is the count of pokemon per generation?</a> </li>
     <li><a href="#6" style="color:black">How many types of pokemon are there in each generation?</a> </li>
     <li><a href="#7" style="color:black">Which type has the easiest pokemon to catch?</a> </li>
     <li><a href="#8" style="color:black">What are the most widespread types of pokemon in both primary and secondary types?</a> </li>
     <li><a href="#9" style="color:black">How does the height and weight of a Pokemon correlate with its various base stats?</a> </li>
     <li><a href="#10" style="color:black">Which type is the most likely to be a legendary Pokemon?</a> </li>
     <li><a href="#11" style="color:black">Can we find the strongest pokemon?</a> </li>
     <li><a href="#12" style="color:black"><b>Pokemon Classifier</b> - Legendary or Not? </a> </li>  
   </ol>
</h4>  

<h3> Also visit my blog on Medium : <a href = "https://medium.com/@Anirudh_Singh_Chauhan/data-analysis-on-pokemon-dataset-44cdc7d15e56">Data Analysis on Pokemon Dataset</a></h3>
<hr />

<h1> Let's Begin 🙋‍♂️  </h1>

In [None]:
import os
import numpy   as np 
import pandas  as pd 
import seaborn as sns
from matplotlib import pyplot as plt 
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings("ignore")

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
data = pd.read_csv("/kaggle/input/pokemon/pokemon.csv")

pd.set_option('display.max_row',data.shape[0])
pd.set_option('display.max_column',data.shape[1]) 
data.head()

<h1 id="1">◾ Descriptive Statistics  📈</h1>

<h4> • Let’s first check the shape of the data followed by the last 5 rows of our dataset. </h4>

In [None]:
data.shape

<p>Now we know that our dataset has <b>801 rows and 41 columns</b>. Damn!! that’s a lot of detail about a pokemon.</p>

In [None]:
data.tail()

<h4> • Let's print some random rows for breaking the pattern of analysis and get a more proper understanding of our data.</h4>

In [None]:
data.sample(5).T

<h4 style="line-height: 30px;" > • So from the random sample of data and from the last 5 rows of the dataset that we extracted, we can see that there are some NaN values in columns "percentage_male" and "type2". 
And also that we got float type, integer type, and string type values in our dataset.<br/> Let's have an in-depth check over these details.</h4>

In [None]:
data.info()

The basic insights we draw from here is that we have a sum of <b>21 float type column valus, 13 integer type column values and 7 of object type</b>.

In [None]:
data.describe().T

<h4> • Let's have a check on some columns for how many unique elements they contain.</h4>

In [None]:
for i in ['generation','type1','type2']:
    print("{} => {}".format(i,data[i].unique()))

<h1 id='2'> ◾ Data Preprocessing & Feature Engineering 👨‍🏭 </h1>

<h4>Let's have a check for Null or NaN values in our dataset. Am sure there must be many...</h4>

In [None]:
data.isnull().sum()

<h4> Okay great !! </h4> 
<p>So we got a lot of missing data in coulmn <b>named "type2" i.e. 384.</b> <br /><br /> Followed by <b>"percentage_male" where we got 98 null values, and we got 20-20 null values in "height_m" and "weight_kg"</b> columns respectively. </p>

<p>But as per our dataset description we can see that the 98 pokemons which have null values in <b>"percentage_male" are actually genderless</b>, so we can not drop them or replace them with some mode or median, we will just change the null values to something that indicates their property.</p> 

<p>Also only some pokemons can have the type2 ability and hence we can not fill it's value with some random value. We will set it with something more meaningful.</p> 

<p>Remaining the rest, "height_m" and "weight_kg" these values can be <b>replaced by the mean/mode value</b> of all the heights and weights of pokemon for better analysis.</p>

In [None]:
data['type2'].fillna('None', inplace=True) # Replacing the values in type2 with None
data['percentage_male'].fillna('None', inplace=True) # Replacing the values in percentage_male with None
data['height_m'].fillna(data['height_m'].mean(), inplace=True) # Replacing the values in height_m with Mode value of the whole column
data['weight_kg'].fillna(data['weight_kg'].mean(), inplace=True) # Replacing the values in weight_kg with mode value of the whole column

In [None]:
# Also I will put the name column in first, seems easier to identify the pokemon.
data.insert(0, 'name', data.pop('name'))

In [None]:
data.isnull().sum()

<p>Now one problem that can be identified in <b>data.info() was that it has 7 object type columns</b>, but on checking the data we can see that all the values we get from the functions data.sample(), head() and tail() is that, the column <b>"capture_rate" has all numeric values</b> but still it is counted in object type. </p>
<p>We will check this column separately for more understanding of what's causing this.</p>

In [None]:
for i in data.capture_rate:
    print(i,end=", ")

<p>Okay!! now we know the exact reason why capture_rate was considered as the object type. </p> 
<p>The value of one particular Pokemon has <b>2 capture rates (i.e. 30 (Meteorite) 255 (Core))</b>. We will replace it with one single value and then we are good to convert this column into integer type.</p>

In [None]:
data[data["capture_rate"]== "30 (Meteorite)255 (Core)"][['name','capture_rate','type1','generation','classfication']]

<p>So, from this we now know that the pokemon <b>"Minor" is a Meteor type pokemon</b> and hence we will consider it's Meteorite capture rate and will replace it.</p>

In [None]:
data["capture_rate"].replace({'30 (Meteorite)255 (Core)': '30'}, inplace=True)
data['capture_rate'] = data['capture_rate'].astype('int')
data['capture_rate'].dtype

<h5>Now lets get more clarity on our dataset. Let's remove some unwanted columns and also add few if needed.</h5>

In [None]:
data.sample(5).T

<p>• We will remove 3 columns that I find unnecessary : <b>japanese_name, pokedex_number, and percentage_male</b></p>

<p>• Also we will add some columns too: the number of abilities and the combined type of both type1 and type2

In [None]:
# Removing 3 columns out of our dataset
data.drop(columns=['japanese_name', 'pokedex_number', 'percentage_male'], axis=1, inplace=True) 
# adding total abilities that a pokemon has
data["tot_abilities"] = data.apply(lambda x: len(x["abilities"]), axis=1)

In [None]:
# merging type1 and type2 and adding into new column=> type, 
# and renaming type1 to primary and type2 to secondary
data['type'] =  data.apply(lambda x: x['type1'] if pd.isnull(x['type2']) else f'{x["type1"]}_{x["type2"]}', axis=1)
data.rename(columns = {'type1':'primary type', 'type2':'secondary type'}, inplace = True)

# Checking the final shape of data before moving into visualizations
data.shape

<h1 id="3">◾ Data Analysis & Visualizations </h1>

In [None]:
data.corr()

<h5>Let's separate the correlations into two main parts => </h5>

In [None]:
against=[]
rest=[]
for i in data.columns:
    if 'against' in i:
        against.append(i)
    else:
        rest.append(i)

In [None]:
fig,axes = plt.subplots(figsize=(14,8))
sns.heatmap(data[against].corr(),annot=True).set_title('Attack Stats')
plt.show()

In [None]:
fig,axes = plt.subplots(figsize=(14,8))
sns.heatmap(data[rest].corr(),annot=True).set_title('General Stats')
plt.show()

<h3>Here in general stats we can see that: </h3>

<ul>
 <li><h4> Base total (base_total) has a good correlation with the attack and defence attributes. 
   For example,</h4>
    <ul>
         <li>base_total with attack: 0.73</li>
         <li>base_total with sp.attack: 0.74</li>
         <li>base_total with defense: 0.63</li>
         <li>base_total with sp.defense: 0.72</li>        
    </ul>
 </li>
 <li><h4> A pokemon being legendary have a huge correlation with the pokemon's Egg Steps (i.e. 0.87).</h4></li>
 <li><h4> The Weight of pokemon is dependent and correlated with the Height of pokemon. (i.e. 0.63) </h4></li>
</ul>

<h1 id="4"> 1. How speed of Pokemon relates to various base factors?</h1>

In [None]:
fig,axes = plt.subplots(2,2,figsize=(16,10),sharey=True)
sns.scatterplot(data['attack'],data['speed'],ax=axes[0,0])
axes[0,0].set_title("Speed V/S Attack")
sns.scatterplot(data['defense'],data['speed'],ax=axes[0,1])
axes[0,1].set_title("Speed V/S Defence")
sns.scatterplot(data['height_m'],data['speed'],ax=axes[1,0])
axes[1,0].set_title("Speed V/S Height")
sns.scatterplot(data['weight_kg'],data['speed'],ax=axes[1,1])
axes[1,1].set_title("Speed V/S Weight")
fig.suptitle("Speed Factor?", size=20)
plt.show()

<h3>Some insights that we can get from these graphs are: </h3>
<ul>
 <li> A pokemon's attack capacity slightly depends on its speed, as some of the pokemons have moderate speed but a good attack capacity, while some are very fast yet lacking in their attack capacity.</li>
 <li> For a pokemon's defense speed is not that much required as we can see from the above Speed vs Defence plot that pokemon's which are slower possess more defense capacity. </li>
 <li> The height of a pokemon highly affects its speed, as from the Speed V/S Height plot we can see that pokemon's with very less height value possess faster speed.</li>
 <li> Similarly the weight of pokemon highly affects its speed, as the pokemon's with heavyweight are slower, and most of the pokemons are  lightweight and moderate to high speed. </li>
</ul>

<h1 id="5"> 2. What is the count of pokemons per generation?</h1>

In [None]:
plt.figure(figsize=(12,6))
ax = sns.countplot(x='generation',data=data,order=data['generation'].value_counts().index,color='skyblue')
ax.set_title('Pokemons per Generation')
ax.set(xlabel='Generation',ylabel='Count')
for p in ax.patches:
    ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))
plt.show()

<h4 style="line-height: 30px;" >So, from above count plot we can figure out that most of the pokemons are from Gen-5, and least in Gen-6. Also, we can see that odd Generations have a larger number of pokemons than even number of generations.</h4>

<h1 id="6"> 3. How many types of pokemon are there in each generation?</h1>

In [None]:
primary_type_generation_group = data.groupby(['generation', 'primary type'])['name'].count().to_frame().reset_index()
primary_type_generation_group.rename(columns={'name' : 'name_count'}, inplace=True)
primary_type_generation_group.head(20).T

In [None]:
primary_type_generation_dict  = {}
for generation in list(primary_type_generation_group['generation'].unique()):
    current_generation = []
    for p_type in primary_type_generation_group['primary type'].unique():
        try:
            current_generation.append(
                primary_type_generation_group.loc[(primary_type_generation_group['generation']==generation) 
                                        & (primary_type_generation_group['primary type'] == p_type)]['name_count'].values[0])
        except IndexError:
            current_generation.append(0)
    primary_type_generation_dict[f'generation {generation}'] = current_generation

p_type_by_generation = pd.DataFrame(primary_type_generation_dict, index= primary_type_generation_group['primary type'].unique())

In [None]:
fig,axes = plt.subplots(figsize=(16,8))
sns.heatmap(p_type_by_generation,annot=True).set_title('Pokemons Per Generation')
plt.show()

<h4 style="line-height: 30px;" > We can see that no generation contains all the types of pokemon, like we do not have any Flying, Steel or Dark type pokemon in generation 1.<br />
We can also see that only generation 5 & 6 have Flying type pokemons.<br />
And we have: </h4>
<ul>
 <li> Most number of pokemons in Generation 1, 2 & 3 of Water Type.</li> 
 <li> Most number of pokemons in Generation 4 of Normal Type.</li>
 <li> Most number of pokemons in Generation 5 of Bug Type. </li>
 <li> Most number of pokemons in Generation 6 of Fairy Type.</li>
 <li> Most number of pokemons in Generation 7 of Grass & Normal Type.</li>
</ul>

<h1 id="7"> 4. Which type has the easiest pokemon to catch?</h1>

In [None]:
plt.figure(figsize=(16,6))
ax = sns.boxplot(x='primary type',y='capture_rate', hue='is_legendary', data = data)

ax.set_xlabel(xlabel='Primary Type')
ax.set_ylabel(ylabel='Capture Rate')
ax.set_title('Pokémon Capture Rate by Primary Type', pad=40)

sns.despine(top=True, right=True)

handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, ['Non-legendary', 'Legendary'], loc=(1,1))

<h5 style="line-height: 30px;" >The easiest pokemon to capture is from the <b>"Fairy Type"</b> whereas the <b>hardest to capture is "Dragon Type"</b>. <br />Also it's pretty hard to capture the <b>"Fire Type" or "Rock Type"</b> pokemons. <br />
On the other hand, in Legendary pokemons, the <b>easiest pokemon to capture will be from the "Grass or Bug"</b> type. </h5>

<h1 id="8"> 5. What are the most wide spread types of pokemon in both Primary type and Secondary type?</h1>

In [None]:
fig,axes = plt.subplots(2,2,figsize=(16,9))
ax = sns.countplot(x='primary type',data=data,order=data['primary type'].value_counts().iloc[:5].index, ax=axes[0,0])
ax.set_title('Primary Type')
for p in ax.patches:
    ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))

ax = sns.countplot(x='secondary type',data=data,order=data['secondary type'].value_counts().iloc[:5].index, ax=axes[0,1])
ax.set_title('Secondary Type')
for p in ax.patches:
    ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))

ax = sns.countplot(x='primary type',data=data,order=data['primary type'].value_counts(ascending=True).index[:5], ax=axes[1,0])
ax.set_title('Primary Type')
for p in ax.patches:
    ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))

ax = sns.countplot(x='secondary type',data=data,order=data['secondary type'].value_counts(ascending=True).index[:5], ax=axes[1,1])
ax.set_title('Secondary Type')
for p in ax.patches:
    ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))
plt.show()

<h3>The most wide spread type of pokemon in:</h3>
<p> • <b>Primary Type</b> = Water type pokemons <br />
• <b>Secondary Type</b> = Flying type pokemons (as this is secondary type so most of the pokemon has no secondary type)
</p>
<h3>The least spread type of pokemon in:</h3>
<p>• <b>Primary Type</b> = Flying type pokemons <br />
• <b>Secondary Type</b> = Normal type pokemons 
</p>
<h4>Note that while there are most number of pokemons with Flying type in thier secondary type yet the flying type has the least number in primary type.</h4> 

<h1 id="9"> 6. How does height and weight of a Pokemon correlate with its various base stats?</h1>

In [None]:
f, axes = plt.subplots(1,4,figsize=(18,4),sharey=True)

sns.scatterplot(data['speed'],data['height_m'],ax=axes[0])
sns.scatterplot(data['attack'],data['height_m'],ax=axes[1])
sns.scatterplot(data['defense'],data['height_m'],ax=axes[2])
sns.scatterplot(data['base_total'],data['height_m'],ax=axes[3])

<h5 style="line-height: 30px;">
  <ul>
    <li>We can see that height and speed both are moderately related, the pokemons with small height are faster and those with large height value are slow. But in some cases, we can see that there are some pokemons with small height and are very slow too but apart from them most of the pokemons have moderate speed but small height.</li>
    <li> Same is the case with Height v/s Attack, here most of the pokemons with small height have high attack power and faster in speed too.</li>
    <li> Height and defense are moderately related as apart from some cases most of the pokemons possess small height and medium defense power.</li>
    <li> Though in base total we can see that there are some pokemons with moderate height but with high base total</li>
  </ul>
</h5>
<h4 style="line-height: 30px;" >So a pokemon having a large height value have its own perks but most of the pokemons have small height and yet they are leading in their base stats which are necessary for them. </h4>


In [None]:
f, axes = plt.subplots(1,4,figsize=(18,4),sharey=True)
sns.scatterplot(data['speed'],data['weight_kg'],ax=axes[0])
sns.scatterplot(data['attack'],data['weight_kg'],ax=axes[1])
sns.scatterplot(data['defense'],data['weight_kg'],ax=axes[2])
sns.scatterplot(data['base_total'],data['weight_kg'],ax=axes[3])

<h5 style="line-height: 30px;" >
  <ul>
    <li> Pokemons with high weight are slower and those with low weight are faster, though there are some 4-5 cases where pokemon having high weight have more than normal speed too, most probably these will be flying type like Charizard.</li>
    <li> In terms of attack heavyweight is not a very good factor as most of the pokemons having lightweight are good in attack. Yet few of them are heavyweight and have good attack power too. Probably their weight helps them in attacking their opponent.</li>
    <li>In defense having a moderate weight is helpful to some pokemons as it helps them increase their defence strength by resisting more of those attacks</li>
    <li> Weighting nearly 100-200kgs can give a pokemon a good base total which indeed signifies its strength</li>
  </ul>
</h5>
<h4 style="line-height: 30px;" >So having heavyweight can help a pokemon in it's defence and somewhat in attack too. But most of the pokemons with 100-200 kg weight are leading their all stats against the rest.</h4>

<h1 id="10"> 7. Which type is the most likely to be a legendary Pokemon?</h1>

In [None]:
legend = data[data['is_legendary']==1][['name','type','primary type','secondary type']]
legend.head()

<h5 style=" line-height: 30px;" >As the "secondary type" and "type" columns both have None in their values so we will neglect them as in the "type" column i.e. combined column values with None will only signify Primary type and will be of less use incorrect analysis. </h5>

In [None]:
from collections import Counter

#for secondary
s = []
secondary = legend['secondary type']
for i in secondary:
    if 'None' in i: 
        pass
    else:
        s.append(i)
count_s = Counter(s)        

# for combined
c=[]
combined = legend['type']
for i in combined:
    if 'None' in i: 
        pass
    else:
        c.append(i)
count_c = Counter(c)

count_s = sorted(count_s.items(), key=lambda x: x[1],reverse=True)
count_c = sorted(count_c.items(), key=lambda x: x[1],reverse=True)

In [None]:
# separating the key, values we got from Counter() of both count_c(combined) and count_s(secondary).
v_s,k_s=[],[]
v_c,k_c=[],[]
for i in count_s:
    k_s.append(i[0])
    v_s.append(i[1])
for i in count_c:
    k_c.append(i[0])
    v_c.append(i[1])

In [None]:
fig,axes = plt.subplots(1,3,figsize=(18,4))
ax = sns.countplot(x='primary type',data=legend,order=legend['primary type'].value_counts().iloc[:5].index, ax=axes[0])
ax.set_title('Primary Type')
for p in ax.patches:
    ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))

ax = sns.barplot(x=k_s[:5],y=v_s[:5], ax=axes[1])
ax.set_title('Secondary Type')
for p in ax.patches:
    ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))
    
ax = sns.barplot(x=k_c[:8],y=v_c[:8], ax=axes[2])
ax.set_title('Combined')
locs, labels = plt.xticks()
plt.setp(labels, rotation=90)
for p in ax.patches:
    ax.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))

<h3>From the above visualizations we can gather few insights as:</h3>
<h5 style="line-height: 30px;" >
 <ul>
    <li> If a pokemon is of "Psychic" type in it's primary type, then there is very good chance of this pokemon being Legendary.</li>
    <li> From the Secondary type perspective, if a pokemon have "Flying" as it's secondary type then possibility of this pokemon being Legendary is more.</li>
    <li>If some pokemon has both primary and secondary type in it then these are the possible types of pokemon which can be considered as legendary: 
       <ul>
        <li> Electric and Flying Type</li>
        <li> Fire and Flying Type</li>
        <li> Dragon and Psychic Type</li>
        <li> Psychic and Ghost Type</li>
        <li> Bug and Fighting Type</li>
       </ul>     
    <li> One more thing to be noted is that, although "Psychic" is leading far better in Primary Type and "Flying" in Secondary Type yet combining both the types in one pokemon makes it less likely to be a legendary pokemon. </li>
  </ul>
</h5>

<h1 id="11"> 8. Which type is the strongest overall? Which is the weakest?</h1>

In [None]:
top10_pokemon_base_total = data.sort_values(by="base_total", ascending=False).reset_index()[:10]
plt.figure(figsize=(20,10))
ax = sns.barplot(x=top10_pokemon_base_total["name"], y=top10_pokemon_base_total["base_total"], orient='v')
ax.set_title("Which is the best pokémon?", size=20)
ax.set(xlabel="Base Total", ylabel="Name")
for p in ax.patches:
    ax.annotate('{:.1f}'.format( p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))

<h4 style="line-height: 30px;" > • From above plot of comparison of all the base strengths we are getting two pokemons at same strength. i.e. "Mewtwo" and "Rayquaza". </h4>
<h4 style="line-height: 30px;" > • But taking in real terms the pokemon "Arceus", without a doubt, is the most powerful Legendary Pokémon. This Normal-type Mythical Pokémon is the creator of the universe and thus the closest that the Pokémon world has to a god. 
And according to our analysis in terms of base parameters, it's the 5th strongest Pokemon.
</h4>
<h4 style=" line-height: 30px;" > • On the other hand second strongest Pokemon is the Psychic-type Pokémon Mewtwo is a man-made, genetically enhanced version of the Mythical Pokémon Mew. Mewtwo is one of the most formidable Pokémon ever to exist. It can use the powers typical of its type, as well as telekinesis and telepathy. </h4>

<h4 style="line-height: 30px;" >To determine the strongest Pokemon, we summed up the base statistics for each species. While our metric approached Pokemon strength objectively as possible, in the end, it may not be meaningful to all players.</h4>
<h4 style="line-height: 30px;" >Therefore, the player's selection of moves, attack and defense points during the battle is ultimately a bigger factor in determining success than the strength of the Pokemon chosen.
And hence the developers have made Pokemon a game that is more than simply optimizing statistics and strength, but is a tactical and personal experience.</h4>

## Strongest Pokemon V/S God of Pokemons

<h4 style="line-height: 30px;" >We have a tie in the strongest Pokemons so we will be comparing both of the top player pokemons with the God of Pokemon: Arceus. 
<br />
We will be comparing them with their basic stats of attack, defense, etc and plotting a Radar Chart as this chart can be used to compare the properties of a single component or compare the properties of two or more variables together which will help us in comparing the two pokemons easily.</h4>

In [None]:
attributes=['attack', 'sp_attack', 'defense', 'sp_defense', 'hp', 'speed']

mewtwo= top10_pokemon_base_total[top10_pokemon_base_total['name'] == 'Mewtwo'][attributes].values.tolist()[0]
rayquaza = top10_pokemon_base_total[top10_pokemon_base_total['name'] == 'Rayquaza'][attributes].values.tolist()[0]
# God of Pokemons: Arceus
Arceus = top10_pokemon_base_total[top10_pokemon_base_total['name'] == 'Arceus'][attributes].values.tolist()[0]

angles=np.linspace(0,2*np.pi,len(attributes), endpoint=False)
angles=np.concatenate((angles,[angles[0]]))

attributes.append(attributes[0])
mewtwo.append(mewtwo[0])
rayquaza.append(rayquaza[0])
Arceus.append(Arceus[0])

In [None]:
fig=plt.figure(figsize=(20,10))
ax=fig.add_subplot(111, polar=True)

#Arceus Plot
ax.plot(angles,Arceus, 'o-', color='blue', linewidth=1, label='Rayquaza')
ax.fill(angles, Arceus, alpha=0.25, color='skyblue')

# Mewtwo Plot
ax.plot(angles, mewtwo, 'o-', color='darkred', linewidth=1, label='Mewtwo')
ax.fill(angles, mewtwo, alpha=0.25, color='darkred')

# Rayquaza Plot
ax.plot(angles,rayquaza, 'o-', color='indianred', linewidth=1, label='Arceus')
ax.fill(angles, rayquaza, alpha=0.25, color='indianred')

ax.set_thetagrids(angles[:-1] * 180/np.pi, attributes[:-1], fontsize=12)
plt.grid(True)

handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, ['Arceus', 'Mewtwo', 'Rayquaza'], loc=(0,0.99))

ax.set_title("Mewtwo & Rayquaza V/S God of Pokemons", pad=40)

<h4 style=" line-height: 30px;" >From the above plot we can figure out that because of the special attacks of man-made pokemon Mewtwo it is having a higher value in the base total and came up to the top of the Strength chart that we plotted earlier.
</h4>
<h4 style="line-height: 30px;" >Though Mewtwo is lacking a lot in defense but because of it's special attack and speed parameters it is leading the strenght race.
On the other hand, Rayquaza is only leading in its attack value which is higher than the rest two, but Rayquaza is lacking in all other factors.
</h4>
<h4 style="line-height: 30px;" >So on comparing both of the strongest pokemons: Mewtwo and Rayquaza. **Mewtwo** is a clear winner as it is leading in all other parameters and can be considered the **Strongest pokemon based on the stats**.
</h4>
<h4 style="line-height: 30px;" >The God of pokemons Arceus has all balanced parameters with the same values for every parameter (approx 124) which also makes it a highly stable and balanced pokemon.</h4>

<h1 id="12"> 9. Legendary or Not? Classifier </h1>

In [None]:
# For this we will be taking some selected features from our dataset.
classify_data= data[['attack','base_total','defense','hp','experience_growth','sp_attack','sp_defense','speed','tot_abilities','is_legendary']]
classify_data.sample(5)

In [None]:
key_features = classify_data.drop("is_legendary",axis=1)
target = classify_data["is_legendary"]

# Spliting the dataset into test train set.
X_train,X_test,Y_train,Y_test = train_test_split(key_features,target,test_size=0.20,random_state=0)

In [None]:
Gnb = GaussianNB()
Gnb.fit(X_train,Y_train)

Y_pred_Gnb = Gnb.predict(X_test)
score_Gnb = round(accuracy_score(Y_pred_Gnb,Y_test)*100,2)
print("The accuracy score we achieved is: "+str(score_Gnb)+" %")

<h4 style="line-height: 30px;" >We can fill-in the values and then can test on some random values too, but for ease of understanding and for an example of the prediction I will be using some random names from the same dataset. </h4 >

In [None]:
for i in ['Arceus','Pikachu','Latios','Zoroark']:
    example = data[data['name']==i][['attack','base_total','defense','hp','experience_growth','sp_attack','sp_defense','speed','tot_abilities']]
    print("Pokemon:{} | Predicted:{} | Actual:{}".format(i,Gnb.predict(example),data[data['name']==i]['is_legendary'].values))
    print('--------------------------------------------')

<hr />
<h2> References : </h2>
<a href="https://www.kaggle.com/code/jaimetrickz/pok-dex-best/notebook">https://www.kaggle.com/code/jaimetrickz/pok-dex-best/notebook </a><br />
<a href="https://www.kaggle.com/code/shreekant009/pokemon-data-visualization/notebook">https://www.kaggle.com/code/shreekant009/pokemon-data-visualization/notebook </a><br />
<a href="https://www.kaggle.com/code/joaopdrg/discovering-the-best-pok-mon?scriptVersionId=96280460">https://www.kaggle.com/code/joaopdrg/discovering-the-best-pok-mon?scriptVersionId=96280460 </a>

<h3 style="line-height: 30px;">Connect to me and Follow me on :</h3> 

<h4>1. Linkedin: <a href="https://www.linkedin.com/in/anirudh-singh-chauhan">https://www.linkedin.com/in/anirudh-singh-chauhan </a> </h4>

<h4>2. Medium : <a href="https://medium.com/@Anirudh_Singh_Chauhan">https://medium.com/@Anirudh_Singh_Chauhan </a> </h4>
 
<h4>3. Github: <a href="https://github.com/Anirudh-Chauhan">https://github.com/Anirudh-Chauhan </a> </h4>

<h4>4. YouTube: <a href="https://www.youtube.com/c/AnirudhSinghChauhan">https://www.youtube.com/c/AnirudhSinghChauhan </a> </h4>


<h3 style="line-height: 30px;" >If you enjoyed reading this notebook. An upvote 👍 will motivate me to do more of this type of work. <br />
Also if there is any feedback or suggestion please let me know in the comment section.</h3>
<h2>Thank You for Reading 😇</h2>