# Exploratory Data Analysis on Pokemon Data

## Introduction 

On this project I will be analyzing the data on Pokemon stats up to Generation 6 of the game. 

The purpose of this project is learn how each pokemon's stats would be like at certain levels in the game and how it compares to other pokemon.

Firstly, I will check the overview of the data and see how the data looks like, which columns does it have, and what data types come with the data. 

Next I will pre-proccess the data to see if there are anything wrong with it(eg. Duplicates, Missing Values, etc.).

The next step will be to perform the analysis and compare the stats of each pokemon to each other at certain levels, and how the stats would look like with a specific Nature. Finally I will create plots to see the difference in the growth of each pokemon based on their base stats, nature, and IVs and EVs.

Questions that I will be looking to analyze are:

1. What will the stats of the starters from Generation 3 look like at level 10 - 50 -100?
2. Show the growth of the stats from level 1 to level 100.
3. Plot a graph to show the growth of the starters from the same Generation.

 

In [1]:
# Importing the necessary libraries
import pandas as pd
import numpy as np
import plotly.express as px
from py

In [2]:
# Pulling the dataset and assigning it to a variable
data = pd.read_csv('pokemon.csv')

## Data Overview

In [3]:
data.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


It seems that the dataset has missing values on the type 2 column, but all the data types seem to be correct.

## Data Preprocessing
I will begin the pre-proccessing of the data.

In [5]:
# Checking the rws with missing values
data[data['Type 2'].isna()]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False
10,8,Wartortle,Water,,405,59,63,80,65,80,58,1,False
11,9,Blastoise,Water,,530,79,83,100,85,105,78,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
775,705,Sliggoo,Dragon,,452,68,75,53,83,113,60,6,False
776,706,Goodra,Dragon,,600,90,100,70,110,150,80,6,False
788,712,Bergmite,Ice,,304,55,69,85,32,35,28,6,False
789,713,Avalugg,Ice,,514,95,117,184,44,46,28,6,False


All the missing values are from single-typed pokemons, with most of them seem like early evolutions.
Since the data type is already an object(string) I am going to change the missing values to 'None'.

In [6]:
data['Type 2'] = data['Type 2'].fillna('None')
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      800 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


Now the missing values are taken care of.

In [7]:
# Checking for duplicates
data.duplicated().sum()

0

In [8]:
# Checking for implicit duplicates
data['#'].duplicated().sum()

79

There seems to be duplicates on the numbers entry.

In [9]:
data[data['#'].duplicated()]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
12,9,BlastoiseMega Blastoise,Water,,630,79,103,120,135,115,78,1,False
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
785,711,GourgeistSmall Size,Ghost,Grass,494,55,85,122,58,75,99,6,False
786,711,GourgeistLarge Size,Ghost,Grass,494,75,95,122,58,75,69,6,False
787,711,GourgeistSuper Size,Ghost,Grass,494,85,100,122,58,75,54,6,False
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True


The duplicates apparently come from the same pokemon with different forms, which have their respective stats. I have decided to keep the duplicates as they are necessary, and have identifiable names to differentiate them.

## Data Analysis

In [10]:
# Pulling example of data again
data.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


First, I will make a list of the natures and create a function that calculates it so that it can be calculated along with the stats.

In [63]:
# Creating the list of natures and grouping them by the stats they improve

nature_list = [
    ['Hardy', 'Lonely', 'Adamant', 'Naughty', 'Brave'],
    ['Bold', 'Docile', 'Impish', 'Lax', 'Relaxed'],
    ['Modest', 'Mild', 'Bashful', 'Rash', 'Quiet'],
    ['Calm', 'Gentle', 'Careful', 'Quirky', 'Sassy'],
    ['Timid', 'Hasty', 'Jolly', 'Naive', 'Serious']
]



Next, I will create a function that calculates the stats of the pokemon based on their level.

In [11]:
# The function will take the input of the pokemon name , level, and the IVS and EVS 
# and show the pokemon's stats at the specified level

def stats(name, level, iv_hp=0, iv_atk=0, iv_def=0, iv_spatk=0, iv_spdef=0, iv_speed=0, 
          ev_hp=0, ev_atk=0, ev_def=0, ev_spatk=0, ev_spdef=0, ev_speed=0, nature='None'):
    
    """
    This function will make a row that shows the pokemon's stats at a given level
    """
    
    # Taking the row of the pokemon's name and base stats
    
    pokemon = data[data['Name'] == name]
    
    # Copying and inputting the formula from the website here and applying it to each specific stat
    
    hp = (((2 * pokemon['HP'] + iv_hp + (ev_hp / 4)) * level) / 100) + level + 10

    attack = (((2 *  pokemon['Attack'] + iv_atk + (ev_atk / 4)) * level) / 100) + 5

    defense = (((2 *  pokemon['Defense'] + iv_def + (ev_def / 4)) * level) / 100) + 5

    special_attack = (((2 *  pokemon['Sp. Atk'] + iv_spatk + (ev_spatk / 4)) * level) / 100) + 5

    special_defense = (((2 *  pokemon['Sp. Def'] + iv_spdef + (ev_spdef / 4)) * level) / 100) + 5

    speed = (((2 *  pokemon['Speed'] + iv_speed + (ev_speed / 4)) * level) / 100) + 5
    
    # Calculating nature into the equation
    
    if nature in nature_list[0]:
        if nature == nature_list[0][0]:
            

    # Creating the dictionary for the stats
    statdict = {
        'Name' : name,
        'Level': level,
        'HP': hp,
        'Attack': attack,
        'Defense': defense,
        'Sp. Atk': special_attack,
        'Sp. Def': special_defense,
        'Speed': speed
    }
    # Making the dictionary into a dataframe and dropping all the decimals on the stats
    statframe = pd.DataFrame(statdict).astype('int', errors='ignore')
    
    return statframe
    

In [12]:
# Trying out the function with some pokemons

stats('Bulbasaur', 50, ev_atk=70, ev_hp=74)

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Bulbasaur,50,114,62,54,70,70,50


In [13]:
stats('Rayquaza', 35)

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
425,Rayquaza,35,118,110,68,110,68,71


In [14]:
stats('Froakie', 10, ev_speed=252)

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
724,Froakie,10,28,16,13,17,13,25


Now I will make the nature's table, as it has an impact on the stats of the pokemons

In [61]:
nature = [
    ['Hardy', 'Lonely', 'Adamant', 'Naughty', 'Brave'],
    ['Bold', 'Docile', 'Impish', 'Lax', 'Relaxed'],
    ['Modest', 'Mild', 'Bashful', 'Rash', 'Quiet'],
    ['Calm', 'Gentle', 'Careful', 'Quirky', 'Sassy'],
    ['Timid', 'Hasty', 'Jolly', 'Naive', 'Serious']
]



The function seems to work perfectly!

In [15]:
# Putting together multiple pokemons on one table

bulbasaur_10 = stats('Bulbasaur', 10)
bulbasaur_50 = stats('Bulbasaur', 50)
bulbasaur_100 = stats('Bulbasaur', 100)

bulbasaur_starter_table = bulbasaur_10.merge(bulbasaur_50, how='outer').merge(bulbasaur_100, how='outer')

bulbasaur_starter_table

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Bulbasaur,10,29,14,14,18,18,14
1,Bulbasaur,50,105,54,54,70,70,50
2,Bulbasaur,100,200,103,103,135,135,95


In [16]:
# I will make a function that uses the previous function to create a table with the stats for level 10, 50, and 100 on a
# single pokemon


def get_all(name):
    
    
    # Pulling the specified level's stats for the pokemon
    
    new_10 = stats(name, 10)
    new_50 = stats(name, 50)
    new_100 = stats(name, 100)
    
    merged = new_10.merge(new_50, how='outer').merge(new_100, how='outer')
    
    return merged
    
    

In [17]:
get_all('Charizard')

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Charizard,10,35,21,20,26,22,25
1,Charizard,50,138,89,83,114,90,105
2,Charizard,100,266,173,161,223,175,205


Now I'm ready to make the comparison of all the starters in one table

## Starters comparison per Generation

I will create tables for each generation and see how the starters stats look like.

In [18]:
# Getting the name of the starters from each generation

gen1_starters = ['Bulbasaur', 'Charmander', 'Squirtle', 'Pikachu', 'Eevee']

gen2_starters = ['Chikorita', 'Cyndaquil', 'Totodile']

gen3_starters = ['Treecko', 'Torchic', 'Mudkip']

gen4_starters = ['Turtwig', 'Chimchar', 'Piplup']

gen5_starters = ['Snivy', 'Tepig', 'Oshawott']

gen6_starters = ['Chespin', 'Fennekin', 'Froakie']

In [19]:
# Making a function that prints out all the starters' pokemon stats at 10, 50, and 100

def table_creator(generation:list):
    
    table = get_all(generation[0])
    
    for i in range(1, len(generation)):
        table = table.merge(get_all(generation[i]), how='outer')
    
    return table


Now I'm going to test the function.

In [20]:
gen1_table = table_creator(gen1_starters)
gen1_table

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Bulbasaur,10,29,14,14,18,18,14
1,Bulbasaur,50,105,54,54,70,70,50
2,Bulbasaur,100,200,103,103,135,135,95
3,Charmander,10,27,15,13,17,15,18
4,Charmander,50,99,57,48,65,55,70
5,Charmander,100,188,109,91,125,105,135
6,Squirtle,10,28,14,18,15,17,13
7,Squirtle,50,104,53,70,55,69,48
8,Squirtle,100,198,101,135,105,133,91
9,Pikachu,10,27,16,13,15,15,23


It works perfectly. I will pull up the rest of the starters' stats.

In [21]:
gen2_table = table_creator(gen2_starters)
gen2_table

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Chikorita,10,29,14,18,14,18,14
1,Chikorita,50,105,54,70,54,70,50
2,Chikorita,100,200,103,135,103,135,95
3,Cyndaquil,10,27,15,13,17,15,18
4,Cyndaquil,50,99,57,48,65,55,70
5,Cyndaquil,100,188,109,91,125,105,135
6,Totodile,10,30,18,17,13,14,13
7,Totodile,50,110,70,69,49,53,48
8,Totodile,100,210,135,133,93,101,91


In [22]:
gen3_table = table_creator(gen3_starters)
gen3_table

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Treecko,10,28,14,12,18,16,19
1,Treecko,50,100,50,40,70,60,75
2,Treecko,100,190,95,75,135,115,145
3,Torchic,10,29,17,13,19,15,14
4,Torchic,50,105,65,45,75,55,50
5,Torchic,100,200,125,85,145,105,95
6,Mudkip,10,30,19,15,15,15,13
7,Mudkip,50,110,75,55,55,55,45
8,Mudkip,100,210,145,105,105,105,85


In [23]:
gen4_table = table_creator(gen4_starters)
gen4_table

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Turtwig,10,31,18,17,14,16,11
1,Turtwig,50,115,73,69,50,60,36
2,Turtwig,100,220,141,133,95,115,67
3,Chimchar,10,28,16,13,16,13,17
4,Chimchar,50,104,63,49,63,49,66
5,Chimchar,100,198,121,93,121,93,127
6,Piplup,10,30,15,15,17,16,13
7,Piplup,50,113,56,58,66,61,45
8,Piplup,100,216,107,111,127,117,85


In [24]:
gen5_table = table_creator(gen5_starters)
gen5_table

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Snivy,10,29,14,16,14,16,17
1,Snivy,50,105,50,60,50,60,68
2,Snivy,100,200,95,115,95,115,131
3,Tepig,10,33,17,14,14,14,14
4,Tepig,50,125,68,50,50,50,50
5,Tepig,100,240,131,95,95,95,95
6,Oshawott,10,31,16,14,17,14,14
7,Oshawott,50,115,60,50,68,50,50
8,Oshawott,100,220,115,95,131,95,95


In [25]:
gen6_table = table_creator(gen6_starters)
gen6_table

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Chespin,10,31,17,18,14,14,12
1,Chespin,50,116,66,70,53,50,43
2,Chespin,100,222,127,135,101,95,81
3,Fennekin,10,28,14,13,17,17,17
4,Fennekin,50,100,50,45,67,65,65
5,Fennekin,100,190,95,85,129,125,125
6,Froakie,10,28,16,13,17,13,19
7,Froakie,50,101,61,45,67,49,76
8,Froakie,100,192,117,85,129,93,147


In [54]:
def growth(name):
    new = stats(name, 1)
    for i in range(2, 101):
        new = new.merge(stats(name, i), how='outer')
        
    return new
        

In [57]:
bulbasaur = growth('Bulbasaur')
bulbasaur

Unnamed: 0,Name,Level,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,Bulbasaur,1,11,5,5,6,6,5
1,Bulbasaur,2,13,6,6,7,7,6
2,Bulbasaur,3,15,7,7,8,8,7
3,Bulbasaur,4,17,8,8,10,10,8
4,Bulbasaur,5,19,9,9,11,11,9
...,...,...,...,...,...,...,...,...
95,Bulbasaur,96,192,99,99,129,129,91
96,Bulbasaur,97,194,100,100,131,131,92
97,Bulbasaur,98,196,101,101,132,132,93
98,Bulbasaur,99,198,102,102,133,133,94


In [60]:
fig = px.bar(bulbasaur, x='Level', y=['HP', 'Attack', 'Defense'])
fig.show()