# Pokemon DataSet

The dataset contains 800 rows and 13 columns

* #: ID for each pokemon
* Name: Name of each pokemon
* Type 1: Each pokemon has a type, this determines weakness/resistance to   attacks
* Type 2: Some pokemon are dual type and have 2
* Total: sum of all stats that come after this, a general guide to how strong a pokemon is
* HP: hit points, or health, defines how much damage a pokemon can withstand before fainting
* Attack: the base modifier for normal attacks (eg. Scratch, Punch)
* Defense: the base damage resistance against normal attacks
* SP Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
* SP Def: the base damage resistance against special attacks
* Speed: determines which pokemon attacks first each round



The objective is to answer the following question on te basis of provided dataset.

1. Which pokemon has the highest speed and whaich one has the lowest speed?
2. The pokemon belongs to low HP category or high HP category?

### Import library / libraries

Pandas is used for the data analysis and the data manipulation

In [1]:
import pandas as pd

### Load Dataset

Loading the dataset which is available in the form of .csv format  

In [2]:
df = pd.read_csv('Pokemon.csv')

In [3]:
# Explore how many rows and columns present in the dataset

df.shape

(800, 13)

In [4]:
# Present first five rows from the dataset

df.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [5]:
# Present last five rows from the dataset 

df.tail()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True
799,721,Volcanion,Fire,Water,600,80,110,120,130,90,70,6,True


### Structure of the Data

Data structure plays an important role for the manipulation of the data so checking all the data structures of the features (columns)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


### Handle missing values

If the dataset has missing values in the entries than model can be biased and reults would include error in the prediction. Most of the times dataset are not free from missing values in the entries (rows) so we have to ensure that the dataset is free from missing values.

In [7]:
df.isnull().sum()

#               0
Name            0
Type 1          0
Type 2        386
Total           0
HP              0
Attack          0
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64

Type 2 contains 386 missing values, which means that these 386 pokemon have only one type.

## Add High_Low_Speed column in the same dataset.

Speed feature define the pace / momentum of attacking the opponent's Pokemon as well as how well the pokemon will defend itself, So categorized all the Pokemons as low or high speed. 

### 1. High_Low_Speed

In [8]:
# Mean of the Speed column

mean_speed = df.Speed.mean()
mean_speed

68.2775

In [9]:
# Defining the function on which category (low speed / high speed) pokemon belongs

def high_low_speed(speed):
    if speed < mean_speed:
        return "Low Speed"
    else:
        return "High Speed"

In [10]:
# Add High_Low_Speed column in the dataset

df['High_Low_Speed'] = df['Speed'].apply(high_low_speed)

In [11]:
# Checking the updated Dataset

df

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,High_Low_Speed
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,Low Speed
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,Low Speed
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,High Speed
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,High Speed
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,Low Speed
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True,Low Speed
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True,High Speed
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True,High Speed
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True,High Speed


## Add High_Low_HP column in the same dataset.

HP feature define the hit points of the Pokemon's Stamina and health so whenever it hits the opponents Pokemon some hp decrease because of the attack, So categorized all the Pokemons as low or high HP. 

### 2. High_Low_HP

In [12]:
# Mean of the HP column

mean_hp = df.HP.mean()
mean_hp

69.25875

In [13]:
# Defining the function on which category (low HP / high HP) pokemon belongs

def high_low_hp(hp):
    if hp < mean_hp:
        return "Low HP"
    else:
        return "High HP"

In [14]:
# Add High_Low_HP column in the dataset

df['High_Low_HP'] = df['HP'].apply(high_low_hp)

In [15]:
# Final dataset which contains category of low/high speed Pokemons as well as low/high HP Pokemons

df

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,High_Low_Speed,High_Low_HP
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,Low Speed,Low HP
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,Low Speed,Low HP
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,High Speed,High HP
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,High Speed,High HP
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,Low Speed,Low HP
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True,Low Speed,Low HP
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True,High Speed,Low HP
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True,High Speed,High HP
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True,High Speed,High HP


In [16]:
df['High_Low_Speed'].value_counts()

Low Speed     424
High Speed    376
Name: High_Low_Speed, dtype: int64

In [17]:
df['High_Low_HP'].value_counts()

Low HP     422
High HP    378
Name: High_Low_HP, dtype: int64

## Conclusion

1. We have categorized the Pokemons on the basis of their speed and we have a count available based on the set criteria which is the mean in this case.

2. We have categorized the Pokemons on the basis of their HP and we have a count available based on the set criteria which is the mean in this case.
