### Analysis Questions to answer in this dataset

##### General Analysis Questions
- How do the average stats (HP, Attack, Defense, Sp. Atk, Sp. Def, and Speed) compare across different generations of Pokémon?
- Which Pokémon type (Type 1) has the highest average Attack stat? What about the lowest?
- Is there a significant difference in the average total stats between Legendary and non-Legendary Pokémon?
- What is the distribution of Pokémon types (Type 1 and Type 2) in the dataset? Which types are the most and least common?

##### Comparative and Relationship Questions
- Is there a correlation between a Pokémon's Speed and its Attack stat?
- How do the stats of Pokémon with a dual type (e.g., Fire/Flying) compare to those with a single type?
- Which Pokémon has the highest individual stat for each category (HP, Attack, Defense, etc.)?

##### Specific Analysis Questions
- How many Pokémon are present in each generation?
- What is the most common Pokémon type combination?
- How do the average stats of each Pokémon type (Type 1) compare to each other? For instance, do Water-type Pokémon generally have higher HP than Fire-type Pokémon?

### Importing libraries and Data Processing

In [40]:
import pandas as pd
import numpy as np
import  matplotlib.pyplot as plt

In [4]:
#Loading the dataset
data = pd.read_csv("pokemon_data.csv")

In [44]:
data["Total_Stats"] = data["HP"] + data["Attack"] + data["Defense"] + data["Sp. Atk"] + data["Sp. Def"] + data["Speed"]

In [87]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   #            800 non-null    int64 
 1   Name         800 non-null    object
 2   Type 1       800 non-null    object
 3   Type 2       414 non-null    object
 4   HP           800 non-null    int64 
 5   Attack       800 non-null    int64 
 6   Defense      800 non-null    int64 
 7   Sp. Atk      800 non-null    int64 
 8   Sp. Def      800 non-null    int64 
 9   Speed        800 non-null    int64 
 10  Generation   800 non-null    int64 
 11  Legendary    800 non-null    bool  
 12  Total_Stats  800 non-null    int64 
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


In [6]:
data.head()

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,39,52,43,60,50,65,1,False


In [41]:
data.describe()

Unnamed: 0,#,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Total_Stats
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375,435.1025
std,208.343798,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129,119.96304
min,1.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0,180.0
25%,184.75,50.0,55.0,50.0,49.75,50.0,45.0,2.0,330.0
50%,364.5,65.0,75.0,70.0,65.0,70.0,65.0,3.0,450.0
75%,539.25,80.0,100.0,90.0,95.0,90.0,90.0,5.0,515.0
max,721.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0,780.0


In [146]:
stats_cols = data[['HP','Attack','Defense','Sp. Atk','Sp. Def','Speed']]

### How do the average stats compare across different generations of Pokémon?

In [42]:
#Stats distribution based on generation
avg_gen_stats = data.groupby("Generation")[stats_cols].mean()
print(avg_gen_stats)

                   HP     Attack    Defense    Sp. Atk    Sp. Def      Speed
Generation                                                                  
1           65.819277  76.638554  70.861446  71.819277  69.090361  72.584337
2           71.207547  72.028302  73.386792  65.943396  73.905660  61.811321
3           66.543750  81.625000  74.100000  75.806250  71.225000  66.925000
4           73.082645  82.867769  78.132231  76.404959  77.190083  71.338843
5           71.787879  82.066667  72.327273  71.987879  68.739394  68.078788
6           68.268293  75.804878  76.682927  74.292683  74.890244  66.439024


### Which Pokémon type (Type 1) has the highest average Attack stat? What about the lowest?

In [53]:
avg_type1_stats = data.groupby("Type 1")[stats_cols].mean()
print(avg_type1_stats)

                 HP      Attack     Defense    Sp. Atk    Sp. Def       Speed
Type 1                                                                       
Bug       56.884058   70.971014   70.724638  53.869565  64.797101   61.681159
Dark      66.806452   88.387097   70.225806  74.645161  69.516129   76.161290
Dragon    83.312500  112.125000   86.375000  96.843750  88.843750   83.031250
Electric  59.795455   69.090909   66.295455  90.022727  73.704545   84.500000
Fairy     74.117647   61.529412   65.705882  78.529412  84.705882   48.588235
Fighting  69.851852   96.777778   65.925926  53.111111  64.703704   66.074074
Fire      69.903846   84.769231   67.769231  88.980769  72.211538   74.442308
Flying    70.750000   78.750000   66.250000  94.250000  72.500000  102.500000
Ghost     64.437500   73.781250   81.187500  79.343750  76.468750   64.343750
Grass     67.271429   73.214286   70.800000  77.500000  70.428571   61.928571
Ground    73.781250   95.750000   84.843750  56.468750  62.75000

### Is there a significant difference in the average total stats between Legendary and non-Legendary Pokémon?

In [52]:
avg_legend_stats = data.groupby("Legendary")[stats_cols].mean()
print(avg_legend_stats)

                  HP      Attack    Defense     Sp. Atk     Sp. Def  \
Legendary                                                             
False      67.182313   75.669388  71.559184   68.454422   68.892517   
True       92.738462  116.676923  99.661538  122.184615  105.938462   

                Speed  
Legendary              
False       65.455782  
True       100.184615  


### What is the distribution of Pokémon types (Type 1 and Type 2) in the dataset? Which types are the most and least common?

In [67]:
data_type = data[["Type 1","Type 2"]]
data_type.value_counts()

Type 1    Type 2
Normal    Flying    24
Grass     Poison    15
Bug       Flying    14
          Poison    12
Ghost     Grass     10
                    ..
Fire      Rock       1
Ice       Ghost      1
Fire      Dragon     1
Fighting  Flying     1
Water     Steel      1
Name: count, Length: 136, dtype: int64

###  Is there a correlation between a Pokémon's Speed and its Attack stat?

In [71]:
data[['Attack','Speed']]

Unnamed: 0,Attack,Speed
0,49,45
1,62,60
2,82,80
3,100,80
4,52,65
...,...,...
795,100,50
796,160,110
797,110,70
798,160,80


### How do the stats of Pokémon with a dual type compare to those with a single type?

In [88]:
single_type = data[data["Type 2"].isna()]
multi_type = data[data["Type 2"].notna()]

In [97]:
single_type['Total_Stats'].min()

180

In [98]:
multi_type['Total_Stats'].min()

190

### Which Pokémon has the highest individual stat for each category (HP, Attack, Defense, etc.)?


In [121]:
max_stats = {}
for col in stats_cols.columns:
    max_stats[col] = stats_cols[col].max()
    
for stat, value in max_stats.items():
    pokemon = data.loc[data[stat] == value, "Name"].values
    print(f"{stat} ({value}) {', '.join(pokemon)}")

HP (255) Blissey
Attack (190) MewtwoMega Mewtwo X
Defense (230) SteelixMega Steelix, Shuckle, AggronMega Aggron
Sp. Atk (194) MewtwoMega Mewtwo Y
Sp. Def (230) Shuckle
Speed (180) DeoxysSpeed Forme


### How many Pokémon are present in each generation?

In [128]:
pokemon_count = data.groupby("Generation").count()
pokemon_count['#']

Generation
1    166
2    106
3    160
4    121
5    165
6     82
Name: #, dtype: int64

### What is the most common Pokémon type combination?

In [136]:
data["Type 1"].value_counts()

Type 1
Water       112
Normal       98
Grass        70
Bug          69
Psychic      57
Fire         52
Electric     44
Rock         44
Dragon       32
Ground       32
Ghost        32
Dark         31
Poison       28
Steel        27
Fighting     27
Ice          24
Fairy        17
Flying        4
Name: count, dtype: int64

### How do the average stats of each Pokémon type (Type 1) compare to each other? For instance, do Water-type Pokémon generally have higher HP than Fire-type Pokémon?

In [148]:
type_data = data.groupby("Type 1")[stats_cols.columns].mean()
print(type_data)

                 HP      Attack     Defense    Sp. Atk    Sp. Def       Speed
Type 1                                                                       
Bug       56.884058   70.971014   70.724638  53.869565  64.797101   61.681159
Dark      66.806452   88.387097   70.225806  74.645161  69.516129   76.161290
Dragon    83.312500  112.125000   86.375000  96.843750  88.843750   83.031250
Electric  59.795455   69.090909   66.295455  90.022727  73.704545   84.500000
Fairy     74.117647   61.529412   65.705882  78.529412  84.705882   48.588235
Fighting  69.851852   96.777778   65.925926  53.111111  64.703704   66.074074
Fire      69.903846   84.769231   67.769231  88.980769  72.211538   74.442308
Flying    70.750000   78.750000   66.250000  94.250000  72.500000  102.500000
Ghost     64.437500   73.781250   81.187500  79.343750  76.468750   64.343750
Grass     67.271429   73.214286   70.800000  77.500000  70.428571   61.928571
Ground    73.781250   95.750000   84.843750  56.468750  62.75000

In [152]:
type_data.loc[["Water","Fire"]]

Unnamed: 0_level_0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
Type 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Water,72.0625,74.151786,72.946429,74.8125,70.517857,65.964286
Fire,69.903846,84.769231,67.769231,88.980769,72.211538,74.442308
