<a href="https://colab.research.google.com/github/rajafadhil/Digital-Skill-Fair-29.0---Data-Science/blob/main/Pokemon_Data_Anlysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Analyzing Pokémon Strength and Types: Are Legendary Pokémon Truly Stronger?

Popularity of Pokémon
Pokémon has been a global phenomenon since its inception in the late 1990s. What started as a video game series has grown into a cultural icon, spanning television shows, movies, trading card games, and a vast array of merchandise. The franchise has captivated millions of fans worldwide, with each new generation of Pokémon bringing fresh excitement and discoveries.

With the immense popularity of Pokémon, understanding the attributes and strengths of different Pokémon has always been a topic of interest among fans and players alike. This analysis aims to shed light on some of the most intriguing questions: Are legendary Pokémon truly superior in terms of their overall statistics? Do certain types consistently outperform others? And how have Pokémon evolved over different generations?

Let's embark on this analytical journey to uncover the truths behind the power of Pokémon.

Business Question:
1. How Relationship Between "Legendary" Status and Total Pokémon Performance?
2. Are There Specific Types with Consistently Higher Stats?
3. How Are Pokémon Stats Distributed Across Generations?

##Data Source and Info

Source: ALBERTO BARRADAS on Kaggle
https://www.kaggle.com/datasets/abcsds/pokemon/data

This data set includes 721 Pokemon, including their number, name, first and second type, and basic stats: HP, Attack, Defense, Special Attack, Special Defense, and Speed. It has been of great use when teaching statistics to kids. With certain types you can also give a geeky introduction to machine learning.

This are the raw attributes that are used for calculating how much damage an attack will do in the games. This dataset is about the pokemon games (NOT pokemon cards or Pokemon Go).

The data as described by Myles O'Neill is:

*   #: ID for each pokemon
*   Name: Name of each pokemon
* Type 1: Each pokemon has a type, this determines weakness/resistance to attacks
* Type 2: Some pokemon are dual type and have 2
* Total: sum of all stats that come after this, a general guide to how strong a pokemon is
* HP: hit points, or health, defines how much damage a pokemon can withstand before fainting
* Attack: the base modifier for normal attacks (eg. Scratch, Punch)
* Defense: the base damage resistance against normal attacks
* SP Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
* SP Def: the base damage resistance against special attacks
* Speed: determines which pokemon attacks first each round
* Generations: Pokémon Generations.
* Legendary: Status whether a Pokémon is in the legendary category or not (True/False).




##Import Library and Load Data

In [None]:
import pandas as pd
import plotly.express as px

df = pd.read_csv('Pokemon.csv')

df

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


##Understanding the Data Structure

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


In [None]:
df.describe()

Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


In [None]:
# Check for null or missing values
null_counts = df.isnull().sum()
print("Null values in each column:\n", null_counts)

# Check for duplicate rows
duplicate_rows = df.duplicated().sum()
print("Number of duplicate rows:", duplicate_rows)

Null values in each column:
 #               0
Name            0
Type 1          0
Type 2        386
Total           0
HP              0
Attack          0
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64
Number of duplicate rows: 0


Because only a few Pokémon have dual types, empty data in the Type 2 column can be considered normal.

Added Category "Legendary Status"

In [None]:
# Create a new column to categorize Legendary and Non-Legendary Pokémon
df['Legendary Status'] = df['Legendary'].apply(lambda x: 'Legendary' if x else 'Non-Legendary')

df

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Legendary Status
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,Non-Legendary
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,Non-Legendary
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,Non-Legendary
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,Non-Legendary
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,Non-Legendary
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True,Legendary
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True,Legendary
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True,Legendary
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True,Legendary


##Analysis: Relationship between Legendary Status and Total Performance

Objective: Understand whether legendary Pokémon are significantly stronger in terms of total statistics compared to non-legendary Pokémon.

Analysis: Compare the average and distribution of total values between legendary and non-legendary Pokémon.

Introduction:
Before diving deeper into the analysis, let's understand the basic concepts of legendary and non-legendary Pokémon. Legendary Pokémon are often considered stronger and rarer compared to regular Pokémon. But does the data support this assumption? Let's look at the distribution of total stats for legendary and non-legendary Pokémon.

In [None]:
# Create an interactive boxplot using Plotly
fig = px.box(df, x='Legendary Status', y='Total',
             title='Distribution of Total Stats for Legendary and Non-Legendary Pokémon',
             labels={'Total':'Total Stats', 'Legendary Status':'Legendary Status'})

# Show the interactive plot
fig.show()



####Visualization:
The above chart shows the distribution of total stats between legendary and non-legendary Pokémon. We can observe that legendary Pokémon generally have higher average total stats compared to non-legendary Pokémon, supporting the belief that they are indeed stronger overall.

##Analysis: Statistics by Pokémon Type

Objective: Determine if certain Pokémon types have specific advantages in terms of statistics (such as HP, Attack, Defense, etc.).

Analysis: Perform average and distribution analysis of each statistic for each Pokémon type and compare across types.

Introduction:
Now we will analyze if certain Pokémon types have consistent advantages in their statistics. By examining the average statistics for each type, we can identify which types perform best in various categories.

In [None]:
# Define the numeric columns
numeric_columns = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total']

# Group by Type 1 and calculate the mean of each numeric statistic
type_stats = df.groupby('Type 1')[numeric_columns].mean().reset_index()

# Melt the DataFrame to make it suitable for a radar chart
type_stats_melted = type_stats.melt(id_vars=['Type 1'], value_vars=numeric_columns, var_name='Stat', value_name='Average')

# Create an interactive bar chart using Plotly
fig = px.bar(type_stats_melted, x='Stat', y='Average', color='Type 1', barmode='group',
             title='Average Stats for Each Pokémon Type', labels={'Average':'Average Value', 'Stat':'Stat'})

# Show the interactive plot
fig.show()

####Visualization:
The above chart illustrates the average statistics for each Pokémon type. From this, we can see which types have strengths in specific stats. For example, the 'Dragon' type tends to have high average total stats, indicating it is one of the stronger types.

##Analysis: Statistical Distribution by Pokémon Generation

Objective: Determine if newer generations of Pokémon show an increase in stats compared to earlier generations.

Analysis: Compare the average and distribution of total, HP, Attack, Defense, etc., stats across generations to identify trends over time.

Introduction:
Finally, we will examine how Pokémon stats have evolved across different generations. Are newer generations stronger than the previous ones? By analyzing the distribution of stats across generations, we can uncover any trends in Pokémon performance over time.

In [None]:
# Define the numeric columns
numeric_columns = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total']

# Group by Generation and calculate the mean of each statistic
generation_stats = df.groupby('Generation')[numeric_columns].mean().reset_index()

# Melt the DataFrame to make it suitable for a bar chart
generation_stats_melted = generation_stats.melt(id_vars=['Generation'], value_vars=numeric_columns, var_name='Stat', value_name='Average')

# Create an interactive bar chart using Plotly
fig = px.bar(generation_stats_melted, x='Generation', y='Average', color='Stat', barmode='group',
             title='Average Stats for Each Pokémon Generation', labels={'Average':'Average Value', 'Generation':'Generation'})

# Show the interactive plot
fig.show()



####Visualization:
The above bar chart displays the average stats for each Pokémon generation. From this data, we can observe any trends in increasing or decreasing performance across generations, where generation 4 is the generation with the highest total stats.

#Conclusion

1. Relationship Between Legendary Status and Total Performance:
* Legendary Pokémon generally have higher average total stats compared to non-legendary Pokémon, supporting the notion that they are stronger.

2. Stats Based on Pokémon Types:
* Certain types, such as 'Dragon,' show higher average stats, indicating their overall strength.

3. Stats Distribution Across Generations:
* The analysis of stats distribution across generations reveals trends in the evolution of Pokémon strength, providing insights into how Pokémon have developed over time. Notably, Generation 2 has the lowest average total stats at 428.3 points, while Generation 4 has the highest average total stats at 459 points.