# Exercise 3

<img src="https://img.itch.zone/aW1nLzM0MzUxOTUuanBn/original/FuMxog.jpg" />

In this exercise, you will perform EDA on the popular Pokémon dataset, which contains detailed information about hundreds of Pokémon, including their stats, types, generation, and whether they are legendary.

The Pokémon dataset is ideal for practicing EDA because it includes a mix of numerical features (such as Attack, Defense, Speed, HP) and categorical features (such as Type 1, Type 2, and Generation). This combination allows you to apply a wide range of EDA techniques, including summary statistics, data visualizations, grouping and aggregation, correlation analysis, and comparisons across categories.

Throughout the exercise, you will explore questions such as:
- Which Pokémon tend to have the highest or lowest stats?
- How do different Pokémon types compare in terms of strength or defense?
- Do Legendary Pokémon differ significantly from non-Legendary ones?
- How are various stats distributed across the entire dataset?
- Are there relationships or trade-offs between certain attributes (e.g., Attack vs. Defense)?

In [703]:
import kagglehub
import os
import pandas as pd

In [704]:
# Download latest version
path = kagglehub.dataset_download("abcsds/pokemon")
print("Path to dataset files:", path)

Using Colab cache for faster access to the 'pokemon' dataset.
Path to dataset files: /kaggle/input/pokemon


In [705]:
if os.path.isdir(path):
  print(True)

contents = os.listdir(path)
contents

mydataset = path + "/" + contents[0]
mydataset


df = pd.read_csv(mydataset)

True



## 1: Data Understanding (4 pts)

1. Display the first 10 rows.

In [659]:
df.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


2. Show dataset shape.

In [706]:
df.shape

(800, 13)

3. Show all columns and its data types.

In [707]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


4. Identify which columns contain missing values.

In [662]:
df.isnull().sum()

Unnamed: 0,0
#,0
Name,0
Type 1,0
Type 2,386
Total,0
HP,0
Attack,0
Defense,0
Sp. Atk,0
Sp. Def,0


## 2. Summary Statistics (4 pts)

1. Generate `df.describe()`.

In [663]:
df.describe()

Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


2. Get mean, median, and mode of Attack.

In [664]:
print(f"Mean Attack: {df['Attack'].mean()}")
print(f"Median Attack: {df['Attack'].median()}")
print(f"Mode Attack: {df['Attack'].mode().tolist()}")

Mean Attack: 79.00125
Median Attack: 75.0
Mode Attack: [100]


3. Compute 25th and 75th percentiles for HP.

In [665]:
hp_25th_percentile = df['HP'].quantile(0.25)
hp_75th_percentile = df['HP'].quantile(0.75)

print(f"25th percentile for HP: {hp_25th_percentile}")
print(f"75th percentile for HP: {hp_75th_percentile}")

25th percentile for HP: 50.0
75th percentile for HP: 80.0


4. Compute standard deviation and variance of Speed.

In [666]:
print(f"Standard Deviation of Speed: {df['Speed'].std()}")
print(f"Variance of Speed: {df['Speed'].var()}")

Standard Deviation of Speed: 29.06047371716149
Variance of Speed: 844.5111326658338


## 3. Filtering & Selection `(7 pts)`

Select all Pokémon with Attack > 100.

In [667]:
df.query('Attack > 100')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
12,9,BlastoiseMega Blastoise,Water,,630,79,103,120,135,115,78,1,False
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False
39,34,Nidoking,Poison,Ground,505,81,102,77,85,75,85,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
793,717,Yveltal,Dark,Flying,680,126,131,95,131,98,99,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


Select all Pokémon whose primary type (Type 1) is "Fire".

In [668]:
df.query('`Type 1` == "Fire"')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
42,37,Vulpix,Fire,,299,38,41,40,50,65,65,1,False
43,38,Ninetales,Fire,,505,73,76,75,81,100,100,1,False
63,58,Growlithe,Fire,,350,55,70,45,70,50,60,1,False
64,59,Arcanine,Fire,,555,90,110,80,100,80,95,1,False
83,77,Ponyta,Fire,,410,50,85,55,65,65,90,1,False


Select all Pokémon that are Legendary.

In [669]:
df.query('Legendary == True')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
156,144,Articuno,Ice,Flying,580,90,85,100,95,125,85,1,True
157,145,Zapdos,Electric,Flying,580,90,90,85,125,90,100,1,True
158,146,Moltres,Fire,Flying,580,90,100,90,125,85,90,1,True
162,150,Mewtwo,Psychic,,680,106,110,90,154,90,130,1,True
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,780,106,190,100,154,100,130,1,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


Select all Pokémon that are Generation 1 AND Legendary.

In [670]:
df.query('Generation == 1 and Legendary == True')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
156,144,Articuno,Ice,Flying,580,90,85,100,95,125,85,1,True
157,145,Zapdos,Electric,Flying,580,90,90,85,125,90,100,1,True
158,146,Moltres,Fire,Flying,580,90,100,90,125,85,90,1,True
162,150,Mewtwo,Psychic,,680,106,110,90,154,90,130,1,True
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,780,106,190,100,154,100,130,1,True
164,150,MewtwoMega Mewtwo Y,Psychic,,780,106,150,70,194,120,140,1,True


Select all Pokémon that are Water type OR Grass type.

In [671]:
df.query('`Type 1` == "Water" or `Type 1` == "Grass"')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
726,658,Greninja,Water,Dark,530,72,95,67,103,71,122,6,False
740,672,Skiddo,Grass,,350,66,65,48,62,57,52,6,False
741,673,Gogoat,Grass,,531,123,100,62,97,81,68,6,False
762,692,Clauncher,Water,,330,50,53,62,58,63,44,6,False


In [672]:
df.query('`Type 2` == "Water" or `Type 2` == "Grass"')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
51,46,Paras,Bug,Grass,285,35,70,55,45,55,25,1,False
52,47,Parasect,Bug,Grass,405,60,95,80,60,80,30,1,False
149,138,Omanyte,Rock,Water,355,35,40,100,90,55,35,1,False
150,139,Omastar,Rock,Water,495,70,60,125,115,70,55,1,False
151,140,Kabuto,Rock,Water,355,30,80,90,55,45,55,1,False
152,141,Kabutops,Rock,Water,495,60,115,105,65,70,80,1,False
271,251,Celebi,Psychic,Grass,600,100,100,100,100,100,100,2,False
293,270,Lotad,Water,Grass,220,40,30,30,40,50,30,3,False
294,271,Lombre,Water,Grass,340,60,50,50,60,70,50,3,False
295,272,Ludicolo,Water,Grass,480,80,70,70,90,100,70,3,False


In [673]:
##################################
df.query('(`Type 1` == "Water" or `Type 1` == "Grass") or (`Type 2` == "Water" or `Type 2` == "Grass")')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
784,711,GourgeistAverage Size,Ghost,Grass,494,65,90,122,58,75,84,6,False
785,711,GourgeistSmall Size,Ghost,Grass,494,55,85,122,58,75,99,6,False
786,711,GourgeistLarge Size,Ghost,Grass,494,75,95,122,58,75,69,6,False
787,711,GourgeistSuper Size,Ghost,Grass,494,85,100,122,58,75,54,6,False


Select all Pokémon that are Fire type AND Attack > 120.

In [674]:
df.query('`Type 1` == "Fire" and Attack > 120')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
147,136,Flareon,Fire,,525,65,130,60,95,110,65,1,False
270,250,Ho-oh,Fire,Flying,680,106,130,90,110,154,90,2,True
279,257,BlazikenMega Blaziken,Fire,Fighting,630,80,160,80,130,80,100,3,False
559,500,Emboar,Fire,Fighting,528,110,123,65,100,65,65,5,False
615,555,DarmanitanStandard Mode,Fire,,480,105,140,55,30,55,95,5,False


In [675]:
df.query('`Type 2` == "Fire" and Attack > 120')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
424,383,GroudonPrimal Groudon,Ground,Fire,770,100,180,160,150,90,90,3,True


In [676]:
###########################
df.query('(`Type 1` == "Fire" and Attack > 120) or (`Type 2` == "Fire" and Attack > 120)')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
147,136,Flareon,Fire,,525,65,130,60,95,110,65,1,False
270,250,Ho-oh,Fire,Flying,680,106,130,90,110,154,90,2,True
279,257,BlazikenMega Blaziken,Fire,Fighting,630,80,160,80,130,80,100,3,False
424,383,GroudonPrimal Groudon,Ground,Fire,770,100,180,160,150,90,90,3,True
559,500,Emboar,Fire,Fighting,528,110,123,65,100,65,65,5,False
615,555,DarmanitanStandard Mode,Fire,,480,105,140,55,30,55,95,5,False


Select all Pokémon whose type is in this list:
`["Dragon", "Ghost", "Dark"]`.

In [677]:
df[df['Type 1'].isin(['Dragon', 'Ghost', 'Dark'])]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
99,92,Gastly,Ghost,Poison,310,30,35,30,100,35,80,1,False
100,93,Haunter,Ghost,Poison,405,45,50,45,115,55,95,1,False
101,94,Gengar,Ghost,Poison,500,60,65,60,130,75,110,1,False
102,94,GengarMega Gengar,Ghost,Poison,600,60,65,80,170,95,130,1,False
159,147,Dratini,Dragon,,300,41,64,45,50,50,50,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
785,711,GourgeistSmall Size,Ghost,Grass,494,55,85,122,58,75,99,6,False
786,711,GourgeistLarge Size,Ghost,Grass,494,75,95,122,58,75,69,6,False
787,711,GourgeistSuper Size,Ghost,Grass,494,85,100,122,58,75,54,6,False
793,717,Yveltal,Dark,Flying,680,126,131,95,131,98,99,6,True


In [678]:
#############
df[df['Type 1'].isin(['Dragon', 'Ghost', 'Dark']) | df['Type 2'].isin(['Dragon', 'Ghost', 'Dark'])]

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
99,92,Gastly,Ghost,Poison,310,30,35,30,100,35,80,1,False
100,93,Haunter,Ghost,Poison,405,45,50,45,115,55,95,1,False
101,94,Gengar,Ghost,Poison,500,60,65,60,130,75,110,1,False
102,94,GengarMega Gengar,Ghost,Poison,600,60,65,80,170,95,130,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
791,715,Noivern,Flying,Dragon,535,85,70,80,97,80,123,6,False
793,717,Yveltal,Dark,Flying,680,126,131,95,131,98,99,6,True
794,718,Zygarde50% Forme,Dragon,Ground,600,108,100,121,81,95,95,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True


## 4. Categorical Exploration `(9 pts)`

Find the number of Pokémon per primary type.

In [679]:
df['Type 1'].value_counts()

Unnamed: 0_level_0,count
Type 1,Unnamed: 1_level_1
Water,112
Normal,98
Grass,70
Bug,69
Psychic,57
Fire,52
Rock,44
Electric,44
Ground,32
Ghost,32


Find the number of Pokémon per generation.

In [680]:
df['Generation'].value_counts()

Unnamed: 0_level_0,count
Generation,Unnamed: 1_level_1
1,166
5,165
3,160
4,121
2,106
6,82


Which type appears the most? Which appears the least?

In [681]:
all_type_counts = df['Type 1'].value_counts().add(df['Type 2'].value_counts(), fill_value=0)
most_common_overall_type = all_type_counts.idxmax()
least_common_overall_type = all_type_counts.idxmin()

print(f"The most common type is: {most_common_overall_type} ({int(all_type_counts.max())} Pokémon)")
print(f"The least common type is: {least_common_overall_type} ({int(all_type_counts.min())} Pokémon)")

print("\nAll types by total count (top 10):\n", all_type_counts.nlargest(10))
print("\nAll types by total count (bottom 5):\n", all_type_counts.nsmallest(5))

The most common type is: Water (126 Pokémon)
The least common type is: Ice (38 Pokémon)

All types by total count (top 10):
 Water      126
Normal     102
Flying     101
Grass       95
Psychic     90
Bug         72
Ground      67
Fire        64
Poison      62
Rock        58
Name: count, dtype: int64

All types by total count (bottom 5):
 Ice       38
Fairy     40
Ghost     46
Steel     49
Dragon    50
Name: count, dtype: int64


In [682]:
# TYPE 1 ONLY

type_counts = df['Type 1'].value_counts()
most_common_type = type_counts.idxmax()
least_common_type = type_counts.idxmin()

print(f"most common type 1 is: {most_common_type} ({type_counts.max()} Pokémon)")
print(f"least common type 1 is: {least_common_type} ({type_counts.min()} Pokémon)")

most common type 1 is: Water (112 Pokémon)
least common type 1 is: Flying (4 Pokémon)


In [683]:
# TYPE 2 ONLY

type_counts = df['Type 2'].value_counts()
most_common_type = type_counts.idxmax()
least_common_type = type_counts.idxmin()

print(f"most common type 2 is: {most_common_type} ({type_counts.max()} Pokémon)")
print(f"least common type 2 is: {least_common_type} ({type_counts.min()} Pokémon)")

most common type 2 is: Flying (97 Pokémon)
least common type 2 is: Bug (3 Pokémon)


How many unique primary types (Type 1) exist?

In [684]:
print(f"no. of unique type 1: {df['Type 1'].nunique()}")

no. of unique type 1: 18


How many unique secondary types (Type 2) exist?

In [685]:
print(f"no. of unique type 2: {df['Type 2'].nunique()}")

no. of unique type 2: 18



Which primary types have the most dual-type combinations?

In [686]:
dual_type_counts = df.groupby('Type 1')['Type 2'].nunique().sort_values(ascending=False)
most_dual_type = dual_type_counts.index[0]
num_combinations = dual_type_counts.iloc[0]

print(f"The primary type with the most dual-type combinations is '{most_dual_type}' with {num_combinations} unique secondary types excluding none type 2.")
print("\nAll primary types by number of unique dual-type combinations (not including type 2 none):\n", dual_type_counts)

The primary type with the most dual-type combinations is 'Water' with 14 unique secondary types excluding none type 2.

All primary types by number of unique dual-type combinations (not including type 2 none):
 Type 1
Water       14
Rock        12
Bug         11
Electric    10
Grass       10
Fire         9
Ground       9
Dark         8
Steel        8
Dragon       7
Normal       7
Poison       7
Psychic      7
Ghost        6
Ice          5
Fighting     4
Flying       1
Fairy        1
Name: Type 2, dtype: int64


Which type has the highest mean Attack?

In [687]:
mean_attack_by_type = df.groupby('Type 1')['Attack'].mean()
highest_attack_type = mean_attack_by_type.idxmax()
highest_mean_attack = mean_attack_by_type.max()

print(f"The type with the highest mean Attack is '{highest_attack_type}' with an average Attack of {highest_mean_attack:.2f}.")

The type with the highest mean Attack is 'Dragon' with an average Attack of 112.12.


Which type has the lowest mean Defense?

In [688]:
mean_defense_by_type = df.groupby('Type 1')['Defense'].mean()
lowest_defense_type = mean_defense_by_type.idxmin()
lowest_mean_defense = mean_defense_by_type.min()

print(f"lowest mean Defense: '{lowest_defense_type}' with an average Defense of {lowest_mean_defense:.2f}.")

lowest mean Defense: 'Normal' with an average Defense of 59.85.



Which generation has the highest average Speed?

In [689]:
mean_speed_by_generation = df.groupby('Generation')['Speed'].mean()
highest_speed_generation = mean_speed_by_generation.idxmax()
highest_mean_speed = mean_speed_by_generation.max()

print(f"highest average Speed: Generation {highest_speed_generation} with an average Speed of {highest_mean_speed:.2f}.")

highest average Speed: Generation 1 with an average Speed of 72.58.


## 5. Groupby & Aggregation `(13 pts)`

Compute the average Attack per primary type.

In [690]:
df.groupby('Type 1')['Attack'].mean().sort_values(ascending=False)

Unnamed: 0_level_0,Attack
Type 1,Unnamed: 1_level_1
Dragon,112.125
Fighting,96.777778
Ground,95.75
Rock,92.863636
Steel,92.703704
Dark,88.387097
Fire,84.769231
Flying,78.75
Poison,74.678571
Water,74.151786


Compute the maximum HP per generation.


In [691]:
df.groupby('Generation')['HP'].max()

Unnamed: 0_level_0,HP
Generation,Unnamed: 1_level_1
1,250
2,255
3,170
4,150
5,165
6,126


Compute the total number of Pokémon per primary type.

In [692]:
df['Type 1'].value_counts()

Unnamed: 0_level_0,count
Type 1,Unnamed: 1_level_1
Water,112
Normal,98
Grass,70
Bug,69
Psychic,57
Fire,52
Rock,44
Electric,44
Ground,32
Ghost,32


For each type, compute:

- mean Attack

- mean Defense

- mean Speed

In [693]:
df.groupby('Type 1')[['Attack', 'Defense', 'Speed']].mean()

Unnamed: 0_level_0,Attack,Defense,Speed
Type 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bug,70.971014,70.724638,61.681159
Dark,88.387097,70.225806,76.16129
Dragon,112.125,86.375,83.03125
Electric,69.090909,66.295455,84.5
Fairy,61.529412,65.705882,48.588235
Fighting,96.777778,65.925926,66.074074
Fire,84.769231,67.769231,74.442308
Flying,78.75,66.25,102.5
Ghost,73.78125,81.1875,64.34375
Grass,73.214286,70.8,61.928571


For each generation, compute:

- count

- mean Total

- number of Legendary Pokémon (hint: use sum on Boolean)

In [694]:
df.groupby('Generation').agg(
    count=('Name', 'size'),
    mean_total=('Total', 'mean'),
    num_legendary=('Legendary', 'sum')
)

Unnamed: 0_level_0,count,mean_total,num_legendary
Generation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,166,426.813253,6
2,106,418.283019,5
3,160,436.225,18
4,121,459.016529,13
5,165,434.987879,15
6,82,436.378049,8


Which type combination (Type 1 + Type 2) has the highest average Attack?

In [695]:
df['Type_Combination'] = df['Type 1'] + ' - ' + df['Type 2'].fillna('None')
mean_attack_by_combination = df.groupby('Type_Combination')['Attack'].mean()
highest_attack_combination = mean_attack_by_combination.idxmax()
highest_mean_attack_comb = mean_attack_by_combination.max()

print(f"type combination with highest average Attack: '{highest_attack_combination}' with an average Attack of {highest_mean_attack_comb:.2f}.")
print("\nAll type combinations by average Attack (top 10):\n", mean_attack_by_combination.nlargest(10))

type combination with highest average Attack: 'Ground - Fire' with an average Attack of 180.00.

All type combinations by average Attack (top 10):
 Type_Combination
Ground - Fire         180.000000
Psychic - Dark        160.000000
Psychic - Fighting    160.000000
Bug - Fighting        155.000000
Dragon - Electric     150.000000
Rock - Dark           149.000000
Dragon - Ice          140.000000
Dragon - Flying       135.666667
Ground - Steel        135.000000
Normal - Fighting     132.000000
Name: Attack, dtype: float64


Which generation has the highest proportion of Legendary Pokémon?

In [696]:
gen_summary = df.groupby('Generation').agg(
    total_pokemon=('Name', 'size'),
    legendary_pokemon=('Legendary', 'sum')
)
gen_summary['proportion_legendary'] = gen_summary['legendary_pokemon'] / gen_summary['total_pokemon']
highest_prop_gen = gen_summary['proportion_legendary'].idxmax()
highest_prop_value = gen_summary['proportion_legendary'].max()

print(f"The generation with the highest proportion of Legendary Pokémon is Generation {highest_prop_gen} with a proportion of {highest_prop_value:.2f}.")
print("\nProportion of Legendary Pokémon per Generation:\n", gen_summary['proportion_legendary'].sort_values(ascending=False))

The generation with the highest proportion of Legendary Pokémon is Generation 3 with a proportion of 0.11.

Proportion of Legendary Pokémon per Generation:
 Generation
3    0.112500
4    0.107438
6    0.097561
5    0.090909
2    0.047170
1    0.036145
Name: proportion_legendary, dtype: float64



Which primary type has the largest variance in HP?

In [697]:
hp_variance_by_type = df.groupby('Type 1')['HP'].var()
largest_hp_variance_type = hp_variance_by_type.idxmax()
largest_hp_variance_value = hp_variance_by_type.max()

print(f"The primary type with the largest variance in HP is '{largest_hp_variance_type}' with a variance of {largest_hp_variance_value:.2f}.")
print("\nHP Variance per Primary Type:\n", hp_variance_by_type.sort_values(ascending=False))

The primary type with the largest variance in HP is 'Normal' with a variance of 1312.86.

HP Variance per Primary Type:
 Type 1
Normal      1312.861456
Ghost       1003.995968
Psychic      807.772556
Water        755.536599
Fighting     668.361823
Ground       658.563508
Dragon       566.221774
Fairy        556.360294
Ice          453.130435
Dark         444.294624
Rock         434.050740
Flying       428.250000
Poison       386.712963
Grass        380.896273
Fire         376.519985
Electric     299.515328
Bug          266.633419
Steel        257.410256
Name: HP, dtype: float64


Which primary type has the highest median Speed?

In [698]:
median_speed_by_type = df.groupby('Type 1')['Speed'].median()
highest_median_speed_type = median_speed_by_type.idxmax()
highest_median_speed_value = median_speed_by_type.max()

print(f"type 1 with the highest median Speed is '{highest_median_speed_type}' with a median Speed of {highest_median_speed_value:.2f}.")
print("\nMedian Speed per Primary Type:\n", median_speed_by_type.sort_values(ascending=False))

type 1 with the highest median Speed is 'Flying' with a median Speed of 116.00.

Median Speed per Primary Type:
 Type 1
Flying      116.0
Dragon       90.0
Electric     88.0
Psychic      80.0
Fire         78.5
Normal       71.0
Dark         70.0
Ground       65.0
Water        65.0
Poison       62.5
Ice          62.0
Ghost        60.5
Bug          60.0
Fighting     60.0
Grass        58.5
Rock         50.0
Steel        50.0
Fairy        45.0
Name: Speed, dtype: float64


Group Pokémon by whether they are Legendary or not. Compare:

- mean Total

- mean Attack

- mean Defense

- mean Speed

In [699]:
df.groupby('Legendary')[['Total', 'Attack', 'Defense', 'Speed']].mean()

Unnamed: 0_level_0,Total,Attack,Defense,Speed
Legendary,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,417.213605,75.669388,71.559184,65.455782
True,637.384615,116.676923,99.661538,100.184615


Show the top 5 strongest types by mean Total.

In [700]:
df.groupby('Type 1')['Total'].mean().nlargest(5)

Unnamed: 0_level_0,Total
Type 1,Unnamed: 1_level_1
Dragon,550.53125
Steel,487.703704
Flying,485.0
Psychic,475.947368
Fire,458.076923


Rank generations by their average Attack.

In [701]:
df.groupby('Generation')['Attack'].mean().sort_values(ascending=False)

Unnamed: 0_level_0,Attack
Generation,Unnamed: 1_level_1
4,82.867769
5,82.066667
3,81.625
1,76.638554
6,75.804878
2,72.028302


Show the top 10 fastest Pokémon using nlargest(10, "Speed").

In [702]:
df.nlargest(10, 'Speed')

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Type_Combination
431,386,DeoxysSpeed Forme,Psychic,,600,50,95,90,95,90,180,3,True,Psychic - None
315,291,Ninjask,Bug,Flying,456,61,90,45,50,50,160,3,False,Bug - Flying
71,65,AlakazamMega Alakazam,Psychic,,590,55,50,65,175,95,150,1,False,Psychic - None
154,142,AerodactylMega Aerodactyl,Rock,Flying,615,80,135,85,70,95,150,1,False,Rock - Flying
428,386,DeoxysNormal Forme,Psychic,,600,50,150,50,150,50,150,3,True,Psychic - None
429,386,DeoxysAttack Forme,Psychic,,600,50,180,20,180,20,150,3,True,Psychic - None
19,15,BeedrillMega Beedrill,Bug,Poison,495,65,150,40,15,80,145,1,False,Bug - Poison
275,254,SceptileMega Sceptile,Grass,Dragon,630,70,110,75,145,85,145,3,False,Grass - Dragon
678,617,Accelgor,Bug,,495,80,70,40,100,60,145,5,False,Bug - None
109,101,Electrode,Electric,,480,60,50,70,80,80,140,1,False,Electric - None
