# Pokémon Generations Analysis

[Pokémon](https://en.wikipedia.org/wiki/Pok%C3%A9mon) is a multimedia japanese franchise firtly developed as a video game for the original game boy in 1996. The following report uses data made availiable by [Keith Galli](https://github.com/KeithGalli) that contains information about all the pokémons from the 6 first generations. This report aims to answer the following questions:

* What is the most usual and rare type of pokémon for each generation?
* What is the strongest and weakest pokémon generation?
* What is the strongest and weakest pokémon for each generation?
* Can pokémons be grouped according to their characteristics?



## Setting up the data

We start our analysis by loading and getting some general information about the data.


In [16]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("https://raw.githubusercontent.com/KeithGalli/pandas/master/pokemon_data.csv")

df.describe(include='all')

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
count,800.0,800,800,414,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800
unique,,800,18,18,,,,,,,,2
top,,Quilava,Water,Flying,,,,,,,,False
freq,,1,112,97,,,,,,,,735
mean,362.81375,,,,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375,
std,208.343798,,,,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129,
min,1.0,,,,1.0,5.0,5.0,10.0,20.0,5.0,1.0,
25%,184.75,,,,50.0,55.0,50.0,49.75,50.0,45.0,2.0,
50%,364.5,,,,65.0,75.0,70.0,65.0,70.0,65.0,3.0,
75%,539.25,,,,80.0,100.0,90.0,95.0,90.0,90.0,5.0,


We have 12 columns with auto descriptive names and 800 rows. The **Name**, **Type 1** and **Type 2** columns contain text values. The **Legendary** column contains boolean values. The rest of the columns are numerical. Let us check the first rows of the dataframe.

In [17]:
df.head()

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,39,52,43,60,50,65,1,False


The **#** columns represents the pokedex number of the pokémon. We have some *Mega* pokémons, which are just a stronger version of specific pokémons. We will be removing them from the data.

In [18]:
df = df[~df['Name'].str.contains("Mega")].reset_index(drop=True)
df.head()

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,4,Charmander,Fire,,39,52,43,60,50,65,1,False
4,5,Charmeleon,Fire,,58,64,58,80,65,80,1,False


Now that we removed the *Mega* pokémon from the data, let us see how many pokémons we have got left and save the tidy data in a csv file.

In [19]:
print("Number of pokémons:", df['Name'].nunique())
df.to_csv("pokemon.csv")

Number of pokémons: 751


## Pokémon Types

Each pokémon has mandatorily one and a maximum of two types. Let us see how many different types we have accross all generations.

In [20]:
ntypes = pd.concat([df['Type 1'], df['Type 2']]).nunique()
print("Number of distinct pokémon types:", ntypes)
types = pd.concat([df['Type 1'], df['Type 2']]).unique()
print("Pokémon types:",types)

Number of distinct pokémon types: 18
Pokémon types: ['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'
 'Fairy' 'Fighting' 'Psychic' 'Rock' 'Ghost' 'Ice' 'Dragon' 'Dark' 'Steel'
 'Flying' nan]


We have 18 different types of pokémons. The *nan* value appears because not all pokémons have two types. Let us see how many pokémons we have for each type.

In [None]:
df_pokemon_by_type = df.groupby()