# Visualization of Pokemon dataset using Pandas and Seaborn
![Banner](https://cdn.custom-cursor.com/collections/129/cover-pokemon-preview.png)

In [None]:
%pylab inline

In [None]:
import pandas as pd   
import seaborn as sns

In [None]:
plt.style.use('bmh')
plt.rcParams['figure.dpi'] = 100

Now we can load the dataset, let's name the dataframe as `pokedata` and take a look at the first and last 10 rows of the dataset to get a general knowledge of the data.

In [None]:
pokedata = pd.read_csv('Pokemon_all.csv')

In [None]:
pokedata.head(10)   

In [None]:
pokedata.tail(10)

## Cleaning the dataset
If we look carefully at the 10 rows in the dataset above, we can see some problems in the dataset.

  - Some Pokemon have `NaN` values (null values) in the column `Type 2`
  - Some Pokemon have multiple forms and those forms are included in this dataset
  - In gen 7, the Pokemon type doesn't start with capital letter like all gen before it, Pandas will count this as a different type

We need to do some cleaning in the dataset before it is ready to use.

Now let's capitalize only the first letter of the Pokemon type.

In [None]:
pokedata['Type 1'] = pokedata['Type 1'].str.capitalize()
pokedata['Type 2'] = pokedata['Type 2'].str.capitalize()

Let's remove the duplicate Pokemon

In [None]:
pokedata.drop_duplicates('#', keep='first', inplace=True)

Some Pokemon doesn't have secondary type so they have `NaN` (null values) in the `Type 2` column. Let's fill in the null values in the `Type 2` column by replacing it with value `None`

In [None]:
pokedata['Type 2'].fillna(value='None', inplace=True)

Now set the `#` column to be index of `pokedata`

In [None]:
pokedata.set_index('#', inplace=True)

Now let's take a look at the first and last 10 rows of the dataset one more time

In [None]:
pokedata.head(10)  

In [None]:
pokedata.tail(10)

The data is much cleaner and now it's ready to use. Now we can do some analysis and visualization.

## Pokemon count in each generation
First, let's verify how many Pokemon are there in this dataset

In [None]:
pokedata['Name'].count()

Now let's see how Pokemons are distributed in each generation.

In [None]:
sns.countplot(
    x='Generation', 
    data=pokedata,
);

## Legendary Pokemon count
Legendary Pokemon are a group of incredibly rare and often very powerful Pokemon, generally featured prominently in the legends and myths of the Pokémon world.

We'll simplify the categorization and count the mythical Pokemon and the ultra beasts as a legendary Pokemon. First, let's take a look at how rare they are, and then we can visualize the distribution between legendary and non legendary Pokemon.

In [None]:
pokedata['Legendary'].value_counts()

Now let's see how they are distributed in each generation.

In [None]:
sns.countplot(
    x='Generation', 
    data=pokedata,
    hue='Legendary',
);

I initially thought that the number of legendary pokemon always correlate to the number of pokemon in that gen, looks like that wasn't the case. There doesn't seem to be any noticeable trend either.

## Pokemon Type Distribution
There are 18 types of Pokemon in total as of generation 7. Some Pokemon have only 1 type, while other have secondary type. For example, Charmander is a Fire type, while Bulbasaur is both a Grass type and a Poison type

First, let's take a look of all the 18 types

In [None]:
pokedata['Type 1'].unique()

Now let's see what is the most common primary and secondary type of Pokemon.

In [None]:
fig, (ax1, ax2) = subplots(1,2, figsize=(14, 6))
pokedata['Type 1'].value_counts().plot(kind='pie', autopct='%.1f%%', pctdistance=0.85, ax=ax1)
pokedata['Type 2'].value_counts().plot(kind='pie', autopct='%.1f%%', pctdistance=0.85, ax=ax2);

We can already see which type of Pokemon is the most and least common, but pie chart is not the ideal choice if the number of slices is too many, so let's just use barplot.

In [None]:
fig, (ax1, ax2) = subplots(1, 2, figsize=(14, 7))
sns.countplot(
    y='Type 1',
    data=pokedata,
    order=pokedata['Type 1'].value_counts().index,
    color='green',
    ax=ax1,
).set_xlabel('# of Pokemon')

sns.countplot(
    y='Type 2',
    data=pokedata,
    order=pokedata['Type 2'].value_counts().index,
    color='purple',
    ax=ax2
).set_xlabel('# of Pokemon');

There are lots of information that can be derived from the above charts, some of the interesting things are:

  - Almost half of all Pokemon don't have secondary type.
  - While Flying is the most common secondary type, it is the least common primary type. It kind of make sense if you think about it, when you see Moltres, the first thing that comes to your mind would be Fire rather than Flying. Or when you see Dragonite, you'll always identify him as a Dragon-type creature rather than Flying-type creature.
  - Water, Normal, and Grass as the most common primary type is to be expected, but I didn't expect Psychic type Pokemon to be that common.

## Pokemon type combinations

We've already seen what is the most and least common type of Pokemon, it will be also interesting to see all the type combination of the Pokemon, note that we will not include Pokemon that doesn't have secondary type

In [None]:
subplots(figsize=(10, 10))
sns.heatmap(
    pokedata[pokedata['Type 2']!='None'].groupby(['Type 1', 'Type 2']).size().unstack(),
    linewidths=1,
    annot=True,
    cmap="Blues"
);
# fix for mpl bug that cuts off top/bottom of seaborn viz
ylim(*add(ylim(), [0.5, -0.5]));

Normal/Flying, Grass/Poison, Bug/Flying and Bug/Poison are the top 4 combination for dual-type Pokemon.