In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

In this file, we'll look at the statistics of pokemon and make inferences about the following:
1. Which type is the most commong? Which is the least common?
2. Which type is great offensively: Attack and Special Attack
3. Which type is great defensively: Defense and Special Defense
4. Which type has great speed?
5. Which is the most common secondary typing?
6. Which typing combination is the most common?

Because I've not played any of the gen 6 or 7 games, I'm going to restrict this analysis to just the first 5 generations. In the second edition of this analysis, I'll look at exclusively at the gen 6 and 7 games, so that I can be ready with a strategy for gen 8. Muwhahahahah....! 

Here's the workflow:
1. Import data
2. Select required subset of data.
3. Answer each question
Simple. Understandable. 

In [None]:
# import data
pokemon = pd.read_csv("../input/pokemon.csv", sep = ",", encoding = "ISO-8859-1")

In [None]:
# basic information about the data
pokemon.info()

Looks like we need to rename a variable and treat some missing values. Let's do that. 

In [None]:
# renaming the columns
pokemon.rename(index = str, columns = {"#": "Id"}, inplace = True)

There's a missing value in the name field. Let's identify where this value is.

In [None]:
# Id of the missing name
pokemon['Id'][pokemon["Name"].isnull()]

Id is 63. This list is sorted in the order of the pokedex number. So, if we look at the pokemon above and below it, we should be able to identify which pokemon this is.

In [None]:
pokemon.loc[pokemon["Id"].isin(list(range(62, 64))), :] 
# the pokemon is primeape, mankey's evolved form

In [None]:
# treating the missing value in the Name field.
pokemon.loc[pokemon.Name.isnull(), "Name"] = "Primeape"

The next bit is pretty important. I've not played any of the gen 6 or 7 games. So, I'm not really interested in the battles between pokemon from those generations. (Well, I can't wait for the gen 8 games to come out. I'm waiting to get my hands on a switch and start playing). 

So, I want to restrict my analysis to those pokemon in gen 1 to gen 5. Let's filter out those pokemon.

To go along with that, I'm going to filter out any legendary pokemon too. I mean, they're cool and all that. The core series games wouldn't be as cool as they are without them. But, in battles, legendary pokemon are a let down. They're stats are all too high (guess why they're called legendary...) and therefore they bore out the battle. 

In this analysis, I don't want to consider any legendary pokemon.

The next thing is removing traits of gen 6. There were two key changes introduced in gen 6: The addition of a new typ (fairy) and the introduction of Mega evolutions.

The fairy type was introduced to balance out the ultra powerful dragon type pokemon and the mega evolutions gave a status boost of almost a 100 points to the pokemon that got the ability. As amazing as they are, they're pretty useless when it comes to this analysis. They just gotta go.

Here are the changes that I want to make:
1. Filter out gen 6.
2. Change the Fairy types of the pokemon introduced before gen 6 to Normal (or their respective types)
3. Filter out the mega evolutions.

Let's get started.

In [None]:
# filtering out the gen 6 and 7 pokemon
analysis_set = pokemon[pokemon["Generation"] < 6]

In [None]:
analysis_set = analysis_set.loc[analysis_set["Legendary"] == False, :]

In [None]:
# changing the type fairy to normal
analysis_set.loc[analysis_set["Type 1"] == "Fairy", "Type 1"] = "Normal"
analysis_set.loc[analysis_set["Type 2"] == "Fairy", "Type 2"] = np.nan

In [None]:
# filtering out the mega evolutions
analysis_set = analysis_set[~analysis_set.Name.str.contains("Mega")]

Now that we have the data ready, we can proceed to the analysis.

# 1. Which type is the most commong? Which is the least common?
For this, we can pokemon dataset. So, let's import that dataset and take care of this.

Actually, instead of importing that dataset again for just one analysis, we can get the job done right here, using the combats dataset.

In [None]:
# since we'll be reusing this groupby, let's store it.
type1_grp = analysis_set.groupby("Type 1")

In [None]:
# most common and least common types
type1_grp["Id"].count().sort_values(ascending = False)

The most common types seem to be water and normal. This is just based on type 1 though. Flying is the least common.

# 2. Which type is great offensively: Attack and Special Attack
So, let's do a two way answer. The type that has the greatest overall attack. We'll do this for both attack and special attack. 

First it's attack.

In [None]:
type1_grp["Attack"].agg(["min", "mean", "max"]).sort_values(by = ["min", "mean", "max"], ascending = False)

That dragon pokemon with that montrous attack, it must be Haxorus. If you don't know what that is, go take a look at it.

The type with the highest attack stat is rock. This is pretty intuitive as the type is super effective agains four other types: bug, ice, fire and flying. 

In [None]:
# Special Attack
type1_grp["Sp. Atk"].agg(["min", "mean", "max"]).sort_values(["mean", "max"], ascending = False)

As expected, this list is topped by psychic: a super powerful type that has a lot of special moves. The second and third place is pretty evident as all three types are known for their speed and amazing special attack stats.

# 3. Which type is the best defensively?
Here again, we'll look at the defense and special defense stats separately. Predictions: steel is the top for defense and special defense.

In [None]:
# defense
type1_grp["Defense"].agg(["min", "mean", "max"]).sort_values(["mean", "max"], ascending = False)

In [None]:
# special defense
type1_grp["Sp. Def"].agg(["min", "mean", "max"]).sort_values(["mean", "max"], ascending = False)

Our prediction was spot on for defense and off by 2 places for special defense. Turns out that the psychic type is pretty powerful defensively too.

That monster stat of 230 in both the lists, belongs to a bug pokemon called Shuckle. It's gotta be super awesome, right?

Nope. Here's why...

In [None]:
# shuckle
analysis_set.loc[pokemon["Name"] == "Shuckle", :]

Sadly, it's other stats suck. It can do nothing with attack stats of 10 and a practically non-existant speed.

In [None]:
# fastest type
type1_grp["Speed"].agg(["min", "mean", "max"]).sort_values(["mean", "max"], ascending = False)

Yep! It's electric alright! The fastes pokemon though, seems to be  from bug. Let's look at which pokemon that is...

In [None]:
# the fastest pokemon
analysis_set.loc[analysis_set["Speed"] == 160, :]

This pokemon, the evolved form of Nincada, is wicked. To top off it's already amazing speed stat, it has an ability called speed boost which raises it's speed at the end of every turn. Give this pokemon a few swords dances and it can try to become a sweeper. Still, it's a pipe dream with those defense stats...

# 5. Which is the most common secondary type?

In [None]:
# most common secondary type
analysis_set.groupby("Type 2")["Id"].count().sort_values(ascending = False) # Flying it is...

# 6. Which is the most common typing?
Typing is the combination of the first and second types. Let's look at which is the most common typing. Predictin: Normal and flying or grass and poison.

In [None]:
# most common typing
analysis_set.groupby(["Type 1", "Type 2"])["Id"].count().sort_values(ascending = False)

Our prediction was right. It is normal and flying. 

The electric type's only weakness is ground. Let's see which pokemon has this typing...

In [None]:
# the ground-electric pokemon
analysis_set[(analysis_set["Type 1"] == "Ground") & (analysis_set["Type 2"] == "Electric")]

Stunfisk. Cool. Even though it has a pretty cool typing, this typing means that it has weaknesses to water, ice and grass while it's immune to ground. Still, mediocre stats at best.

This is the end of this kernel. This is a not an exhaustive analysis. But, this is a starting point. The main aim of this analysis was nothing other than practice Pandas. I'm pretty new to Pandas.

However, this is also a starting point on how to think about team building. If you're new to Pokemon (which is pretty surprising), then this might give you some direction on how to build teams. I'll update this kernel so that it includes more strategies on team building. I'll keep updating it. If this helped you, or you enjoyed reading it, awesome! I'm glad you liked it. Thanks!