In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.

Let's look first at the data given by the dataset.

In [None]:
pokeTable = pd.read_csv('../input/Pokemon.csv')
pokeTable.head(10)

Next, let's fix the table so that it looks clean by removing the unnecessary names before 'Mega' and by removing the '#' column.

In [None]:
pokeTable = pokeTable.set_index('Name')

pokeTable.index = pokeTable.index.str.replace(".*(?=Mega)", "")

pokeTable.head(10)

In [None]:
pokeTable = pokeTable.drop(['#'], axis = 1)

pokeTable.head(10)

Now that the table looks clean, let's begin with our research questions.

> **Research Question 1:** *What is the most common type of Pokemon?*

In [None]:
# collects all types of pokemon
type1 = [pokeTable['Type 1'].value_counts()[key] for key in pokeTable ['Type 1'].value_counts().index]
type2 = [pokeTable['Type 2'].value_counts()[key] for key in pokeTable ['Type 1'].value_counts().index]

cords = np.arange(len(pokeTable['Type 1'].value_counts().index))

# width distance
width = .45

#random color 1 & 2
clr1 = np.random.rand(4)
clr2 = np.random.rand(4)

#bar graph
hndl = [patches.Patch(color = clr1, label = 'Type 1'), patches.Patch(color = clr2, label = 'Type 2')]
plt.bar(cords, type1, width, color = clr1)
plt.bar(cords + width, type2, width, color = clr2)
plt.gca().set_xticklabels(pokeTable['Type 1'].value_counts().index)
plt.gca().set_xticks(cords + width)
plt.xticks(rotation = 90)
plt.legend(hndl = hndl)

Based on the graph above, Water is the most common type of Pokemon (as of Type 1), and Normal type is next to Water. Flying, on the other hand, is the most common (as of Type 2).

> **Research Question 2:** *What are the statistical analysis of all Pokemon types?*

In [None]:
# This boxplot represents the minimum, median, and maximum base stats of all the pokemon types
pokeTableBP = pokeTable.drop(['Generation','Total'], axis = 1)
sns.boxplot(data = pokeTableBP)
plt.ylim(0,300)
plt.show()

For the boxplot above, we can see the minimum, maximum, and the median of the base stats of all Pokemon present in the .csv data (excluding Legendary as that determines whether or not a Pokemon is legendary).

In [None]:
# The boxplot explains the stats distribution of type 1 pokemon based on their attack
plt.subplots(figsize = (15,5))
plt.title('Attack Stats of Type 1 Pokemons')
sns.boxplot(x = 'Type 1', y = 'Attack', data = pokeTableBP)
plt.ylim(0,200)
plt.show()

In [None]:
# The boxplot explains the stats distribution of type 2 pokemon based on their attack
plt.subplots(figsize = (15,5))
plt.title('Attack Stats of Type 2 Pokemons')
sns.boxplot(x = 'Type 2', y = 'Attack', data = pokeTableBP)
plt.ylim(0,200)
plt.show()

Comparing the two types of pokemon, the Dragon type of type 1 pokemons seem to have the upper hand comparing to all other types and in type 2, Fighting type leads among all types.

In [None]:
# The boxplot explains the stats distribution of type 1 pokemon based on their defense
plt.subplots(figsize = (15,5))
plt.title('Defense Stats of Type 1 Pokemons')
sns.boxplot(x = 'Type 1', y = 'Defense', data = pokeTableBP)
plt.ylim(0,250)
plt.show()

In [None]:
# The boxplot explains the stats distribution of type 2 pokemon based on their defense
plt.subplots(figsize = (15,5))
plt.title('Defense Stats of Type 2 Pokemons')
sns.boxplot(x = 'Type 2', y = 'Defense', data = pokeTableBP)
plt.ylim(0,250)
plt.show()

Comparing the two types of pokemon under the Defensive stat, Steel leads on Type 1 and it's a tie between Rock and Ground types for Type 2. However, if you are looking at it on the quantity scale, Ground type leads. If you are looking at it on a quality scale, Rock type leads.

In [None]:
# The boxplot explains the stats distribution of type 1 pokemon based on their speed
plt.subplots(figsize = (15,5))
plt.title('Speed Stats of Type 1 Pokemons')
sns.boxplot(x = 'Type 1', y = 'Speed', data = pokeTableBP)
plt.ylim(0,200)
plt.show()

In [None]:
# The boxplot explains the stats distribution of type 2 pokemon based on their speed
plt.subplots(figsize = (15,5))
plt.title('Speed Stats of Type 2 Pokemons')
sns.boxplot(x = 'Type 2', y = 'Speed', data = pokeTableBP)
plt.ylim(0,200)
plt.show()

Comparing the two types of pokemon under the Speed stat, in Type 1; Quality wise, Flying is the best. Quantity wise, Psychic is the best among all other types. For Type 2, Flying leads in both quality and quantity wise.

After analyzing the Attack, Defense, and Speed stat of all Pokemon (Basic stats, Sp. Atk, Sp. Def and others are not included anymore), there lies another question.

> **Research Question 3:** *From the data given above, which generation is the strongest?*

In [None]:
plt.subplots(figsize = (20,10))
plt.title('Strongest Generation of Pokemon')
sns.violinplot(x = "Generation", y = "Total", data = pokeTable) #Kinda like boxplot but in a different perspective
plt.show()

Looking at the generations of Pokemon, it's a tight battle between Generation 3 and Generation 1. However, if you carefully inspect the higher parts of the spectrum, Generation 3 contains more powerful Pokemon compared to Generation 1 even though Generation 1 leads in the 500 power level category.

To summarize our data, let's look back at our answers in our **3 Research Data Questions** that we have:

> *Legend:
> T1 = Type 1
> T2 = Type 2*

**1.  What is the most common type of Pokemon?**

**A1:** Water.

**2. What are the statistical analysis of all Pokemon types?**

**A2:** There are 3 main stats that we looked into (Attack, Defense, and Speed) and for Attack, Dragon(T1) and Fighting(T2) leads. For Defensive stats, Steel(T1) and a tie between Rock(T2) and Ground(T2) leads. Lastly, for the Speed stats, Flying(T1) and Psychic(T1) leads in their respective categories, and Flying(T2) leads for the other.

**3. From the statistics given for Research Question 2, which Generation of Pokemon is the strongest?**

**A3:** Generation 3.