# Plotly Tutorial

# Context
With the rise of the popularity of machine learning, this is a good opportunity to share a wide database of the even more popular video-game Pokémon by Nintendo, Game freak, and Creatures, originally released in 1996.


Pokémon started as a Role Playing Game (RPG), but due to its increasing popularity, its owners ended up producing many TV series, manga comics, and so on, as well as other types of video-games (like the famous Pokémon Go!).


This dataset is focused on the stats and features of the Pokémon in the RPGs. Until now (08/01/2017) seven generations of Pokémon have been published. All in all, this dataset does not include the data corresponding to the last generation, since 1) I created the databased when the seventh generation was not released yet, and 2) this database is a modification+extension of the database "721 Pokemon with stats" by Alberto Barradas (https://www.kaggle.com/abcsds/pokemon), which does not include (of course) the latest generation either.

# Content
This database includes 21 variables per each of the 721 Pokémon of the first six generations, plus the Pokémon ID and its name. These variables are briefly described next:

- Number. Pokémon ID in the Pokédex.
- Name. Name of the Pokémon.
- Type_1. Primary type.
- Type_2. Second type, in case the Pokémon has it.
- Total. Sum of all the base stats (Health Points, Attack, Defense, Special Attack, Special Defense, and Speed).
- HP. Base Health Points.
- Attack. Base Attack.
- Defense. Base Defense.
- Sp_Atk. Base Special Attack.
- Sp_Def. Base Special Defense.
- Speed. Base Speed.
- Generation. Number of the generation when the Pokémon was introduced.
- isLegendary. Boolean that indicates whether the Pokémon is Legendary or not.
- Color. Color of the Pokémon according to the Pokédex.
- hasGender. Boolean that indicates if the Pokémon can be classified as female or male.
- Pr_male. In case the Pokémon has Gender, the probability of its being male. The probability of being female is, of course, 1 minus this value.
- Egg_Group_1. Egg Group of the Pokémon.
- Egg_Group_2. Second Egg Group of the Pokémon, in case it has two.
- hasMegaEvolution. Boolean that indicates whether the Pokémon is able to Mega-evolve or not.
- Height_m. Height of the Pokémon, in meters.
- Weight_kg. Weight of the Pokémon, in kilograms.
- Catch_Rate. Catch Rate.
- Body_Style. Body Style of the Pokémon according to the Pokédex.

## Notes
Please note that many Pokémon are multi-form, and also some of them can Mega-evolve. I wanted to keep the structure of the dataset as simple and general as possible, as well as the Number variable (the ID of the Pokémon) unique. Hence, in the cases of the multi-form Pokémon, or the ones capable of Mega-evolve, I just chose one of the forms, the one I (and my brother) considered the standard and/or the most common. The specific choice for each of this Pokémon are shown below:

Mega-Evolutions are not considered as Pokémon.
- Kyogre, Groudon. Primal forms not considered.
- Deoxis. Only normal form considered.
- Wormadam. Only plant form considered.
- Rotom. Only normal form considered, the one with types Electric and Ghost.
- Giratina. Origin form considered.
- Shaymin. Land form considered.
- Darmanitan. Standard mode considered.
- Tornadus, Thundurus, Landorus. Incarnate form considered.
- Kyurem. Normal form considered, not white or black forms.
- Meloetta. Aria form considered.
- Mewstic. Both female and male forms are equal in the considered variables.
- Aegislash. Shield form considered.
- Pumpkaboo, Gourgeist. Average size considered.
- Zygarde. 50% form considered.
- Hoopa. Confined form considered.

In [33]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns

import plotly
import plotly.plotly as py
import plotly.graph_objs as go
plotly.offline.init_notebook_mode(connected=True)

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

pd.options.display.max_columns = 25

In [3]:
df = pd.read_csv('pokemon_alopez247.csv')
df.sample(5)

Unnamed: 0,Number,Name,Type_1,Type_2,Total,HP,Attack,Defense,Sp_Atk,Sp_Def,Speed,Generation,isLegendary,Color,hasGender,Pr_Male,Egg_Group_1,Egg_Group_2,hasMegaEvolution,Height_m,Weight_kg,Catch_Rate,Body_Style
503,504,Patrat,Normal,,255,45,55,39,35,39,42,5,False,Brown,True,0.5,Field,,False,0.51,11.6,255,quadruped
438,439,Mime_Jr.,Psychic,Fairy,310,20,25,45,70,90,60,4,False,Pink,True,0.5,Undiscovered,,False,0.61,13.0,145,bipedal_tailless
665,666,Vivillon,Bug,Flying,411,80,52,50,90,50,89,6,False,Black,True,0.5,Bug,,False,1.19,17.0,45,four_wings
344,345,Lileep,Rock,Grass,355,66,41,77,61,87,23,3,False,Purple,True,0.875,Water_3,,False,0.99,23.8,45,head_base
395,396,Starly,Normal,Flying,245,40,55,30,30,30,60,4,False,Brown,True,0.5,Flying,,False,0.3,2.0,255,two_wings


In [21]:
# some simple data characteristics
print "{} different pokemon with {} different types from {} generations".format(len(df['Name'].unique()), len(df['Type_1'].unique()), max(df['Generation']))
print "Among them there are {} legendaries".format(np.sum(df['isLegendary']))

721 different pokemon with 18 different types from 6 generations
Among them there are 46 legendaries


In [20]:
# tells what columns contain None type 
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 721 entries, 0 to 720
Data columns (total 23 columns):
Number              721 non-null int64
Name                721 non-null object
Type_1              721 non-null object
Type_2              350 non-null object
Total               721 non-null int64
HP                  721 non-null int64
Attack              721 non-null int64
Defense             721 non-null int64
Sp_Atk              721 non-null int64
Sp_Def              721 non-null int64
Speed               721 non-null int64
Generation          721 non-null int64
isLegendary         721 non-null bool
Color               721 non-null object
hasGender           721 non-null bool
Pr_Male             644 non-null float64
Egg_Group_1         721 non-null object
Egg_Group_2         191 non-null object
hasMegaEvolution    721 non-null bool
Height_m            721 non-null float64
Weight_kg           721 non-null float64
Catch_Rate          721 non-null int64
Body_Style          721 non-

In [29]:
df.drop(['Number'], axis=1).describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Total,721.0,417.945908,109.663671,180.0,320.0,424.0,499.0,720.0
HP,721.0,68.380028,25.848272,1.0,50.0,65.0,80.0,255.0
Attack,721.0,75.01387,28.984475,5.0,53.0,74.0,95.0,165.0
Defense,721.0,70.808599,29.296558,5.0,50.0,65.0,85.0,230.0
Sp_Atk,721.0,68.737864,28.788005,10.0,45.0,65.0,90.0,154.0
Sp_Def,721.0,69.291262,27.01586,20.0,50.0,65.0,85.0,230.0
Speed,721.0,65.714286,27.27792,5.0,45.0,65.0,85.0,160.0
Generation,721.0,3.323162,1.669873,1.0,2.0,3.0,5.0,6.0
Pr_Male,644.0,0.553377,0.199969,0.0,0.5,0.5,0.5,1.0
Height_m,721.0,1.144979,1.044369,0.1,0.61,0.99,1.4,14.5


### Scatter Plot

In [46]:
legendary = go.Scatter(x = df[df['isLegendary']==True]['Attack'], y = df[df['isLegendary']==True]['HP'], name = 'Legendary',
                      mode='markers',marker=dict(size=6,color='red',line=dict(color='red',width=0.5),opacity=0.4))
noLegendary = go.Scatter(x = df[df['isLegendary']==False]['Attack'], y = df[df['isLegendary']==False]['HP'], name = 'Not Legendary',
                        mode='markers',marker=dict(size=6,color='blue',line=dict(color='blue',width=0.5),opacity=0.4))

data = [legendary, noLegendary]
layout = go.Layout(
    title = 'Attack vs HP',
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=30
    )
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig,filename='scatterplot');

In [55]:
fire = go.Scatter(x = df[df['Type_1']=='Fire']['Attack'], y = df[df['Type_1']=='Fire']['HP'], name = 'Legendary',
                      mode='markers+text',marker=dict(size=6,color='red',line=dict(color='red',width=0.5),opacity=0.4))
grass = go.Scatter(x = df[df['Type_1']=='Grass']['Attack'], y = df[df['Type_1']=='Grass']['HP'], name = 'Not Legendary',
                        mode='markers+text',marker=dict(size=6,color='blue',line=dict(color='blue',width=0.5),opacity=0.4))

data = [fire, grass]
layout = go.Layout(
    title = 'Attack vs HP',
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=30
    )
)
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig,filename='scatterplot');

## Histogram

In [74]:
types=go.Histogram(
    x=df['Type_1'], 
    opacity=0.75,
    marker=dict(color='##AGD7E0')
)

data = [types]
layout = go.Layout(title='Types',
                  #bargroupgap=0,
                  #bargap=0.2
                  )
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig,filename='histogram');

## Boxplots

In [112]:
types = ['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison']

y_data = [df[df['Type_1']=='Grass']['Attack'],df[df['Type_1']=='Fire']['Attack'],df[df['Type_1']=='Water']['Attack'],df[df['Type_1']=='Bug']['Attack'],df[df['Type_1']=='Normal']['Attack'],df[df['Type_1']=='Poison']['Attack']]

In [127]:
traces = []
for t, y_d in zip(types,y_data):
    traces.append(go.Box(
        y=y_d,
        name=t,
        boxpoints='all',
        jitter=0.5,
        whiskerwidth=0.2,
        marker=dict(
            size=2
        ),
        boxmean='sd',
        line=dict(width=1)
    ))
    
layout = go.Layout(
    title='Attack Power by Several Types of Pokemon',
    yaxis=dict(
        autorange=True,
        showgrid=True,
        zeroline=True,
        dtick=10,
        gridcolor='rgb(255, 255, 255)',
        gridwidth=1,
        zerolinecolor='rgb(255, 255, 255)',
        zerolinewidth=2,
    ),
    margin=dict(
        l=40,
        r=30,
        b=40,
        t=40,
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
    showlegend=False
)
fig = go.Figure(data=traces, layout=layout)
plotly.offline.iplot(fig)

Acknowledgements

As said at the beginning, this database was based on the Kaggle database "721 Pokemon with stats" by Alberto Barradas (https://www.kaggle.com/abcsds/pokemon). The other resources I mainly used are listed below:

WikiDex (http://es.pokemon.wikia.com/wiki/WikiDex).

Bulbapedia, the community driven Pokémon encyclopedia (http://bulbapedia.bulbagarden.net/wiki/Main_Page).

Smogon University (http://www.smogon.com/).

Possible future work
This dataset can be used with different objectives, such as, Pokémon clustering, trying to find relations or dependencies between the variables, and also for supervised classification purposes, where the class could be the Primary Type, but also many of the other variables.

Author
Asier López Zorrilla