<a href="https://colab.research.google.com/github/mblackstock/notebooks/blob/main/notebooks/Pokemon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Pokemon dataset and analysis.

We want to find out which Pokemon win the most!

References
*   https://www.kaggle.com/rounakbanik/pokemon
*   https://www.kaggle.com/mmetter/pokemon-data-analysis-tutorial
*   https://www.kaggle.com/rtatman/which-pokemon-win-the-most/notebook


In [2]:
!rm -rf notebooks
!git clone https://github.com/mblackstock/datasets.git

Cloning into 'notebooks'...
remote: Enumerating objects: 259, done.[K
remote: Counting objects: 100% (259/259), done.[K
remote: Compressing objects: 100% (249/249), done.[K
remote: Total 259 (delta 42), reused 171 (delta 3), pack-reused 0[K
Receiving objects: 100% (259/259), 15.64 MiB | 10.81 MiB/s, done.
Resolving deltas: 100% (42/42), done.


In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib as plt # data visualization

import seaborn as sns #data visualization
import random 

random.seed(1)
# Import the dataset
pokemon = pd.read_csv("../datasets/pokemon/pokemon.csv")
# rename the column with a pound sign/hashtag as "number" its name
# The reason for this is when  we try and access this column later it will comment out the code
pokemon = pokemon.rename(index=str, columns={"#": "Number"})
combat = pd.read_csv("../datasets/pokemon/combats.csv")
pokemon.loc['230']

Number            231
Name          Shuckle
Type 1            Bug
Type 2           Rock
HP                 20
Attack             10
Defense           230
Sp. Atk            10
Sp. Def           230
Speed               5
Generation          2
Legendary       False
Name: 230, dtype: object

In [2]:
combat.head()

Unnamed: 0,First_pokemon,Second_pokemon,Winner
0,266,298,298
1,702,701,701
2,191,668,668
3,237,683,683
4,151,231,151


In [3]:
print("Dimenstions of Pokemon: " + str(pokemon.shape))
print("Dimenstions of Combat: " + str(combat.shape))

Dimenstions of Pokemon: (800, 12)
Dimenstions of Combat: (50000, 3)


Lets look for some missing data.  Why are there missing type2 entries?  Why are there missing heights?

In [4]:
pokemon.isnull().sum()

Number          0
Name            1
Type 1          0
Type 2        386
HP              0
Attack          0
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64

In [5]:
# calculate the win % of each pokemon 
# add the calculation to the pokemon dataset 
total_Wins = combat.Winner.value_counts()
total_Wins


163    152
438    136
154    136
428    134
314    133
      ... 
189      5
639      4
237      4
190      3
290      3
Name: Winner, Length: 783, dtype: int64

In [6]:
# get the number of wins for each pokemon
numberOfWins = combat.groupby('Winner').count()
numberOfWins

Unnamed: 0_level_0,First_pokemon,Second_pokemon
Winner,Unnamed: 1_level_1,Unnamed: 2_level_1
1,37,37
2,46,46
3,89,89
4,70,70
5,55,55
...,...,...
796,39,39
797,116,116
798,60,60
799,89,89


In [7]:
#both methods produce the same results
countByFirst = combat.groupby('Second_pokemon').count()
countBySecond = combat.groupby('First_pokemon').count()
countByFirst



Unnamed: 0_level_0,First_pokemon,Winner
Second_pokemon,Unnamed: 1_level_1,Unnamed: 2_level_1
1,63,63
2,66,66
3,64,64
4,63,63
5,62,62
...,...,...
796,56,56
797,67,67
798,59,59
799,69,69


In [8]:
print("Looking at the dimensions of our dataframes")
print("Count by first winner shape: " + str(countByFirst.shape))
print("Count by second winner shape: " + str(countBySecond.shape))
print("Total Wins shape : " + str(total_Wins.shape))

Looking at the dimensions of our dataframes
Count by first winner shape: (784, 2)
Count by second winner shape: (784, 2)
Total Wins shape : (783,)


Since the total wins has fewer rows than the first and second winner shape, one of the pokemon never won.

In [9]:
find_losing_pokemon= np.setdiff1d(countByFirst.index.values, numberOfWins.index.values)-1 #offset because the index and pokedex number are off by one
losing_pokemon = pokemon.iloc[find_losing_pokemon[0]] # using the number as the pokemon index
print(losing_pokemon)

Number            231
Name          Shuckle
Type 1            Bug
Type 2           Rock
HP                 20
Attack             10
Defense           230
Sp. Atk            10
Sp. Def           230
Speed               5
Generation          2
Legendary       False
Name: 230, dtype: object
