# Predicting the winner of Pokemon battles

**This project will collect statistics about pokemon battles in 'battles.csv' and attempt to determine the winner of a pokemon battle using its stats and type alone.  
The reader may play the following song to enhance the experience.  
https://www.youtube.com/watch?v=JuYeHPFR3f0**

In [1]:
import pandas as pd
import numpy as np
from IPython.display import display

battles = pd.read_csv("combats.csv")
pokemon = pd.read_csv("pokemon.csv")
#contains the tests for predicting the winner. Will be used once algorithm is completed
tests = pd.read_csv("tests.csv")

#Previewing the data
display(battles.head())
pokemon.head()

Unnamed: 0,First_pokemon,Second_pokemon,Winner
0,266,298,298
1,702,701,701
2,191,668,668
3,237,683,683
4,151,231,151


Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,4,Mega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,5,Charmander,Fire,,39,52,43,60,50,65,1,False


In [2]:
#Changing strings to lowercase
pokemon['Name'] = pokemon['Name'].str.lower()
pokemon['Type 1'] = pokemon['Type 1'].str.lower()
pokemon['Type 2'] = pokemon['Type 2'].str.lower()
pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,bulbasaur,grass,poison,45,49,49,65,65,45,1,False
1,2,ivysaur,grass,poison,60,62,63,80,80,60,1,False
2,3,venusaur,grass,poison,80,82,83,100,100,80,1,False
3,4,mega venusaur,grass,poison,80,100,123,122,120,80,1,False
4,5,charmander,fire,,39,52,43,60,50,65,1,False


In [3]:
print(pokemon.columns)
print(pokemon.dtypes)

Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk',
       'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')
#              int64
Name          object
Type 1        object
Type 2        object
HP             int64
Attack         int64
Defense        int64
Sp. Atk        int64
Sp. Def        int64
Speed          int64
Generation     int64
Legendary       bool
dtype: object


In [4]:
print(battles.columns)
print(battles.dtypes)

Index(['First_pokemon', 'Second_pokemon', 'Winner'], dtype='object')
First_pokemon     int64
Second_pokemon    int64
Winner            int64
dtype: object


In [5]:
#Checking for nulls
print(battles.isnull().any())

First_pokemon     False
Second_pokemon    False
Winner            False
dtype: bool


In [6]:
print(pokemon.isnull().any())

#             False
Name           True
Type 1        False
Type 2         True
HP            False
Attack        False
Defense       False
Sp. Atk       False
Sp. Def       False
Speed         False
Generation    False
Legendary     False
dtype: bool


**Many pokemon do not have multiple types, so a null in the 'type 2' column is understandable.  
A null in the name column will have to be fixed**

In [7]:
null_names = pokemon[pokemon['Name'].isnull()]
print(null_names)

     # Name    Type 1 Type 2  HP  Attack  Defense  Sp. Atk  Sp. Def  Speed  \
62  63  NaN  fighting    NaN  65     105       60       60       70     95   

    Generation  Legendary  
62           1      False  


**Luckily there is only one name to fix, and a quick google search reveals #63 is Abra**

In [8]:
pokemon.loc[62, 'Name'] = "Abra"
print(pokemon.isnull().any())

#             False
Name          False
Type 1        False
Type 2         True
HP            False
Attack        False
Defense       False
Sp. Atk       False
Sp. Def       False
Speed         False
Generation    False
Legendary     False
dtype: bool


**Some points before we begin.  **
1. **Pokemon are weak, strong or completely immune to certain attack types.  **  
**eg. Water type pokemon deal 2x damage and receive 0.5x damage from fire based attacks.  **
2. **If a pokemon has two types and is hit by an attack that its first type is weak to, but second type is strong to, the damage will be 1x.  **
**eg A Water/Fire pokemon hit by a grass attack will deal 1x damage.**
3. **In this dataset, the pokemon in the left column (First_Pokemon) attacks first.**

**In order to factor in pokemon type, we will have to construct an interaction table  
The table will have 7 columns:  **

**Type - The pokemon type  
atk_up - Types that the row type's attack will be effective against  
atk_down - Types that the row type's attack will be ineffective against  
def_up - Types that the row type's defense will be effective against  
def_down - Types that the row type's defense will be vulnerable to    
atk_immune - Types that the row type's attack cannot damage  
def_immune - Types that the row type will be immune to**

In [9]:
#Creating variables for types to eliminate the chance of spelling errors
grass = 'grass'
bug = 'bug'
dark = 'dark'
dragon = 'dragon'
electric = 'electric'
fairy = 'fairy'
fighting = 'fighting'
fire = 'fire' 
flying = 'flying'
ghost = 'ghost' 
ground = 'ground' 
ice = 'ice' 
normal = 'normal' 
poison = 'poison' 
psychic = 'psychic' 
rock = 'rock' 
steel = 'steel'
water = 'water'

In [10]:
all_types = np.sort(pokemon['Type 1'].unique())
all_types = [x.lower() for x in all_types]
interaction_cols = ['type','atk_up','atk_down','atk_immune']
interactions = pd.DataFrame(index=all_types, columns=interaction_cols)

def add_row(ptype, atk_up, atk_down, atk_immune):
    interactions.loc[ptype] = [ptype, atk_up, atk_down, atk_immune]
    return interactions

interactions = add_row(ptype = bug,
                       atk_up = grass+','+psychic+','+dark,
                       atk_down = fire+','+fighting+','+poison+','+flying+','+ghost+','+steel+','+fairy, 
                       atk_immune = '')
interactions = add_row(ptype= dark,
                       atk_up = ghost+','+psychic,
                       atk_down = fighting+','+dark+','+fairy, 
                       atk_immune = '')
interactions = add_row(ptype = dragon,
                       atk_up = dragon,
                       atk_down = steel, 
                       atk_immune = fairy)
interactions = add_row(ptype = electric,
                       atk_up = water+','+flying,
                       atk_down = electric+','+grass+','+dragon, 
                       atk_immune = ground)
interactions = add_row(ptype = fairy,
                       atk_up = fighting+','+dragon+','+dark,
                       atk_down = fire+','+poison+','+steel, 
                       atk_immune = '')
interactions = add_row(ptype = fighting,
                       atk_up = normal+','+ice+','+rock+','+dark+','+steel,
                       atk_down = poison+','+flying+','+psychic+','+bug+','+fairy, 
                       atk_immune = ghost)
interactions = add_row(ptype = fire,
                       atk_up = grass+','+ice+','+bug+','+steel,
                       atk_down = fire+','+water+','+rock+','+dragon, 
                       atk_immune = '')
interactions = add_row(ptype = flying,
                       atk_up = grass+','+fighting+','+bug,
                       atk_down = electric+','+steel+','+rock, 
                       atk_immune = '')
interactions = add_row(ptype = ghost,
                       atk_up = psychic+','+ghost,
                       atk_down = dark, 
                       atk_immune = normal)
interactions = add_row(ptype = grass,
                       atk_up = water+','+ground+','+rock,
                       atk_down = fire+','+grass+','+poison+','+flying+','+bug+','+dragon+','+steel, 
                       atk_immune = '')
interactions = add_row(ptype = ground,
                       atk_up = fire+','+electric+','+poison+','+rock+','+steel,
                       atk_down = grass+','+bug, 
                       atk_immune = flying)
interactions = add_row(ptype = ice,
                       atk_up = grass+','+ground+','+flying+','+dragon,
                       atk_down = fire+','+water+','+ice+','+steel, 
                       atk_immune = '')
interactions = add_row(ptype = normal,
                       atk_up = '',
                       atk_down = rock+','+steel, 
                       atk_immune = ghost)
interactions = add_row(ptype = poison,
                       atk_up = grass+','+fairy,
                       atk_down = poison+','+ground+','+rock+','+ghost, 
                       atk_immune = steel)
interactions = add_row(ptype = psychic,
                       atk_up = fighting+','+poison,
                       atk_down = psychic+','+steel, 
                       atk_immune = dark)
interactions = add_row(ptype = rock,
                       atk_up = fire+','+ice+','+flying+','+bug,
                       atk_down = fighting+','+ground+','+steel, 
                       atk_immune = '')
interactions = add_row(ptype = steel,
                       atk_up = ice+','+rock+','+fairy,
                       atk_down = fire+','+water+','+electric+','+steel, 
                       atk_immune = '')
interactions = add_row(ptype = water,
                       atk_up = fire+','+rock+','+ground,
                       atk_down = water+','+grass+','+dragon, 
                       atk_immune = '')

interactions

Unnamed: 0,type,atk_up,atk_down,atk_immune
bug,bug,"grass,psychic,dark","fire,fighting,poison,flying,ghost,steel,fairy",
dark,dark,"ghost,psychic","fighting,dark,fairy",
dragon,dragon,dragon,steel,fairy
electric,electric,"water,flying","electric,grass,dragon",ground
fairy,fairy,"fighting,dragon,dark","fire,poison,steel",
fighting,fighting,"normal,ice,rock,dark,steel","poison,flying,psychic,bug,fairy",ghost
fire,fire,"grass,ice,bug,steel","fire,water,rock,dragon",
flying,flying,"grass,fighting,bug","electric,steel,rock",
ghost,ghost,"psychic,ghost",dark,normal
grass,grass,"water,ground,rock","fire,grass,poison,flying,bug,dragon,steel",


#### Calculating statistics for pokemon battles
The pokemon on the left attacks first, so I would like to see what kind of effect this has on the final outcome

In [11]:
left_win = 0
right_win = 0
for index, row in battles.iterrows():
    if row['Winner']==row['First_pokemon']:
        left_win+=1
    else:
        right_win+=1
print(left_win/battles.shape[0])
print(right_win/battles.shape[0])


0.47202
0.52798


**The result is skewed toward the opposite direction of what was expected. Attacking second definitely does not confer any advantages, so there must be more things that affect the outcome than attack order.**

**To find out what determines the outcome of a match, we will need to compare each round's winner to each round's loser and determine the difference  
First, we will investigate differences between the winner's and loser's stats**

In [12]:

results = pd.DataFrame()

left_win = battles[battles['First_pokemon'] == battles['Winner']]
left_win = left_win.drop('First_pokemon', axis=1)
left_win['order'] = 'first'
left_win.columns = ['loser','winner','order']
right_win = battles[battles['Second_pokemon'] == battles['Winner']]
right_win = right_win.drop('Second_pokemon', axis=1)
right_win['order'] = 'second'
right_win.columns = ['loser','winner','order']
combat_results = pd.concat([left_win, right_win])
combat_merge = pd.merge(combat_results, pokemon, how='left', left_on='loser', right_on='#')
combat_merge = pd.merge(combat_merge, pokemon, how='left', left_on='winner', right_on='#')


In [13]:
#NOTE: _y belongs to the winner column, _x belongs to the loser column
combat_merge['atk_delta'] = combat_merge['Attack_y'] - combat_merge['Attack_x']
combat_merge['def_delta'] = combat_merge['Defense_y'] - combat_merge['Defense_x'] 
combat_merge['hp_delta'] = combat_merge['HP_y'] - combat_merge['HP_x'] 
combat_merge['spatk_delta'] = combat_merge['Sp. Atk_y'] - combat_merge['Sp. Atk_x'] 
combat_merge['spdef_delta'] = combat_merge['Sp. Def_y'] - combat_merge['Sp. Def_x'] 
combat_merge['speed_delta'] = combat_merge['Speed_y'] - combat_merge['Speed_x'] 
combat_merge['total'] = (combat_merge['atk_delta'] + combat_merge['def_delta'] + 
                         combat_merge['hp_delta'] + combat_merge['spatk_delta'] + 
                         combat_merge['spdef_delta'] + combat_merge['speed_delta'])
#print(combat_merge[(combat_merge['winner']==657) & (combat_merge['loser']==752)])
combat_deltas = combat_merge[['loser','winner','order','atk_delta','def_delta','hp_delta','spatk_delta','spdef_delta',
                              'speed_delta','total']]
combat_deltas.head()

Unnamed: 0,loser,winner,order,atk_delta,def_delta,hp_delta,spatk_delta,spdef_delta,speed_delta,total
0,231,151,first,50,-105,50,105,-160,50,-10
1,752,657,first,-3,-100,-10,7,-100,5,-201
2,624,701,first,99,5,53,17,25,78,277
3,87,151,first,-15,15,-25,15,-10,25,5
4,462,269,first,70,100,30,1,70,5,276


In [14]:
print("% Winners having higher combined stats:", combat_deltas[combat_deltas['total']>0].shape[0]/combat_deltas.shape[0])
print("% Winners having lower combined stats:", combat_deltas[combat_deltas['total']<=0].shape[0]/combat_deltas.shape[0])

% Winners having higher combined stats: 0.69062
% Winners having lower combined stats: 0.30938


**69% of battles were won by pokemon with higher overall battle stats.  
Next, I will check the significance of other stats.**

In [15]:
print("% Winners having higher Attack:", combat_deltas[combat_deltas['atk_delta']>0].shape[0]/combat_deltas.shape[0])
print("% Winners having higher Defense:", combat_deltas[combat_deltas['def_delta']>0].shape[0]/combat_deltas.shape[0])
print("% Winners having higher HP:", combat_deltas[combat_deltas['hp_delta']>0].shape[0]/combat_deltas.shape[0])
print("% Winners having higher Sp. Atk:", combat_deltas[combat_deltas['spatk_delta']>0].shape[0]/combat_deltas.shape[0])
print("% Winners having higher Sp. Def:", combat_deltas[combat_deltas['spdef_delta']>0].shape[0]/combat_deltas.shape[0])
print("% Winners having higher Speed:", combat_deltas[combat_deltas['speed_delta']>0].shape[0]/combat_deltas.shape[0])

% Winners having higher Attack: 0.63856
% Winners having higher Defense: 0.53438
% Winners having higher HP: 0.58508
% Winners having higher Sp. Atk: 0.63184
% Winners having higher Sp. Def: 0.58896
% Winners having higher Speed: 0.91398


**Speed seems to be correlated the highest with victory in pokemon combat.  
It seems the instructions for this data set were wrong, and that attack order is likely based on speed.  
In the pokemon game, pokemon can faint in one attack, so it makes sense that speed would be a highly advantageous stat.  **

## Placeholder code to analyze type advantages
**Next, we analyze type advantages**

In [95]:
#winner_loser_type = combat_merge[['winner','loser','Type 1_y','Type 2_y','Type 1_x','Type 2_x']]
#winner_loser_type.columns = ['winner','loser','type1_w','type2_w','type1_l','type2_l']
#winner_loser_type = winner_loser_type.assign(adv = 1)

#for index, row in winner_loser_type.head().iterrows():
#    if row['type1_l'] in interactions.loc[row['type1_w'],'atk_up']:
#        winner_loser_type.loc[index,'adv'] *= 2
#    else:
#        winner_loser_type.loc[index,'adv'] *= 0.5

#winner_loser_type.head()