# Data Wrangling and Exploratory Data Analysis

In [1]:
import pandas as pd
import numpy as np

## The Showdown Dataset

Our dataset consists of records of a person's Pokemon battle, their team, and
whether they won or lost that battle.

While the a battle consists of two players each with their own unique features, we will delimit
our study to the scope of one person and base our analyses from there.

We do this because we mostly want to focus on identifying patterns on how a player builds their
teams given the meta irrespective of the other team. In Pokemon Showdown, teams are built
independently before players challenge each other and are not built "on the fly".

In [2]:
showdown_df = pd.read_csv("./dataset/showdown/showdown.csv")
showdown_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Tag          5000 non-null   int64 
 1   Player       5000 non-null   object
 2   Elo          5000 non-null   object
 3   Pokemon 1    5000 non-null   object
 4   Pokemon 2    5000 non-null   object
 5   Pokemon 3    5000 non-null   object
 6   Pokemon 4    5000 non-null   object
 7   Pokemon 5    5000 non-null   object
 8   Pokemon 6    5000 non-null   object
 9   LeadPokemon  4987 non-null   object
 10  TurnCount    5000 non-null   int64 
 11  Result       5000 non-null   object
dtypes: int64(2), object(10)
memory usage: 468.9+ KB


### Forfeit matches

Some records report no `LeadPokemon`. These indicate that the match ended
prematurely, before the player sent out a Pokemon in the first place, likely due
to disconnection issues or one of the players forfeit the match.

Nevertheless, data from "battles" like these can still be used to identify
trends in team building from people of different elo levels.

(I'm sure there are instances where we'd have to drop these though, as we
previously mentioned)

In [3]:
showdown_df[showdown_df["TurnCount"] == 0]

Unnamed: 0,Tag,Player,Elo,Pokemon 1,Pokemon 2,Pokemon 3,Pokemon 4,Pokemon 5,Pokemon 6,LeadPokemon,TurnCount,Result
69,101708749,WeForneMon,1660,Latios,Lucario,Gengar,Gastrodon,Heatran,Skarmory,,0,lose
377,1082937854,hgfjhds,1532,Tyranitar,Landorus-Therian,Rotom-Wash,Jirachi,Latios,Keldeo,,0,lose
798,1139401705,Wally,1653,Garchomp,Rotom-Wash,Tyranitar,Scizor,Latios,Terrakion,,0,win
811,1141004806,Wally,1653,Garchomp,Rotom-Wash,Tyranitar,Scizor,Latios,Terrakion,,0,win
1027,121892059,ALnlef,1307,Gliscor,Politoed,Rotom-Wash,Ferrothorn,Tentacruel,Tornadus,,0,lose
1307,2033253771,yusei38,1333,Politoed,Skarmory,Espeon,Kyurem-Black,Blissey,Jolteon,,0,win
1530,2058219606,yusei38,1333,Politoed,Skarmory,Espeon,Kyurem-Black,Blissey,Jolteon,,0,win
1667,2073047615,authentise_is_shit,1507,Politoed,Jellicent,Excadrill,Tornadus,Gastrodon,Ferrothorn,,0,lose
2283,2145224095,rabian,1345,Jolteon,Ferrothorn,Politoed,Gyarados,Toxicroak,Tentacruel,,0,lose
4052,588117142,Jack314,1314,Breloom,Garchomp,Latios,Jirachi,Rotom-Wash,Tyranitar,,0,lose


### Ties

Matches do not always end in wins or losses; they can be ties, too.

In [4]:
showdown_df[showdown_df['Result'] == 'tie']

Unnamed: 0,Tag,Player,Elo,Pokemon 1,Pokemon 2,Pokemon 3,Pokemon 4,Pokemon 5,Pokemon 6,LeadPokemon,TurnCount,Result
387,1083803355,professorcaralho,1390,Politoed,Latios,Rotom-Wash,Breloom,Jirachi,Scizor,Rotom-Wash,30,tie
796,113898835,professorcaralho,1390,Politoed,Latios,Rotom-Wash,Breloom,Jirachi,Scizor,Rotom-Wash,30,tie
4803,93999693,Volta,1329,Dragonite,Keldeo,Scizor,Terrakion,Latios,Garchomp,Terrakion,27,tie
4807,94150142,Volta,1329,Dragonite,Keldeo,Scizor,Terrakion,Latios,Garchomp,Terrakion,27,tie
