<p align="center">
  <img src="assets/pokemon-center.gif" alt="PKCenter" width="300"/>
</p>

In [2]:
# Checkpoint to begin loading the Showdown data set

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats as sp

gen5_types_df = pd.read_csv("./dataset/pokemon-fandom/pokemon_gen5_types_fixed.csv")
gen5_stats_df = pd.read_csv(r"dataset/bulbagarden/pokemon_gen5_stats.csv")
showdown_df = pd.read_csv("showdown.csv")
unique_pokemons_df = pd.read_csv("unique_pokemons.csv")
gen5ou_usage_df = pd.read_csv("gen5ou_usage.csv")

## Dugtrio Arena Trap Ban

In the previous section regarding the paired observation test, it was mentioned that one of the Pokemon that dropped in usage is dugtrio where from 2015 its usage is at 4.07888% and dropped to 0.07689% by 2025. The ban is a result of a retroactive ban where it was initially only banned in Generation 7 OU due to the availability of the Ground Z-Move, that then cascaded to the other lower generations of OU. In this section, we will identify if this is a significant drop in usage utilizing the other recorded usage% of dugtrio before the Arena Trap Ban at April 2018.

<p align="center">
<a href="https://www.smogon.com/forums/threads/bw-ou-dugtrio-vote-banned-reopened.3631624"> 
<img src="https://www.smogon.com/forums/attachments/giphy-gif.214306/" data-url="" class="bbImage" data-zoom-target="1" style="" alt="giphy.gif" title="giphy.gif" width="500" height="375" loading="lazy">
</a>
</p>



Before this, the usage of dugtrio_df from November of 2014 until december of 2021 is recorded through the python source code `get_dugtrio_usage_stats.py` taking the usage from the downloaded text files from `smogon/stats`. These are then stored in the `dugtrio.csv` which will be loaded into a dataframe below.

In [38]:
dugtrio_df = pd.read_csv(r"dataset/smogon/dugtrio.csv")

### Pre-processing

To better cater the dataframe for data transformation and extracting of rows, we will firsst split the date into two columns and casting these as integers so we will be able to perform integer operations such as comparisons.

In [39]:
dugtrio_df[['Year', 'Month']] = dugtrio_df['Date'].str.split('-', expand=True)

percent_cols = ['Usage%', 'Raw%', 'Real%']
for col in percent_cols:
    dugtrio_df[col] = dugtrio_df[col].str.replace('%', '', regex=False).astype(float)

cols = ['Year', 'Month'] + [col for col in dugtrio_df.columns if col not in ['Year', 'Month']]
dugtrio_df['Year'] = dugtrio_df['Year'].astype(int)
dugtrio_df['Month'] = dugtrio_df['Month'].astype(int)

dugtrio_df = dugtrio_df[cols]
dugtrio_df = dugtrio_df.drop(columns=['Date'])
dugtrio_df

Unnamed: 0,Year,Month,Rank,Pokemon,Usage%,Raw,Raw%,Real,Real%
0,2014,11,36,Dugtrio,5.18433,90,5.184,75,5.234
1,2014,12,32,Dugtrio,5.15342,262,5.153,199,4.793
2,2015,1,42,Dugtrio,4.07888,151,4.079,111,3.821
3,2015,2,44,Dugtrio,4.19901,119,4.199,89,3.982
4,2015,3,39,Dugtrio,4.32073,208,4.321,158,4.100
...,...,...,...,...,...,...,...,...,...
81,2021,8,197,Dugtrio,0.22362,269,0.224,190,0.205
82,2021,9,165,Dugtrio,0.31558,343,0.316,240,0.287
83,2021,10,202,Dugtrio,0.17242,168,0.172,128,0.170
84,2021,11,187,Dugtrio,0.20062,183,0.201,143,0.203


With the dates properly set as integers, we can then prepare the two additional dataframes we need which would be for the pre arena trap ban usage and the post arena trap ban. All of the rows that occur before April 2018 will be recorded in the `pre_ban_df` while all those at April 2018 onwards will be recorded in the `post_ban_df`.  

In [None]:
ban_year = 2018
ban_month = 4

pre_ban_df = dugtrio_df.loc[
    (dugtrio_df['Year'] < ban_year) |
    ((dugtrio_df['Year'] == ban_year) & (dugtrio_df['Month'] < ban_month))
]

pre_ban_df

Unnamed: 0,Year,Month,Rank,Pokemon,Usage%,Raw,Raw%,Real,Real%
0,2014,11,36,Dugtrio,5.18433,90,5.184,75,5.234
1,2014,12,32,Dugtrio,5.15342,262,5.153,199,4.793
2,2015,1,42,Dugtrio,4.07888,151,4.079,111,3.821
3,2015,2,44,Dugtrio,4.19901,119,4.199,89,3.982
4,2015,3,39,Dugtrio,4.32073,208,4.321,158,4.1
5,2015,4,26,Dugtrio,7.32824,192,7.323,145,6.965
6,2015,5,60,Dugtrio,2.34478,71,2.345,51,2.124
7,2015,6,32,Dugtrio,6.13532,214,6.135,156,5.761
8,2015,7,42,Dugtrio,4.04793,125,4.048,94,3.955
9,2015,8,44,Dugtrio,4.01575,102,4.016,80,4.109


In [41]:
post_ban_df = dugtrio_df.loc[
    (dugtrio_df['Year'] > ban_year) |
    ((dugtrio_df['Year'] == ban_year) & (dugtrio_df['Month'] >= ban_month))
]

post_ban_df

Unnamed: 0,Year,Month,Rank,Pokemon,Usage%,Raw,Raw%,Real,Real%
41,2018,4,111,Dugtrio,0.79251,188,0.793,141,0.757
42,2018,5,0,Dugtrio,0.0,0,0.0,0,0.0
43,2018,6,0,Dugtrio,0.0,0,0.0,0,0.0
44,2018,7,0,Dugtrio,0.0,0,0.0,0,0.0
45,2018,8,0,Dugtrio,0.0,0,0.0,0,0.0
46,2018,9,371,Dugtrio,0.01688,4,0.017,3,0.016
47,2018,10,190,Dugtrio,0.24108,52,0.241,44,0.26
48,2018,11,173,Dugtrio,0.29997,63,0.3,48,0.29
49,2018,12,173,Dugtrio,0.31742,67,0.317,49,0.297
50,2019,1,181,Dugtrio,0.32089,82,0.321,58,0.29


### Shapiro-Wilks

To discern if we are able to do an Unpaired T-test or if we will need to resort to a non-parametric alternative, we first have to test whether our two groups are normally distributed. For that we will be utilizing the Shapiro-Wilks Test of Normality.

The hypothesis is as follows:
* $H_0$: The data is normally distributed
* $H_a$: The data is not normally distributed

We set a significance level of $a = 0.05$. As such, we will reject the null hypothesis if the $p$ value reaches below this threshold.

In [42]:
data_pre_ban = pre_ban_df["Usage%"]

# Ben Shapiro
stat, p = sp.shapiro(data_pre_ban)
print(f"Shapiro-Wilk test Pre-Ban: stat={stat}, p={p}")
if p > 0.05:
    print("Accept H0: The group is likely normal")
else:
    print("Reject H0: The group is likely not normal\n")
    
data_post_ban = post_ban_df["Usage%"]

stat, p = sp.shapiro(data_post_ban)
print(f"Shapiro-Wilk test Post-Ban: stat={stat}, p={p}")
if p > 0.05:
    print("Accept H0: The group is likely normal")
else:
    print("Reject H0: The group is likely not normal")

Shapiro-Wilk test Pre-Ban: stat=0.9373146707496911, p=0.02544113742236956
Reject H0: The group is likely not normal

Shapiro-Wilk test Post-Ban: stat=0.965928489567733, p=0.20488294250577838
Accept H0: The group is likely normal


From the Shapiro-Wilk test, we see that one of the groups is not normally distributed. As such one of the requirements of the unpaired t-test is unfulfilled. With this in mind, the proper statistical test to identify the significance of the difference would have to be a non-parametric test like the Mann-Whitney U-test.

### Mann-Whitney U-Test

The Mann–Whitney ${\displaystyle U}$ test is a nonparametric statistical test of the null hypothesis that randomly selected values X and Y from two populations have the same distribution. This test assumes independence from the two groups similar to an Unpaired T-test. The statistic is solved as follows:

Let:
- $ n_1 $ = number of observations in **Group 1**
- $ n_2 $ = number of observations in **Group 2**
- $ R_1 $ = sum of ranks for **Group 1**

The **U statistic** for both groups is solved as follows:
$\begin{align}
U_1 = n_1 n_2 + \frac{n_1 (n_1 + 1)}{2} - R_1 
\end{align}$

$\begin{align}
U_2 = n_1 n_2 - U_1
\end{align}$

For this analysis, we will be discerning if the usage of Dugtrio is greater before Arena Trap is banned compared to when Arena Trap is banned. Given this, we would be performing a Right Tailed Test.

The hypothesis is as follows:
> **Null hypothesis $H_{0}$**
- Group $A \leq$ Group $B$


> **Right tailed alternative hypothesis $H_{a}$**
- Group $A >$ Group $B$

We can utilize the Mann-Whitney U-test through the function `mannwhitneyu(groupA, groupB, alternative)` under `scipy.stats` to solve for the test statistic and the p-value.

In [43]:
stat, p = sp.mannwhitneyu(data_pre_ban, data_post_ban, alternative="greater")
print(f"Mann-Whitenye U test: stat={stat}, p={p}")
if p > 0.05:
    print("Accept H0: Usage of dugtrio before the Arena trap ban is less than or equal to after the ban")
else:
    print("Reject H0: Usage of dugtrio before the Arena trap ban is greater than its usage after the ban")

Mann-Whitenye U test: stat=1845.0, p=7.786728871721725e-16
Reject H0: Usage of dugtrio before the Arena trap ban is greater than its usage after the ban


<p align="center">
<a href="https://pokemondb.net/pokedex/dugtrio"><img src="https://img.pokemondb.net/sprites/black-white/anim/normal/dugtrio.gif" alt="Dugtrio" style="width: 200px;"></a>
</p>

### Results and Discussions

With a critical value of 0.05, we can conclude that the usage of dugtrio significantly dropped from before the ban of arena trap to after that. With this in mind, we can infer that taking away one of a Pokemon's defining attributes will definitely stiffle the usage of said Pokemon. Although a ban does not happen often, these can heavily influence team building as it would factor in the potential matchups that you would have to consider.