# COGS 108 - EDA Checkpoint

# Names

- Crystal Zhan
- Akil Selvan Rajendra Janarthanan 
- Kristen Prescaro
- Kristine Thipatima
- Ethan Dinh-Luong

<a id='research_question'></a>
# Research Question

How did the addition of the Fairy type Pokemon change competitive battling?

# Setup

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Data Cleaning

## Cleaning Informational Data (Pokedex, Pokemon Moves)

The list of moves is in a semi-structed JSON file, which we needed to clean up so we only had the relevant information about the moves. 

In [2]:
moves = pd.read_json("Pokedex and Moves/data/moves.json")
moves.head()

Unnamed: 0,10000000voltthunderbolt,absorb,accelerock,acid,acidarmor,aciddownpour,acidspray,acrobatics,acupressure,aerialace,...,workup,worryseed,wrap,wringout,xscissor,yawn,zapcannon,zenheadbutt,zingzap,zippyzap
num,719,71,709,51,151,628,491,512,367,332,...,526,388,35,378,404,281,192,428,716,729
accuracy,True,100,100,100,True,True,100,100,True,True,...,True,100,90,100,100,True,50,90,100,100
basePower,195,20,40,40,0,1,40,55,0,60,...,0,0,15,0,80,0,120,80,80,80
category,Special,Special,Physical,Special,Status,Physical,Special,Physical,Status,Physical,...,Status,Status,Physical,Special,Physical,Status,Special,Physical,Physical,Physical
isNonstandard,Past,,,,,Past,,,,,...,,,,Past,,,,,,LGPE


As can be seen, the names of the moves are the column names and the aspects of them are the row names. However, we want it the other way around, with the move names on the side. 

We also only want the move name and the move type, so we remove all the other columns. 

Next, we reset the index so there are index numbers and rename the index column to be called move.

Now, our move dataset is clean. 

In [3]:
# switching the column and row names 
moves = moves.T

# removing all columns except move name & type
moves = moves.loc[:,["name", "type"]]

#resetting the index numbers
moves = moves.reset_index()

#renaming index to be called move 
moves.rename(columns = {"index":"move"}, inplace=True)
moves.head()

Unnamed: 0,move,name,type
0,10000000voltthunderbolt,"10,000,000 Volt Thunderbolt",Electric
1,absorb,Absorb,Grass
2,accelerock,Accelerock,Rock
3,acid,Acid,Poison
4,acidarmor,Acid Armor,Poison


Now, do the same cleaning steps with the list of Pokemon. 

However, there is an extra step of removing any Mega or Gmax form Pokemon, as our Pokemon Showdown dataset doesn't include them, nor would they affect the usage of Pokemon based on type. 

In [4]:
pokemon = pd.read_json("Pokedex and Moves/data/pokedex.json")

#takes column name and row name and flips them
pokemon = pokemon.T 
pokemon = pokemon.iloc[0:1155, :]

#deletes all columns except for name and types
pokemon = pokemon.loc[:,["name", "types"]]
pokemon = pokemon.reset_index()

#renames index to be pokemon
pokemon.rename(columns = {"index":"pokemon"}, inplace=True)

#drop rows that contain the partial string "gmax or mega" in the pokemon column
pokemon = pokemon[~pokemon.pokemon.str.contains("gmax")]
pokemon = pokemon[~pokemon.name.str.contains("-Mega")]

pokemon.head()


Unnamed: 0,pokemon,name,types
0,bulbasaur,Bulbasaur,"[Grass, Poison]"
1,ivysaur,Ivysaur,"[Grass, Poison]"
2,venusaur,Venusaur,"[Grass, Poison]"
5,charmander,Charmander,[Fire]
6,charmeleon,Charmeleon,[Fire]


The next step for the Pokemon list is to split the type column into two parts, as some Pokemon have two types, while others have one. For those with one type, we will put null for the 2nd type. 

In [5]:
type1 = []
type2 = []

#loop through the list of pokemon and puts their types in their own columns
#puts None if there's no secondary type
for x in pokemon["types"]:
    type1.append(x[0])
    if (len(x) == 2):
        type2.append(x[1])
    else:
        type2.append(None)

pokemon["types"] = type1
pokemon["type2"] = type2
pokemon.head()


Unnamed: 0,pokemon,name,types,type2
0,bulbasaur,Bulbasaur,Grass,Poison
1,ivysaur,Ivysaur,Grass,Poison
2,venusaur,Venusaur,Grass,Poison
5,charmander,Charmander,Fire,
6,charmeleon,Charmeleon,Fire,


## Pokemon Showdown Battle Stats

The data given by Pokemon Showdown is several semi-structured JSON format files, where cleaning was needed to read the data into a usable format. 

*The cleaning process exampled below was repeated for all other JSON files.*

The file given by Pokemon Showdown was downloaded and imported into the Notebook, and unnecessary data prior to our analysis was removed where data was NaN.

In [6]:
raw = pd.read_json("Pokemon Usage/September/raw/gen8/gen8ou-0.json")
df = raw[raw['data'].notna()]['data']
df

Mr. Mime-Galar    {'Moves': {'': 32.0, 'healingwish': 226.0, 'bl...
Eevee             {'Moves': {'': 197.0, 'rest': 7.0, 'mudslap': ...
Torracat          {'Moves': {'': 1.0, 'firespin': 20.0, 'leechli...
Poliwrath         {'Moves': {'': 58.0, 'counter': 48.0, 'liquida...
Emolga            {'Moves': {'': 2.0, 'eerieimpulse': 47.0, 'ris...
                                        ...                        
Shedinja          {'Moves': {'': 578.0, 'absorb': 11.0, 'falsesw...
Wishiwashi        {'Moves': {'': 67.0, 'liquidation': 393.0, 'be...
Sneasel           {'Moves': {'counter': 3.0, 'beatup': 9.0, 'bli...
Hitmontop         {'Moves': {'': 208.0, 'detect': 89.0, 'quickgu...
Kingdra           {'Moves': {'': 57.0, 'icywind': 32.0, 'liquida...
Name: data, Length: 440, dtype: object

To narrow down the data desired for our analysis, the following criteria were used to filter out the data:
- Pokemon with at least 2% usage
- Each Pokemon's Top 6 Moves

Additionally, each dataframe includes 4 more columns identifying which JSON file the data originated from, denoted by **Gen**, **Format**, **Rating**, and **Recent** given in the first few rows of the JSON file.

In [9]:
### Dictionary to make the DataFrame
top_mons = {}

### Saves the Pokemon as Indexes
ix = list(df.index)

### For each Observation
for row in range(len(df)):

    ### At least 2% Usage
    if df[row]['usage'] >= .02:

        ### Pokemon Name
        mon = ix[row]

        ### Finds the Top 6 Moves
        top_6 = list(dict(sorted(df[row]['Moves'].items(), key=lambda item: item[1], reverse=True)))[:6]
        
        ### Saves info to dictionary
        top_mons[mon] = [top_6, df[row]['usage']]

### Output DataFrame
cleaned = pd.DataFrame.from_dict(top_mons, orient = 'index').rename(columns = {0:"Moves", 1:"Usage"})

metagame = raw.loc['metagame'][0]
gen = metagame[3]
format_name = metagame[4:]
rating = raw.loc["cutoff"][0]
cleaned["Gen"] = gen
cleaned["Format"] = format_name
cleaned["Min Rating"] = rating
cleaned.head()


Unnamed: 0,Moves,Usage,Gen,Format,Min Rating
Landorus-Therian,"[earthquake, uturn, stealthrock, knockoff, tox...",0.304108,8,ou,0.0
Blissey,"[softboiled, seismictoss, toxic, teleport, thu...",0.084829,8,ou,0.0
Slowbro,"[scald, teleport, slackoff, futuresight, icebe...",0.057747,8,ou,0.0
Crawdaunt,"[aquajet, knockoff, crabhammer, swordsdance, c...",0.028303,8,ou,0.0
Urshifu-Rapid-Strike,"[surgingstrikes, closecombat, aquajet, uturn, ...",0.129478,8,ou,0.0


In [None]:
recentdf = pd.read_csv("Pokemon Usage/SeptemberData.csv",index_col=0).rename(columns = {'index':"name"})
olddf =  pd.read_csv("Pokemon Usage/OldData.csv",index_col=0).rename(columns = {'index':"name"})
usagedf = recentdf.merge(olddf, how='outer')


# Data Analysis & Results (EDA)

Carry out EDA on your dataset(s); Describe in this section

In [3]:
## YOUR CODE HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION