# Exploratory data analysis
## Data 765 tutoring

[matplotlib](https://matplotlib.org/) is an ancient package at the nexus of plotting in Python. `matplotlib` is vast and powerful but sometimes unwieldly due to its lower level nature. Other plotting libraries, such as [seaborn](https://seaborn.pydata.org/), itself built on `matplotlib`, are a few layers higher than `matplotlib` and thus produce pleasing visualizations quickly. `matplotlib` implements the drawing primitives necessary for plots and provides full access to those objects.

I personally grew to love `matplotlib` after my thesis for this program as well as 790. Prior to those two courses I disliked `matplotlib` and even `seaborn`, both of which I felt were archaic and difficult. I now prefer `matplotlib` and `seaborn` to R's `ggplot2` after learning how to navigate the library.

Libraries such as `ggplot2` and `seaborn` are excellent for reasonably aesthetic visualizations with a minimum amount of work. However, you will invariable need to drop down a few levels in order to produce more specialized visualizations. `matplotlib` is omnipresent and powerful. `ggplot2`, in my opinion, becomes increasingly convoluted whenever anything moderately complicated is required. The same could be said about R in general. Like `pandas` in relation to `NumPy`, `seaborn` et alia are intricately tied to `matplotlib` such that seamlessly moving between them vastly increases your productivity.

The official site lists [many of the third party](https://matplotlib.org/mpl-third-party/) libraries built on `matplotlib`. I suggest checking it out to get a grasp on the landscape.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from pandas.api.types import CategoricalDtype

pokemon = pd.read_csv("../data/pokedex.csv")

# Drop unused columns
pokemon.drop(columns=["Unnamed: 0",
                      "german_name",
                      "japanese_name",
                      "abilities_number",
                      "type_number"],
            inplace=True)

# Type conversion
pokemon = pokemon.astype({"name": "string",
                          "status": "category",
                          "species": "string",
                          "type_1": "category",
                          "type_2": "category",
                          "ability_1": "category",
                          "ability_2": "category"
                         })

pokemon.generation = pokemon.generation.astype(CategoricalDType(ordered=True))

# Tukey's rule functions from last session

def iqr(features):
     """Calculate IQR for a Series or DataFrame

     Parameters
     ----------
     features: Union[pd.Series, pd.DataFrame]
          Series or DataFrame for which to calculate IQR(s).

     Returns
     -------
     Union[float, pd.DataFrame]
          Returns the IQR as a float or IQRs as a DataFrame.
     """
     quantiles = features.quantile([.25, .75])
     return quantiles.loc[0.75] - quantiles.loc[0.25]

def iqr_rule(features):
     """Construct boolean array(s) for outliers via Tukey's rule.

     Parameters
     ----------
     features: Union[pd.Series, pd.DataFrame]
          Series or DataFrame for which to construct boolean arrays.

     Returns
     -------
     Union[pd.Series, pd.DataFrame]
          Boolean arrays for outliers.
     """
     iqr_rule = 1.5 * iqr(features)
     iqr_lower = features.quantile(.25) - iqr_rule
     iqr_upper = features.quantile(.75) + iqr_rule

     return ((features < iqr_lower) | (features > iqr_upper)).any(axis="columns")

In [None]:
# These colors were copied wholesale from another project I worked on.
# https://github.com/joshuamegnauth54/Data790_PyVis/blob/main/misc/joshua_megnauth_pset2.ipynb

yellow = "#ffcb05"
blue = "#3d7dca"
dark_blue = "#003a70"
nintendo = "#e4000f"
sapphire = "#0f52ba"
ruby = "#e0115f"

# https://bulbapedia.bulbagarden.net/wiki/Category:Color_templates
poketype_colors = {"Water": "#6890F0",
                   "Fire": "#F08030",
                   "Grass": "#78C850",
                   "Electric": "#F8D030",
                   "Ice": "#98D8D8",
                   "Psychic": "#F85888",
                   "Dragon": "#7038F8",
                   "Dark": "#705848",
                   "Fairy": "#EE99AC",
                   "Normal": "#A8A878",
                   "Fighting": "#C03028",
                   "Flying": "#A890F0",
                   "Poison": "#A040A0",
                   "Ground": "#E0C068",
                   "Rock": "#B8A038",
                   "Bug": "#A8B820",
                   "Ghost": "#705898",
                   "Steel": "#B8B8D0",
                   "Unknown": "#68A090"
                   }

# Same link
stat_colors = {"Atk.": "#F08030",
               "Defense": "#F8D030",
               "Sp. Atk.": "#6890F0",
               "Sp. Def.": "#78C850",
               "Speed": "#F85888",
               "Hit Points": "#FF0000",
               "Total (Base)": "#D77AFF"
               }