# Explore Fantasy Football Data <a id="return"></a>

This notebook will explore the web-scraped data from the ESPN Raytonia Beach Fantasy Football League.
<br><br/>

**Notebook Sections:**
1. [Import Packages and Set User-Defined Fields](#section1)
2. [Read in Data](#section2)
3. [Explore Numerical Data](#section3)
4. [Explore Categorical Data](#section4)
5. [Explore Numerical Distributions](#section5)
6. [Explore Numerical Correlations](#section6)
<br><br/>

**Outputs:**
1. Web-Scraped Weekly Matchup Raw Data: **json**
<br><br/>

**Enhancements:**
1. Explore categorical relationships via Chi-Square Test of Independence, Cramer's V, and Theil's U

## Import Packages <a id="section1"></a>

In [1]:
# import needed packages
# import numpy as np
import pandas as pd
import os
# import explore_util

# set pandas display options
pd.set_option('display.max_columns',500)
pd.set_option('display.max_rows',100)

# prints multiple outputs within same cell display
# from IPython.core.interactiveshell import InteractiveShell
# InteractiveShell.ast_node_interactivity = "all"

# run the 00-01-explore_ff_league_data_util.ipynb notebook
%run 01-explore_ff_league_data_util.ipynb

## Set User-Defined Fields

[Return to Top](#return)

In [2]:
# set print flag to True if you want to output within this notebook or set print flag to False to save output to csv
print_flag = False

# if print flag is False set output directory below
output_dir = '../data/data_exploration/'

# create directory if it doesn't exist
if not os.path.exists(output_dir): 
    os.makedirs(output_dir)

## Read in Data <a id="section2"></a>
[Return to Top](#return)

In [3]:
# create function to read in data
def read_data(file_name, columns):
    
    # read csv
    df = pd.read_csv(file_name)

    # convert columns to string
    df[columns[2:]] = df[columns[2:]].astype('string')
    
    # initially explore data
    df.info()
    
    return df

In [4]:
# create list of player specific columns
player_columns = ['year'
                 ,'week'
                 ,'player'
                 ,'short_name'
                 ,'position_name'
                 ,'pro_team'
                 ,'pro_team_abv']

# run read_data function
rbwrte_df = read_data("rbwrte_feature_matrix.csv", player_columns)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7615 entries, 0 to 7614
Data columns (total 43 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   year                 7615 non-null   int64  
 1   week                 7615 non-null   int64  
 2   player               7615 non-null   string 
 3   short_name           7615 non-null   string 
 4   position_name        7615 non-null   string 
 5   pro_team             7615 non-null   string 
 6   pro_team_abv         7615 non-null   string 
 7   rush_att             7615 non-null   float64
 8   rush_yrd             7615 non-null   float64
 9   rush_td              7615 non-null   float64
 10  rush_2pt_con         7615 non-null   float64
 11  rec_tar              7615 non-null   float64
 12  receptions           7615 non-null   float64
 13  rec_yrd              7615 non-null   float64
 14  rec_td               7615 non-null   float64
 15  rec_2pt_con          7615 non-null   f

In [32]:
# run read_data function
qb_df = read_data("qb_feature_matrix.csv", player_columns)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1148 entries, 0 to 1147
Data columns (total 62 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   year                 1148 non-null   int64  
 1   week                 1148 non-null   int64  
 2   player               1148 non-null   string 
 3   short_name           1148 non-null   string 
 4   position_name        1148 non-null   string 
 5   pro_team             1148 non-null   string 
 6   pro_team_abv         1148 non-null   string 
 7   pass_comp            1148 non-null   float64
 8   pass_incomp          1148 non-null   float64
 9   pass_yrd             1148 non-null   float64
 10  air_yards            1148 non-null   float64
 11  pass_td              1148 non-null   float64
 12  rdz_td               1148 non-null   float64
 13  pass_2pt_con         1148 non-null   float64
 14  pass_int             1148 non-null   float64
 15  rush_att             1148 non-null   f

In [33]:
# run read_data function
def_df = read_data("def_feature_matrix.csv", player_columns)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 841 entries, 0 to 840
Data columns (total 94 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   year                       841 non-null    int64  
 1   week                       841 non-null    int64  
 2   player                     841 non-null    string 
 3   short_name                 841 non-null    string 
 4   position_name              841 non-null    string 
 5   pro_team                   841 non-null    string 
 6   pro_team_abv               841 non-null    string 
 7   def_pts_alw                841 non-null    float64
 8   def_tot_yrd_alw            841 non-null    float64
 9   def_st_int                 841 non-null    float64
 10  def_st_fum                 841 non-null    float64
 11  def_st_sack                841 non-null    float64
 12  def_st_safety              841 non-null    float64
 13  def_st_blk_kick            841 non-null    float64

In [35]:
# run read_data function
kr_df = read_data("kr_feature_matrix.csv", player_columns)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 572 entries, 0 to 571
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   year              572 non-null    int64  
 1   week              572 non-null    int64  
 2   player            572 non-null    string 
 3   short_name        572 non-null    string 
 4   position_name     572 non-null    string 
 5   pro_team          572 non-null    string 
 6   pro_team_abv      572 non-null    string 
 7   pat_con           572 non-null    float64
 8   pat_att           572 non-null    float64
 9   fg_con            572 non-null    float64
 10  fg_att            572 non-null    float64
 11  pat_perc          572 non-null    float64
 12  fg_perc           572 non-null    float64
 13  kick_perc         572 non-null    float64
 14  total_plays       572 non-null    float64
 15  total_yards       572 non-null    float64
 16  total_scores      572 non-null    float64
 1

## Explore Numerical Data <a id="section3"></a>

This section will review the following for each positional group:
* Row count
* Number of distinct values
* 5-number summary
* Mean
* Standard Deviation
* Sum of all values
* Percentage null
* Percentage zero
* Percentage positive
* Percentage negative
* Top n most frequent values
* First n values when sorted
* Last n values when sorted
<br><br/>

[Return to Top](#return)

#### Running Backs, Wide Receivers, and Tight Ends:

In [12]:
# run run_explore_func and explore_num_data from explore_util
explore_util.run_explore_func(rbwrte_df, explore_util.explore_num_data, 'num', 'rbwrte', print_flag, output_dir)

#### Quarterbacks:

In [13]:
# run run_explore_func and explore_num_data from explore_util
explore_util.run_explore_func(qb_df, explore_util.explore_num_data, 'num', 'qb', print_flag, output_dir)

#### Defenses:

In [14]:
# run run_explore_func and explore_num_data from explore_util
explore_util.run_explore_func(def_df, explore_util.explore_num_data, 'num', 'def', print_flag, output_dir)

#### Kickers:

In [15]:
# run run_explore_func and explore_num_data from explore_util
explore_util.run_explore_func(kr_df, explore_util.explore_num_data, 'num', 'kick', print_flag, output_dir)

## Explore Categorical Data <a id="section4"></a>

This section will review the following for each positional group:
* Row count
* Number of distinct values
* Minimum string length
* Maximum string length
* Percentage null
* Percentage empty
* Top n most frequent values
* First n values when sorted
* Last n values when sorted
<br><br/>

[Return to Top](#return)

#### Running Backs, Wide Receivers, and Tight Ends:

In [16]:
# run run_explore_func and explore_cat_data from explore_util
explore_util.run_explore_func(rbwrte_df, explore_util.explore_cat_data, 'cat', 'rbwrte', print_flag, output_dir)

#### Quarterbacks:

In [17]:
# run run_explore_func and explore_cat_data from explore_util
explore_util.run_explore_func(qb_df, explore_util.explore_cat_data, 'cat', 'qb', print_flag, output_dir)

#### Defenses:

In [18]:
# run run_explore_func and explore_cat_data from explore_util
explore_util.run_explore_func(def_df, explore_util.explore_cat_data, 'cat', 'def', print_flag, output_dir)

#### Kickers:

In [19]:
# run run_explore_func and explore_cat_data from explore_util
explore_util.run_explore_func(kr_df, explore_util.explore_cat_data, 'cat', 'kick', print_flag, output_dir)

## Explore Numerical Distributions <a id="section5"></a>

This section will explore numerical distributions via histograms.
<br><br/>

[Return to Top](#return)

#### Running Backs, Wide Receivers, and Tight Ends:

In [20]:
# plot histograms for running backs, wide receivers, and tight ends
explore_util.plot_hist(rbwrte_df, 'rbwrte', output_dir)

#### Quarterbacks:

In [21]:
# plot histograms for quaterbacks
explore_util.plot_hist(qb_df, 'qb', output_dir)

#### Defenses:

In [22]:
# plot histograms for defenses
explore_util.plot_hist(def_df, 'def', output_dir)

#### Kickers:

In [23]:
# plot histograms for kickers
explore_util.plot_hist(kr_df, 'kr', output_dir)

## Explore Numerical Correlations <a id="section6"></a>

This section will explore numerical correlations via heatmaps.
<br><br/>

[Return to Top](#return)

In [24]:
# plot heatmaps for running backs, wide receivers, and tight ends
explore_util.corr_matrix(rbwrte_df, 'rbwrte', output_dir)

#### Quarterbacks:

In [25]:
# plot heatmaps for quaterbacks
explore_util.corr_matrix(qb_df, 'qb', output_dir)

#### Defenses:

In [26]:
# plot heatmaps for defenses
explore_util.corr_matrix(def_df, 'def', output_dir)

#### Kickers:

In [27]:
# plot heatmaps for kickers
explore_util.corr_matrix(kr_df, 'kick', output_dir)