# NHL Draft data from NHL Records API
# Feature extraction
This notebook presents feature extraction process for NHL Draft data collected from the NHL Records API. Previous steps performed include data collection from the API and data cleanup.

### Data collection summary
Dataset generated from a JSON received from the NHL Records API, contains response to the request for all draft records.  
For details, see notebook `notebooks/feature_extraction/nhl_api.ipynb`.

### Data cleanup summary
Previous steps included data cleaning and feature extraction. Summary of each step is presented below, for details see notebook `notebooks/feature_extraction/nhl_cleanup_extraction.ipynb`.

Cleanup summary:

* summarized positions
    * corrected for consistency
    * C/RW, C/LW, _etc._, C/W, F = C
    * L/RW, W = RW
    * player who can play center are assumed to be centers for the purposes of this analysis
    * universal (left/right) wingers are assumed to be right wingers
    * for details, see notebook `notebooks/feature_extraction/nhl_cleanup_extraction.ipynb`


## Description of features

### Features in the original dataset

General player info:
* `Overall`: overall number of draft pick in the corresponding draft season
* `Team`: team by which the player was drafted
* `Player`: player name and _alias?_
* `Nat.`: nationality of the player
* `Pos`: position played
* `Age`: age when drafted
* `To`: _?_
* `Amateur Team`: team prior to draft

Skater-specific stats:
* `GP`: total games played _where?_
* `G`: total goals scored _where?_
* `A`: total assists _where?_
* `PTS`: total points scored _where?_
* `+/-`: plus/minus _where?_
* `PIM`: _?_

Goaltender-specific stats:
* `GP.1`: _?_
* `W`: wins _where?_
* `L`: losses _where?_
* `T/O`: _?_
* `SV%`: save percentage _where?_
* `GAA`: goals against average _where?_


### Added features
New features added during feature extraction:

* `year`: int, year of NHL draft, extracted from .csv file names
* `num_teams`: int, number of teams in each draft year
* `round_ratio`: float, ratio of each pick: 
    * $\text{round_ratio}=\large{\frac{\text{# Overall}}{\text{number of teams}}}$ 
    * number of teams represents number of picks per round
    * each overall pick number (e.g., 171) is divided by the number of picks per round to determine in which round (and how late in the round, via the ratio) was each prospect selected
    * \- 1 is needed to ensure proper boundary between rounds
    * so, for example, for pick #171 $\text{round ratio}=\frac{171 - 1}{30} = 5.67$
* `round`: int, round in which a prospect was selected
    * `round_ratio` is rounded down and 1 is added
    * $\text{round} = \text{int}(\text{round ratio}) + 1$
* `1st_round`: boolean, whether the prospect was selected in the $1^{st}$ round
    * one-hot encoding for $1^{st}$ round picks
    * True if `round` == 1, False otherwise
* `gpg`: float, average goals per game
* `apg`: float, average assists per game
* `ppg`: float, average points per game


## Preparations
### Import dependencies

### Load data