Skip to content
This repository has been archived by the owner on Apr 4, 2021. It is now read-only.

Epa model update #12

Merged
merged 73 commits into from
Oct 30, 2020
Merged

Epa model update #12

merged 73 commits into from
Oct 30, 2020

Conversation

saiemgilani
Copy link
Owner

the main epa_wpa portion of the cfb_pbp_data function is redesigned to look like this:

g_ids = sort(unique(play_df$game_id))
play_df = purrr::map_dfr(g_ids,
                         function(x) {
                           play_df %>%
                             dplyr::filter(.data$game_id == x) %>%
                             penalty_detection() %>% 
                             add_play_counts() %>% 
                             clean_pbp_dat() %>% 
                             clean_drive_dat() %>% 
                             prep_epa_df_after() %>% 
                             create_epa() %>%
                           # add_betting_cols(g_id = x, yr=year) %>%
                           # create_wpa_betting() %>%
                            create_wpa_naive()
                        })

Will edit to complete this PR

Remove tryCatch messages
Purrr map_dfr function now chains the following functions:
 * penalty_detection()
 * add_play_counts()
 * clean_pbp_dat()
 * clean_drive_dat()
 * prep_epa_df_after()
 * add_betting_cols()
 * create_epa()
 * create_wpa_naive()
Move pred_df_before function to clean_pbp_dat mostly, move pred_df_after function outside of the create_epa function, now file consists only of the main create_epa function and the epa_fg_prob update function.

Updated documentation, comments, and made the effort to change the process to include end of period/end of half play types (which separate quarters)

Removes both map functions from the create_epa function.

made the switch from -9 to +8 for the FG model adjustments per discussion regarding switch to using the un-adjusted FG yards_to_goal (done to improve model calibration/input consistency)

Now returns the actual probabilities of the predictions as well for pre/post play predictions

add more to the roxygen imports to turn them into importFrom's to hopefully eventually reduce conflicts
mainly adding documentation, creating a few copies of variables to examine before/after transformations. Add Penalties to skips for play counts (game/drive/half)
helpers for all the pbp cleaning functions and documentation
version moved to 1.0.3
* switch epa_fg_prob to proper missed_fg_pred accounting.
* add return for fg_make_prob for field goal attempts
* add ep_before as lag_ep_after for play_type = "Timeout"
* modify home_EPA to calculate using .data$pos_team = .data$home instead of .data$offense_play
define a bunch of pos_team lag/lead columns. Mostly moving stuff around to define things at exactly the same time. Add punt return fumble and kickoff fumble play_type correction. not su
switch to using pos_team, do some end of period skippin
update models... this will get updated again in a bit, but for checking, should suffice
add some additional wrapping to account for there possibly being one fg attempt (which saves as a 7 row dataframe, rather than 1 row x 7 col) and the zero fg attempt case just adding the NA variable for fg_make_prob
ep_model calibration error : 0.01150
wp_model calibration error: 0.00787
1) def_td_play and off_td_play changed to offense_score_play and defense_score_play
2) Penallty (Safety), Punt Team Fumble Recovery/Touchdown, Kickoff Team Fumble Recovery/Touchdown,  Punt (Safety)  added as play types
3) fg_made added
4) new_drive_result renamed to drive_result_detailed, drive_result2 uses the same method of determining drives as drive_result_detailed, but labels them in similar style to the API
5) Catching uncategorized end of period plays `add_play_counts()`
6) Definition of  change_of_pos_team changed to have `lead_play_type` == "End Period" instead of `play_type`, this is important since a change of possession at the end of a period stays would not otherwise register since for end of period plays, offense_play/defense_play is repeated until the next period's first event. So we were always trying to check for the lead_play_type being a period end, this is the appropriate transformation, analogous to wpa_base_nxt and wpa_change_nxt definitions.
7) Add blocked field goal touchdowns play_text td check and fix.
8) Add better definitions for receives_2H_kickoff, pos_score_diff, pos_score_diff_start. Add additional lags for two prior for each of our heavily relied upon variables.
9) pos_team_timeouts_rem modified/renamed to pos_team_timeouts_rem_before/def_*
10) lead_down and lead_yards_to_goal used for some new_down and new_yardline conditions:
 * new_down penalty case, use lead_down
 * new_yardline for lead_yards_to_goal for the following play_types
   - "Blocked Punt", "Punt"
   - "Blocked Field Goal"
   - "Fumble Recovery (Opponent)"
   - "Field Goal Missed"
   - "Missed Field Goal Return"
   - "Fumble Recovery (Own)"
   - "Interception Return"
   - "Kickoff"
   - "Punt Team Fumble Recovery"
11) add new_pos_score_diff_start defintion
new models for wp and ep
1) switch fully to pos_score_diff_start
2) new kick needed to be (new_kick["adj_TimeSecsRem"] + 1)
1) structural shift to bring the join of the ep_before/ep_after variables to just prior to the EPA calculation. The reasoning behind this is that before they are joined, the "End Period" and "End of Half" play_types have been filtered out, allowing for much easier lag/lead conditions.
2) add play_type to both initial select statements and additionally selecting turnover for the ep_before calculations.
3) fully switch to pos_score_diff_start
new_yardline condition for `Goal_to_Go` was missing the subtraction for yards_gained. i.e. it was set to `yards_to_goal` rather than `yards_to_goal` - `yards_gained`
mhm, you'll see why
update docs/format
fixing the cumulative sums/running totals and docs
add arguments for san jose state. add_betting cols to pre-epa_wpa argument. i'm too tired to document this. it's everything and it's slow and you all will just have to deal. we're going to direct everyone to the data repo whenever possible
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants