Epa model update #12

saiemgilani · 2020-10-07T02:47:27Z

the main epa_wpa portion of the cfb_pbp_data function is redesigned to look like this:

g_ids = sort(unique(play_df$game_id))
play_df = purrr::map_dfr(g_ids,
                         function(x) {
                           play_df %>%
                             dplyr::filter(.data$game_id == x) %>%
                             penalty_detection() %>% 
                             add_play_counts() %>% 
                             clean_pbp_dat() %>% 
                             clean_drive_dat() %>% 
                             prep_epa_df_after() %>% 
                             create_epa() %>%
                           # add_betting_cols(g_id = x, yr=year) %>%
                           # create_wpa_betting() %>%
                            create_wpa_naive()
                        })

Will edit to complete this PR

Remove tryCatch messages

Purrr map_dfr function now chains the following functions: * penalty_detection() * add_play_counts() * clean_pbp_dat() * clean_drive_dat() * prep_epa_df_after() * add_betting_cols() * create_epa() * create_wpa_naive()

Move pred_df_before function to clean_pbp_dat mostly, move pred_df_after function outside of the create_epa function, now file consists only of the main create_epa function and the epa_fg_prob update function. Updated documentation, comments, and made the effort to change the process to include end of period/end of half play types (which separate quarters) Removes both map functions from the create_epa function. made the switch from -9 to +8 for the FG model adjustments per discussion regarding switch to using the un-adjusted FG yards_to_goal (done to improve model calibration/input consistency) Now returns the actual probabilities of the predictions as well for pre/post play predictions add more to the roxygen imports to turn them into importFrom's to hopefully eventually reduce conflicts

mainly adding documentation, creating a few copies of variables to examine before/after transformations. Add Penalties to skips for play counts (game/drive/half)

helpers for all the pbp cleaning functions and documentation

version moved to 1.0.3

* switch epa_fg_prob to proper missed_fg_pred accounting. * add return for fg_make_prob for field goal attempts * add ep_before as lag_ep_after for play_type = "Timeout" * modify home_EPA to calculate using .data$pos_team = .data$home instead of .data$offense_play

define a bunch of pos_team lag/lead columns. Mostly moving stuff around to define things at exactly the same time. Add punt return fumble and kickoff fumble play_type correction. not su

switch to using pos_team, do some end of period skippin

update models... this will get updated again in a bit, but for checking, should suffice

add some additional wrapping to account for there possibly being one fg attempt (which saves as a 7 row dataframe, rather than 1 row x 7 col) and the zero fg attempt case just adding the NA variable for fg_make_prob

ep_model calibration error : 0.01150 wp_model calibration error: 0.00787

1) def_td_play and off_td_play changed to offense_score_play and defense_score_play 2) Penallty (Safety), Punt Team Fumble Recovery/Touchdown, Kickoff Team Fumble Recovery/Touchdown, Punt (Safety) added as play types 3) fg_made added 4) new_drive_result renamed to drive_result_detailed, drive_result2 uses the same method of determining drives as drive_result_detailed, but labels them in similar style to the API 5) Catching uncategorized end of period plays `add_play_counts()` 6) Definition of change_of_pos_team changed to have `lead_play_type` == "End Period" instead of `play_type`, this is important since a change of possession at the end of a period stays would not otherwise register since for end of period plays, offense_play/defense_play is repeated until the next period's first event. So we were always trying to check for the lead_play_type being a period end, this is the appropriate transformation, analogous to wpa_base_nxt and wpa_change_nxt definitions. 7) Add blocked field goal touchdowns play_text td check and fix. 8) Add better definitions for receives_2H_kickoff, pos_score_diff, pos_score_diff_start. Add additional lags for two prior for each of our heavily relied upon variables. 9) pos_team_timeouts_rem modified/renamed to pos_team_timeouts_rem_before/def_* 10) lead_down and lead_yards_to_goal used for some new_down and new_yardline conditions: * new_down penalty case, use lead_down * new_yardline for lead_yards_to_goal for the following play_types - "Blocked Punt", "Punt" - "Blocked Field Goal" - "Fumble Recovery (Opponent)" - "Field Goal Missed" - "Missed Field Goal Return" - "Fumble Recovery (Own)" - "Interception Return" - "Kickoff" - "Punt Team Fumble Recovery" 11) add new_pos_score_diff_start defintion

new models for wp and ep

1) switch fully to pos_score_diff_start 2) new kick needed to be (new_kick["adj_TimeSecsRem"] + 1)

1) structural shift to bring the join of the ep_before/ep_after variables to just prior to the EPA calculation. The reasoning behind this is that before they are joined, the "End Period" and "End of Half" play_types have been filtered out, allowing for much easier lag/lead conditions. 2) add play_type to both initial select statements and additionally selecting turnover for the ep_before calculations. 3) fully switch to pos_score_diff_start

order select

new_yardline condition for `Goal_to_Go` was missing the subtraction for yards_gained. i.e. it was set to `yards_to_goal` rather than `yards_to_goal` - `yards_gained`

mhm, you'll see why

update docs/format

docs

fixing the cumulative sums/running totals and docs

add arguments for san jose state. add_betting cols to pre-epa_wpa argument. i'm too tired to document this. it's everything and it's slow and you all will just have to deal. we're going to direct everyone to the data repo whenever possible

saiemgilani added 30 commits October 5, 2020 17:40

Update cfb_betting_lines.R

6a36925

Remove tryCatch messages

Update cfb_betting_lines.R

12f1e4f

Update cfb_pbp_data.R

bb6e7a6

Purrr map_dfr function now chains the following functions: * penalty_detection() * add_play_counts() * clean_pbp_dat() * clean_drive_dat() * prep_epa_df_after() * add_betting_cols() * create_epa() * create_wpa_naive()

Delete prep_epa_df_before.Rd

fea1b07

Update cfb_betting_lines.R

301969f

Update DESCRIPTION

cab7889

update docs

4afa9c8

Update NAMESPACE

e0aadfa

Create helpers_calc_ep_wp.R

33719fb

update to a lead yards_to_goal for kickoffs

f52dcf0

switch to using data package

5830b0e

update docs

3d8a4a7

Merge branch 'master' into epa-model-update

fac2f89

Update cfb_pbp_data.R

920409b

Update cfb_pbp_data.R

aa7c0d4

mainly adding documentation, creating a few copies of variables to examine before/after transformations. Add Penalties to skips for play counts (game/drive/half)

documentation

e8bbb77

Create helpers_cfb_pbp_data.R

e091a6a

helpers for all the pbp cleaning functions and documentation

helper documentation

1f16d38

Update cfb_pbp_data.R

3fcc7cc

Update helpers_cfb_pbp_data.R

2cddd19

Update helpers_cfb_pbp_data.R

55839af

Update DESCRIPTION

d293ca7

version moved to 1.0.3

Update create_epa.R

3a99c40

* switch epa_fg_prob to proper missed_fg_pred accounting. * add return for fg_make_prob for field goal attempts * add ep_before as lag_ep_after for play_type = "Timeout" * modify home_EPA to calculate using .data$pos_team = .data$home instead of .data$offense_play

Update cfb_pbp_data.R

beed5c4

define a bunch of pos_team lag/lead columns. Mostly moving stuff around to define things at exactly the same time. Add punt return fumble and kickoff fumble play_type correction. not su

Update create_wpa_naive.R

d979cb6

switch to using pos_team, do some end of period skippin

Update sysdata.rda

6740f61

update models... this will get updated again in a bit, but for checking, should suffice

Update documentation

8ddb78e

Update create_epa.R

59e831e

add some additional wrapping to account for there possibly being one fg attempt (which saves as a 7 row dataframe, rather than 1 row x 7 col) and the zero fg attempt case just adding the NA variable for fg_make_prob

Update sysdata.rda

bd77d12

ep_model calibration error : 0.01150 wp_model calibration error: 0.00787

saiemgilani added 28 commits October 16, 2020 11:22

update docs

9623cc3

update docs

1344bad

Update NAMESPACE

897ead0

Update helpers_calc_ep_wp.R

e3cd6d1

Update helpers_cfb_pbp_data.R

2210b15

Update sysdata.rda

9ccb2ec

new models for wp and ep

Update create_wpa_naive.R

4b9d6f6

1) switch fully to pos_score_diff_start 2) new kick needed to be (new_kick["adj_TimeSecsRem"] + 1)

Update cfb_pbp_data.R

062bdfe

order select

Update cfb_pbp_data.R

47fe853

new_yardline condition for `Goal_to_Go` was missing the subtraction for yards_gained. i.e. it was set to `yards_to_goal` rather than `yards_to_goal` - `yards_gained`

Create mj_fsu_ipad.jpg

1a9cf93

mhm, you'll see why

Update docs

233f80c

san jose st

6933ec1

Update create_epa.R

c642872

Update create_epa.R

2bb10c1

Update helpers_cfb_pbp_data.R

3b28adf

Update NAMESPACE

600d6df

update documentation mostly

a1a1f43

these now have their own helper files for space

fd70142

Update create_wpa_naive.R

e090786

update docs/format

Update create_wpa_betting.R

8b962d7

docs

Update create_epa.R

4a4373e

fixing the cumulative sums/running totals and docs

Update cfb_pbp_data.R

d243b43

add arguments for san jose state. add_betting cols to pre-epa_wpa argument. i'm too tired to document this. it's everything and it's slow and you all will just have to deal. we're going to direct everyone to the data repo whenever possible

incorrect exported function call

62a52aa

Create animated_wp.gif

12e240c

Update Animated-WP-Plotting.Rmd

dc3b3b4

Update cfb_betting_lines.R

7ac9328

saiemgilani merged commit 54b4959 into master Oct 30, 2020

saiemgilani mentioned this pull request Nov 10, 2020

No link provided in "intro tutorial" for R and R studio #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epa model update #12

Epa model update #12

saiemgilani commented Oct 7, 2020

Epa model update #12

Epa model update #12

Conversation

saiemgilani commented Oct 7, 2020