Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate_player_stats_def() no longer errors with small subsets of pbp #415

Merged
merged 3 commits into from Jul 20, 2023

Conversation

mrcaseb
Copy link
Member

@mrcaseb mrcaseb commented Jul 19, 2023

if the pbp subset is small, there could be several columns missing which results in dplyr errors.

This PR fixes that behavior by adding the relevant columns if they are missing

closes #410

@mrcaseb mrcaseb requested a review from tanho63 July 19, 2023 16:11
@mrcaseb
Copy link
Member Author

mrcaseb commented Jul 19, 2023

FYI: I checked the function with various random samples of 2021 pbp. I think it's robust now

pbp <- nflfastR::load_pbp(2021)

pbp %>% 
  dplyr::slice_sample(n = 1) %>% 
  calculate_player_stats_def()

@tanho63
Copy link
Member

tanho63 commented Jul 19, 2023

I solved the problem a bit differently here, what approach do we think is better? https://github.com/nflverse/ngsscrapR/blob/master/R/roster.R#L176

@mrcaseb
Copy link
Member Author

mrcaseb commented Jul 19, 2023

I like both! In terms of the playstats function I prefer having exact control over what I add and avoid additional rows or new NAs at the same time.

@tanho63
Copy link
Member

tanho63 commented Jul 19, 2023

I like the other approach for having better control over vector types (and they’re all length zero so no new rows).

maybe if this accepted named args of empty vectors eg

add_column_if_missing(df, forced_fumble_player = character(), n_fumble_forced = numeric()) ?

@mrcaseb
Copy link
Member Author

mrcaseb commented Jul 19, 2023

Yeah that's easy to implement

@mrcaseb
Copy link
Member Author

mrcaseb commented Jul 19, 2023

I wonder if it makes sense in this case tho. We would exclusively do integer as all of the relevant columns are only counts

@tanho63
Copy link
Member

tanho63 commented Jul 19, 2023

player is an ID that should be character?

@mrcaseb
Copy link
Member Author

mrcaseb commented Jul 19, 2023

I may have missed something. Gotta check later

@mrcaseb
Copy link
Member Author

mrcaseb commented Jul 19, 2023

Double checked and all relevant variables are actually integers

"solo_tackle", 
"tackle_with_assist", 
"tackle_for_loss", 
"assist_tackle",
"forced_fumble_player" # sounds like player ID but in fact is a counter,
"n_sack", 
"n_qb_hit", 
"sack_yards_sack",
"n_interception", 
"n_pass_defense", 
"return_yards_interception",
"fumble_recovery",
"n_penalty", 
"yards_penalty"

tanho63
tanho63 previously approved these changes Jul 19, 2023
@github-actions
Copy link

@mrcaseb mrcaseb merged commit 41b1510 into master Jul 20, 2023
7 checks passed
@mrcaseb mrcaseb deleted the fix-missing-cols-in-def-stats branch July 20, 2023 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants