/
tutorial.Rmd
181 lines (144 loc) · 8.3 KB
/
tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
title: "Using cfbplotR"
output: rmarkdown::html_vignette
author: "Jared Lee <br><a href='https://twitter.com/JaredDLee' target='blank'><img src='https://img.shields.io/twitter/follow/JaredDLee?color=blue&label=%40JaredDLee&logo=twitter&style=for-the-badge' alt='@JaredDLee' /></a> <a href='https://github.com/Kazink36' target='blank'><img src='https://img.shields.io/github/followers/Kazink36?color=eee&logo=Github&style=for-the-badge' alt='@Kazink36' /></a>"
vignette: >
%\VignetteIndexEntry{Using cfbplotR}
%\VignetteEngine{knitr::rmarkdown}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This is a quick tutorial on how to use cfbplotR to quickly and easily include college football team logos in your ggplot.
## Load and Process Data
First we need to load the necessary libraries. [`cfbfastR`](https://cfbfastR.sportsdataverse.org/) will get us the data we want to use for analysis. [`cfbplotR`](https://cfbplotR.sportsdataverse.org/) will help us easily plot the results. `tidyverse` will help us do the necessary data manipulation and of course includes `ggplot2` that we will use for plotting. You can use the commented out code to install these packages if you don't already have them.
```{r libraries, message=FALSE}
#remotes::install_github(repo = "sportsdataverse/cfbfastR")
#remotes::install_github(repo = "sportsdataverse/cfbplotR")
#install.packages(tidyverse)
library(cfbfastR)
library(cfbplotR)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(tidyr)
library(forcats)
```
This first chunk of code will pull the play-by-play data from the first week of the 2021 season using the `cfbfastR` data repo and create the advanced metrics like EPA that we will be plotting (this will take a second). We'll also pull in the general team info so we can filter down to just teams in a power 5 conference.
```{r pulldata}
pbp <- cfbfastR::load_cfb_pbp(2021) %>%
filter(week == 1)
team_info <- cfbfastR::cfbd_team_info(year = 2021)
team_info <- team_info %>%
dplyr::select("team" = "school", "conference", "mascot") %>%
dplyr::filter(conference %in% c("Pac-12","ACC","SEC","Big Ten","Big 12"))
```
Now we quickly roll up the EPA data and find the EPA per rush and EPA per pass for every team in week 1 and take a look at our plotting data.
```{r rollup}
team_plot_data <- pbp %>%
dplyr::group_by(team = offense_play) %>%
dplyr::summarize(rush_epa = mean(if_else(rush == 1,EPA,NA_real_),na.rm = TRUE),
n_rush = sum(rush),
pass_epa = mean(if_else(pass == 1,EPA,NA_real_),na.rm = TRUE),
n_pass = sum(pass)) %>%
dplyr::filter(team %in% team_info$team) %>%
dplyr::left_join(team_info,by = "team")
head(team_plot_data)
```
## Plotting with cfbplotR
Now that the data is prepped, we can being to use `cfbplotR`. First we'll plot all the teams with Passing EPA on the x-axis and Rushing EPA on the y-axis with lines showing the median value for each. It's important to set width or height in `geom_cfb_logos` to small values. The default of 1 will create extremely large logos.
```{r epa_plot}
ggplot(team_plot_data, aes(x = pass_epa, y = rush_epa)) +
geom_median_lines(aes(v_var = pass_epa, h_var = rush_epa)) +
geom_cfb_logos(aes(team = team), width = 0.075) +
labs(x = "EPA per Pass",y = "EPA per Rush") +
theme_bw()
```
This is still pretty messy because of the large number of teams. Let's try to focus in on the Pac-12 teams with a couple of handy tools. We're going to add two columns to our data: one for the color and one for the alpha. Then we just add those two columns as aesthetics to `geom_cfb_logos` to turn the logos of non-Pac-12 teams black and white and lower the alpha.
```{r gray}
team_plot_data %>%
dplyr::mutate(color = if_else(conference == "Pac-12",NA_character_,"b/w"),
alpha = if_else(conference == "Pac-12",1,.6)) %>%
ggplot(aes(x = pass_epa, y = rush_epa)) +
geom_median_lines(aes(v_var = pass_epa, h_var = rush_epa)) +
geom_cfb_logos(aes(team = team, alpha = alpha, color = color), width = 0.075) +
scale_alpha_identity() +
scale_color_identity() +
labs(x = "EPA per Pass",y = "EPA per Rush") +
theme_bw()
```
Finally let's make a bar chart showing the Pac-12 EPA per pass for each team. Because `cfbplotR` creates a custom geom for ggplot, we can use `annotate()` to place a log anywhere we'd like. `scale_color_cfb()` and `scale_fill_cfb()` let us automatically use a teams primary color on a plot. The `alt_colors` argument lets us pass through a vector of team names that we want to use an alternate color for. ~~`sacle_x_cfb()` and `scale_y_cfb()` change the axis labels that are team names into logos. Due to the way ggplot works, you have to add the corresponding theme function `theme_x_cfb()` or `theme_y_cfb()`.~~ `element_cfb_logo()` and `element_cfb_headshot()` can be used for the axis.text argument in the theme function for improved performance in using logos and headshots as axis labels.
```{r bar_chart}
team_plot_data %>%
dplyr::filter(conference == "Pac-12") %>%
dplyr::mutate(team = fct_reorder(team,pass_epa)) %>%
ggplot(aes(x = team, y = pass_epa)) +
#
geom_col(aes(fill = team, color = team),size = 1.5) +
annotate(GeomCFBlogo,x = "California",y = 0.2,team = "Pac-12",height = .35,alpha = .3) +
scale_fill_cfb(alpha = .8) +
scale_color_cfb(alt_colors = team_plot_data$team) +
#scale_x_cfb(size = 18) +
labs(x = "", y = "EPA per Pass") +
theme_bw() +
#theme_x_cfb()
theme(axis.text.x = element_cfb_logo())
```
`cfbplotR` also allows you to plot player headshots. Let's look at the top 10 rushing EPA players with more than 10 rushes for week 1.
```{r headshots}
player_plot_data <- pbp %>%
dplyr::filter(!is.na(rush_player_id)) %>%
dplyr::group_by(rush_player_id) %>%
dplyr::summarize(epa = mean(EPA, na.rm = TRUE),
player_name = first(rusher_player_name),
team = first(pos_team),
n = n()) %>%
dplyr::filter(n >= 10) %>%
dplyr::arrange(dplyr::desc(epa)) %>%
dplyr::slice(1:10)
player_plot_data %>%
dplyr::mutate(team_ordered = fct_reorder(team,epa)) %>%
ggplot(aes(y = team_ordered, x = epa)) +
geom_col(aes(color = team, fill = team)) +
geom_label(aes(label = player_name, x = epa / 2), alpha = .6) +
geom_cfb_headshots(aes(player_id = rush_player_id, x = epa + .1), height = .1) +
scale_color_cfb(alt_colors = valid_team_names()) +
scale_fill_cfb() +
labs(y = "", x = "EPA per Rush") +
#scale_y_cfb(size = 18) +
theme_minimal() +
theme(legend.position = "none",
panel.grid.major.y = element_blank()) +
#theme_y_cfb()
theme(axis.text.y = element_cfb_logo())
```
## Tables with cfbplotR
The `gt` package offers an easy way to create nice tables of data and the `gtExtras` package from Tom Mock provides a number of convenient functions for styling those tables. The `gt_fmt_cfb_logo()` and `gt_fmt_cfb_wordmark()` functions are slightly modified versions of `gtExtras::gt_image_rows()` to easily add team and conference logos or wordmarks based on names from `valid_team_names()`. The `gt_merge_stack_team_color()` function is a slightly modified version of `gtExtras::gt_merge_stack()` that merges two columns together and colors the text of the bottom row with the color of the team referenced in a third column. We can quickly make a table showing the top teams from week 1 by EPA per pass.
```{r fmt_cfb}
library(gt)
team_plot_data %>%
dplyr::transmute(.data$conference,
.data$team,
logo = team,
.data$mascot,
wordmark = team,
pass_epa = round(pass_epa, 2),
.data$n_pass,
rush_epa = round(rush_epa, 2),
.data$n_rush) %>%
dplyr::arrange(desc(pass_epa)) %>%
head(8) %>%
gt() %>%
gt_fmt_cfb_logo(columns = c("conference","logo")) %>%
gt_fmt_cfb_wordmark(columns = "wordmark") %>%
gt_merge_stack_team_color("team", "mascot", "team")
```
We can also use the `gt_fmt_cfb_headshot()` function to add headshots to a gt using the player_id or headshot_url available through `cfbfastR`.
```{r fmt_cfb_headshot}
player_plot_data %>%
tidyr::separate("player_name", into = c("first", "last"), extra = "merge") %>%
dplyr::select("team", "rush_player_id", "first", "last", "n", "epa") %>%
gt() %>%
gt_fmt_cfb_logo("team") %>%
gt_fmt_cfb_headshot("rush_player_id") %>%
gt_merge_stack_team_color("first", "last", "team")
```