Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
1 contributor

Users who have contributed to this file

213 lines (168 sloc) 13.2 KB

NFL Data Sets

Here is detailed information on each data set.

Draft Picks

To import and join to nflscrapR (using the Ben Baldwin name field):

draft_picks <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/draft_picks.csv")
# we do a left join here because the names won't always match but don't want to lose any nflscrapR rows
plays <- plays %>%
  left_join(draft_picks,by=c("posteam"="team","name"="name"))

Note this will only match results for the player playing for the team that drafted them. You can remove posteam=team to change this, but the weaker the join, the more likely it results the risk of false positives and duplicate rows for players with names that translate identically once in the NFL play-by-play format, so be careful when you do this! I'd add some additional filtering if you can. For example, if doing a QB analysis, do a filter(position == 'QB') so you're less likely to join to a wrong player.

Data begins with the 2000 season, does not include picks from the supplemental draft, and comes from the excellent Pro Football Reference.

  • season: The season in which the draft occurred. This is the season after the draft, not the one before it, so this would represent the rookie year for drafted players.
  • team: The team that drafted the player. This team may have had this pick originally, or traded for it.
  • round: The round of the draft this pick occurred in.
  • pick: The number of the pick.
  • full_name: The name of the selected player.
  • name: The name of the selected player in the same format as NFL play-by-play data. This will usually match nflscrapR name fields, but not for every player.
  • playerid: The ID of the selected player as used by Pro Football Reference. If NA, the player was not assigned an ID by Pro Football Reference, which normally indicates they never played in an NFL game.
  • side: The side of the ball the player plays on. Can be:
    • For offense, O
    • For defense, D
    • For special teams, S
    • If position is NA, this will be also.
  • category: The category of position the player plays in.
    • For offense, this can be OL, QB, RB, TE, or WR.
    • For defense, this can be DB, DL, ED (edge), or LB. Note that ED is only used from 2019 forward.
    • For special teams, this can be K, KR, LS, P, or ST (generic special teams).
    • If position is NA, this will be also.
  • position: The NFL position the selected player plays as reported by Pro Football Reference. If NA, Pro Football Reference did not record a position.

Draft Values

To import and join to draft pick data from above:

draft_values <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/draft_values.csv")
draft_picks <- draft_picks %>%
  inner_join(draft_values,by=c("pick"="pick"))

Columns:

It's worth noting that the Stuart scale is attempting to measure how teams should value draft picks, while the Johnson and Hill scales are attempting to measure how teams in practice value draft picks, and that these are different questions. The Hill versions is obviously based on more recent data. Also note the systems are using different numerical scales, so you should only compare values within a scale, not compare, say, a Stuart value to a Johnson value (the latter will essentially always be higher).

Games

To import and join to nflscrapR data:

games <- read_csv("http://www.habitatring.com/games.csv")
plays <- plays %>%
  inner_join(games,by=c("game_id"="game_id","away_team"="away_team","home_team"="home_team"))

Data begins with the 2006 NFL season. Does not include preseason.

Columns:

Logos

To import and join to nflscrapR data (for the offense):

logos <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/logos.csv")
plays <- plays %>%
  inner_join(logos,by=c("posteam"="team"))

Columns:

  • team: The team.
  • url: URL of an image where a transparent team logo is located.

This is based off a version done by Michael Lopez, but includes a manual fix for the Tennessee Titans logo (which had a white rather than transparent background on Wikipedia for some reason) and also supports older team abbreviations (SD and STL).

Rosters

To import and join to nflscrapR (using the Ben Baldwin name field):

rosters <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/rosters.csv")
# we do a left join here because the names won't always match but don't want to lose any nflscrapR rows
plays <- plays %>%
  left_join(rosters,by=c("season"="season,"posteam"="team","name"="name"))

Data begins with the 2006 NFL season and comes from Pro Football Reference.

Columns:

  • season: The season the player was on the roster.
  • team: The team whose roster the player was on.
  • full_name: The name of the selected player.
  • name: The name of the selected player in the same format as NFL play-by-play data. This will usually match nflscrapR name fields, but not for every player.
  • playerid: The ID of the selected player as used by Pro Football Reference. If NA, the player was not assigned an ID by Pro Football Reference, which normally indicates they never played in an NFL game.
  • side: The side of the ball the player plays on. Can be:
    • For offense, O
    • For defense, D
    • For special teams, S
    • If position is NA, this will be also.
  • category: The category of position the player plays in.
    • For offense, this can be OL, QB, RB, TE, or WR.
    • For defense, this can be DB, DL, ED (edge), or LB. Note that ED is only used from 2019 forward.
    • For special teams, this can be K, KR, LS, P, or ST (generic special teams).
    • If position is NA, this will be also.
  • position: The NFL position the selected player plays as reported by Pro Football Reference. If NA, Pro Football Reference did not record a position.
  • games: Number of regular season games the player played in that season.
  • starts: Number of regular season games the player started that season.
  • years: Number of prior seasons of NFL experience the player has. Is 0 for rookies.
  • av: The player's Approximate Value that year, as defined by the Pro Football Reference metric

Standings

To import and join to nflscrapR data (for the offense):

standings <- read_csv("http://www.habitatring.com/standings.csv")
plays <- plays %>%
  inner_join(standings,by=c("season"="season","posteam"="team"))

Data begins with the 2002 NFL season.

Columns:

  • season: The year of the NFL season. This reperesents the whole season, so regular season games that happen in January as well as playoff games will occur in the year after this number.
  • conf: The conference the team is in. This will be either AFC or NFC.
  • division: The division the team is in. This will be the value of conf followed by either East, North, South, or West.
  • team: The team.
  • wins: The number of games the team won in the regular season.
  • losses: The number of games the team lost in the regular season.
  • ties: The number of games the team tied in the regular season.
  • pct: The win rate of the team in the regular season. Equals (wins + 0.5 * ties) / (wins + losses + ties).
  • div_rank: This is where this team ranks compared to the other teams in the division based on regular season games only. Will be a number 1-4. If the teams have identical pct values, NFL tiebreakers are applied.
  • scored: The number of points the team has scored in regular season games.
  • allowed: The number of points the team has allowed to be scored on them in regular season games.
  • net: Net points scored in regular season games. Equals scored - allowed.
  • sov: As used in NFL tiebreakers, strength of victory, defined as the combined win rates for teams this team has beaten.
  • sos: As used in NFL tiebreakers, strength of schedule, defined as the combined win rates for teams this team has played.
  • seed: The seed earned by the team in its conference for playoff games. Is NA for teams which do not make the playoffs.
  • playoff: The outcome of the team's playoff run. Is NA for teams which do not make the playoffs, otherwise will be one of LostWC, LostDV, LostCC, LostSB, or WonSB.

Teams

To import:

teams <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/teams.csv")

This data set is designed to help scrape websites for additional NFL information.

Columns:

You can’t perform that action at this time.