[![Build Status](https://travis-ci.org/gVkWY8NJAa/ProFootballRef.svg?branch=master)](https://travis-ci.org/gVkWY8NJAa/ProFootballRef) [![Coverage Status](https://coveralls.io/repos/github/gVkWY8NJAa/ProFootballRef/badge.svg?branch=master)](https://coveralls.io/github/gVkWY8NJAa/ProFootballRef?branch=master)
# ProFootballRef </hr>

This is a python toolkit that lets you scrape statistics from https://www.pro-football-reference.com/, and return the resulting data as a Pandas DataFrame. 

Please consider contributing the $20/yr to support the site, they do a great job: https://www.pro-football-reference.com/my/?do=ad_free_browsing

If you come across a url that is not parsed properly, please open an issue for the corresponding url.

## Contents
* [Installation](#installation)
* [Testing](#testing)
* [Find players](#find_players)
* [Player stats](#player_stats)
    * [Individual player stats](#career_player_stats)
    * [Multiple player stats](#multi_player_stats)
    * [Gamelog](#gamelog)
* [Team stats](#team_stats)
    * [Team offense stats](#team_offense)
    * [Team defense stats](#team_defense)

## Key Features
* Aggregate player data for each season.
* Ability to combine qualitative (height/weight) with quantitative (TDs).
* Multi column headers have been simplified and closly match the canonical source.
* Scrape team stats for a given season.
* Player gamelog data available for a given season.
* Returned objects are Pandas DataFrames for ease of analysis.

<a id='installation'></a>
## Installation
```
git clone git@github.com:gVkWY8NJAa/ProFootballRef.git
cd ProFootballRef
pip install -r requirements.txt
```
<a id='testing'></a>
## Testing
```
cd <path/to/ProFootballRef>
python3.6 -m pytest tests/
```
<a id='find_players'></a>
## Find Players
---
Before we do anything, we need to gather a list of urls for various players to parse.

This is easily done by importing the **GetPositionLinks** module that resides in the 'LinkBuilder' directory:

In [1]:
from profootballref.LinkBuilder import GetPositionLinks

The **GetPositionLinks** module contains a class called **Position**. 

To generate urls to parse, we'll call the **player_links** method from the **Position** class, and save the output to a list.
The **Position** class takes one of five possible arguments:
* passing
* receiving
* rushing
* kicking
* defense

The **player_links** method takes either a season, or a range of seasons, as integers:

In [2]:
urls = GetPositionLinks.Position('passing').player_links(2017)

Or, to grab all players from a range of season do:

In [3]:
urls = GetPositionLinks.Position('passing').player_links(2015, 2017)

If a range of seasons is passed to **player_links**, any duplicate urls will be removed automatically.

We'll look at the first five urls in the list:

In [3]:
print(urls[:5])

['https://www.pro-football-reference.com/players/B/BradTo00.htm', 'https://www.pro-football-reference.com/players/R/RivePh00.htm', 'https://www.pro-football-reference.com/players/M/MannEl00.htm', 'https://www.pro-football-reference.com/players/S/StafMa00.htm', 'https://www.pro-football-reference.com/players/R/RoetBe00.htm']


<a id='player_stats'></a>
## Player stats 
---
The following code demonstrates how to return career position statistics given a player. This is the data that would be found on the [players page](https://www.pro-football-reference.com/players/B/BradTo00.htm).
<a id='career_player_stats'></a>
### Individual player stats
In this example, we will [pass a url as a string for a given player](#find_players) that we created previously to return their career stats for their position. If you do not yet have a list of urls for a given position, see the above section [Find Players](#find_players).

This is easily done by importing the **PlayerParser** module that resides in the 'Parsers' directory:

In [4]:
from profootballref.Parsers import PlayerParser

The **PlayerParser** module contains a class also called **PlayerParser**. 

To scrape a player(s) career stats, we'll call one of five methods from the **PlayerParser** class, and save the output to a variable.

The available methods from the **PlayerParser** class are:
* passing
* receiving
* rushing
* kicking
* defense

Remember we're using the 'urls' data from what we did in the [Find Players](#find_players) section above.

In [5]:
passing_df = PlayerParser.PlayerParser().passing(urls[:1][0])

<a id='multi_player_stats'></a>
### Multiple player stats
**This can generate a ton of traffic to the website so use caution with how many players you retrieve at one time.**

In [7]:
import pandas as pd
from profootballref.LinkBuilder import GetPositionLinks
from profootballref.Parsers import PlayerParser
import random
from importlib import reload

reload(PlayerParser)
# Initialize an empty DataFrame to store all the players
all_qb = pd.DataFrame()
 
# Specify which position and season we want
position = 'passing'

# In this example we'll generate a list of urls multiple seasons
links = GetPositionLinks.Position(position).player_links(2015,2017)

all_qb = pd.DataFrame()
# We will scrape the 10 players at random in the list of links
for player in random.sample(links, 10):
    # pass the url to the position parser
    stats = PlayerParser.PlayerParser().passing(player)
    
    # concat the results with our catch-all dataframe
    all_qb = pd.concat([all_qb, stats], axis=0)

https://www.pro-football-reference.com/players/R/RyanJo21.htm  is not a quarterback we can parse so we're skipping this player
https://www.pro-football-reference.com/players/A/AmenDa00.htm  is not a quarterback we can parse so we're skipping this player
https://www.pro-football-reference.com/players/G/GrayMa00.htm  is not a quarterback we can parse so we're skipping this player
https://www.pro-football-reference.com/players/C/CadeTr00.htm  is not a quarterback we can parse so we're skipping this player
https://www.pro-football-reference.com/players/M/McKiJe00.htm  is not a quarterback we can parse so we're skipping this player


Notice that in the above example, some links output "*url* is not a quarterback we can parse so we're skipping this player". This is due to players other than quarterbacks who have made passes, such as kickers, RBs etc. To avoid this, we ignore the player completely, and do not try to parse the page as if it were a quarterback.

This is intended behavior of all position methods in the **PlayerParser** class.

In addition, RB's and WR's have the same net DataFrame structure. This means that if a WR url gets passed to the **rushing** method instead of **receiving**, the **rushing** method will call the correct position method for the player. This works for all WR, TE, FB, RB positions as they are interchangeable.  

In [8]:
all_qb.groupby(['Name']).sum()

Unnamed: 0_level_0,Year,Age,Height,Weight,DOB_mo,DOB_day,DOB_yr,No.,Cmp,Att,...,Pass_Y/A,AY/A,Y/C,Y/G,Sk,Sk_Yds,NY/A,ANY/A,Sk%,4QC
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
B.J. Daniels,4029,53.0,142,434,20,48,3976,5.0,6,0.0,...,9.0,0.0,12.0,0.3,9.0,8.0,3.0,24.0,0.0,0.0
Blaine Gabbert,16116,204.0,608,1880,80,120,15912,53.0,842,1498.0,...,48.0,43.1,88.3,1215.7,138.0,916.0,40.5,36.4,59.4,7.0
DeShone Kizer,4035,43.0,152,470,2,6,3992,16.0,275,518.0,...,10.6,6.8,20.7,255.2,42.0,264.0,8.43,4.97,16.1,
Josh McCown,32160,496.0,1216,3488,112,64,31664,203.0,1581,2628.0,...,77.8,63.3,134.2,2084.2,235.0,1563.0,61.92,49.48,127.3,6.0
Ryan Tannehill,12088,160.0,456,1242,42,162,11928,102.0,1829,2911.0,...,42.5,40.6,67.3,1376.6,248.0,1885.0,35.32,33.53,48.2,15.0


<a id='gamelog'></a>
### Gamelog
Individual player gamelog stats can be obtained for a player(s) for a given season(s). Descriptive information about the player such as their alma mater, height, weight, etc is also attached to the results.

In [9]:
from profootballref.LinkBuilder import GetPositionLinks 
from profootballref.Parsers import GamelogParser

# gather player urls for a given season
position = 'passing'
season = 2017

urls = GetPositionLinks.Position(position).player_links(season)

# view the first url as a string
urls[:1][0]

'https://www.pro-football-reference.com/players/B/BradTo00.htm'

In [10]:
# pass the url and the season to the passing method in the GameLog class.
GamelogParser.GameLog().passing(urls[:1][0], season)

Unnamed: 0,Date,G#,Week,Age,Home,Opp,Result,GS,Cmp,Att,Cmp%,Yds,TD,Int,Rate,Sk,Yds.1,Y/A,AY/A,Att.1,Yds.2,Y/A.1,TD.1,Fmb,FF,FR,Yds.3,TD.2,PF,PA,Name,Pos,Throws,Height,Weight,DOB_mo,DOB_day,DOB_yr,College
0,2017-09-07,1.0,1.0,40.09589,,KAN,L,True,16,36,44.44,267,0,0,70.0,3,20,7.42,7.42,2,0,0.0,0,0,0,0,0,0,27,42,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
1,2017-09-17,2.0,2.0,40.123288,,NOR,W,True,30,39,76.92,447,3,0,139.6,2,11,11.46,13.0,2,9,4.5,0,0,0,0,0,0,36,20,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
2,2017-09-24,3.0,3.0,40.142466,,HOU,W,True,25,35,71.43,378,5,0,146.2,5,41,10.8,13.66,1,6,6.0,0,3,0,0,0,0,36,33,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
3,2017-10-01,4.0,4.0,40.161644,,CAR,L,True,32,45,71.11,307,2,0,104.6,3,14,6.82,7.71,1,2,2.0,0,0,0,0,0,0,30,33,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
4,2017-10-05,5.0,5.0,40.172603,,TAM,W,True,30,40,75.0,303,1,1,94.1,3,14,7.58,6.95,2,5,2.5,0,1,0,0,0,0,19,14,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
5,2017-10-15,6.0,6.0,40.2,,NYJ,W,True,20,38,52.63,257,2,1,80.7,0,0,6.76,6.63,1,-1,-1.0,0,0,0,0,0,0,24,17,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
6,2017-10-22,7.0,7.0,40.021918,,ATL,W,True,21,29,72.41,249,2,0,121.2,2,8,8.59,9.97,5,5,1.0,0,1,0,1,0,0,23,7,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
7,2017-10-29,8.0,8.0,40.238356,,LAC,W,True,32,47,68.09,333,1,0,95.4,3,16,7.09,7.51,1,2,2.0,0,0,0,0,0,0,21,13,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
8,2017-11-12,9.0,10.0,40.276712,,DEN,W,True,25,34,73.53,266,3,0,125.4,1,6,7.82,9.59,1,0,0.0,0,0,0,0,0,0,41,16,Tom Brady,QB,Right,76,225,8,3,1977,Michigan
9,2017-11-19,10.0,11.0,40.29589,,OAK,W,True,30,37,81.08,340,3,0,132.0,1,8,9.19,10.81,0,0,,0,0,0,0,0,0,33,8,Tom Brady,QB,Right,76,225,8,3,1977,Michigan


The gamelog positions you can choose from are:

**passing()**

**receiving()**

**rushing()**

**defense()**

**kicking()**

Each of these parsers will return a Pandas DataFrame object.

<a id='team_stats'></a>
## Team stats
---
<a id='team_offense'></a>
### Team offense stats
Simply pass a season (year) to the **offense()** method in the **TeamStats()** class

In [11]:
from profootballref.Parsers import TeamStats

year = 2015
df = TeamStats.TeamStats().offense(year)

In [12]:
df.head()

Unnamed: 0,Tm,G,PF,Yds,Ply,Y/P,TO,FL,1stD,Pass_Cmp,Pass_Att,Pass_Yds,Pass_TD,Int,NY/A,Pass_1stD,Rush_Att,Rush_Yds,Rush_TD,Y/A,Rush_1stD,Pen,Pen_Yds,1stPy,Sc%,TO%,EXP
0,Carolina Panthers,16.0,500.0,5871.0,1060.0,5.5,19.0,9.0,357.0,300.0,501.0,3589.0,35.0,10.0,6.7,197.0,526.0,2282.0,19.0,4.3,136.0,103.0,887.0,24.0,42.9,9.6,125.65
1,Arizona Cardinals,16.0,489.0,6533.0,1041.0,6.3,24.0,11.0,373.0,353.0,562.0,4616.0,35.0,13.0,7.8,237.0,452.0,1917.0,16.0,4.2,92.0,94.0,758.0,44.0,42.5,11.8,161.96
2,New England Patriots,16.0,465.0,5991.0,1050.0,5.7,14.0,7.0,348.0,404.0,629.0,4587.0,36.0,7.0,6.9,230.0,383.0,1404.0,14.0,3.7,87.0,96.0,860.0,31.0,43.2,5.7,127.68
3,Pittsburgh Steelers,16.0,423.0,6327.0,1011.0,6.3,28.0,7.0,331.0,391.0,590.0,4603.0,26.0,21.0,7.4,207.0,388.0,1724.0,16.0,4.4,91.0,94.0,868.0,33.0,40.5,13.7,116.15
4,Seattle Seahawks,16.0,423.0,6058.0,1035.0,5.9,16.0,8.0,335.0,333.0,489.0,3790.0,34.0,8.0,7.1,190.0,500.0,2268.0,10.0,4.5,128.0,117.0,1007.0,17.0,42.0,8.6,132.31


<a id='team_defense'></a>
### Team defense stats
Simply pass a season (year) to the **defense()** method in the **TeamStats()** class

In [13]:
from profootballref.Parsers import TeamStats

year = 2015
df = TeamStats.TeamStats().defense(year)

In [14]:
df.head()

Unnamed: 0,Tm,G,PF,Yds,Ply,Y/P,TO,FL,1stD,Pass_Cmp,Pass_Att,Pass_Yds,Pass_TD,Int,NY/A,Pass_1stD,Rush_Att,Rush_Yds,Rush_TD,Y/A,Rush_1stD,Pen,Pen_Yds,1stPy,Sc%,TO%,EXP
0,Seattle Seahawks,16.0,277.0,4668.0,947.0,4.9,23.0,9.0,273.0,333.0,548.0,3364.0,14.0,14.0,5.8,175.0,362.0,1304.0,10.0,3.6,71.0,94.0,795.0,27.0,29.3,13.2,50.54
1,Cincinnati Bengals,16.0,279.0,5453.0,1032.0,5.3,28.0,7.0,307.0,415.0,646.0,3976.0,18.0,21.0,5.8,202.0,344.0,1477.0,8.0,4.3,74.0,116.0,1063.0,31.0,28.9,15.0,24.23
2,Kansas City Chiefs,16.0,287.0,5269.0,1037.0,5.1,29.0,7.0,313.0,349.0,607.0,3698.0,25.0,22.0,5.7,193.0,383.0,1571.0,7.0,4.1,86.0,110.0,941.0,34.0,27.3,15.3,69.97
3,Denver Broncos,16.0,296.0,4530.0,1033.0,4.4,27.0,13.0,289.0,344.0,573.0,3193.0,19.0,14.0,5.1,162.0,408.0,1337.0,10.0,3.3,81.0,104.0,773.0,46.0,26.9,11.9,146.71
4,Minnesota Vikings,16.0,302.0,5510.0,1015.0,5.4,22.0,9.0,318.0,359.0,561.0,3762.0,24.0,13.0,6.2,189.0,411.0,1748.0,7.0,4.3,94.0,109.0,875.0,35.0,33.3,11.9,3.87
