### Exploring Running Back Data (2015 - 2021) ###
#### Webscraping the Data ####
* [Football Reference](https://www.pro-football-reference.com/) - Main Website 
* [Rushing](https://www.pro-football-reference.com/years/2021/rushing.htm) - Wanted Data

Installs the requests library for scraping data

In [1]:
!pip install requests



In [2]:
years = list(range(2015, 2022))

In [3]:
url_start = "https://www.pro-football-reference.com/years/{}/rushing.htm"

In [4]:
import requests

for year in years:
    url = url_start.format(year)
    data = requests.get(url)

    with open("RB/{}.html".format(year), "w+", encoding="utf8") as f:
        f.write(data.text)

- Created a folder called RB
- For loop goes through the list of years in range
- Saves each year as a file in the RB folder

Installs beautiful soup for data parsing

In [5]:
!pip install beautifulsoup4



Installs pandas for exploring data

In [6]:
!pip install pandas



In [7]:
from bs4 import BeautifulSoup
import pandas as pd

with open("RB/2015.html") as f:
    page = f.read()

soup = BeautifulSoup(page, "html.parser")
soup.find('tr', class_='over_header').decompose()
rb_table = soup.find(id="rushing")
rb_2015 = pd.read_html(str(rb_table))[0]
rb_2015


Unnamed: 0,Rk,Player,Tm,Age,Pos,G,GS,Att,Yds,TD,1D,Lng,Y/A,Y/G,Fmb
0,1,Adrian Peterson*+,MIN,30,RB,16,16,327,1485,11,72,80,4.5,92.8,7
1,2,Doug Martin*+,TAM,26,RB,16,16,288,1402,6,68,84,4.9,87.6,5
2,3,Latavius Murray*,OAK,25,RB,16,16,266,1066,6,49,54,4.0,66.6,4
3,4,Devonta Freeman*,ATL,23,RB,15,13,265,1056,11,71,39,4.0,70.4,3
4,5,Frank Gore,IND,32,RB,16,16,260,967,6,48,37,3.7,60.4,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
317,308,Kasen Williams,SEA,23,,2,0,1,5,0,0,5,5.0,2.5,0
318,309,Glenn Winston,CLE,26,,3,0,1,-8,0,0,-8,-8.0,-2.7,1
319,310,Robert Woods,BUF,23,WR,14,9,1,0,0,0,0,0.0,0.0,2
320,311,Charles Woodson*,OAK,39,SS,16,16,1,-3,0,0,-3,-3.0,-0.2,1


In [8]:
dfs = []
for year in years:
    with open("RB/{}.html".format(year)) as f:
        page = f.read()
    soup = BeautifulSoup(page, "html.parser")
    soup.find('tr', class_='over_header').decompose()
    soup.find('tr', class_='thead').decompose()
    rb_table = soup.find(id="rushing")
    rb = pd.read_html(str(rb_table))[0]
    rb["Year"] = year
    
    dfs.append(rb)

In [9]:
rbs = pd.concat(dfs)

Removed a division artifact from webscraping

In [10]:
rbs = rbs[~rbs['Player'].str.contains('Player')]

Exporting to CSV for further cleaning and analysis

In [11]:
rbs.to_csv("rbs.csv")