# NFL Data Scraping Project
This is a notebook that will describe the web scraping process for NFL team statistics

To start we need to figure out what information and where to find that information. For this project I want to analyze how regular season statistics for a team impact their performance during the regular season and in the playoffs.
- To find the regular season statistics I will be using the information provided by the NFL here: **https://www.nfl.com/stats/team-stats/**
- To find the win and loss information for teams I used information from this link: **https://www.teamrankings.com/nfl/trends/win_trends/**

Let's start by gathering the win loss data

### Packages
For this project I will be using selenium and using beautiful soup to conduct web scraping.

In [33]:
from bs4 import BeautifulSoup
from requests_html import HTMLSession
from urllib.parse import urljoin
import requests as r
import pandas as pd

url = "https://www.teamrankings.com/nfl/trends/win_trends/?sc=is_regular_season"
response = r.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
specific_div = soup.find('table') # finding the table with the record of each team
rows = specific_div.find_all('tr') # get the rows of data

# Now that we have the data as a list of rows, we can parse the data to construct a data frame
data = []
for row in rows:
        cells = row.find_all(['td', 'th'])  # 'td' for regular cells, 'th' for header cells
        row_data = [cell.text.strip() for cell in cells] #extract the contents in each cell
        data.append(row_data)
columns = data[0]
df = pd.DataFrame(data[1:], columns=columns)
print(df)


             Team Win-Loss Record  Win %    MOV ATS +/-
0    Philadelphia           8-1-0  88.9%    6.3    +1.2
1     Kansas City           7-2-0  77.8%    7.2    +1.1
2         Detroit           7-2-0  77.8%    4.2    +1.5
3       Baltimore           7-3-0  70.0%   11.3    +6.3
4         Seattle           6-3-0  66.7%   -0.1    -2.0
5           Miami           6-3-0  66.7%    6.7    +3.2
6   San Francisco           6-3-0  66.7%   12.1    +5.4
7       Cleveland           6-3-0  66.7%    4.9    +4.7
8          Dallas           6-3-0  66.7%   11.6    +6.3
9      Pittsburgh           6-3-0  66.7%   -2.9    -2.1
10   Jacksonville           6-3-0  66.7%    0.7    -0.1
11      Minnesota           6-4-0  60.0%    2.4    +3.6
12     Cincinnati           5-4-0  55.6%   -1.1    -3.0
13        Houston           5-4-0  55.6%    2.8    +5.3
14        Buffalo           5-5-0  50.0%    7.8    +1.7
15   Indianapolis           5-5-0  50.0%   -0.6    +1.7
16      Las Vegas           5-5-0  50.0%   -3.3 

We have now parsed the data for 1 year. The next challenge is to change the filters to get postseason data, and then to change the year filter to get data for each year. When inspecting the html code we can see that there a 'div class=filter' which holds the different filter that we can change. When a year filter is changed the url changes. For examples *https://www.teamrankings.com/nfl/trends/win_trends/?sc=is_regular_season&range=yearly_2022&range=yearly_2022*. Using the final parameter "&range=yearly_2022" we can adjust the url to select the year to scrape the data


In [35]:
url = url + "&range=yearly_"
year = 2022
url = url + str(year)
print(url)

https://www.teamrankings.com/nfl/trends/win_trends/?sc=is_regular_season&range=yearly_2022&range=yearly_2022
