# NFL Betting Odds

**Author:** Jakob Malasig
**Last modified:** 12-17-2024 (created: 11-17-2024)

**Description:** I will be scraping data for NFL season (2018 to current) from Fixture Downloads. Will then export it as a csv file for further anaylsis.

<br>

Import libraries:

In [5]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import time
import random

Function to help with web scraping:

In [7]:
# function to scroll from the top to the bottom of the web page
def random_scroll(browser, total_wait_time = 5):
    # get the total height of the page
    total_height = browser.execute_script("return document.body.scrollHeight")
    
    # number of steps to scroll (you can adjust this number)
    scroll_steps = random.randint(3, 10) # randomize how many scroll steps we will use
    
    # calculate the height to scroll on each step
    scroll_increment = total_height // scroll_steps

    # calculate the total time available for scrolling each step
    time_per_step = total_wait_time / scroll_steps
    
    # random scrolling across time
    for step in range(scroll_steps):
        # scroll by the increment (dividing total height by number of steps)
        browser.execute_script(f"window.scrollBy(0, {scroll_increment});")
        
        # random wait time between scrolls to simulate varying speed
        time.sleep(random.uniform(0.5 * time_per_step, 1.5 * time_per_step))  # randomize the wait within a range
        
    # final scroll to make sure you are at the very bottom (in case it didn't exactly match)
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

<br>

Setting up for scraping:

In [10]:
# setup chrome
chrome_options = Options()
services = Service(ChromeDriverManager().install())

# initialize webdriver
browser = webdriver.Chrome(service = Service())

<br>

Scraping data:

In [13]:
# create an empty lists to store the extracted data
dates = []
locations = []
home_teams = []
away_teams = []
results = []

# loop through years 2018-2024
# URL pattern: https://fixturedownload.com/results/nfl-2024

# loop through years 2018-2024
for i in range(18, 25):
    url = f"https://fixturedownload.com/results/nfl-20{i}"

    print(url)

    # navigate to the web page using the URL
    browser.get(url)
    browser.maximize_window()

    # add a random delay before scraping
    total_wait_time = random.uniform(2, 20)
    random_scroll(browser, total_wait_time)
    
    # find the table with a specific style
    table = browser.find_element(By.XPATH, '//table[@style="width:100%;"]')

    # find all rows in the table
    rows = table.find_elements(By.TAG_NAME, "tr")

    # loop through each row to scrape the game data
    for row in rows:
        # Extract the columns (cells) in the row
        columns = row.find_elements(By.TAG_NAME, "td")
        
        if len(columns) > 0:  # Ensure the row has data
            # Extract the date, location, teams, and result
            date = columns[1].text  # Date is in the second column
            location = columns[2].text  # Location is in the third column
            home_team = columns[3].text  # Home team is in the fourth column
            away_team = columns[4].text  # Away team is in the fifth column
            result = columns[5].text  # Result is in the sixth column

            # Append to the lists
            dates.append(date)
            locations.append(location)
            home_teams.append(home_team)
            away_teams.append(away_team)
            results.append(result)
        
    # counter to know bot moves to next page
    print(f"20{i} season done")

# close the automated browser
browser.close()

https://fixturedownload.com/results/nfl-2018
2018 season done
https://fixturedownload.com/results/nfl-2019
2019 season done
https://fixturedownload.com/results/nfl-2020
2020 season done
https://fixturedownload.com/results/nfl-2021
2021 season done
https://fixturedownload.com/results/nfl-2022
2022 season done
https://fixturedownload.com/results/nfl-2023
2023 season done
https://fixturedownload.com/results/nfl-2024
2024 season done


In [14]:
print(len(dates))
print(len(locations))
print(len(home_teams))
print(len(away_teams))
print(len(results))

1919
1919
1919
1919
1919


<br>

Converting to dataframe:

In [17]:
# create a dataframe from data
nfl_data = pd.DataFrame({
    "date": dates,
    "location": locations,
    "home_team": home_teams,
    "away_team": away_teams,
    "result": results,
})

# display the data
display(nfl_data)

Unnamed: 0,date,location,home_team,away_team,result
0,06/09/2018 20:20,Lincoln Financial Field,Philadelphia Eagles,Atlanta Falcons,18 - 12
1,09/09/2018 13:00,FirstEnergy Stadium,Cleveland Browns,Pittsburgh Steelers,21 - 21
2,09/09/2018 13:00,Lucas Oil Stadium,Indianapolis Colts,Cincinnati Bengals,23 - 34
3,09/09/2018 13:00,Hard Rock Stadium,Miami Dolphins,Tennessee Titans,27 - 20
4,09/09/2018 13:00,U.S. Bank Stadium,Minnesota Vikings,San Francisco 49ers,24 - 16
...,...,...,...,...,...
1914,04/01/2025 00:00,MetLife Stadium,New York Jets,Miami Dolphins,-
1915,04/01/2025 00:00,Lincoln Financial Field,Philadelphia Eagles,New York Giants,-
1916,04/01/2025 00:00,Acrisure Stadium,Pittsburgh Steelers,Cincinnati Bengals,-
1917,04/01/2025 00:00,Raymond James Stadium,Tampa Bay Buccaneers,New Orleans Saints,-


<br>

Cleaning data:

In [20]:
# remove records that have no results yet (game not played yet)
nfl_data = nfl_data[nfl_data["result"] != "-"]

In [21]:
# remove the time from the data column
nfl_data["date"] = pd.to_datetime(nfl_data["date"], format = "%d/%m/%Y %H:%M")
nfl_data["date"] = nfl_data["date"].dt.strftime("%m/%d/%Y")

In [22]:
display(nfl_data)

Unnamed: 0,date,location,home_team,away_team,result
0,09/06/2018,Lincoln Financial Field,Philadelphia Eagles,Atlanta Falcons,18 - 12
1,09/09/2018,FirstEnergy Stadium,Cleveland Browns,Pittsburgh Steelers,21 - 21
2,09/09/2018,Lucas Oil Stadium,Indianapolis Colts,Cincinnati Bengals,23 - 34
3,09/09/2018,Hard Rock Stadium,Miami Dolphins,Tennessee Titans,27 - 20
4,09/09/2018,U.S. Bank Stadium,Minnesota Vikings,San Francisco 49ers,24 - 16
...,...,...,...,...,...
1866,12/15/2024,SoFi Stadium,Los Angeles Chargers,Tampa Bay Buccaneers,17 - 40
1867,12/15/2024,Lincoln Financial Field,Philadelphia Eagles,Pittsburgh Steelers,27 - 13
1868,12/16/2024,Lumen Field,Seattle Seahawks,Green Bay Packers,13 - 30
1869,12/17/2024,U.S. Bank Stadium,Minnesota Vikings,Chicago Bears,30 - 12


<br>

Creating new csv file from cleaned data:

In [25]:
nfl_data.to_csv("../../data/clean/clean_fixture_download.csv", encoding = "UTF-8", index = False)