## PGA Draftkings Notebook
Use [PGA Website](https://www.pgatour.com/tournaments/schedule.html) to look up tournament info and fill out first USER INPUT block below. (Fetch/XHR from Network tab, Paylod sub-tab)

Looks like the tournament ID is also in the address bar.

### User Input

In [3]:
# === USER INPUTS ===
# Old Tournament
old_tournament_name = "Truist Championship"
tournament_date = "5/11/2025"
old_course = "The Philadelphia Cricket Club"
tournament_id = "R2025480"

# New Tournament
new_tournament_name = "PGA Championship"
new_course = "Quail Hollow Club"


# === LIBRARIES AND VARIABLES ===
# Import necessary libraries
import requests
import pandas as pd
from datetime import datetime
import sqlite3 as sql
import numpy as np
from numpy import nan
import os
import importlib
import utils.db_utils
from utils.db_utils import TOURNAMENT_NAME_MAP, PLAYER_NAME_MAP

# ===============================

tournament_config = {
    "old": {
        "name": old_tournament_name,
        "date": tournament_date,
        "course": old_course,
        "id": tournament_id
    },
    "new": {
        "name": new_tournament_name,
        "course": new_course,
        "quoted_course": f'"{new_course}"',
        "quoted_name": f"'{new_tournament_name}'"
    }
}

### Update Database

#### Old Tournament

In [7]:
importlib.reload(utils.db_utils)  # Only needed if you're actively editing db_utils.py
from utils.db_utils import update_tournament_results

# Change these each year!!
season = 2025
year = 20250  # Unique GraphQL year distinguishing number in case of multiple per year

# Run the update
db_path = "data/golf.db"  # Or use os.path.join("data", "golf.db")
tournDf = update_tournament_results(tournament_config, db_path, season, year)

# Show just the most recent tournament added for confirmation
from sqlalchemy import create_engine

engine = create_engine(f"sqlite:///{db_path}")

query = f"""
SELECT *
FROM tournaments
WHERE TOURN_ID = '{tournament_config['old']['id']}'
  AND ENDING_DATE = '{datetime.strptime(tournament_config['old']['date'], '%m/%d/%Y').date()}'
"""

recent = pd.read_sql(query, engine)
engine.dispose()
recent.head()

📦 Fetching results for tournament ID R2025480 (Truist Championship), year: 20250
ℹ️ Tournament 'Truist Championship' already exists — no new data inserted.


Unnamed: 0,SEASON,ENDING_DATE,TOURN_ID,TOURNAMENT,COURSE,PLAYER,POS,FINAL_POS,ROUNDS:1,ROUNDS:2,ROUNDS:3,ROUNDS:4,OFFICIAL_MONEY,FEDEX_CUP_POINTS
0,2025,2025-05-11,R2025480,Truist Championship,The Philadelphia Cricket Club,Aaron Rai,T23,23,-5,+2,+1,-4,"$167,142.86",40.0
1,2025,2025-05-11,R2025480,Truist Championship,The Philadelphia Cricket Club,Adam Hadwin,T60,60,1,-1,+2,-1,"$42,500.00",8.0
2,2025,2025-05-11,R2025480,Truist Championship,The Philadelphia Cricket Club,Adam Scott,T34,34,-2,E,E,-2,"$95,062.50",22.656
3,2025,2025-05-11,R2025480,Truist Championship,The Philadelphia Cricket Club,Akshay Bhatia,T46,46,-7,E,+1,4,"$53,600.00",14.3
4,2025,2025-05-11,R2025480,Truist Championship,The Philadelphia Cricket Club,Alex Noren,T51,51,-3,-2,+2,2,"$47,000.00",12.0


#### Stats

In [10]:
importlib.reload(utils.db_utils)
from utils.db_utils import update_season_stats  # <- This line is essential

# Change these each year!!
statsYear = 2025

stats_df = update_season_stats(statsYear, db_path)
stats_df.head()

ℹ️ Stats for season 2025 already exist — no new rows added.


Unnamed: 0,PLAYER,SGTTG_RANK,SG:TTG,SGOTT_RANK,SG:OTT,SGAPR_RANK,SG:APR,SGATG_RANK,SG:ATG,SGP_RANK,...,SCRAMBLING_RANK,SCRAMBLING,OWGR_RANK,OWGR,SEASON,SGTTG,SGOTT,SGAPR,SGATG,SGP


#### Odds
Not usually needed for weekly routine.

**Manual Fix! Odds name cleanup (only needed when joins fail)**

Make sure to update the dictionaries in db_utils.py if new names need to be added.

In [6]:
importlib.reload(utils.db_utils)
from utils.db_utils import clean_odds_names, PLAYER_NAME_MAP, TOURNAMENT_NAME_MAP

db_path = "data/golf.db" 
updated_odds = clean_odds_names(db_path, TOURNAMENT_NAME_MAP, PLAYER_NAME_MAP)
updated_odds.head()

ℹ️ No odds rows required name cleanup.


Unnamed: 0,SEASON,TOURNAMENT,PLAYER,ODDS,VEGAS_ODDS,TOURNAMENT_ORIG,PLAYER_ORIG


**Historical Odds Updates**

Only run this when loading in entire year odds at the start of each year or if corrections need to be made.  This will load in the entire year into the database and update it using the dictionary in db_utils.py.

In [53]:
importlib.reload(utils.db_utils)
from utils.db_utils import import_historical_odds

oddsYear = "2020-2021"    # URL segment
season = 2021             # PGA Tour season
db_path = "data/golf.db"

odds_df = import_historical_odds(oddsYear, season, db_path)
odds_df.head()

✅ Inserting 8841 new rows into odds table...


IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: odds.ENDING_DATE
[SQL: INSERT INTO odds ("SEASON", "TOURNAMENT", "ENDING_DATE", "PLAYER", "ODDS", "VEGAS_ODDS") VALUES (?, ?, ?, ?, ?, ?)]
[parameters: [(2021, 'Safeway Open', '2020-09-13', 'Phil Mickelson', '20/1', 20.0), (2021, 'Safeway Open', '2020-09-13', 'Si Woo Kim', '20/1', 20.0), (2021, 'Safeway Open', '2020-09-13', 'Brendan Steele', '20/1', 20.0), (2021, 'Safeway Open', '2020-09-13', 'Shane Lowry', '25/1', 25.0), (2021, 'Safeway Open', '2020-09-13', 'Sergio Garcia', '30/1', 30.0), (2021, 'Safeway Open', '2020-09-13', 'Jordan Spieth', '30/1', 30.0), (2021, 'Safeway Open', '2020-09-13', 'Erik van Rooyen', '35/1', 35.0), (2021, 'Safeway Open', '2020-09-13', 'Emiliano Grillo', '30/1', 30.0)  ... displaying 10 of 8841 total bound parameter sets ...  (2021, 'Golf Nippon Series JT Cup', '2021-12-05', 'Mikiya Akutsu', '80/1', 80.0), (2021, 'Golf Nippon Series JT Cup', '2021-12-05', 'Tomoyasu Sugiyama', '80/1', 80.0)]]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

## Update Database Structure (Temporary)

In [2]:
import pandas as pd
import sqlite3
from sqlalchemy import create_engine
# importlib.reload(utils.schema)
from utils.schema import stats_table, metadata

db_path = "data/golf.db"
engine = create_engine(f"sqlite:///{db_path}")

# Step 1: Load existing stats table
with sqlite3.connect(db_path) as conn:
    old_df = pd.read_sql("SELECT * FROM stats", conn)

# Rename old colon-based and malformed rank columns to match new schema
stat_column_renames = {
    "SG:TTG": "SGTTG",
    "SG:OTT": "SGOTT",
    "SG:APR": "SGAPR",
    "SG:ATG": "SGATG",
    "SG:P": "SGP",
    "PAR 3": "PAR_3",
    "PAR 4": "PAR_4",
    "PAR 5": "PAR_5",
    "PAR3_RANK": "PAR_3_RANK",
    "PAR4_RANK": "PAR_4_RANK",
    "PAR5_RANK": "PAR_5_RANK"
}

old_df = old_df.rename(columns=stat_column_renames)

# Step 2: Add missing columns if needed
required_columns = [col.name for col in stats_table.columns]
for col in required_columns:
    if col not in old_df.columns:
        old_df[col] = None

# Step 3: Deduplicate and clean
deduped_df = old_df.drop_duplicates(subset=["SEASON", "PLAYER"]).copy()

deduped_df.loc[:, "SEASON"] = deduped_df["SEASON"].astype(int)

# Step 4: Overwrite with new schema
with engine.begin() as conn:
    metadata.drop_all(conn, tables=[stats_table])
    metadata.create_all(conn)
    deduped_df.to_sql("stats", conn, index=False, if_exists="append")

print("✅ Migration complete: 'stats' table updated.")


✅ Migration complete: 'stats' table updated.


In [3]:
import pandas as pd
import sqlite3
from sqlalchemy import create_engine
from utils.schema import tournaments_table, metadata

# Set up database path and SQLAlchemy engine
db_path = "data/golf.db"
engine = create_engine(f"sqlite:///{db_path}")

# Step 1: Load original data from the old tournaments table
with sqlite3.connect(db_path) as conn:
    old_df = pd.read_sql("SELECT * FROM tournaments", conn)

# Step 2: Ensure required columns exist
required_columns = [col.name for col in tournaments_table.columns]
for col in required_columns:
    if col not in old_df.columns:
        old_df[col] = None

# Step 3: Deduplicate using the new composite primary key
deduped_df = old_df.drop_duplicates(subset=["SEASON", "TOURNAMENT", "PLAYER", "ENDING_DATE"]).copy()

# Step 4: Prepare the new table contents
try:
    deduped_df.loc[:, "SEASON"] = deduped_df["SEASON"].astype(int)
    deduped_df.loc[:, "FINAL_POS"] = pd.to_numeric(deduped_df["FINAL_POS"], errors="coerce")
    deduped_df.loc[:, "ENDING_DATE"] = pd.to_datetime(deduped_df["ENDING_DATE"]).dt.date
except Exception as e:
    print("❌ Data transformation failed:", e)
    engine.dispose()
    raise

# Step 5: Only drop/create if transformation succeeds
with engine.begin() as conn:
    metadata.drop_all(conn, tables=[tournaments_table])
    metadata.create_all(conn)
    deduped_df.to_sql("tournaments", conn, index=False, if_exists="append")

# Step 6: Dispose the engine to release file lock
engine.dispose()

print("✅ Migration complete: Data inserted into refactored 'tournaments' table.")


✅ Migration complete: Data inserted into refactored 'tournaments' table.


In [4]:
import pandas as pd
import sqlite3
from sqlalchemy import create_engine
from utils.schema import odds_table, metadata

db_path = "data/golf.db"
engine = create_engine(f"sqlite:///{db_path}")

# Step 1: Load original odds data
with sqlite3.connect(db_path) as conn:
    old_df = pd.read_sql("SELECT * FROM odds", conn)

# Step 2: Confirm all necessary columns exist
required_columns = [col.name for col in odds_table.columns]
for col in required_columns:
    if col not in old_df.columns:
        old_df[col] = None

# Step 3: Drop rows where any primary key columns are missing
clean_df = old_df.dropna(subset=["SEASON", "TOURNAMENT", "PLAYER", "ODDS"]).copy()

# Optional: log dropped rows
dropped = old_df.shape[0] - clean_df.shape[0]
print(f"⚠️ Dropped {dropped} rows with NULLs in SEASON, TOURNAMENT, PLAYER, or ODDS.")

# Step 4: Deduplicate using new composite PK
deduped_df = clean_df.drop_duplicates(subset=["SEASON", "TOURNAMENT", "PLAYER", "ODDS"]).copy()

# Step 5: Ensure correct data types
deduped_df["SEASON"] = deduped_df["SEASON"].astype(int)
deduped_df["VEGAS_ODDS"] = pd.to_numeric(deduped_df["VEGAS_ODDS"], errors="coerce")

# Step 6: Drop and rebuild the odds table
with engine.begin() as conn:
    metadata.drop_all(conn, tables=[odds_table])
    metadata.create_all(conn)
    deduped_df.to_sql("odds", conn, index=False, if_exists="append")

engine.dispose()
print("✅ Migration complete: 'odds' table now keyed on ODDS string value.")



⚠️ Dropped 0 rows with NULLs in SEASON, TOURNAMENT, PLAYER, or ODDS.
✅ Migration complete: 'odds' table now keyed on ODDS string value.


In [5]:
import sqlite3
import pandas as pd

with sqlite3.connect("data/golf.db") as conn:
    new_df = pd.read_sql("SELECT SEASON, TOURNAMENT, PLAYER, ENDING_DATE FROM tournaments", conn)

with sqlite3.connect("data/golf - Copy.db") as conn:
    old_df = pd.read_sql("SELECT SEASON, TOURNAMENT, PLAYER, ENDING_DATE FROM tournaments", conn)

# Do an anti-join to find what got dropped
missing = pd.merge(old_df, new_df, how="left", indicator=True,
                   on=["SEASON", "TOURNAMENT", "PLAYER", "ENDING_DATE"])
missing = missing[missing["_merge"] == "left_only"]
print(f"⚠️ Rows lost during migration: {len(missing)}")
print(missing.head())

⚠️ Rows lost during migration: 0
Empty DataFrame
Columns: [SEASON, TOURNAMENT, PLAYER, ENDING_DATE, _merge]
Index: []


Re-doing odds database with end date

In [57]:
import pandas as pd
import numpy as np
import requests
import re
from datetime import datetime
from io import StringIO

# === USER INPUT ===
oddsYear = "2020-2021"    # URL segment
season = 2021        # PGA Tour season

url = f"http://golfodds.com/archives-{oddsYear}.html"
response = requests.get(url)
tables = pd.read_html(StringIO(response.text))
# raw_df = tables[5]  # the actual table of interest
# Find the largest 2-column table that contains at least some odds-like strings
raw_df = None
for tbl in tables:
    if tbl.shape[1] == 2 and tbl.shape[0] > 50:  # Rough filter
        sample = tbl.iloc[:, 1].astype(str).str.contains(r"\d+/\d+").sum()
        if sample > 5:
            raw_df = tbl
            break

if raw_df is None:
    raise ValueError("❌ Could not find valid odds table on the page.")

# === STEP 1: Initial clean-up ===
df = raw_df.dropna(how="all").reset_index(drop=True)
df.columns = ["PLAYER", "ODDS"]

# 🔧 Clean up non-breaking spaces and extra whitespace
df["PLAYER"] = (
    df["PLAYER"]
    .astype(str)
    .str.replace("\xa0", " ", regex=False)
    .str.replace(r"\s+", " ", regex=True)
    .str.strip()
)

df.insert(loc=0, column="SEASON", value=season)
df.insert(loc=1, column="TOURNAMENT", value=np.nan)
df.insert(loc=2, column="ENDING_DATE", value=np.nan)

# === STEP 2: Helper function for parsing date strings ===
def parse_ending_date(text):
    import re
    from datetime import datetime

    # Normalize whitespace and symbols
    text = (
        text.replace("\u2013", "-")
            .replace("–", "-")
            .replace("\xa0", " ")
    )
    text = re.sub(r"\bSept(?!ember)\b", "Sep", text)

    # ✅ Fix typo: "Match" → "March" only when used in a date context
    text = re.sub(r"\bMatch(?=\s+\d{1,2}\s*[-–]\s*\d{1,2},\s*\d{4})", "March", text)

    # Pattern 1: "July 30 - August 2, 2015" or "Oct 29 - Nov 1, 2015"
    match = re.search(r"(\w+)\s\d+\s*-\s*(\w+)\s(\d+),\s(\d{4})", text)
    if match:
        month2, day2, year = match.group(2), match.group(3), match.group(4)
        for fmt in ["%B %d, %Y", "%b %d, %Y"]:
            try:
                return datetime.strptime(f"{month2} {day2}, {year}", fmt).date()
            except ValueError:
                continue

    # Pattern 2: "November 21-24, 2024"
    match = re.search(r"(\w+)\s\d+-\d+,\s(\d{4})", text)
    if match:
        month, year = match.group(1), match.group(2)
        day = re.search(r"(\d+)-(\d+)", text).group(2)
        for fmt in ["%B %d, %Y", "%b %d, %Y"]:
            try:
                return datetime.strptime(f"{month} {day}, {year}", fmt).date()
            except ValueError:
                continue

    # Pattern 3: "Sunday, October 20, 2019"
    try:
        return datetime.strptime(text.strip(), "%A, %B %d, %Y").date()
    except ValueError:
        pass

    # Pattern 4: "October 20, 2019"
    try:
        return datetime.strptime(text.strip(), "%B %d, %Y").date()
    except ValueError:
        pass

    return None

last_tourn_name = None
last_date = None

# === STEP 3: Iterate block by block ===
final_rows = []
i = 0
last_tourn_name = None
last_end_date = None

while i < len(df) - 4:
    player_i = str(df.loc[i, "PLAYER"])
    player_i2 = str(df.loc[i + 2, "PLAYER"])
    player_i3 = str(df.loc[i + 3, "PLAYER"]).lower()

    # Detect start of a new tournament block
    is_header = (
        pd.isna(df.loc[i, "ODDS"]) and
        pd.isna(df.loc[i + 1, "ODDS"]) and (
            re.search(r"\w+\s\d+\s*[-–]\s*(\w+\s)?\d+,\s\d{4}", player_i2) or
            re.search(r"(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday),?\s+\w+\s\d{1,2},\s\d{4}", player_i2)
        )
    )

    if is_header:
        tourn_name = player_i.strip()
        end_date = parse_ending_date(player_i2)

        # Skip cancelled or empty blocks
        if "cancelled" in player_i3:
            print(f"⚠️ Skipping cancelled tournament: {tourn_name} — {end_date}")
            i += 4
            continue

        # Avoid duplicate block processing
        if tourn_name == last_tourn_name and end_date == last_end_date:
            i += 1
            continue

        print(f"📍 Detected: {tourn_name} — Ending: {end_date}")
        last_tourn_name = tourn_name
        last_end_date = end_date
        i += 4  # Skip header lines

        # Collect all player rows until next header block
        while i < len(df) - 2:
            next_i2 = str(df.loc[i + 2, "PLAYER"])
            is_next_header = (
                pd.isna(df.loc[i, "ODDS"]) and
                pd.isna(df.loc[i + 1, "ODDS"]) and (
                    re.search(r"\w+\s\d+\s*[-–]\s*(\w+\s)?\d+,\s\d{4}", next_i2) or
                    re.search(r"(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday),?\s+\w+\s\d{1,2},\s\d{4}", next_i2)
                )
            )
            if is_next_header:
                break

            if pd.notna(df.loc[i, "ODDS"]):
                row = df.loc[i].copy()
                row["TOURNAMENT"] = tourn_name
                row["ENDING_DATE"] = end_date
                final_rows.append(row)
            i += 1
    else:
        i += 1

# === STEP 4: Create cleaned DataFrame ===
clean_df = pd.DataFrame(final_rows)

# ✅ Prevent crash if nothing was parsed
if clean_df.empty or "PLAYER" not in clean_df.columns:
    print(f"⚠️ No valid tournament blocks detected for season {season} ({oddsYear})")
    final_df = pd.DataFrame()  # Safe fallback
else:
    # Remove winner tag
    clean_df["PLAYER"] = clean_df["PLAYER"].str.replace(r"\s\*Winner\*", "", regex=True)

    # Clean odds to numeric
    clean_df["VEGAS_ODDS"] = (
        clean_df["ODDS"]
        .str.replace(",", "")
        .str.extract(r"(\d+)/(\d+)")
        .astype(float)
        .apply(lambda x: x[0] / x[1], axis=1)
    )

    # Final output with source index for debugging
    final_df = clean_df[
        ["SEASON", "TOURNAMENT", "ENDING_DATE", "PLAYER", "ODDS", "VEGAS_ODDS"]
    ].reset_index(drop=True)

    # Drop non-standard team events (e.g., Presidents Cup, Ryder Cup)
    drop_terms = ["Presidents Cup", "Ryder Cup"]
    final_df = final_df[~final_df["TOURNAMENT"].str.contains("|".join(drop_terms), case=False, na=False)]

    display(final_df.head())



📍 Detected: Safeway Open — Ending: 2020-09-13
📍 Detected: US Open — Ending: 2020-09-20
📍 Detected: R & C Championship — Ending: 2020-09-27
📍 Detected: at Big Cedar Lodge - — Ending: 2020-09-22
📍 Detected: Sanderson Farms Champ — Ending: 2020-10-04
📍 Detected: Shriners H for C Open — Ending: 2020-10-11
📍 Detected: The CJ Cup — Ending: 2020-10-18
📍 Detected: ZOZO CHAMPIONSHIP — Ending: 2020-10-25
📍 Detected: Bermuda Championship — Ending: 2020-11-01
📍 Detected: Vivint Houston Open — Ending: 2020-11-08
📍 Detected: The Masters — Ending: 2020-11-15
📍 Detected: The RSM Classic — Ending: 2020-11-22
📍 Detected: Champions for Change — Ending: 2020-11-27
📍 Detected: Mayakoba Golf Classic — Ending: 2020-12-06
📍 Detected: QBE Shootout — Ending: 2020-12-13
📍 Detected: Sentry Tourn of Champions — Ending: 2021-01-10
📍 Detected: Sony Open in Hawaii — Ending: 2021-01-17
📍 Detected: The American Express — Ending: 2021-01-24
📍 Detected: Abu Dhabi HSBC Champ — Ending: 2021-01-24
📍 Detected: Farmers Insura

Unnamed: 0,SEASON,TOURNAMENT,ENDING_DATE,PLAYER,ODDS,VEGAS_ODDS
0,2021,Safeway Open,2020-09-13,Phil Mickelson,20/1,20.0
1,2021,Safeway Open,2020-09-13,Si Woo Kim,20/1,20.0
2,2021,Safeway Open,2020-09-13,Brendan Steele,20/1,20.0
3,2021,Safeway Open,2020-09-13,Shane Lowry,25/1,25.0
4,2021,Safeway Open,2020-09-13,Sergio Garcia,30/1,30.0


In [58]:
from datetime import datetime, date
# ✅ Check for non-date types in ENDING_DATE
non_dates = final_df[~final_df["ENDING_DATE"].apply(lambda x: isinstance(x, date))]

print(f"🧪 Rows with invalid ENDING_DATE values: {len(non_dates)}")
display(non_dates.head(10))



🧪 Rows with invalid ENDING_DATE values: 0


Unnamed: 0,SEASON,TOURNAMENT,ENDING_DATE,PLAYER,ODDS,VEGAS_ODDS


In [59]:
dupes = final_df.duplicated(subset=["SEASON", "TOURNAMENT", "ENDING_DATE", "PLAYER"], keep=False)

print(f"🚨 Duplicate primary keys in final_df: {dupes.sum()}")
display(final_df[dupes].sort_values(by=["SEASON", "TOURNAMENT", "PLAYER"]))

🚨 Duplicate primary keys in final_df: 0


Unnamed: 0,SEASON,TOURNAMENT,ENDING_DATE,PLAYER,ODDS,VEGAS_ODDS


In [135]:
# Search for tournament headers containing the missing names
keywords = ["Quicken Loans", "Paul Lawrie Match Play", "Madeira Islands"]
for idx, row in df.iterrows():
    for keyword in keywords:
        if keyword.lower() in str(row["PLAYER"]).lower():
            print(f"{idx}: {repr(row['PLAYER'])} — ODDS: {row['ODDS']}")

1436: 'Madeira Islands Open' — ODDS: nan
3094: 'Quicken Loans National' — ODDS: nan
3148: 'Paul Lawrie Match Play' — ODDS: nan
3216: 'Madeira Islands Open' — ODDS: nan


In [136]:
# Display rows following the detected header
for keyword in keywords:
    match_idx = df[df["PLAYER"].str.contains(keyword, case=False, na=False)].index
    for idx in match_idx:
        print(f"\n=== Rows around index {idx} ===")
        display(df.loc[idx:idx + 3])



=== Rows around index 3094 ===


Unnamed: 0,SEASON,TOURNAMENT,ENDING_DATE,PLAYER,ODDS
3094,2015,,,Quicken Loans National,
3095,2015,,,"Gainesville, Virginia",
3096,2015,,,"July 30 - August 2, 2015",
3097,2015,,,Odds to Win:,



=== Rows around index 3148 ===


Unnamed: 0,SEASON,TOURNAMENT,ENDING_DATE,PLAYER,ODDS
3148,2015,,,Paul Lawrie Match Play,
3149,2015,,,"Aberdeen, Scotland",
3150,2015,,,"July 30 - August 2, 2015",
3151,2015,,,Odds to Win:,



=== Rows around index 1436 ===


Unnamed: 0,SEASON,TOURNAMENT,ENDING_DATE,PLAYER,ODDS
1436,2015,,,Madeira Islands Open,
1437,2015,,,"Madeira, Portugal",
1438,2015,,,"March 19-22, 2015",
1439,2015,,,Cancelled (weather),



=== Rows around index 3216 ===


Unnamed: 0,SEASON,TOURNAMENT,ENDING_DATE,PLAYER,ODDS
3216,2015,,,Madeira Islands Open,
3217,2015,,,"Madeira, Portugal",
3218,2015,,,"July 30 - August 2, 2015",
3219,2015,,,Odds to Win:,


In [126]:
print(parse_ending_date("July 30 - August 2, 2015")) 

2015-08-02


In [80]:
import pandas as pd

# Set a range that includes the suspicious transition between tournaments
# Adjust these numbers based on where the anomaly seems to occur
start_idx = 9212
end_idx = 9296

print(f"🔍 Inspecting raw PLAYER values from index {start_idx} to {end_idx}...\n")

for i in range(start_idx, end_idx):
    raw_value = df.loc[i, "PLAYER"]
    cleaned_value = raw_value.encode("unicode_escape").decode("utf-8") if isinstance(raw_value, str) else raw_value
    print(f"{i}: {repr(raw_value)}  →  {cleaned_value}")


🔍 Inspecting raw PLAYER values from index 9212 to 9296...

9212: 'Aldrich Potgieter'  →  Aldrich Potgieter
9213: 'Alfredo Garcia-Heredia'  →  Alfredo Garcia-Heredia
9214: 'Cameron John'  →  Cameron John
9215: 'Jack Senior'  →  Jack Senior
9216: 'Cristobal Del Solar'  →  Cristobal Del Solar
9217: 'Ryggs Johnston'  →  Ryggs Johnston
9218: 'Jacob\xa0Skov Olesen'  →  Jacob\xa0Skov Olesen
9219: 'Matthew Southgate'  →  Matthew Southgate
9220: 'Ricardo Gouveia'  →  Ricardo Gouveia
9221: 'Freddy Schott'  →  Freddy Schott
9222: 'Andrea Halvorsen'  →  Andrea Halvorsen
9223: 'Jannik de Bruyn'  →  Jannik de Bruyn
9224: 'Filippo\xa0Celli'  →  Filippo\xa0Celli
9225: 'Ivan Cantero'  →  Ivan Cantero
9226: 'Inhoi Hur'  →  Inhoi Hur
9227: 'Jeong Weon Ko'  →  Jeong Weon Ko
9228: 'Brett Coletta'  →  Brett Coletta
9229: 'Pierre Pineau'  →  Pierre Pineau
9230: 'Quinnton\xa0Croker'  →  Quinnton\xa0Croker
9231: 'Rafa Cabrera Bello'  →  Rafa Cabrera Bello
9232: 'Jack Buchanan'  →  Jack Buchanan
9233: 'Ashun Wu

In [51]:
from sqlalchemy import create_engine
from utils.schema import odds_table, metadata

engine = create_engine(f"sqlite:///data/golf.db")
with engine.begin() as conn:
    metadata.drop_all(conn, tables=[odds_table])
    metadata.create_all(conn)
print("✅ Recreated 'odds' table with updated schema.")


✅ Recreated 'odds' table with updated schema.


## Historical Data

### Pull Relevant Seasons
Do a check to see when this course or tournament have been historically played.

In [16]:
importlib.reload(utils.db_utils)
from utils.db_utils import get_combined_history_seasons

# === USER INPUT ===
seasons = list(range(2016, 2025))  # Adjust as needed
db_path = "data/golf.db"

# Pull course and tournament from config
n_course = tournament_config["new"]["course"]
n_tourn = tournament_config["new"]["name"]

# Fetch relevant history
history_df = get_combined_history_seasons(db_path, course=n_course, tournament=n_tourn, allowed_seasons=seasons)
history_df.head(20)


ℹ️ Found 15 relevant tournaments from course or tournament name.


Unnamed: 0,SEASON,COURSE,TOURN_ID,TOURNAMENT,ENDING_DATE
1088,2016,Quail Hollow Club,480,Wells Fargo Championship,2016-05-08
1556,2016,Baltusrol GC,033,PGA Championship,2016-07-31
1400,2017,Quail Hollow Club,033,PGA Championship,2017-08-13
311,2018,Quail Hollow Club,480,Wells Fargo Championship,2018-05-06
933,2018,Bellerive CC,033,PGA Championship,2018-08-12
155,2019,Quail Hollow Club,480,Wells Fargo Championship,2019-05-05
777,2019,Bethpage State Park BK Course,033,PGA Championship,2019-05-19
622,2020,TPC Harding Park,033,PGA Championship,2020-08-09
0,2021,Quail Hollow Club,480,Wells Fargo Championship,2021-05-09
467,2021,Ocean Course at Kiawah Island,033,PGA Championship,2021-05-23


### Cut Percentage and FedEx Points
Use a rolling-window approach to look at the most recent cut percentage and how many FedEx cup points have been accumulated recently. This will intentionally not match the PGA Tour stats that start over every year, but will have the same amount of data all the time.  We also add a new feature called Form Density which divides the FedEx Cup Points by the Total Events.

In [35]:
importlib.reload(utils.db_utils)
from utils.db_utils import get_cut_and_fedex_history

cuts = get_cut_and_fedex_history("data/golf.db", history_df, window_months=9)
# cuts["2024-05-12"].head(20)

for end_date, df in cuts.items():
    print(f"\n📆 {end_date} — {df['TOURNAMENT'].iloc[0]} ({len(df)} players)")
    display(df.head(5))


📆 2016-05-08 — Wells Fargo Championship (468 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
211,Jason Day,9,8,3408.83,88.9,378.76,2016-05-08,Wells Fargo Championship
245,Jordan Spieth,9,8,1554.73,88.9,172.75,2016-05-08,Wells Fargo Championship
122,Daniel Berger,12,7,1525.17,58.3,127.1,2016-05-08,Wells Fargo Championship
48,Brandt Snedeker,16,13,1379.4,81.2,86.21,2016-05-08,Wells Fargo Championship
72,Bubba Watson,8,8,1299.8,100.0,162.48,2016-05-08,Wells Fargo Championship



📆 2016-07-31 — PGA Championship (501 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
149,Dustin Johnson,13,13,2321.06,100.0,178.54,2016-07-31,PGA Championship
214,Jason Day,12,11,1708.75,91.7,142.4,2016-07-31,PGA Championship
247,Jordan Spieth,13,11,1697.4,84.6,130.57,2016-07-31,PGA Championship
51,Brandt Snedeker,17,12,1422.85,70.6,83.7,2016-07-31,PGA Championship
262,Kevin Chappell,17,12,1354.0,70.6,79.65,2016-07-31,PGA Championship



📆 2017-08-13 — PGA Championship (504 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
240,Jordan Spieth,15,12,2462.88,80.0,164.19,2017-08-13,PGA Championship
180,Hideki Matsuyama,12,11,1853.81,91.7,154.48,2017-08-13,PGA Championship
93,Charley Hoffman,23,18,1430.03,78.3,62.18,2017-08-13,PGA Championship
65,Brian Harman,21,15,1409.85,71.4,67.14,2017-08-13,PGA Championship
368,Rickie Fowler,11,10,1403.88,90.9,127.63,2017-08-13,PGA Championship



📆 2018-05-06 — Wells Fargo Championship (463 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
267,Marc Leishman,12,10,2734.72,83.3,227.89,2018-05-06,Wells Fargo Championship
334,Rickie Fowler,11,9,2043.17,81.8,185.74,2018-05-06,Wells Fargo Championship
236,Justin Thomas,10,10,1932.38,100.0,193.24,2018-05-06,Wells Fargo Championship
219,Jon Rahm,9,9,1518.89,100.0,168.77,2018-05-06,Wells Fargo Championship
185,Jason Day,8,8,1491.61,100.0,186.45,2018-05-06,Wells Fargo Championship



📆 2018-08-12 — PGA Championship (537 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
152,Dustin Johnson,11,10,1867.12,90.9,169.74,2018-08-12,PGA Championship
219,Jason Day,11,10,1576.98,90.9,143.36,2018-08-12,PGA Championship
270,Justin Thomas,13,12,1533.4,92.3,117.95,2018-08-12,PGA Championship
66,Bryson DeChambeau,18,16,1490.68,88.9,82.82,2018-08-12,PGA Championship
269,Justin Rose,10,10,1362.24,100.0,136.22,2018-08-12,PGA Championship



📆 2019-05-05 — Wells Fargo Championship (443 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
228,Keegan Bradley,10,9,2159.86,90.0,215.99,2019-05-05,Wells Fargo Championship
69,Brooks Koepka,9,8,1904.44,88.9,211.6,2019-05-05,Wells Fargo Championship
277,Matt Kuchar,13,12,1764.41,92.3,135.72,2019-05-05,Wells Fargo Championship
225,Justin Thomas,11,11,1748.77,100.0,158.98,2019-05-05,Wells Fargo Championship
431,Xander Schauffele,10,9,1528.5,90.0,152.85,2019-05-05,Wells Fargo Championship



📆 2019-05-19 — PGA Championship (416 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
212,Keegan Bradley,10,8,2147.25,80.0,214.72,2019-05-19,PGA Championship
259,Matt Kuchar,11,11,1705.08,100.0,155.01,2019-05-19,PGA Championship
404,Xander Schauffele,8,7,1504.25,87.5,188.03,2019-05-19,PGA Championship
207,Justin Rose,6,5,1498.95,83.3,249.82,2019-05-19,PGA Championship
315,Rory McIlroy,7,7,1482.76,100.0,211.82,2019-05-19,PGA Championship



📆 2020-08-09 — PGA Championship (353 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
342,Webb Simpson,8,6,1632.83,75.0,204.1,2020-08-09,PGA Championship
176,Justin Thomas,9,6,1381.0,66.7,153.44,2020-08-09,PGA Championship
84,Daniel Berger,9,8,1165.44,88.9,129.49,2020-08-09,PGA Championship
49,Bryson DeChambeau,9,8,1055.58,88.9,117.29,2020-08-09,PGA Championship
167,Jon Rahm,9,8,1031.5,88.9,114.61,2020-08-09,PGA Championship



📆 2021-05-09 — Wells Fargo Championship (457 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
202,Jon Rahm,12,12,2506.24,100.0,208.85,2021-05-09,Wells Fargo Championship
120,Dustin Johnson,11,10,2340.77,90.9,212.8,2021-05-09,Wells Fargo Championship
152,Hideki Matsuyama,17,14,1704.17,82.4,100.25,2021-05-09,Wells Fargo Championship
59,Bryson DeChambeau,10,9,1664.866,90.0,166.49,2021-05-09,Wells Fargo Championship
219,Justin Thomas,13,12,1648.813,92.3,126.83,2021-05-09,Wells Fargo Championship



📆 2021-05-23 — PGA Championship (442 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
195,Jon Rahm,13,12,2463.24,92.3,189.48,2021-05-23,PGA Championship
115,Dustin Johnson,10,9,2070.77,90.0,207.08,2021-05-23,PGA Championship
147,Hideki Matsuyama,17,14,1677.25,82.4,98.66,2021-05-23,PGA Championship
208,Justin Thomas,13,12,1656.753,92.3,127.44,2021-05-23,PGA Championship
54,Bryson DeChambeau,11,10,1633.066,90.9,148.46,2021-05-23,PGA Championship



📆 2022-05-22 — PGA Championship (444 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
321,Patrick Cantlay,9,8,2978.4,88.9,330.93,2022-05-22,PGA Championship
383,Scottie Scheffler,14,13,2426.4,92.9,173.31,2022-05-22,PGA Championship
402,Sungjae Im,15,13,1836.53,86.7,122.44,2022-05-22,PGA Championship
367,Sam Burns,14,9,1729.68,64.3,123.55,2022-05-22,PGA Championship
350,Rory McIlroy,8,7,1635.68,87.5,204.46,2022-05-22,PGA Championship



📆 2023-05-07 — Wells Fargo Championship (465 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
216,Jon Rahm,14,13,3623.603,92.9,258.83,2023-05-07,Wells Fargo Championship
317,Patrick Cantlay,11,10,2964.212,90.9,269.47,2023-05-07,Wells Fargo Championship
382,Scottie Scheffler,14,13,2417.4,92.9,172.67,2023-05-07,Wells Fargo Championship
447,Will Zalatoris,10,7,2276.898,70.0,227.69,2023-05-07,Wells Fargo Championship
286,Max Homa,14,13,1924.781,92.9,137.48,2023-05-07,Wells Fargo Championship



📆 2023-05-21 — PGA Championship (458 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
212,Jon Rahm,13,12,3275.033,92.3,251.93,2023-05-21,PGA Championship
313,Patrick Cantlay,11,10,2985.012,90.9,271.36,2023-05-21,PGA Championship
376,Scottie Scheffler,14,14,2517.4,100.0,179.81,2023-05-21,PGA Championship
281,Max Homa,14,13,1947.281,92.9,139.09,2023-05-21,PGA Championship
418,Tony Finau,15,14,1601.666,93.3,106.78,2023-05-21,PGA Championship



📆 2024-05-12 — Wells Fargo Championship (415 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
348,Scottie Scheffler,13,13,4987.0,100.0,383.62,2024-05-12,Wells Fargo Championship
233,Ludvig Åberg,14,14,2314.017,100.0,165.29,2024-05-12,Wells Fargo Championship
405,Wyndham Clark,13,11,2105.971,84.6,162.0,2024-05-12,Wells Fargo Championship
406,Xander Schauffele,13,13,1989.833,100.0,153.06,2024-05-12,Wells Fargo Championship
333,Sahith Theegala,16,14,1947.2,87.5,121.7,2024-05-12,Wells Fargo Championship



📆 2024-05-19 — PGA Championship (415 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,CUTS MADE,FEDEX CUP POINTS,CUT PERCENTAGE,form_density,ENDING_DATE,TOURNAMENT
348,Scottie Scheffler,12,12,4895.0,100.0,407.92,2024-05-19,PGA Championship
233,Ludvig Åberg,14,14,2314.017,100.0,165.29,2024-05-19,PGA Championship
406,Xander Schauffele,13,13,2259.833,100.0,173.83,2024-05-19,PGA Championship
405,Wyndham Clark,13,11,2106.021,84.6,162.0,2024-05-19,PGA Championship
317,Rory McIlroy,10,10,1815.304,100.0,181.53,2024-05-19,PGA Championship


### Recent Form

In [37]:
importlib.reload(utils.db_utils)
from utils.db_utils import get_recent_avg_finish

rf = get_recent_avg_finish("data/golf.db", history_df, window_months=9)

# Example preview
for date, df in rf.items():
    print(f"\n📆 {date} — {df['TOURNAMENT'].iloc[0]} ({len(df)} players)")
    display(df.head(5))


📆 2016-05-08 — Wells Fargo Championship (468 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Lee McCoy,1,4.0,5.77,2016-05-08,Wells Fargo Championship
1,Cody Gribble,1,8.0,11.54,2016-05-08,Wells Fargo Championship
2,Henrik Stenson,7,11.6,5.58,2016-05-08,Wells Fargo Championship
3,Jon Rahm,2,12.5,11.38,2016-05-08,Wells Fargo Championship
4,Mackenzie Hughes,1,13.0,18.76,2016-05-08,Wells Fargo Championship



📆 2016-07-31 — PGA Championship (501 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Lee McCoy,1,4.0,5.77,2016-07-31,PGA Championship
1,Tyrrell Hatton,1,5.0,7.21,2016-07-31,PGA Championship
2,Jared du Toit,1,9.0,12.98,2016-07-31,PGA Championship
3,Dustin Johnson,13,10.5,3.98,2016-07-31,PGA Championship
4,Matthew Southgate,1,12.0,17.31,2016-07-31,PGA Championship



📆 2017-08-13 — PGA Championship (504 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Matthew Southgate,1,6.0,8.66,2017-08-13,PGA Championship
1,Keith Mitchell,1,11.0,15.87,2017-08-13,PGA Championship
2,Jonathan Byrd,2,13.0,11.83,2017-08-13,PGA Championship
3,Oscar Fraustro,1,13.0,18.76,2017-08-13,PGA Championship
4,Austin Connelly,1,14.0,20.2,2017-08-13,PGA Championship



📆 2018-05-06 — Wells Fargo Championship (463 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Jordan L Smith,1,9.0,12.98,2018-05-06,Wells Fargo Championship
1,Jason Day,8,11.6,5.28,2018-05-06,Wells Fargo Championship
2,Dustin Johnson,8,13.5,6.14,2018-05-06,Wells Fargo Championship
3,Justin Thomas,10,15.7,6.55,2018-05-06,Wells Fargo Championship
4,Daisuke Kataoka,1,18.0,25.97,2018-05-06,Wells Fargo Championship



📆 2018-08-12 — PGA Championship (537 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Eddie Pepperell,1,6.0,8.66,2018-08-12,PGA Championship
1,Chase Seiffert,1,9.0,12.98,2018-08-12,PGA Championship
2,Justin Rose,10,12.2,5.09,2018-08-12,PGA Championship
3,Sam Horsfield,1,14.0,20.2,2018-08-12,PGA Championship
4,Andres Romero,2,14.0,12.74,2018-08-12,PGA Championship



📆 2019-05-05 — Wells Fargo Championship (443 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Thomas Pieters,1,6.0,8.66,2019-05-05,Wells Fargo Championship
1,Jon Rahm,10,10.5,4.38,2019-05-05,Wells Fargo Championship
2,John Oda,1,11.0,15.87,2019-05-05,Wells Fargo Championship
3,Brett Stegmaier,1,11.0,15.87,2019-05-05,Wells Fargo Championship
4,Rory McIlroy,8,12.1,5.51,2019-05-05,Wells Fargo Championship



📆 2019-05-19 — PGA Championship (416 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Rory McIlroy,7,7.0,3.37,2019-05-19,PGA Championship
1,Jon Rahm,8,10.5,4.78,2019-05-19,PGA Championship
2,John Oda,1,11.0,15.87,2019-05-19,PGA Championship
3,Brett Stegmaier,1,11.0,15.87,2019-05-19,PGA Championship
4,Tiger Woods,4,13.0,8.08,2019-05-19,PGA Championship



📆 2020-08-09 — PGA Championship (353 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Patrick Cantlay,6,18.3,9.4,2020-08-09,PGA Championship
1,Tyrrell Hatton,4,19.3,11.99,2020-08-09,PGA Championship
2,Daniel Berger,9,20.1,8.73,2020-08-09,PGA Championship
3,Rory McIlroy,8,22.0,10.01,2020-08-09,PGA Championship
4,Bryson DeChambeau,9,22.1,9.6,2020-08-09,PGA Championship



📆 2021-05-09 — Wells Fargo Championship (457 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Jon Rahm,12,9.1,3.55,2021-05-09,Wells Fargo Championship
1,Xander Schauffele,12,16.1,6.28,2021-05-09,Wells Fargo Championship
2,Haotong Li,1,17.0,24.53,2021-05-09,Wells Fargo Championship
3,Justin Thomas,13,18.5,7.01,2021-05-09,Wells Fargo Championship
4,Dawie van der Walt,1,20.0,28.85,2021-05-09,Wells Fargo Championship



📆 2021-05-23 — PGA Championship (442 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Bud Cauley,1,14.0,20.2,2021-05-23,PGA Championship
1,Xander Schauffele,12,16.4,6.39,2021-05-23,PGA Championship
2,Jon Rahm,13,16.9,6.4,2021-05-23,PGA Championship
3,Justin Thomas,13,17.7,6.71,2021-05-23,PGA Championship
4,Joaquin Niemann,15,19.7,7.11,2021-05-23,PGA Championship



📆 2022-05-22 — PGA Championship (444 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Taylor Montgomery,1,11.0,15.87,2022-05-22,PGA Championship
1,Haotong Li,1,12.0,17.31,2022-05-22,PGA Championship
2,Justin Thomas,12,13.8,5.38,2022-05-22,PGA Championship
3,Martin Contini,1,16.0,23.08,2022-05-22,PGA Championship
4,Grant Hirschman,1,17.0,24.53,2022-05-22,PGA Championship



📆 2023-05-07 — Wells Fargo Championship (465 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Brooks Koepka,1,2.0,2.89,2023-05-07,Wells Fargo Championship
1,Phil Mickelson,1,2.0,2.89,2023-05-07,Wells Fargo Championship
2,Patrick Reed,1,4.0,5.77,2023-05-07,Wells Fargo Championship
3,Ryo Hisatsune,1,12.0,17.31,2023-05-07,Wells Fargo Championship
4,Joaquin Niemann,3,12.3,8.87,2023-05-07,Wells Fargo Championship



📆 2023-05-21 — PGA Championship (458 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Brooks Koepka,1,2.0,2.89,2023-05-21,PGA Championship
1,Phil Mickelson,1,2.0,2.89,2023-05-21,PGA Championship
2,Patrick Reed,1,4.0,5.77,2023-05-21,PGA Championship
3,Scottie Scheffler,14,8.9,3.29,2023-05-21,PGA Championship
4,Ryo Hisatsune,1,12.0,17.31,2023-05-21,PGA Championship



📆 2024-05-12 — Wells Fargo Championship (415 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Bryson DeChambeau,1,6.0,8.66,2024-05-12,Wells Fargo Championship
1,Cameron Smith,1,6.0,8.66,2024-05-12,Wells Fargo Championship
2,Scottie Scheffler,13,6.2,2.35,2024-05-12,Wells Fargo Championship
3,Patrick Reed,1,12.0,17.31,2024-05-12,Wells Fargo Championship
4,Brett White,1,13.0,18.76,2024-05-12,Wells Fargo Championship



📆 2024-05-19 — PGA Championship (415 players)


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,RECENT FORM,adj_form,ENDING_DATE,TOURNAMENT
0,Scottie Scheffler,12,4.2,1.64,2024-05-19,PGA Championship
1,Bryson DeChambeau,1,6.0,8.66,2024-05-19,PGA Championship
2,Cameron Smith,1,6.0,8.66,2024-05-19,PGA Championship
3,Patrick Reed,1,12.0,17.31,2024-05-19,PGA Championship
4,Brett White,1,13.0,18.76,2024-05-19,PGA Championship


### Course History

In [39]:
importlib.reload(utils.db_utils)
from utils.db_utils import get_course_history

# Filter history_df for only the course we're targeting
target_course = tournament_config["new"]["course"]
course_df = history_df[history_df["COURSE"] == target_course]
course_hist = get_course_history("data/golf.db", course_df)

# View example
for date, df in course_hist.items():
    if not df.empty:
        print(f"\n🏌️‍♂️ Course history for {df['TOURNAMENT'].iloc[0]} on {date}")
        display(df.head(5))


🏌️‍♂️ Course history for Wells Fargo Championship on 2016-05-08


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,COURSE HISTORY,adj_ch,ENDING_DATE,COURSE,TOURNAMENT
0,Aaron Baddeley,1,55.0,79.35,2016-05-08,Quail Hollow Club,Wells Fargo Championship
1,Adam Hadwin,1,90.0,129.84,2016-05-08,Quail Hollow Club,Wells Fargo Championship
2,Adam Scott,1,90.0,129.84,2016-05-08,Quail Hollow Club,Wells Fargo Championship
3,Alex Cejka,1,58.0,83.68,2016-05-08,Quail Hollow Club,Wells Fargo Championship
4,Alex Prugh,1,90.0,129.84,2016-05-08,Quail Hollow Club,Wells Fargo Championship



🏌️‍♂️ Course history for PGA Championship on 2017-08-13


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,COURSE HISTORY,adj_ch,ENDING_DATE,COURSE,TOURNAMENT
0,Aaron Baddeley,1,55.0,79.35,2017-08-13,Quail Hollow Club,PGA Championship
1,Adam Hadwin,2,75.5,68.72,2017-08-13,Quail Hollow Club,PGA Championship
2,Adam Scott,2,53.5,48.7,2017-08-13,Quail Hollow Club,PGA Championship
3,Alex Cejka,2,55.5,50.52,2017-08-13,Quail Hollow Club,PGA Championship
4,Alex Prugh,1,90.0,129.84,2017-08-13,Quail Hollow Club,PGA Championship



🏌️‍♂️ Course history for Wells Fargo Championship on 2018-05-06


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,COURSE HISTORY,adj_ch,ENDING_DATE,COURSE,TOURNAMENT
0,Aaron Baddeley,1,55.0,79.35,2018-05-06,Quail Hollow Club,Wells Fargo Championship
1,Adam Hadwin,3,80.3,57.92,2018-05-06,Quail Hollow Club,Wells Fargo Championship
2,Adam Rainaud,1,90.0,129.84,2018-05-06,Quail Hollow Club,Wells Fargo Championship
3,Adam Scott,3,56.0,40.4,2018-05-06,Quail Hollow Club,Wells Fargo Championship
4,Alex Cejka,2,55.5,50.52,2018-05-06,Quail Hollow Club,Wells Fargo Championship



🏌️‍♂️ Course history for Wells Fargo Championship on 2019-05-05


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,COURSE HISTORY,adj_ch,ENDING_DATE,COURSE,TOURNAMENT
0,Aaron Baddeley,2,72.5,65.99,2019-05-05,Quail Hollow Club,Wells Fargo Championship
1,Aaron Wise,1,2.0,2.89,2019-05-05,Quail Hollow Club,Wells Fargo Championship
2,Abraham Ancer,1,90.0,129.84,2019-05-05,Quail Hollow Club,Wells Fargo Championship
3,Adam Hadwin,4,64.3,39.95,2019-05-05,Quail Hollow Club,Wells Fargo Championship
4,Adam Rainaud,1,90.0,129.84,2019-05-05,Quail Hollow Club,Wells Fargo Championship



🏌️‍♂️ Course history for Wells Fargo Championship on 2021-05-09


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,COURSE HISTORY,adj_ch,ENDING_DATE,COURSE,TOURNAMENT
0,Aaron Baddeley,2,72.5,65.99,2021-05-09,Quail Hollow Club,Wells Fargo Championship
1,Aaron Wise,2,10.0,9.1,2021-05-09,Quail Hollow Club,Wells Fargo Championship
2,Abraham Ancer,1,90.0,129.84,2021-05-09,Quail Hollow Club,Wells Fargo Championship
3,Adam Hadwin,5,59.0,32.93,2021-05-09,Quail Hollow Club,Wells Fargo Championship
4,Adam Long,1,45.0,64.92,2021-05-09,Quail Hollow Club,Wells Fargo Championship



🏌️‍♂️ Course history for Wells Fargo Championship on 2023-05-07


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,COURSE HISTORY,adj_ch,ENDING_DATE,COURSE,TOURNAMENT
0,Aaron Baddeley,1,90.0,129.84,2023-05-07,Quail Hollow Club,Wells Fargo Championship
1,Aaron Wise,3,9.7,7.0,2023-05-07,Quail Hollow Club,Wells Fargo Championship
2,Abraham Ancer,2,46.0,41.87,2023-05-07,Quail Hollow Club,Wells Fargo Championship
3,Adam Hadwin,5,59.0,32.93,2023-05-07,Quail Hollow Club,Wells Fargo Championship
4,Adam Long,2,67.5,61.44,2023-05-07,Quail Hollow Club,Wells Fargo Championship



🏌️‍♂️ Course history for Wells Fargo Championship on 2024-05-12


Unnamed: 0,PLAYER,TOTAL EVENTS PLAYED,COURSE HISTORY,adj_ch,ENDING_DATE,COURSE,TOURNAMENT
0,Aaron Baddeley,1,90.0,129.84,2024-05-12,Quail Hollow Club,Wells Fargo Championship
1,Aaron Rai,1,90.0,129.84,2024-05-12,Quail Hollow Club,Wells Fargo Championship
2,Aaron Wise,3,9.7,7.0,2024-05-12,Quail Hollow Club,Wells Fargo Championship
3,Abraham Ancer,2,46.0,41.87,2024-05-12,Quail Hollow Club,Wells Fargo Championship
4,Adam Hadwin,5,64.8,36.17,2024-05-12,Quail Hollow Club,Wells Fargo Championship
