## **An Introduction to Automated Sports Journalism**
# AI-Generated NFL Game Summaries & Fantasy Football Recommendations

In [1]:
%pip install openai==0.28
import pandas as pd
import numpy as np
import openai
import json
import os
from transformers import GPT2TokenizerFast
from google.colab import files
from sklearn.metrics import mean_absolute_error



In [11]:
# Save the API key for CLI usage
openai.api_key = personal_key

In [3]:
%%capture
from google.colab import drive
drive.mount('/content/drive')

import sys
sys.path.append('/content/drive/My Drive/')

# Part I: Generating NFL Game Summaries from Game Logs

## Stage I: Fine-Tuning GPT-4.0 on Recent NFL Facts
**Fine-tuning the model on current NFL players, teams, and general fantasy football rules and concepts.**

GPT-4 already has extensive knowledge on the rules and intricacies of football from its prior training, but likely does not have knowledge of some newer/younger players or recent team records and accomplishments, which will be important in writing game summaries later.

In [None]:
# Upload the training file
file_response = openai.File.create(
    file=open("/content/drive/My Drive/all_descriptions.jsonl"),
    purpose="fine-tune"
)

file_id = file_response['id']

In [None]:
# Initiate fine-tuning process
fine_tune_response = openai.FineTuningJob.create(
    training_file=file_id,
    model="gpt-4o-mini-2024-07-18"
)

fine_tune_id = fine_tune_response['id']

In [None]:
# Model ID of the fine-tuned model
fine_tuned_model = "ft:gpt-4o-mini-2024-07-18:personal::AbF9w0Ny"

In [None]:
# Example prompts for testing
test_prompts = [
    "Tell me about Justin Jefferson.",
    "Tell me about rookie Jayden Daniels.",
    "Based on their 2023 performance and results, are the Detroit Lions a good team?",
    "Explain the fantasy football waiver wire.",
    "Who is a better player based on their accolades: Josh Allen or Will Levis?",
    "Who are the reigning Super Bowl Champions?"
]

# Iterate over prompts and get responses
for prompt in test_prompts:
    response = openai.ChatCompletion.create(
        model=fine_tuned_model,
        messages=[
            {"role": "system", "content": "You are a football knowledge assistant. Answer questions based on your training."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=400,
        # Setting temperature low to ensure the model is focused and deterministic, rather than creative
        temperature=0.2
    )
    print(f"Prompt: {prompt}")
    print(f"Response: {response['choices'][0]['message']['content']}\n")

Prompt: Tell me about Justin Jefferson.
Response: Justin Jefferson is a veteran WR for the Minnesota Vikings (MIN). He entered the league in 2020.

Prompt: Tell me about rookie Jayden Daniels.
Response: Jayden Daniels is a rookie QB for the Washington Commanders (WAS). He entered the league in 2024.

Prompt: Based on their 2023 performance and results, are the Detroit Lions a good team?
Response: Yes, the Detroit Lions are a good team. They finished the 2023 season with a record of 12-5 and made the playoffs. Additionally, they won their division. Going into the 2024 season, PFF had them ranked 6th overall.

Prompt: Explain the fantasy football waiver wire.
Response: The fantasy football waiver wire is a system that allows managers to add players who are not currently on a roster in their league. Each week, there is a designated waiver period, typically running from Tuesday to Wednesday, during which managers can submit claims for players. If multiple managers put in claims for the sam

## Stage II: Fine-Tuning GPT-4 on Numeric Reasoning & Statistics Aggregation
Fine-tuning the model to properly aggregate and sum drive-level stats for teams and players to get full-game statlines.

N = 400 training examples.

In [None]:
# Model ID of the fine-tuned model from Stage I
fine_tuned_model = "ft:gpt-4o-mini-2024-07-18:personal::AbF9w0Ny"

# Upload the training file
file_response = openai.File.create(
    file=open("/content/drive/My Drive/stat_reasoning_fine_tuning.jsonl"),
    purpose="fine-tune"
)

file_id = file_response['id']

In [None]:
# Initiate fine-tuning process
fine_tune_response = openai.FineTuningJob.create(
    training_file=file_id,
    model=fine_tuned_model
)

fine_tune_id = fine_tune_response['id']

In [None]:
job_id = fine_tune_response["id"]
status_response = openai.FineTuningJob.retrieve(id=job_id)
print(f"Status: {status_response['status']}")

Status: running


In [None]:
fine_tuned_model = "ft:gpt-4o-mini-2024-07-18:personal::AcHMvCoo"

In [None]:
# Load test data
test_data_path = "/content/drive/My Drive/stat_reasoning_test.jsonl"
with open(test_data_path, "r") as f:
    test_data = [json.loads(line) for line in f]

# Sample test data structure
test_data = [
    {
        "messages": [
            {"role": "system", "content": "You are a sports statistician specializing in NFL football."},
            {"role": "user", "content": "On a drive, MIN had 9 plays, 31 rushing yards, 0 passing yards, scored 0 points, with 0 touchdowns, 0 field goals, and committed 0 turnovers.\nOn a drive, MIN had 8 plays, 1 rushing yards, 59 passing yards, scored 7 points, with 1 touchdowns, 0 field goals, and committed 0 turnovers.\nOn a drive, MIN had 13 plays, 14 rushing yards, 30 passing yards, scored 3 points, with 0 touchdowns, 1 field goals, and committed 0 turnovers.\nWhat was MIN's final statline?"},
            {"role": "assistant", "content": "MIN's final statline: 30 plays, 46 rushing yards, 89 passing yards, 10 points, 1 touchdowns, 1 field goals, and 0 turnovers."}
        ]
    }
]

# Function to evaluate the fine-tuned model
def evaluate_model(test_data, model_id=fine_tuned_model):
    results = []

    for example in test_data:
        # Extract messages
        messages = example["messages"]
        expected_completion = messages[-1]["content"]

        # Generate prediction
        response = openai.ChatCompletion.create(
            model=model_id,
            messages=messages[:-1],
            max_tokens=150,
            temperature=0.3,
        )

        # Extract the model's response
        predicted_completion = response["choices"][0]["message"]["content"].strip()

        # Record results
        results.append({
            "prompt": messages[-2]["content"],
            "expected": expected_completion,
            "predicted": predicted_completion,
        })

    # Convert results to a DataFrame for analysis
    return pd.DataFrame(results)

# Display results
results_df = evaluate_model(test_data, fine_tuned_model)
results_df.head()


Unnamed: 0,prompt,expected,predicted
0,"On a drive, MIN had 9 plays, 31 rushing yards,...","MIN's final statline: 30 plays, 46 rushing yar...","MIN's final statline: 30 plays, 46 rushing yar..."


In [None]:
# Compare predictions with expected completions
results_df["correct"] = results_df["expected"] == results_df["predicted"]
accuracy = results_df["correct"].mean()
print("Accuracy:", accuracy)

Accuracy: 1.0


## Stage III: Fine-Tuning GPT-4 to Write Game Summaries Based on Drive Logs
#### Data Pre-Processing

In [None]:
# Read in training data, which includes drive logs (inputs), and game summaries (targets)
drives = pd.read_csv('/content/drive/My Drive/game_drive_summaries.csv')
drives.head()

Unnamed: 0,game_id,drive_number,offense_team,combined_summary,game_summary,team_summary,player_summary,numeric_summary
0,1,0,Unknown Team,Play Log: Offense Team: Unknown Team. 7-H.Butk...,Headline: Chiefs hold off Ravens 27-20 when re...,"Game 1 Team Summary:\n- BAL: 20 points, 2 touc...",Game 1 Player Summary:\n- 1-X.Worthy: 21.0 rus...,Team Stats:\nGame 1 Team Summary:\n- BAL: 20 p...
1,1,1,BAL,Play Log: Offense Team: BAL. (Shotgun) 22-D.He...,Headline: Chiefs hold off Ravens 27-20 when re...,"Game 1 Team Summary:\n- BAL: 20 points, 2 touc...",Game 1 Player Summary:\n- 1-X.Worthy: 21.0 rus...,Team Stats:\nGame 1 Team Summary:\n- BAL: 20 p...
2,1,2,KC,Play Log: Offense Team: KC. (Shotgun) 15-P.Mah...,Headline: Chiefs hold off Ravens 27-20 when re...,"Game 1 Team Summary:\n- BAL: 20 points, 2 touc...",Game 1 Player Summary:\n- 1-X.Worthy: 21.0 rus...,Team Stats:\nGame 1 Team Summary:\n- BAL: 20 p...
3,1,3,BAL,Play Log: Offense Team: BAL. 8-L.Jackson pass ...,Headline: Chiefs hold off Ravens 27-20 when re...,"Game 1 Team Summary:\n- BAL: 20 points, 2 touc...",Game 1 Player Summary:\n- 1-X.Worthy: 21.0 rus...,Team Stats:\nGame 1 Team Summary:\n- BAL: 20 p...
4,1,4,KC,Play Log: Offense Team: KC. (Shotgun) 15-P.Mah...,Headline: Chiefs hold off Ravens 27-20 when re...,"Game 1 Team Summary:\n- BAL: 20 points, 2 touc...",Game 1 Player Summary:\n- 1-X.Worthy: 21.0 rus...,Team Stats:\nGame 1 Team Summary:\n- BAL: 20 p...


In [None]:
# Remove an "\n" from game summaries and drive summaries
drives['game_summary'] = drives['game_summary'].str.replace("\n", " ", regex=False)
drives['numeric_summary'] = drives['numeric_summary'].str.replace("\n", " ", regex=False)
drives['drive_summary'] = drives['combined_summary'].str.replace("\n", " ", regex=False)


In [None]:
# Create one field of consecutive drive logs for each game
drives = drives.sort_values(by=['game_id', 'drive_number'])
grouped = drives.groupby(['game_id','game_summary','numeric_summary']).agg({
    'drive_summary': lambda x: " ".join(x),
}).reset_index()

# Rename columns for clarity
grouped.rename(columns={'drive_summary': 'drive_summaries'}, inplace=True)
grouped.head()

Unnamed: 0,game_id,game_summary,numeric_summary,drive_summaries
0,1,Headline: Chiefs hold off Ravens 27-20 when re...,Team Stats: Game 1 Team Summary: - BAL: 20 poi...,Play Log: Offense Team: Unknown Team. 7-H.Butk...
1,2,Headline: Montgomery’s 1-yard touchdown run in...,Team Stats: Game 2 Team Summary: - DET: 19 poi...,Play Log: Offense Team: DET. GAME. 39-J.Bates...
2,3,Headline: Barkley scores 3 TDs as Eagles beat ...,Team Stats: Game 3 Team Summary: - GB: 16 poin...,Play Log: Offense Team: PHI. GAME 10-B.Mann ki...
3,4,Headline: Darnold throws 2 TD passes and Van G...,Team Stats: Game 4 Team Summary: - MIN: 21 poi...,Play Log: Offense Team: Unknown Team. GAME 16-...
4,5,Headline: Cook scores 3 TDs to help Bills rout...,Team Stats: Game 5 Team Summary: - BUF: 17 poi...,Play Log: Offense Team: BUF. GAME 2-T.Bass kic...


In [None]:
# Prepare data for fine-tuning
fine_tune_data = []

for _, row in grouped.iterrows():
    # Create a prompt and completion pair
    prompt = (
        "You are a professional sports journalist, specializing in NFL game coverage. "
        "Your task is to write a detailed, factual, and engaging game summary based on the provided play-by-play logs, drive summaries, and team/player statistics. "
        "The summary should highlight key events, scoring plays, pivotal moments, standout player performances, and the flow of the game.\n\n"

        "The game details include:\n"
        "- A play-by-play log of offense teams and events within each drive.\n"
        "- A summary of the key statistics and outcomes for each drive.\n"
        "- Team and player statistics summing up the full game performance.\n\n"

        "Guidelines:\n"
        "- Use only the information explicitly provided in the play logs, drive summaries, and numeric summaries.\n"
        "- Do not add any details or speculate about events not mentioned.\n"
        "- Refer to players by their first initial and last name, as they appear in the logs.\n"
        "- Highlight critical plays, player contributions, and turning points in the game.\n"
        "- Focus on the progression of the game, including scoring sequences, shifts in momentum, and notable performances.\n"
        "- Write in a professional journalistic tone, following conventions used by major outlets like AP or ESPN.\n"
        "- Ensure the summary is readable, concise, and suitable for a general audience.\n"
        "- Only use numbers or statistics explicitly included in the game information provided, do not perform any other calculations.\n\n"

        "Output Requirements:\n"
        "- Include the final score in the headline or opening sentence.\n"
        "- Use paragraphs to organize the summary clearly.\n"
        "- Mention standout individual performances and their contribution to the game's outcome.\n"
        "- Discuss scoring sequences and the game's critical moments in chronological order.\n\n"

        f"Game Details:\n\n"
        f"{row['drive_summaries']}\n\n"
        f"Numeric Summary:\n{row['numeric_summary']}"
    )
    completion = row['game_summary']

    # Format the data for fine-tuning
    fine_tune_data.append({
        "messages": [
            {"role": "system", "content": "You are a professional sports journalist. Write concise, factual, and engaging football game summaries based on the drive summaries, play logs, and numeric summaries provided. Please only use the details provided. Do not make assumptions or invent details."},
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": completion}
        ]
    })

# Save to JSONL format
output_file = "/content/drive/My Drive/drive_game_summaries_with_numeric.jsonl"
with open(output_file, "w") as f:
    for item in fine_tune_data:
        f.write(json.dumps(item) + "\n")

print(f"Data saved to {output_file}")


Data saved to /content/drive/My Drive/drive_game_summaries_with_numeric.jsonl


In [None]:
# Upload the fine-tuning dataset
file_response = openai.File.create(
    file=open("/content/drive/My Drive/drive_game_summaries.jsonl"),
    purpose="fine-tune"
)

file_id = file_response["id"]
print(f"File uploaded. File ID: {file_id}")


File uploaded. File ID: file-8FsF3Re5ZRSC3GoZJYAXxL


In [None]:
# Initiate the fine-tuning process on the previously fine-tuned model
fine_tune_response = openai.FineTuningJob.create(
    training_file=file_id,
    model=fine_tuned_model,
)
print(f"Fine-tuning job created. ID: {fine_tune_response['id']}")

Fine-tuning job created. ID: ftjob-uqstozbk9YAqtCACEwM1OqWx


In [None]:
# Check fine-tuning job status
job_id = fine_tune_response["id"]
status_response = openai.FineTuningJob.retrieve(id=job_id)
print(f"Status: {status_response['status']}")


Status: running


In [4]:
final_model = "ft:gpt-4o-mini-2024-07-18:personal::AcIKE4yF"

## Testing Model & Generating New Game Summaries

In [5]:
final_drives = pd.read_csv('/content/drive/My Drive/all_game_stats.csv')
final_drives.head()

Unnamed: 0,game_id,drive_number,offense_team,combined_summary,numeric_summary
0,2024_01_ARI_BUF,0,BUF,Play Log: Offense Team: BUF. GAME\n2-T.Bass ki...,Team Stats:\nGame 2024_01_ARI_BUF Team Summary...
1,2024_01_ARI_BUF,1,ARI,Play Log: Offense Team: ARI. 6-J.Conner up the...,Team Stats:\nGame 2024_01_ARI_BUF Team Summary...
2,2024_01_ARI_BUF,2,BUF,Play Log: Offense Team: BUF. 4-J.Cook left gua...,Team Stats:\nGame 2024_01_ARI_BUF Team Summary...
3,2024_01_ARI_BUF,3,ARI,Play Log: Offense Team: ARI. 6-J.Conner left e...,Team Stats:\nGame 2024_01_ARI_BUF Team Summary...
4,2024_01_ARI_BUF,4,BUF,Play Log: Offense Team: BUF. (Shotgun) 17-J.Al...,Team Stats:\nGame 2024_01_ARI_BUF Team Summary...


In [6]:
# Remove an "\n" from game summaries and drive summaries
final_drives['drive_summary'] = final_drives['combined_summary'].str.replace("\n", " ", regex=False)
final_drives['numeric_summary'] = final_drives['numeric_summary'].str.replace("\n", " ", regex=False)

# Create one field of consecutive drive logs for each game
final_drives = final_drives.sort_values(by=['game_id', 'drive_number'])
grouped = final_drives.groupby(['game_id','numeric_summary']).agg({
    'drive_summary': lambda x: " ".join(x),
}).reset_index()

# Rename columns for clarity
grouped.rename(columns={'drive_summary': 'drive_summaries'}, inplace=True)
grouped.head()

Unnamed: 0,game_id,numeric_summary,drive_summaries
0,2024_01_ARI_BUF,Team Stats: Game 2024_01_ARI_BUF Team Summary:...,Play Log: Offense Team: BUF. GAME 2-T.Bass kic...
1,2024_01_BAL_KC,Team Stats: Game 2024_01_BAL_KC Team Summary: ...,Play Log: Offense Team: Unknown Team. GAME 7-H...
2,2024_01_CAR_NO,Team Stats: Game 2024_01_CAR_NO Team Summary: ...,Play Log: Offense Team: CAR. GAME 4-E.Pineiro ...
3,2024_01_DAL_CLE,Team Stats: Game 2024_01_DAL_CLE Team Summary:...,Play Log: Offense Team: CLE. GAME 7-D.Hopkins ...
4,2024_01_DEN_SEA,Team Stats: Game 2024_01_DEN_SEA Team Summary:...,Play Log: Offense Team: DEN. GAME 3-W.Lutz kic...


In [16]:
# Select a single game for testing
game_id = "2024_14_GB_DET"

# Ensure the selected game_id exists in the dataset
if game_id in grouped['game_id'].values:
    # Get combined summaries for the selected game
    drive_summaries = "\n\n".join(
        grouped[grouped['game_id'] == game_id]['drive_summaries'].tolist()
    )
    numeric_summary = grouped[grouped['game_id'] == game_id]['numeric_summary'].iloc[0]

    # Create the prompt
    prompt = (
        "You are a professional sports journalist, specializing in NFL game coverage. "
        "Your task is to write a detailed, factual, and engaging game summary based on the provided drive summaries, play-by-play logs, and team/player statistics. "
        "The summary should highlight key events, scoring plays, pivotal moments, standout player performances, and the flow of the game.\n\n"

        "The game details include:\n"
        "- A play-by-play log of offense teams and events within each drive.\n"
        "- Note: These events occur in chronological order and include all relevant information for the individual plays and drives. No other information should be needed to create a summary of the action.\n"
        "- A summary of the key statistics and outcomes for each drive.\n"
        "- Team and player statistics summing up the full game performance.\n\n"

        "Guidelines:\n"
        "- Use only the information explicitly provided in the drive summaries, play logs, and numeric summaries.\n"
        "- Use numeric values exactly as they are presented. Do not perform any calculations or assumptions beyond summing the explicitly provided values.\n"
        "- Do not add any details or speculate about events not mentioned.\n"
        "- Refer to players by their first initial and last name, as they appear in the logs.\n"
        "- Highlight critical plays, player contributions, and turning points in the game.\n"
        "- Focus on the progression of the game, including scoring sequences, shifts in momentum, and notable performances.\n"
        "- Write in a professional journalistic tone, following conventions used by major outlets like AP or ESPN.\n"
        "- Ensure the summary is readable, concise, and suitable for a general audience.\n"
        "- Avoid speculating about player motivations, team strategies, or context outside the game details.\n"
        "- If you are uncertain about a sequence of events or a numeric total, omit it from the summary, do not try to fill gaps.\n"
        "- Only use numbers or statistics explicitly included in the game information provided, do not perform any other calculations.\n\n"

        "Output Requirements:\n"
        "- Start with a headline or opening sentence that includes the final score.\n"
        "- Use paragraphs to organize the summary clearly.\n"
        "- Mention standout individual performances and their contribution to the game's outcome.\n"
        "- Discuss scoring sequences and the game's critical moments in chronological order.\n\n"

        f"Game Details:\n{drive_summaries}\n\n"
        f"Numeric Summary:\n{numeric_summary}"
    )

    # Generate a response
    response = openai.ChatCompletion.create(
        model="ft:gpt-4o-mini-2024-07-18:personal::AcIKE4yF",
        messages=[
            {"role": "system", "content": "You are a professional sports journalist. Write concise, factual, and engaging football game summaries based on the drive summaries, play logs, and numeric summaries provided."},
            {"role": "user", "content": prompt},
        ],
        max_tokens=1000,
        temperature=0.05,
    )

    # Extract the model's response
    game_summary = response['choices'][0]['message']['content']
    print("Generated Game Summary:\n", game_summary)
else:
    print(f"Game ID {game_id} not found in the dataset.")


Generated Game Summary:
 Headline: Jared Goff throws for 263 yards and 3 TDs, Lions beat Packers 34-31 in wild finish. Summary: The Lions led 31-17 in the fourth quarter, but the Packers rallied to tie it with 2 TDs in a span of 3:09. Green Bay took a 31-31 tie on Jacobs’ 4-yard TD run with 11:56 left. The Lions then drove to the Green Bay 14, but Goff was sacked on third down and settled for a field goal. The Packers got the ball back with 4:10 left and drove to the Detroit 6, but Love’s TD pass to Jacobs was nullified by an offensive pass interference call on Christian Watson. Green Bay settled for a 32-yard field goal by Brandon McManus with 2:02 left. The Lions got the ball back and drove to the Green Bay 14, where they faced a fourth-and-1 with 28 seconds left. Detroit opted to kick the field goal, and Jake Bates made a 35-yarder to win it. Notable Performances: Goff completed 30 of 38 passes for 263 yards and three touchdowns. Montgomery ran for 50 yards and a score and caught fi

Summary is numerically accurate, captures the close score, back and forth nature of the game, and game-winning field goal for DET as time expired.

In [17]:
# Select a single game for testing
game_id = "2024_01_BAL_KC"

# Ensure the selected game_id exists in the dataset
if game_id in grouped['game_id'].values:
    # Get combined summaries for the selected game
    drive_summaries = "\n\n".join(
        grouped[grouped['game_id'] == game_id]['drive_summaries'].tolist()
    )
    numeric_summary = grouped[grouped['game_id'] == game_id]['numeric_summary'].iloc[0]

    # Create the prompt
    prompt = (
        "You are a professional sports journalist, specializing in NFL game coverage. "
        "Your task is to write a detailed, factual, and engaging game summary based on the provided drive summaries, play-by-play logs, and team/player statistics. "
        "The summary should highlight key events, scoring plays, pivotal moments, standout player performances, and the flow of the game.\n\n"

        "The game details include:\n"
        "- A play-by-play log of offense teams and events within each drive.\n"
        "- Note: These events occur in chronological order and include all relevant information for the individual plays and drives. No other information should be needed to create a summary of the action.\n"
        "- A summary of the key statistics and outcomes for each drive.\n"
        "- Team and player statistics summing up the full game performance.\n\n"

        "Guidelines:\n"
        "- Use only the information explicitly provided in the drive summaries, play logs, and numeric summaries.\n"
        "- Use numeric values exactly as they are presented. Do not perform any calculations or assumptions beyond summing the explicitly provided values.\n"
        "- Do not add any details or speculate about events not mentioned.\n"
        "- Refer to players by their first initial and last name, as they appear in the logs.\n"
        "- Highlight critical plays, player contributions, and turning points in the game.\n"
        "- Focus on the progression of the game, including scoring sequences, shifts in momentum, and notable performances.\n"
        "- Write in a professional journalistic tone, following conventions used by major outlets like AP or ESPN.\n"
        "- Ensure the summary is readable, concise, and suitable for a general audience.\n"
        "- Avoid speculating about player motivations, team strategies, or context outside the game details.\n"
        "- If you are uncertain about a sequence of events or a numeric total, omit it from the summary, do not try to fill gaps.\n"
        "- Only use numbers or statistics explicitly included in the game information provided, do not perform any other calculations.\n\n"

        "Output Requirements:\n"
        "- Start with a headline or opening sentence that includes the final score.\n"
        "- Use paragraphs to organize the summary clearly.\n"
        "- Mention standout individual performances and their contribution to the game's outcome.\n"
        "- Discuss scoring sequences and the game's critical moments in chronological order.\n\n"

        f"Game Details:\n{drive_summaries}\n\n"
        f"Numeric Summary:\n{numeric_summary}"
    )

    # Generate a response
    response = openai.ChatCompletion.create(
        model="ft:gpt-4o-mini-2024-07-18:personal::AcIKE4yF",
        messages=[
            {"role": "system", "content": "You are a professional sports journalist. Write concise, factual, and engaging football game summaries based on the drive summaries, play logs, and numeric summaries provided."},
            {"role": "user", "content": prompt},
        ],
        max_tokens=1000,
        temperature=0.05,
    )

    # Extract the model's response
    game_summary = response['choices'][0]['message']['content']
    print("Generated Game Summary:\n", game_summary)
else:
    print(f"Game ID {game_id} not found in the dataset.")


Generated Game Summary:
 Headline: Mahomes, Chiefs hold off Jackson, Ravens 27-20. Summary: The Chiefs took the lead for good on Worthy’s 35-yard touchdown run in the fourth quarter. Jackson had a chance to tie it in the final seconds, but his pass to Likely in the end zone was ruled incomplete after a review. Notable Performances: Mahomes threw for 279 yards and a touchdown and ran for 19 yards. Rice had six catches for 102 yards. Jackson ran for 122 yards and a touchdown and threw for 264 yards and a score. Injuries: Chiefs: LB Leo Chenal left the game in the fourth quarter but returned. Ravens: LB Kyle Van Noy left in the third quarter.


In [21]:
# Select a single game for testing
game_id = "2024_13_CHI_DET"

# Ensure the selected game_id exists in the dataset
if game_id in grouped['game_id'].values:
    # Get combined summaries for the selected game
    drive_summaries = "\n\n".join(
        grouped[grouped['game_id'] == game_id]['drive_summaries'].tolist()
    )
    numeric_summary = grouped[grouped['game_id'] == game_id]['numeric_summary'].iloc[0]

    # Create the prompt
    prompt = (
        "You are a professional sports journalist, specializing in NFL game coverage. "
        "Your task is to write a detailed, factual, and engaging game summary based on the provided drive summaries, play-by-play logs, and team/player statistics. "
        "The summary should highlight key events, scoring plays, pivotal moments, standout player performances, and the flow of the game.\n\n"

        "The game details include:\n"
        "- A play-by-play log of offense teams and events within each drive.\n"
        "- Note: These events occur in chronological order and include all relevant information for the individual plays and drives. No other information should be needed to create a summary of the action.\n"
        "- A summary of the key statistics and outcomes for each drive.\n"
        "- Team and player statistics summing up the full game performance.\n\n"

        "Guidelines:\n"
        "- Use only the information explicitly provided in the drive summaries, play logs, and numeric summaries.\n"
        "- Use numeric values exactly as they are presented. Do not perform any calculations or assumptions beyond summing the explicitly provided values.\n"
        "- Do not add any details or speculate about events not mentioned.\n"
        "- Refer to players by their first initial and last name, as they appear in the logs.\n"
        "- Highlight critical plays, player contributions, and turning points in the game.\n"
        "- Focus on the progression of the game, including scoring sequences, shifts in momentum, and notable performances.\n"
        "- Write in a professional journalistic tone, following conventions used by major outlets like AP or ESPN.\n"
        "- Ensure the summary is readable, concise, and suitable for a general audience.\n"
        "- Avoid speculating about player motivations, team strategies, or context outside the game details.\n"
        "- If you are uncertain about a sequence of events or a numeric total, omit it from the summary, do not try to fill gaps.\n"
        "- Only use numbers or statistics explicitly included in the game information provided, do not perform any other calculations.\n\n"

        "Output Requirements:\n"
        "- Start with a headline or opening sentence that includes the final score.\n"
        "- Use paragraphs to organize the summary clearly.\n"
        "- Mention standout individual performances and their contribution to the game's outcome.\n"
        "- Discuss scoring sequences and the game's critical moments in chronological order.\n\n"

        f"Game Details:\n{drive_summaries}\n\n"
        f"Numeric Summary:\n{numeric_summary}"
    )

    # Generate a response
    response = openai.ChatCompletion.create(
        model="ft:gpt-4o-mini-2024-07-18:personal::AcIKE4yF",
        messages=[
            {"role": "system", "content": "You are a professional sports journalist. Write concise, factual, and engaging football game summaries based on the drive summaries, play logs, and numeric summaries provided."},
            {"role": "user", "content": prompt},
        ],
        max_tokens=1000,
        temperature=0.05,
    )

    # Extract the model's response
    game_summary = response['choices'][0]['message']['content']
    print("Generated Game Summary:\n", game_summary)
else:
    print(f"Game ID {game_id} not found in the dataset.")


Generated Game Summary:
 Headline: Goff, LaPorta lead Lions to 23-20 win over Bears in NFC North showdown. Summary: The Lions took a 3-0 lead on their opening drive and never trailed. They led 20-7 at halftime and 23-14 in the fourth quarter. The Bears scored a touchdown on their first possession of the second half, but Detroit's defense held on fourth down at the 1 to preserve the lead. Chicago got the ball back with 3:14 left and drove to the Detroit 13, but Williams was sacked on fourth down to seal the win. Notable Performances: Jared Goff threw for 211 yards and two touchdowns. Sam LaPorta had two touchdown catches. Jayden Reed had a 30-yard reception. David Montgomery ran for 87 yards. Jahmyr Gibbs ran for 86 yards. Moore had six catches for 97 yards. Williams threw for 223 yards and two touchdowns. Injuries: Bears: WR D.J. Moore left in the second quarter but returned. DT Montez Sweat left in the fourth quarter. Lions: DT Levi Onwuzurike left in the fourth quarter.


This summary is also numerically accurate, highlights the Bears' chance to tie or win it at the end, and their failure to do so.

Summary is brief but does capture the general flow of the game and most importantly, the Likely TD that should have forced overtime, but that was ultimately called back.

# Part II: Using the Fine-Tunes Model's Summarizing Capabilities and In-Context Learning to Offer Fantasy Football Recommendations

## Creating a Knowledge Base for In-Context Learning
Generating Summaries for All 2024 Games to Date

In [18]:
# Generate summaries for all games this season
game_summaries = []

for game_id in grouped['game_id'].unique():
    # Get drive summaries and numeric summaries for the game
    drive_summaries = "\n\n".join(
        grouped[grouped['game_id'] == game_id]['drive_summaries'].tolist()
    )
    numeric_summary = grouped[grouped['game_id'] == game_id]['numeric_summary'].iloc[0]

    # Create the prompt
    prompt = (
        "You are a professional sports journalist, specializing in NFL game coverage. "
        "Your task is to write a detailed, factual, and engaging game summary based on the provided drive summaries, play-by-play logs, and team/player statistics. "
        "The summary should highlight key events, scoring plays, pivotal moments, standout player performances, and the flow of the game.\n\n"

        "The game details include:\n"
        "- A play-by-play log of offense teams and events within each drive.\n"
        "- Note: These events occur in chronological order and include all relevant information for the individual plays and drives. No other information should be needed to create a summary of the action.\n"
        "- A summary of the key statistics and outcomes for each drive.\n"
        "- Team and player statistics summing up the full game performance.\n\n"

        "Guidelines:\n"
        "- Use only the information explicitly provided in the drive summaries, play logs, and numeric summaries.\n"
        "- Use numeric values exactly as they are presented. Do not perform any calculations or assumptions beyond summing the explicitly provided values.\n"
        "- Do not add any details or speculate about events not mentioned.\n"
        "- Refer to players by their first initial and last name, as they appear in the logs.\n"
        "- Highlight critical plays, player contributions, and turning points in the game.\n"
        "- Focus on the progression of the game, including scoring sequences, shifts in momentum, and notable performances.\n"
        "- Write in a professional journalistic tone, following conventions used by major outlets like AP or ESPN.\n"
        "- Ensure the summary is readable, concise, and suitable for a general audience.\n"
        "- Avoid speculating about player motivations, team strategies, or context outside the game details.\n"
        "- If you are uncertain about a sequence of events or a numeric total, omit it from the summary, do not try to fill gaps.\n"
        "- Only use numbers or statistics explicitly included in the game information provided, do not perform any other calculations.\n\n"

        "Output Requirements:\n"
        "- Start with a headline or opening sentence that includes the final score.\n"
        "- Use paragraphs to organize the summary clearly.\n"
        "- Mention standout individual performances and their contribution to the game's outcome.\n"
        "- Discuss scoring sequences and the game's critical moments in chronological order.\n\n"

        f"Game Details:\n{drive_summaries}\n\n"
        f"Numeric Summary:\n{numeric_summary}"
    )

    # Generate a response
    response = openai.ChatCompletion.create(
        model="ft:gpt-4o-mini-2024-07-18:personal::AcIKE4yF",
        messages=[
            {"role": "system", "content": "You are a professional sports journalist. Write concise, factual, and engaging football game summaries based on the drive summaries and game statistics provided."},
            {"role": "user", "content": prompt},
        ],
        max_tokens=1000,
        temperature=0.1,
    )

    # Extract the game summary
    game_summary = response['choices'][0]['message']['content']
    game_summaries.append({
        "game_id": game_id,
        "game_summary": game_summary
    })

# Convert to a DataFrame for easier handling
game_summaries_df = pd.DataFrame(game_summaries)


In [19]:
game_summaries_df.head()

Unnamed: 0,game_id,game_summary
0,2024_01_ARI_BUF,"Headline: Allen throws 3 TD passes, runs for 2..."
1,2024_01_BAL_KC,Headline: Patrick Mahomes throws for 279 yards...
2,2024_01_CAR_NO,"Headline: Carr throws 3 TD passes, Saints beat..."
3,2024_01_DAL_CLE,"Headline: Dak Prescott, Tony Pollard lead Cowb..."
4,2024_01_DEN_SEA,Headline: Geno Smith runs for a TD and throws ...


In [20]:
# Create a dictionary to store summaries by game_id
game_knowledge_base = {row['game_id']: row['game_summary'] for _, row in game_summaries_df.iterrows()}

In [30]:
# Add a fantasy football-related prompt
prompt = f"""
      You are a fantasy football assistant. Using the information from recent NFL games, answer the following question:

      Guidelines:
      1. Offensive Focus: Focus on offensive players, particularly quarterbacks (QBs), running backs (RBs), wide receivers (WRs), and tight ends (TEs).
      2. Scoring Criteria: Prioritize players who score the most fantasy points. Fantasy scoring typically rewards:
        - Touchdowns (rushing, passing, receiving).
        - Yardage gains (rushing yards, passing yards, receiving yards).
      3. Consistency Over Time: Consistency in recent performances is critical. Players with reliable production over the last few games are more valuable than those with erratic performances or those who had their best games weeks ago.
      4. Injury Risk: Devalue players with recent injuries, frequent game exits, or ongoing performance limitations due to health concerns.
      5. Upside Potential: Highlight players with increasing opportunities (e.g., higher snap counts, targets, or touches) or those stepping into larger roles due to team changes (e.g., injuries to teammates or lineup changes).
      6. Position-Specific Insights:
        - Quarterbacks: Look at total passing yards, touchdowns, and rushing contributions.
        - Running Backs: Focus on total yardage (rushing + receiving), goal-line opportunities, and snap count.
        - Wide Receivers/Tight Ends: Evaluate targets, receptions, total receiving yards, and red-zone usage.
      7. Additional Considerations:
        - Matchup Dependency: Players who consistently perform well regardless of opposing defenses are more valuable than those dependent on favorable matchups.
        - Trends: Highlight players showing upward trends (e.g., rookie breakouts, post-injury improvements, or new starting roles).
        - Team Context: Consider offensive scheme and team dynamics. High-scoring offenses typically produce better fantasy players.
      8. Question Responses:
        - If options are provided, only choose from the available options. Do not consider options outside the explicitly mentioned ones.
        - In one sentence, explain your reasoning for the response. If possible, use statistics and numeric data to back up your choice.

      Game Summaries:
      {game_knowledge_base}

      Question: The following wide receivers are available for waiver wire pickup: J.Chase, T.Boyd, M.Williams
      Based on performance, which receiver should I pick up?
      """

# Generate a response
response = openai.ChatCompletion.create(
    model="ft:gpt-4o-mini-2024-07-18:personal::AcIKE4yF",
    messages=[
        {"role": "system", "content": "You are a fantasy football assistant. Provide accurate, data-driven answers based on recent game summaries."},
        {"role": "user", "content": prompt},
    ],
    max_tokens=100,
    temperature=0.05,
)

# Extract the model's response
fantasy_advice = response['choices'][0]['message']['content']
print("Fantasy Football Advice:\n", fantasy_advice)


Fantasy Football Advice:
 You should pick up Ja'Marr Chase. He had a huge performance in Week 14, catching 11 passes for 264 yards and two touchdowns.


The expected response is **J.Chase**. He is the most talented and high-producing receiver of those mentioned.

In [29]:
# Define a reusable prompt template
def generate_fantasy_prompt(question, game_knowledge_base):
    return f"""
    You are a fantasy football assistant. Using the information from recent NFL games, answer the following question:

    Guidelines:
    1. Offensive Focus: Focus on offensive players, particularly quarterbacks (QBs), running backs (RBs), wide receivers (WRs), and tight ends (TEs).
    2. Scoring Criteria: Prioritize players who score the most fantasy points. Fantasy scoring typically rewards:
      - Touchdowns (rushing, passing, receiving).
      - Yardage gains (rushing yards, passing yards, receiving yards).
    3. Consistency Over Time: Consistency in recent performances is critical. Players with reliable production over the last few games are more valuable than those with erratic performances or those who had their best games weeks ago.
    4. Injury Risk: Devalue players with recent injuries, frequent game exits, or ongoing performance limitations due to health concerns.
    5. Upside Potential: Highlight players with increasing opportunities (e.g., higher snap counts, targets, or touches) or those stepping into larger roles due to team changes (e.g., injuries to teammates or lineup changes).
    6. Position-Specific Insights:
      - Quarterbacks: Look at total passing yards, touchdowns, and rushing contributions.
      - Running Backs: Focus on total yardage (rushing + receiving), goal-line opportunities, and snap count.
      - Wide Receivers/Tight Ends: Evaluate targets, receptions, total receiving yards, and red-zone usage.
    7. Additional Considerations:
      - Matchup Dependency: Players who consistently perform well regardless of opposing defenses are more valuable than those dependent on favorable matchups.
      - Trends: Highlight players showing upward trends (e.g., rookie breakouts, post-injury improvements, or new starting roles).
      - Team Context: Consider offensive scheme and team dynamics. High-scoring offenses typically produce better fantasy players.
    8. Question Responses:
      - If options are provided, only choose from the available options. Do not consider options outside the explicitly mentioned ones.
      - In one sentence, explain your reasoning for the response. If possible, use statistics and numeric data to back up your choice.

    Game Summaries:
    {game_knowledge_base}

    Question: {question}
    """


In [31]:
# Example usage
game_knowledge_base = game_knowledge_base
question = "Assume every QB is available. Who should I draft to my fantasy team to give me the best chance of winning?"
prompt = generate_fantasy_prompt(question, game_knowledge_base)

# Generate a response
response = openai.ChatCompletion.create(
    model="ft:gpt-4o-mini-2024-07-18:personal::AcIKE4yF",
    messages=[
        {"role": "system", "content": "You are a fantasy football assistant. Provide accurate, data-driven answers based on recent game summaries."},
        {"role": "user", "content": prompt},
    ],
    max_tokens=100,
    temperature=0.05,
)

# Extract the model's response
fantasy_advice = response['choices'][0]['message']['content']
print("Fantasy Football Advice:\n", fantasy_advice)

Fantasy Football Advice:
 To give you the best chance of winning, you should draft Joe Burrow. He has been on fire lately, throwing for 421 yards and 4 touchdowns in a win over the Ravens. He also threw for 242 yards and 5 touchdowns in a win over the Raiders. He has a great matchup against the Commanders in Week 14.


In [32]:
# Example usage
game_knowledge_base = game_knowledge_base
question = "I am looking to choose a new defense for my fantasy team. I would like to take a defense that holds opponents to the fewest possible points. Which defense should I select? How many points did they allow in their most recent game?"
prompt = generate_fantasy_prompt(question, game_knowledge_base)

# Generate a response
response = openai.ChatCompletion.create(
    model="ft:gpt-4o-mini-2024-07-18:personal::AcIKE4yF",
    messages=[
        {"role": "system", "content": "You are a fantasy football assistant. Provide accurate, data-driven answers based on recent game summaries."},
        {"role": "user", "content": prompt},
    ],
    max_tokens=100,
    temperature=0.05,
)

# Extract the model's response
fantasy_advice = response['choices'][0]['message']['content']
print("Fantasy Football Advice:\n", fantasy_advice)

Fantasy Football Advice:
 You should select the Baltimore Ravens defense, as they hold opponents to the fewest points per game (15.8). In their most recent game, they allowed 16 points to the Pittsburgh Steelers.
