# F1RAG: Formula 1 Retrieval-Augmented Generation System

This notebook demonstrates the end-to-end workflow of the F1RAG system, which leverages historical Formula 1 data to generate race summaries using retrieval-augmented generation (RAG) techniques. The workflow includes data processing, model training, summary generation, and visualization.

## 1. Environment Setup

Ensure all dependencies are installed and the environment is ready. If running for the first time, install required packages and set up the data directory.

In [1]:
# Install dependencies (uncomment if needed)
# %pip install torch transformers datasets scikit-learn pandas matplotlib seaborn
# !python install_deps.py

## 2. Data Preparation

Process and load the Formula 1 race data. If you have not yet processed the raw data, run the data processing script.

In [None]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../src')))

from data_processing import load_race_data, preprocess_race_data, extract_race_information

# Load race data
file_path = 'data/processed/race_data.csv'
race_data = load_race_data(file_path)

# Preprocess race data
preprocessed_data = preprocess_race_data(race_data)

# Extract relevant race information
race_info = extract_race_information(preprocessed_data)
print(f"Loaded {len(race_info)} race entries.")

ModuleNotFoundError: No module named 'src'

## 3. Model Setup

Initialize the RAG model and tokenizer. The model will be trained to generate summaries based on the processed race data.

In [None]:
from src.rag_model import train_rag_model, evaluate_rag_model, generate_race_summaries

# Prepare training data
training_data = [str(info) for info in race_info.values()]

# (Optional) Set model parameters
model_name = 'facebook/rag-sequence-nq'
epochs = 3
batch_size = 4
learning_rate = 1e-5
output_dir = 'models'

## 4. Model Training

Train the RAG model on the Formula 1 race data. This step may take some time depending on hardware and dataset size.

In [None]:
# Train the RAG model
rag_model, tokenizer = train_rag_model(
    training_data,
    model_name=model_name,
    epochs=epochs,
    batch_size=batch_size,
    learning_rate=learning_rate,
    output_dir=output_dir
)

## 5. Model Evaluation

Evaluate the trained model's performance on the training data or a held-out validation set.

In [None]:
# Evaluate the RAG model
accuracy = evaluate_rag_model(rag_model, training_data)
print(f"Model accuracy: {accuracy:.2f}")

## 6. Generate Race Summaries

Use the trained model to generate natural language summaries for Formula 1 races.

In [None]:
# Generate race summaries
race_summaries = generate_race_summaries(rag_model, training_data)
print(race_summaries[0])

## 7. Visualization

Visualize the generated race summaries and explore the results interactively.

In [None]:
from src.visualization import visualize_race_summaries, create_user_interface

# Visualize race summaries
visualize_race_summaries(race_summaries)

# Create user interface for exploring race summaries
create_user_interface(race_summaries)

---

**End of F1RAG System Notebook**