# üèéÔ∏è Formula 1 Driver Mood Analysis with Vision Language Models

Welcome to the fast lane! In this notebook, we'll use cutting-edge AI to analyze the moods of F1 2025 drivers from their official photos.

## What We'll Do:

* üñºÔ∏è **Download** images of all 20 F1 2025 drivers
* ü§ñ **Analyze** their moods using Databricks' Claude Sonnet 4 Vision Language Model
* üîç **Search** for drivers by mood using Vector Search (semantic search)
* üéØ **Discover** which drivers look confident, happy, or intensely focused

Let's get started! üèÅ

## üîß Step 1: Install Required Libraries

First, we need to install the necessary Python packages for working with Databricks Vector Search, LangChain, and image processing.

In [0]:
%pip install -U --quiet databricks-sdk==0.49.0 "databricks-langchain>=0.4.0" databricks-agents mlflow[databricks] databricks-vectorsearch==0.55 langchain==0.3.25 langchain_core==0.3.59 bs4==0.0.2 markdownify==0.14.1 pydantic==2.10.1
dbutils.library.restartPython()

## üì¶ Step 2: Create Catalog, Schema, and Volume

Now we'll create the Unity Catalog objects to store our F1 driver images. Don't worry - if they already exist, these commands won't break anything!

In [0]:
# Configuration - Update these values for your environment
catalog_name = "formula1"
schema_name = "default"
volume_name = "driver_images"
table_name = "driver_images_table"

# Derived paths
volume_path = f"/Volumes/{catalog_name}/{schema_name}/{volume_name}/"
full_table_name = f"{catalog_name}.{schema_name}.{table_name}"

print(f"Catalog: {catalog_name}")
print(f"Schema: {schema_name}")
print(f"Volume: {volume_name}")
print(f"Volume Path: {volume_path}")
print(f"Table: {full_table_name}")

In [0]:
spark.sql(f"CREATE CATALOG IF NOT EXISTS {catalog_name}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {catalog_name}.{schema_name}")
spark.sql(f"CREATE VOLUME IF NOT EXISTS {catalog_name}.{schema_name}.{volume_name}")

In [0]:
# Optional: Clean the volume if you want to start fresh
dbutils.fs.rm(volume_path, True)

## üì∏ Step 3: Download F1 2025 Driver Images

Time to grab all 20 driver photos from GitHub! We'll compress them to save space while keeping great quality.

**Fun fact:** We're downloading images of champions like Max Verstappen, Lewis Hamilton, and rising stars like Kimi Antonelli! üèéÔ∏è

In [0]:
import requests
from PIL import Image
from io import BytesIO

image_filenames = [
    "Alexander_Albon_23.png",
    "Carlos_Sainz_55.png",
    "Charles_Leclerc_16.png",
    "Esteban_Ocon_31.png",
    "Fernando_Alonso_14.png",
    "Gabriel_Bortoleto_5.png",
    "George_Russell_63.png",
    "Isack_Hadjar_6.png",
    "Jack_Doohan_7.png",
    "Kimi_Antonelli_12.png",
    "Lance_Stroll_18.png",
    "Lando_Norris_4.png",
    "Lewis_Hamilton_44.png",
    "Liam_Lawson_30.png",
    "Max_Verstappen_1.png",
    "Nico_Hulkenberg_27.png",
    "Oliver_Bearman_87.png",
    "Oscar_Piastri_81.png",
    "Pierre_Gasly_10.png",
    "Yuki_Tsunoda_22.png"
]

base_url = "https://raw.githubusercontent.com/toUpperCase78/formula1-datasets/be28da6b5a94315dd5fc8c3fc5f240fdccf6f723/F1%202025%20Season%20Drivers/"

for filename in image_filenames:
    url = f"{base_url}{filename}"
    response = requests.get(url)
    
    if response.status_code == 200:
        # Load image into PIL
        img = Image.open(BytesIO(response.content))
        
        # Resize to smaller dimensions (adjust as needed)
        # This keeps aspect ratio and makes max dimension 800px
        max_size = (800, 800)
        img.thumbnail(max_size, Image.Resampling.LANCZOS)
        
        # Save as JPEG with compression (quality 85 is good balance)
        output_filename = filename.replace('.png', '.jpg')
        img.convert('RGB').save(
            f"{volume_path}{output_filename}", 
            'JPEG', 
            quality=85, 
            optimize=True
        )
        print(f"Saved compressed: {output_filename}")
    else:
        print(f"Failed to download: {filename}")

In [0]:
import base64
import os

# Display images in a grid
thumb_files = [f for f in os.listdir(volume_path) if f.endswith('.jpg')]
grid_html = "<table><tr>"

for idx, img_file in enumerate(thumb_files):
    img_path = f"{volume_path}{img_file}"
    if os.path.exists(img_path):
        # Read image and convert to base64
        with open(img_path, 'rb') as f:
            img_data = base64.b64encode(f.read()).decode()
        
        grid_html += f"<td><img src='data:image/jpeg;base64,{img_data}' width='120'><br>{img_file.replace('.jpg','')}</td>"
        if (idx + 1) % 5 == 0:
            grid_html += "</tr><tr>"

grid_html += "</tr></table>"

displayHTML(grid_html)

## ü§ñ Step 4: Analyze Driver Moods with AI

Here's where the magic happens! We'll use **Claude Sonnet 4** (a Vision Language Model) to analyze each driver's mood from their photo.

The AI will look at facial expressions, body language, and overall vibe to describe how each driver appears. This creates a table with:
* üñºÔ∏è Image path
* üí¨ AI-generated mood description

In [0]:
spark.sql(f"""
CREATE OR REPLACE TABLE {full_table_name}
TBLPROPERTIES (delta.enableChangeDataFeed = true)
SELECT
  ai_query(
    'databricks-claude-sonnet-4',
    'Please describe the mood of the person of the person in the image',
    files => files.content
  ) AS enriched_caption,
  files.path
FROM READ_FILES('{volume_path}', format => 'binaryFile') AS files
""")

## üëÄ Step 5: View the Results

Let's see what the AI thinks about our drivers' moods!

In [0]:
display(spark.table(full_table_name))

## üîç Step 6: Set Up Vector Search

Now for the cool part! We'll create a **Vector Search** system that lets us search for drivers by mood using natural language.

**What's Vector Search?** It converts text (mood descriptions) into mathematical vectors, allowing semantic search. Instead of exact keyword matching, it understands *meaning*!

First, let's initialize the Vector Search client:

In [0]:
from databricks.vector_search.client import VectorSearchClient
vsc = VectorSearchClient(disable_notice=True)

In [0]:
import time

# Create unique names with timestamp
timestamp = str(int(time.time()))
endpoint_name = f'f1_drivers_endpoint_{timestamp}'
index_name = f'{catalog_name}.{schema_name}.f1_drivers_index_{timestamp}'

print(f"Creating new endpoint: {endpoint_name}")
print(f"Creating new index: {index_name}")

# Create the vector search endpoint
try:
    endpoint = vsc.create_endpoint(
        name=endpoint_name, 
        endpoint_type='STANDARD'
    )
    print(f"‚úÖ Successfully created endpoint: {endpoint_name}")
except Exception as e:
    print(f"‚ùå Error creating endpoint: {e}")
    raise

# Wait a moment for endpoint to be ready
print("Waiting for endpoint to be ready...")
time.sleep(10)

# Create the vector search index
try:
    index = vsc.create_delta_sync_index(
        endpoint_name=endpoint_name,
        index_name=index_name,
        source_table_name=full_table_name,
        pipeline_type="TRIGGERED",
        primary_key="path",
        embedding_source_column='enriched_caption',
        embedding_model_endpoint_name='databricks-gte-large-en'
    )
    print(f"‚úÖ Successfully created index: {index_name}")
except Exception as e:
    print(f"‚ùå Error creating index: {e}")
    raise

print("\nüéâ Vector search setup complete!")
print(f"Endpoint: {endpoint_name}")
print(f"Index: {index_name}")

## üèÅ Step 7: Test Semantic Search!

Time to put our Vector Search to the test! We'll search for drivers based on their mood descriptions.

**How it works:** You describe a mood in natural language, and the AI finds drivers whose photos match that vibe - even if the exact words aren't in the description!

In [0]:
# Construct the index name dynamically
index_name = f"{catalog_name}.{schema_name}.f1_drivers_index_{timestamp}"

print(f"Testing vector search with index: {index_name}")
print("Searching for: Confident and determined drivers...\n")

try:
    # Test semantic search for confident drivers
    results = spark.sql(f"""
    SELECT 
        search_score,
        REGEXP_EXTRACT(path, r'([^/]+)\\.jpg$', 1) as driver_name,
        enriched_caption
    FROM VECTOR_SEARCH(
        index => '{index_name}',
        query_text => 'confident and determined professional athlete',
        num_results => 5
    )
    ORDER BY search_score DESC
    """)
    
    print("‚úÖ Vector search is working!")
    print("üèÜ Top 5 drivers with confident mood:\n")
    display(results)
    
except Exception as e:
    print(f"‚ö†Ô∏è Vector search not ready yet: {e}")
    print("The index is still building embeddings. This typically takes 5-10 minutes.")
    print("\nüîç Showing text-based search as fallback:\n")
    
    confident_drivers = spark.sql(f"""
    SELECT 
        REGEXP_EXTRACT(path, r'([^/]+)\\.jpg$', 1) as driver_name,
        enriched_caption
    FROM {full_table_name}
    WHERE LOWER(enriched_caption) LIKE '%confident%'
    ORDER BY driver_name
    """)
    display(confident_drivers)

## üéâ Search for Celebratory Drivers

Let's find drivers who look like they're ready to pop champagne on the podium!

In [0]:
try:
    results2 = spark.sql(f"""
    SELECT 
        search_score,
        REGEXP_EXTRACT(path, r'([^/]+)\\.jpg$', 1) as driver_name,
        enriched_caption
    FROM VECTOR_SEARCH(
        index => '{index_name}',
        query_text => 'happy celebrating victory triumph',
        num_results => 3
    )
    ORDER BY search_score DESC
    """)
    
    print("üéâ Top 3 drivers with celebratory mood:\n")
    display(results2)
    
except Exception as e:
    print(f"‚ö†Ô∏è Vector search not ready yet: {e}")

## üéØ Search for Focused Drivers

Who's got that intense, race-day concentration?

In [0]:
try:
    results3 = spark.sql(f"""
    SELECT 
        search_score,
        REGEXP_EXTRACT(path, r'([^/]+)\\.jpg$', 1) as driver_name,
        enriched_caption
    FROM VECTOR_SEARCH(
        index => '{index_name}',
        query_text => 'serious focused intense concentration',
        num_results => 3
    )
    ORDER BY search_score DESC
    """)
    
    print("üéØ Top 3 drivers with serious/focused mood:\n")
    display(results3)
    
except Exception as e:
    print(f"‚ö†Ô∏è Vector search not ready yet: {e}")

## üèÜ Bonus: Meet the Champion!

Let's take a closer look at Max Verstappen - the reigning world champion! We'll show his photo and what the AI thinks about his mood.

In [0]:
# Get Max Verstappen's AI-generated mood caption
max_caption = spark.sql(f"""
    SELECT enriched_caption
    FROM {full_table_name}
    WHERE path LIKE '%Max_Verstappen_1%'
""").collect()

if max_caption:
    print("\nüèéÔ∏è Max Verstappen - 2024 World Champion")
    print("="*60)
    print(f"\nüí¨ AI Mood Analysis:")
    print(f"{max_caption[0]['enriched_caption']}")
    print("\n" + "="*60)
    print("\nüñºÔ∏è Photo:")

# Display the image
from IPython.display import Image as IPImage, display as ip_display
import os

# Read the image file
image_path = f"{volume_path}Max_Verstappen_1.jpg"
if os.path.exists(image_path):
    ip_display(IPImage(filename=image_path, width=400))
else:
    print(f"Image not found at: {image_path}")