# üé¨ CymbalFlix Discover - Database Setup

Welcome to the data engineering portion of CymbalFlix Discover! In this notebook, you'll set up your AlloyDB database with everything needed for an AI-powered movie discovery application.

## What We're Building

By the end of this notebook, your database will contain:

| Table | Records | Purpose |
|-------|---------|--------|
| `movies` | ~9,700 | Core catalog with AI-searchable summaries and vector embeddings |
| `genres` | 20 | Genre lookup table |
| `movie_genres` | ~21,000 | Many-to-many junction for movie genres |
| `users` | 610 | User profiles extracted from ratings data |
| `ratings` | 100,836 | Historical ratings for analytics |
| `tags` | 3,683 | User-generated tags for semantic analysis |
| `watchlist` | 0 | Ready for user watchlist operations |

## AlloyDB Extensions We'll Enable

- **`vector`** - PostgreSQL vector data type for embeddings
- **`alloydb_scann`** - Google's ScaNN index for lightning-fast vector search
- **`google_ml_integration`** - Direct Vertex AI access from SQL

## Security: IAM Authentication

Notice something missing? **No database passwords!** We're using IAM authentication, which means:
- Your Google Cloud identity is your database identity
- No passwords to manage, rotate, or accidentally commit to Git
- The Auth Proxy handles secure token exchange automatically

Let's get started! üöÄ

---
## Step 1: Configure Your Environment

First, let's set up the configuration for your specific AlloyDB cluster. You'll need your project ID from the lab instructions.

In [None]:
# =============================================================================
# CONFIGURATION - Update these values with your environment details
# =============================================================================

# Your Google Cloud Project ID (from the lab instructions)
PROJECT_ID = "YOUR_PROJECT_ID"  # TODO: Replace with your project ID

# AlloyDB cluster details (from Terraform outputs)
REGION = "us-central1"
CLUSTER_ID = "cymbalflix-cluster"

# The database we'll create for CymbalFlix
DB_NAME = "cymbalflix"

# GCS bucket with our MovieLens data
DATA_BUCKET = "gs://class-demo/ml-latest-small"

print(f"‚úÖ Configuration set for project: {PROJECT_ID}")
print(f"‚úÖ AlloyDB cluster: {CLUSTER_ID} in {REGION}")
print(f"\nüîê Using IAM authentication (no password required!)")

---
## Step 2: Get Your IAM Identity

With IAM authentication, your Google Cloud identity becomes your database identity. Let's see who you are!

In [None]:
import subprocess
import json

# Get the current authenticated user's email
result = subprocess.run(
    ["gcloud", "auth", "list", "--filter=status:ACTIVE", "--format=json"],
    capture_output=True, text=True
)
accounts = json.loads(result.stdout)

if accounts:
    DB_USER = accounts[0]['account']
    print(f"‚úÖ Authenticated as: {DB_USER}")
    print(f"\n   This identity will be used for database connections.")
    print(f"   No password needed - IAM handles authentication!")
else:
    print("‚ùå No active gcloud account found. Please authenticate.")

---
## Step 3: Install and Start the AlloyDB Auth Proxy

The AlloyDB Auth Proxy provides secure, IAM-based authentication to your database. With the `--auto-iam-authn` flag, it automatically:

1. Gets your current Google Cloud identity
2. Generates a short-lived access token
3. Authenticates to AlloyDB on your behalf

**Why Auth Proxy?**
- No passwords to manage
- Automatic token rotation
- Encrypted connections
- Works the same in development and production

In [None]:
# Download and install the AlloyDB Auth Proxy
!curl -o alloydb-auth-proxy https://storage.googleapis.com/alloydb-auth-proxy/v1.11.2/alloydb-auth-proxy.linux.amd64
!chmod +x alloydb-auth-proxy

print("\n‚úÖ AlloyDB Auth Proxy installed!")

In [None]:
import subprocess
import time
import os

# Build the instance connection name
INSTANCE_CONNECTION = f"projects/{PROJECT_ID}/locations/{REGION}/clusters/{CLUSTER_ID}/instances/cymbalflix-primary"

print(f"üîó Connecting to: {INSTANCE_CONNECTION}")
print(f"üîê Using IAM authentication for: {DB_USER}")
print("\n‚è≥ Starting Auth Proxy in background...")

# Start the proxy with IAM authentication enabled
proxy_process = subprocess.Popen(
    [
        "./alloydb-auth-proxy", 
        INSTANCE_CONNECTION, 
        "--port=5432", 
        "--address=127.0.0.1",
        "--auto-iam-authn"  # This enables automatic IAM authentication!
    ],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

# Give it a moment to start up
time.sleep(5)

# Check if it's running
if proxy_process.poll() is None:
    print("‚úÖ Auth Proxy is running with IAM authentication!")
    print("   Connection available at: localhost:5432")
    print(f"   Authenticating as: {DB_USER}")
else:
    print("‚ùå Auth Proxy failed to start. Check the error output below:")
    stdout, stderr = proxy_process.communicate()
    print(stderr.decode())

---
## Step 4: Install Python Dependencies and Connect

We'll use `psycopg2` for PostgreSQL connectivity and `pandas` for data manipulation. These are the same tools you'd use with any PostgreSQL database‚ÄîAlloyDB is 100% PostgreSQL compatible!

**Note on IAM auth:** When using the Auth Proxy with `--auto-iam-authn`, we don't need to provide a password. The proxy handles authentication automatically using your Google Cloud identity.

In [None]:
# Install required packages
!pip install -q psycopg2-binary pandas google-cloud-storage

print("‚úÖ Dependencies installed!")

In [None]:
import psycopg2
from psycopg2.extras import execute_values
import pandas as pd
from google.cloud import storage
import io
import re
import json

def get_connection(dbname="postgres"):
    """
    Create a connection to AlloyDB via the Auth Proxy.
    
    With --auto-iam-authn enabled on the proxy, we don't need a password!
    The proxy automatically generates and uses IAM-based credentials.
    """
    return psycopg2.connect(
        host="127.0.0.1",
        port=5432,
        user=DB_USER,
        dbname=dbname,
        # No password needed! Auth Proxy handles IAM authentication.
        # This is more secure than managing database passwords.
    )

# Test the connection
try:
    conn = get_connection()
    with conn.cursor() as cur:
        cur.execute("SELECT version();")
        version = cur.fetchone()[0]
        cur.execute("SELECT current_user;")
        current_user = cur.fetchone()[0]
    conn.close()
    print("‚úÖ Successfully connected to AlloyDB!")
    print(f"\nüîê Connected as: {current_user}")
    print(f"\nüìä Database version:\n{version}")
except Exception as e:
    print(f"‚ùå Connection failed: {e}")
    print("\nüîç Troubleshooting tips:")
    print("   1. Make sure your AlloyDB cluster is fully created (check Cloud Console)")
    print("   2. Verify the Auth Proxy is running (re-run the previous cell)")
    print("   3. Check that PROJECT_ID and CLUSTER_ID are correct")
    print("   4. Ensure your IAM user was created by Terraform (check terraform output)")

---
## Step 5: Create the CymbalFlix Database and Enable Extensions

Now we'll create our dedicated database and enable the AlloyDB extensions that power our AI features:

- **`vector`**: Adds the VECTOR data type for storing embeddings
- **`alloydb_scann`**: Enables ScaNN indexes for fast similarity search
- **`google_ml_integration`**: Connects AlloyDB directly to Vertex AI

In [None]:
# Create the cymbalflix database
conn = get_connection("postgres")
conn.autocommit = True  # Required for CREATE DATABASE

with conn.cursor() as cur:
    # Check if database exists
    cur.execute("SELECT 1 FROM pg_database WHERE datname = %s", (DB_NAME,))
    exists = cur.fetchone()
    
    if not exists:
        cur.execute(f"CREATE DATABASE {DB_NAME}")
        print(f"‚úÖ Created database: {DB_NAME}")
    else:
        print(f"‚ÑπÔ∏è  Database {DB_NAME} already exists")

conn.close()

In [None]:
# Enable AlloyDB extensions
conn = get_connection(DB_NAME)
conn.autocommit = True

extensions = [
    ("vector", "Vector data type for embeddings"),
    ("alloydb_scann", "ScaNN index for vector similarity search"),
    ("google_ml_integration", "Vertex AI integration for AI SQL functions")
]

with conn.cursor() as cur:
    for ext_name, description in extensions:
        try:
            cur.execute(f"CREATE EXTENSION IF NOT EXISTS {ext_name}")
            print(f"‚úÖ Enabled: {ext_name}")
            print(f"   ‚îî‚îÄ {description}")
        except Exception as e:
            print(f"‚ö†Ô∏è  Could not enable {ext_name}: {e}")

conn.close()
print("\nüéâ AlloyDB is ready for AI-powered operations!")

---
## Step 6: Create the Database Schema

Our schema is designed for both transactional operations (watchlists, ratings) and analytical queries (trending movies, genre analysis). The `movies` table includes a `summary_embedding` column that stores 1536-dimensional vectors for semantic search.

**Schema Highlights:**
- Normalized genre data with a junction table
- Vector column for semantic similarity search
- Proper foreign keys for data integrity
- Timestamps for temporal analysis

In [None]:
# Define our database schema
schema_sql = """
-- Core movie catalog with vector embeddings for semantic search
CREATE TABLE IF NOT EXISTS movies (
    movie_id INTEGER PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    year INTEGER,
    summary TEXT,
    summary_embedding VECTOR(1536)
);

-- Genre lookup table
CREATE TABLE IF NOT EXISTS genres (
    genre_id SERIAL PRIMARY KEY,
    genre_name VARCHAR(50) UNIQUE NOT NULL
);

-- Many-to-many junction table for movie genres
CREATE TABLE IF NOT EXISTS movie_genres (
    movie_id INTEGER REFERENCES movies(movie_id) ON DELETE CASCADE,
    genre_id INTEGER REFERENCES genres(genre_id) ON DELETE CASCADE,
    PRIMARY KEY (movie_id, genre_id)
);

-- User profiles (extracted from ratings data)
CREATE TABLE IF NOT EXISTS users (
    user_id INTEGER PRIMARY KEY,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Historical ratings for analytics
CREATE TABLE IF NOT EXISTS ratings (
    rating_id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(user_id) ON DELETE CASCADE,
    movie_id INTEGER REFERENCES movies(movie_id) ON DELETE CASCADE,
    rating NUMERIC(2,1) NOT NULL CHECK (rating >= 0.5 AND rating <= 5.0),
    rated_at TIMESTAMP
);

-- User watchlists (for transactional operations)
CREATE TABLE IF NOT EXISTS watchlist (
    user_id INTEGER REFERENCES users(user_id) ON DELETE CASCADE,
    movie_id INTEGER REFERENCES movies(movie_id) ON DELETE CASCADE,
    added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, movie_id)
);

-- User-generated tags for semantic analysis
CREATE TABLE IF NOT EXISTS tags (
    tag_id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(user_id) ON DELETE CASCADE,
    movie_id INTEGER REFERENCES movies(movie_id) ON DELETE CASCADE,
    tag_text VARCHAR(255) NOT NULL,
    tagged_at TIMESTAMP
);
"""

# Execute the schema
conn = get_connection(DB_NAME)
with conn.cursor() as cur:
    cur.execute(schema_sql)
conn.commit()
conn.close()

print("‚úÖ Database schema created!")
print("\nüìã Tables created:")
print("   ‚Ä¢ movies (with vector embedding column)")
print("   ‚Ä¢ genres")
print("   ‚Ä¢ movie_genres (junction table)")
print("   ‚Ä¢ users")
print("   ‚Ä¢ ratings")
print("   ‚Ä¢ watchlist")
print("   ‚Ä¢ tags")

---
## Step 7: Load Data from Google Cloud Storage

Now comes the fun part‚Äîloading our MovieLens data! We'll load data from GCS and transform it as we go:

1. **Movies**: Extract year from title, e.g., "Toy Story (1995)" ‚Üí title="Toy Story", year=1995
2. **Summaries**: AI-generated movie descriptions (we'll merge these into movies)
3. **Embeddings**: Pre-computed 1536-dimensional vectors from Gemini
4. **Genres**: Parse pipe-delimited genres into a normalized structure
5. **Users**: Extract unique user IDs from ratings
6. **Ratings & Tags**: Load with timestamp conversion

In [None]:
def load_csv_from_gcs(bucket_name, blob_name):
    """Load a CSV file from GCS into a pandas DataFrame."""
    # Parse the bucket path
    if bucket_name.startswith("gs://"):
        bucket_name = bucket_name[5:]
    
    # Handle bucket/path format
    if "/" in bucket_name:
        parts = bucket_name.split("/", 1)
        bucket_name = parts[0]
        blob_name = f"{parts[1]}/{blob_name}"
    
    client = storage.Client(project=PROJECT_ID)
    bucket = client.bucket(bucket_name)
    blob = bucket.blob(blob_name)
    
    content = blob.download_as_text()
    return pd.read_csv(io.StringIO(content))

print("‚úÖ GCS loader function ready!")

### 7.1 Load and Transform Movies

The MovieLens dataset stores the year in the title (e.g., "Jumanji (1995)"). We'll extract it into a separate column for better analytics.

In [None]:
# Load movies from GCS
print("üì• Loading movies.csv from GCS...")
movies_df = load_csv_from_gcs(DATA_BUCKET, "movies.csv")
print(f"   Loaded {len(movies_df)} movies")

# Extract year from title using regex
# Pattern matches " (YYYY)" at the end of the title
def extract_year_and_clean_title(title):
    match = re.search(r'\s*\((\d{4})\)\s*$', str(title))
    if match:
        year = int(match.group(1))
        clean_title = re.sub(r'\s*\(\d{4}\)\s*$', '', title).strip()
        return clean_title, year
    return title, None

# Apply the transformation
movies_df[['clean_title', 'year']] = movies_df['title'].apply(
    lambda x: pd.Series(extract_year_and_clean_title(x))
)
movies_df['title'] = movies_df['clean_title']
movies_df = movies_df.drop(columns=['clean_title'])

# Store genres for later processing
movies_with_genres = movies_df[['movieId', 'genres']].copy()

print("\n‚úÖ Movies processed!")
print(f"\nüìä Sample data:")
movies_df[['movieId', 'title', 'year']].head()

### 7.2 Load and Merge Summaries

The summaries were generated using an AI model to provide rich, searchable descriptions of each movie. These descriptions enable semantic search‚Äîfinding movies based on meaning, not just keywords.

In [None]:
# Load summaries
print("üì• Loading summaries.csv from GCS...")
summaries_df = load_csv_from_gcs(DATA_BUCKET, "summaries.csv")
print(f"   Loaded {len(summaries_df)} summaries")

# Merge summaries into movies
movies_df = movies_df.merge(summaries_df, on='movieId', how='left')

print("\n‚úÖ Summaries merged!")
print(f"\nüìù Sample summary for '{movies_df.iloc[0]['title']}':")
print(f"   {movies_df.iloc[0]['summary'][:200]}..." if pd.notna(movies_df.iloc[0]['summary']) else "   (No summary available)")

### 7.3 Load and Merge Embeddings

The embeddings are 1536-dimensional vectors generated by the Gemini embedding model. Each vector captures the semantic meaning of a movie's summary, enabling similarity search.

**Why pre-computed embeddings?**
- Generating embeddings for ~10,000 movies takes time
- Pre-computing allows us to focus on AlloyDB features
- In production, you'd typically compute embeddings when content is added

In [None]:
# Load embeddings
print("üì• Loading embeddings.csv from GCS...")
embeddings_df = load_csv_from_gcs(DATA_BUCKET, "embeddings.csv")
print(f"   Loaded {len(embeddings_df)} embeddings")

# The embedding column contains JSON arrays as strings
# We'll parse them when inserting into the database

# Merge embeddings into movies
movies_df = movies_df.merge(embeddings_df, on='movieId', how='left')

print("\n‚úÖ Embeddings merged!")
print(f"\nüî¢ Embedding dimensions: 1536")
print(f"   First few values of first embedding: {movies_df.iloc[0]['embedding'][:50]}..." if pd.notna(movies_df.iloc[0]['embedding']) else "   (No embedding)")

### 7.4 Insert Movies into AlloyDB

Now we'll insert our prepared movie data into AlloyDB. Note how we handle the vector embeddings‚Äîthey're stored as JSON arrays and AlloyDB's vector extension handles the conversion.

In [None]:
# Prepare movies data for insertion
conn = get_connection(DB_NAME)

# Insert movies in batches for better performance
insert_sql = """
    INSERT INTO movies (movie_id, title, year, summary, summary_embedding)
    VALUES %s
    ON CONFLICT (movie_id) DO UPDATE SET
        title = EXCLUDED.title,
        year = EXCLUDED.year,
        summary = EXCLUDED.summary,
        summary_embedding = EXCLUDED.summary_embedding
"""

# Prepare the data
movies_data = []
for _, row in movies_df.iterrows():
    # Parse embedding from JSON string if present
    embedding = None
    if pd.notna(row.get('embedding')):
        try:
            embedding = row['embedding']  # Keep as string, PostgreSQL will parse it
        except:
            pass
    
    movies_data.append((
        int(row['movieId']),
        row['title'],
        int(row['year']) if pd.notna(row['year']) else None,
        row.get('summary') if pd.notna(row.get('summary')) else None,
        embedding
    ))

print(f"üì§ Inserting {len(movies_data)} movies into AlloyDB...")

with conn.cursor() as cur:
    execute_values(cur, insert_sql, movies_data, page_size=500)

conn.commit()
conn.close()

print("‚úÖ Movies inserted successfully!")

### 7.5 Process and Load Genres

MovieLens stores genres as pipe-delimited strings (e.g., "Action|Comedy|Sci-Fi"). We'll normalize this into a proper relational structure with a genres lookup table and a junction table.

In [None]:
# Extract unique genres
all_genres = set()
for genres_str in movies_with_genres['genres']:
    if pd.notna(genres_str) and genres_str != '(no genres listed)':
        all_genres.update(genres_str.split('|'))

print(f"üé¨ Found {len(all_genres)} unique genres:")
print(f"   {', '.join(sorted(all_genres))}")

# Insert genres
conn = get_connection(DB_NAME)

with conn.cursor() as cur:
    for genre in sorted(all_genres):
        cur.execute(
            "INSERT INTO genres (genre_name) VALUES (%s) ON CONFLICT (genre_name) DO NOTHING",
            (genre,)
        )

conn.commit()

# Get genre IDs for the junction table
with conn.cursor() as cur:
    cur.execute("SELECT genre_id, genre_name FROM genres")
    genre_lookup = {name: id for id, name in cur.fetchall()}

print(f"\n‚úÖ Genres inserted!")

In [None]:
# Create movie_genres junction records
movie_genres_data = []
for _, row in movies_with_genres.iterrows():
    if pd.notna(row['genres']) and row['genres'] != '(no genres listed)':
        movie_id = int(row['movieId'])
        for genre in row['genres'].split('|'):
            if genre in genre_lookup:
                movie_genres_data.append((movie_id, genre_lookup[genre]))

print(f"üì§ Creating {len(movie_genres_data)} movie-genre associations...")

with conn.cursor() as cur:
    execute_values(
        cur,
        "INSERT INTO movie_genres (movie_id, genre_id) VALUES %s ON CONFLICT DO NOTHING",
        movie_genres_data,
        page_size=1000
    )

conn.commit()
conn.close()

print("‚úÖ Movie-genre associations created!")

### 7.6 Load Ratings and Extract Users

The ratings dataset contains over 100,000 ratings from 610 users. We'll first extract unique users, then load the ratings with their timestamps.

In [None]:
# Load ratings
print("üì• Loading ratings.csv from GCS...")
ratings_df = load_csv_from_gcs(DATA_BUCKET, "ratings.csv")
print(f"   Loaded {len(ratings_df)} ratings")

# Extract unique users
unique_users = ratings_df['userId'].unique()
print(f"\nüë• Found {len(unique_users)} unique users")

# Insert users
conn = get_connection(DB_NAME)

user_data = [(int(uid),) for uid in unique_users]
with conn.cursor() as cur:
    execute_values(
        cur,
        "INSERT INTO users (user_id) VALUES %s ON CONFLICT DO NOTHING",
        user_data
    )

conn.commit()
print("‚úÖ Users inserted!")

In [None]:
# Convert timestamps and prepare ratings data
from datetime import datetime

ratings_data = []
for _, row in ratings_df.iterrows():
    # Convert Unix timestamp to datetime
    rated_at = datetime.fromtimestamp(row['timestamp'])
    ratings_data.append((
        int(row['userId']),
        int(row['movieId']),
        float(row['rating']),
        rated_at
    ))

print(f"üì§ Inserting {len(ratings_data)} ratings...")

with conn.cursor() as cur:
    execute_values(
        cur,
        "INSERT INTO ratings (user_id, movie_id, rating, rated_at) VALUES %s",
        ratings_data,
        page_size=5000
    )

conn.commit()
conn.close()

print("‚úÖ Ratings inserted!")

### 7.7 Load Tags

Tags are user-generated labels for movies. These are great for demonstrating AlloyDB's AI SQL functions later‚Äîwe can use AI to understand tag meanings and find semantically similar movies.

In [None]:
# Load tags
print("üì• Loading tags.csv from GCS...")
tags_df = load_csv_from_gcs(DATA_BUCKET, "tags.csv")
print(f"   Loaded {len(tags_df)} tags")

# Prepare tags data
tags_data = []
for _, row in tags_df.iterrows():
    tagged_at = datetime.fromtimestamp(row['timestamp'])
    tags_data.append((
        int(row['userId']),
        int(row['movieId']),
        str(row['tag']),
        tagged_at
    ))

print(f"üì§ Inserting {len(tags_data)} tags...")

conn = get_connection(DB_NAME)
with conn.cursor() as cur:
    execute_values(
        cur,
        "INSERT INTO tags (user_id, movie_id, tag_text, tagged_at) VALUES %s",
        tags_data,
        page_size=1000
    )

conn.commit()
conn.close()

print("‚úÖ Tags inserted!")

---
## Step 8: Verify Your Data

Let's make sure everything loaded correctly with some verification queries.

In [None]:
# Verification queries
conn = get_connection(DB_NAME)

verification_queries = [
    ("movies", "SELECT COUNT(*) FROM movies"),
    ("movies with summaries", "SELECT COUNT(*) FROM movies WHERE summary IS NOT NULL"),
    ("movies with embeddings", "SELECT COUNT(*) FROM movies WHERE summary_embedding IS NOT NULL"),
    ("genres", "SELECT COUNT(*) FROM genres"),
    ("movie_genres", "SELECT COUNT(*) FROM movie_genres"),
    ("users", "SELECT COUNT(*) FROM users"),
    ("ratings", "SELECT COUNT(*) FROM ratings"),
    ("tags", "SELECT COUNT(*) FROM tags"),
]

print("üìä Data Verification Report")
print("=" * 50)

with conn.cursor() as cur:
    for name, query in verification_queries:
        cur.execute(query)
        count = cur.fetchone()[0]
        print(f"   {name}: {count:,}")

conn.close()
print("=" * 50)

In [None]:
# Sample query: Top-rated movies with their genres
sample_query = """
SELECT 
    m.title,
    m.year,
    ROUND(AVG(r.rating), 2) as avg_rating,
    COUNT(r.rating_id) as num_ratings,
    STRING_AGG(DISTINCT g.genre_name, ', ' ORDER BY g.genre_name) as genres
FROM movies m
JOIN ratings r ON m.movie_id = r.movie_id
JOIN movie_genres mg ON m.movie_id = mg.movie_id
JOIN genres g ON mg.genre_id = g.genre_id
GROUP BY m.movie_id, m.title, m.year
HAVING COUNT(r.rating_id) >= 50
ORDER BY avg_rating DESC, num_ratings DESC
LIMIT 10;
"""

conn = get_connection(DB_NAME)
result_df = pd.read_sql(sample_query, conn)
conn.close()

print("\nüèÜ Top 10 Highest-Rated Movies (minimum 50 ratings):")
result_df

---
## üéâ Congratulations!

Your CymbalFlix database is ready! Here's what you've accomplished:

‚úÖ Connected to AlloyDB using **IAM authentication** (no passwords!)  
‚úÖ Created a dedicated database with AI extensions enabled  
‚úÖ Built a normalized schema for movies, users, ratings, and tags  
‚úÖ Loaded and transformed MovieLens data  
‚úÖ Added AI-generated summaries and vector embeddings  

### Security Note

Notice how we never handled a database password? That's IAM authentication in action:
- Your Google Cloud identity IS your database identity
- The Auth Proxy automatically manages secure token exchange
- No credentials to rotate, leak, or manage

This is the **production-ready** way to handle database authentication in Google Cloud.

### What's Next?

In the upcoming lab modules, you'll:

1. **Create a ScaNN Index** - Enable lightning-fast vector similarity search
2. **Run Semantic Searches** - Find movies by meaning, not just keywords
3. **Use AI SQL Functions** - Apply Gemini intelligence directly in your queries
4. **Enable the Columnar Engine** - Accelerate analytical queries by up to 100x

Your database is now ready to power an AI-driven movie discovery experience! üé¨ü§ñ

In [None]:
# Cleanup: Stop the Auth Proxy when you're done
# Uncomment the line below if you want to stop the proxy
# proxy_process.terminate()
# print("Auth Proxy stopped.")