# QELM Training on Google Colab

This notebook trains the embedding-based QELM system on GPU.

**What it does:**
1. Clones the code from GitHub
2. Installs dependencies
3. Mounts Google Drive for checkpoints
4. Trains Stage 1 (supervised pretraining)
5. Trains the two-tower recommender
6. Tests the full system

**Prerequisites:**
- Push your code to GitHub
- Set OpenAI API key (for question generation)
- Use GPU runtime (Runtime → Change runtime type → GPU)

## 1. Setup

In [None]:
# Check GPU availability
!nvidia-smi

### Clone Code

**For private repos**, you have 3 options:
1. Make the repo public temporarily
2. Use a Personal Access Token (see cell below for instructions)
3. Upload code to Google Drive and copy from there

**For public repos**, just run the clone command directly.

In [None]:
# Clone repository
# Option 1: Public repo (easiest)
!git clone https://github.com/makarovaalexa-brch/qelm-crs.git

# Option 2: Private repo with personal access token
# Create token at: https://github.com/settings/tokens (select 'repo' scope)
# Then use: !git clone https://YOUR_TOKEN@github.com/makarovaalexa-brch/qelm-crs.git

# Option 3: Upload files from Google Drive instead
# Uncomment if you've already uploaded the code to Drive:
# !cp -r /content/drive/MyDrive/qelm-crs /content/

# Change to directory
%cd qelm-crs

In [None]:
# Install dependencies
!pip install -q torch torchvision torchaudio
!pip install -q sentence-transformers
!pip install -q openai
!pip install -q pandas numpy scikit-learn tqdm
!pip install -q python-dotenv

In [None]:
# Mount Google Drive for checkpoints
from google.colab import drive
drive.mount('/content/drive')

# Create checkpoint directory
!mkdir -p /content/drive/MyDrive/qelm_checkpoints

In [None]:
# Set OpenAI API key
import os
from getpass import getpass

# Enter your OpenAI API key when prompted
api_key = getpass('Enter OpenAI API Key: ')
os.environ['OPENAI_API_KEY'] = api_key

## 2. Prepare Data

In [None]:
# Download MovieLens 25M dataset
!wget -nc https://files.grouplens.org/datasets/movielens/ml-25m.zip
!unzip -n ml-25m.zip -d data/

# Filter dataset
import pandas as pd

# Load full dataset
ratings = pd.read_csv('data/ml-25m/ratings.csv')
movies = pd.read_csv('data/ml-25m/movies.csv')

print(f"Original: {len(ratings)} ratings, {len(movies)} movies, {ratings['userId'].nunique()} users")

# Filter: Top 1000 most-rated movies
movie_counts = ratings['movieId'].value_counts()
top_movies = movie_counts.head(1000).index
movies_filtered = movies[movies['movieId'].isin(top_movies)]

# Filter: Users with 50+ ratings on these top movies
ratings_filtered = ratings[ratings['movieId'].isin(top_movies)]
user_counts = ratings_filtered['userId'].value_counts()
active_users = user_counts[user_counts >= 50].index
ratings_filtered = ratings_filtered[ratings_filtered['userId'].isin(active_users)]

print(f"Filtered: {len(ratings_filtered)} ratings, {len(movies_filtered)} movies, {len(active_users)} users")

# Save filtered data
movies_filtered.to_csv('data/ml-25m/movies_filtered.csv', index=False)
ratings_filtered.to_csv('data/ml-25m/ratings_filtered.csv', index=False)

print(f"\n✓ Saved filtered dataset to data/ml-25m/")

### Download MovieLens Dataset

We'll use a filtered subset for faster training:
- Top 1000 most-rated movies
- Users with 50+ ratings (active users)

In [None]:
# Create sample Reddit data (or scrape real data)
%cd /content/qelm-crs

# Option 1: Use 5 sample posts (fast, for testing pipeline)
!python src/qelm/data/reddit_scraper.py --sample --output-dir data/reddit

# Option 2: Scrape real Reddit data (100 posts per subreddit, ~500 total)
# Uncomment for actual training:
# !python src/qelm/data/reddit_scraper.py --max-posts 100 --output-dir data/reddit

In [None]:
# Verify data
import json

with open('data/reddit/sample_conversations.json', 'r') as f:
    data = json.load(f)
    
print(f"Loaded {len(data)} sample conversation pairs")
print("\nExample:")
print(json.dumps(data[0], indent=2))

## 3. Train Stage 1 (Supervised Pretraining)

This teaches the RL actor to predict embeddings for **what to ask next**.

**Key insight:** 
- Input: User's preference ("I love Inception")
- Target: Concepts from helpful response ("Have you seen Interstellar?")
- Model learns: User preference → Related concepts to explore

This is NOT just echoing the user - it's learning to ask exploratory follow-up questions!

In [None]:
# Import modules
import sys
sys.path.append('/content/qelm-crs/src')

from qelm.models.embedding_qelm import SentenceBERTEmbeddingSpace, EmbeddingActorCritic
from qelm.training.stage1_supervised import Stage1Trainer
from sentence_transformers import SentenceTransformer

print("Imports successful!")

In [None]:
# Initialize components
print("Initializing SentenceBERT embedding space...")
embedding_space = SentenceBERTEmbeddingSpace(movielens_data_path=None)

print("\nInitializing RL actor...")
rl_agent = EmbeddingActorCritic(
    state_dim=384,  # SentenceBERT
    embedding_dim=384  # SentenceBERT
)

print("\nInitializing encoder...")
encoder = SentenceTransformer('all-MiniLM-L6-v2')

print("\n✓ Initialization complete")

In [None]:
# Create trainer
trainer = Stage1Trainer(
    rl_agent=rl_agent,
    embedding_space=embedding_space,
    reddit_data_path='data/reddit',
    encoder=encoder
)

In [None]:
# Train Stage 1
trainer.train(
    epochs=20,  # More epochs on GPU
    batch_size=64,  # Larger batch on GPU
    learning_rate=0.001
)

In [None]:
# Evaluate Stage 1
trainer.evaluate(num_samples=10)

In [None]:
# Save checkpoint to Google Drive
import torch
from pathlib import Path

checkpoint_dir = Path('/content/drive/MyDrive/qelm_checkpoints')
checkpoint_path = checkpoint_dir / 'stage1_final.pt'

torch.save({
    'actor_state_dict': rl_agent.actor.state_dict(),
    'train_losses': trainer.train_losses,
}, checkpoint_path)

print(f"✓ Saved checkpoint to: {checkpoint_path}")

# Initialize recommender with REAL MovieLens data
movie_catalog = MovieCatalog(movielens_data_path='data/ml-25m')

recommender = TwoTowerRecommender(
    state_dim=384,
    embedding_dim=128
)

rec_trainer = RecommenderTrainer(recommender, movie_catalog, encoder)

print(f"✓ Loaded {len(movie_catalog.movies)} movies")

In [None]:
# Create training data from MovieLens ratings
# For each user, create a "conversation" describing their preferences

def create_user_conversations(ratings_df, movies_df, num_users=100):
    """
    Convert user ratings into conversation-like descriptions.
    
    For each user:
    - Sample their top-rated movies
    - Create a text description of their preferences
    - Return (conversation_text, liked_movie_ids)
    """
    conversations = []
    liked_movies = []
    
    # Sample users
    sampled_users = ratings_df['userId'].unique()[:num_users]
    
    for user_id in sampled_users:
        user_ratings = ratings_df[ratings_df['userId'] == user_id]
        
        # Get movies they rated highly (>= 4.0)
        high_ratings = user_ratings[user_ratings['rating'] >= 4.0]
        
        if len(high_ratings) < 3:
            continue
        
        # Sample 3-5 movies they liked
        sample_size = min(5, len(high_ratings))
        sampled = high_ratings.sample(sample_size)
        
        # Get movie titles
        movie_ids = sampled['movieId'].tolist()
        movie_titles = []
        for mid in movie_ids[:3]:  # Use first 3 for description
            movie_row = movies_df[movies_df['movieId'] == mid]
            if len(movie_row) > 0:
                title = movie_row.iloc[0]['title']
                movie_titles.append(title.split('(')[0].strip())
        
        # Create conversation text
        if len(movie_titles) >= 2:
            conversation = f"I really enjoyed {movie_titles[0]} and {movie_titles[1]}"
            if len(movie_titles) > 2:
                conversation += f", as well as {movie_titles[2]}"
            
            conversations.append(conversation)
            liked_movies.append(movie_ids)
    
    return conversations, liked_movies

# Load filtered ratings
ratings_df = pd.read_csv('data/ml-25m/ratings_filtered.csv')
movies_df = pd.read_csv('data/ml-25m/movies_filtered.csv')

# Generate training data
sample_conversations, sample_liked_movies = create_user_conversations(
    ratings_df, 
    movies_df, 
    num_users=200  # Use 200 users for training
)

print(f"Created {len(sample_conversations)} training examples from real user data")
print(f"\nExample conversation:")
print(f"  Text: {sample_conversations[0]}")
print(f"  Liked movies: {sample_liked_movies[0][:5]}")

In [None]:
# Initialize recommender
movie_catalog = MovieCatalog(movielens_data_path=None)  # Use sample data

recommender = TwoTowerRecommender(
    state_dim=384,
    embedding_dim=128
)

rec_trainer = RecommenderTrainer(recommender, movie_catalog, encoder)

In [None]:
# Sample training data
sample_conversations = [
    "I love action movies with great cinematography like The Dark Knight",
    "I enjoy mind-bending sci-fi films like Inception and Interstellar",
    "I prefer dark crime dramas with great dialogue like Pulp Fiction",
    "I like sci-fi movies with philosophical themes like The Matrix",
    "I want intense space exploration films like Interstellar",
] * 10  # Repeat for more training data

sample_liked_movies = [
    [1],  # The Dark Knight
    [2, 5],  # Inception, Interstellar
    [3],  # Pulp Fiction
    [4],  # The Matrix
    [5],  # Interstellar
] * 10

print(f"Training on {len(sample_conversations)} examples")

In [None]:
# Train recommender
rec_trainer.train(
    conversations=sample_conversations,
    liked_movies=sample_liked_movies,
    epochs=20,
    batch_size=32
)

In [None]:
# Save recommender checkpoint
rec_checkpoint_path = checkpoint_dir / 'recommender_final.pt'

torch.save({
    'user_tower': recommender.user_tower.state_dict(),
    'item_tower': recommender.item_tower.state_dict(),
}, rec_checkpoint_path)

print(f"✓ Saved recommender to: {rec_checkpoint_path}")

## 5. Test Full System

**What we've trained so far (all SUPERVISED, no RL yet):**

1. **RL Actor Network** (Stage 1):
   - Trained with MSE loss on conversation pairs
   - Predicts embeddings for next concepts to explore
   - NOT using reinforcement learning yet (that's Stage 3)

2. **Two-Tower Recommender**:
   - Trained with BPR loss on user preferences
   - Maps conversation → movie recommendations

**NOT trained yet:**
- RL Critic (value function) - only needed for Stage 3 RL
- User simulator - exists but not used yet
- Stage 3 end-to-end RL with rewards

In [None]:
# Simulate a conversation with REAL user simulator
print("\n" + "="*60)
print("QELM CONVERSATION DEMO (with User Simulator)")
print("="*60)

# Initialize user simulator with real MovieLens data
from qelm.agents.user_simulator import MovieLensLLMSimulator

user_sim = MovieLensLLMSimulator(
    movielens_data_path='data/ml-25m',
    model_name='gpt-4o-mini',  # Use GPT-4o-mini for simulation
    min_ratings=50
)

# Sample a user profile
user_profile = user_sim.sample_user()
print(f"\n👤 Simulated User Profile:")
print(f"   Liked: {', '.join(user_profile['liked_movies'][:3])}")
print(f"   Genres: {', '.join([g[0] for g in user_profile['preferred_genres']])}")

# User starts with an initial statement
initial_question = "Hi! What kind of movies do you enjoy?"
response1 = user_sim.simulate_response(initial_question, user_profile)
qelm.process_user_response(response1)
print(f"\n🤖 QELM: {initial_question}")
print(f"👤 User: {response1}")

# System asks first question based on user input
question1 = qelm.select_next_question(explore=False, verbose=True)
print(f"\n🤖 QELM: {question1}")

# User responds with simulator
response2 = user_sim.simulate_response(question1, user_profile)
qelm.process_user_response(response2)
print(f"👤 User: {response2}")

# System asks second question
question2 = qelm.select_next_question(explore=False, verbose=True)
print(f"\n🤖 QELM: {question2}")

# User responds again
response3 = user_sim.simulate_response(question2, user_profile)
qelm.process_user_response(response3)
print(f"👤 User: {response3}")

# Get final recommendations based on full conversation
conv_state = qelm.encode_conversation_state()
final_recs = rec_trainer.recommend(conv_state, top_k=10)

print(f"\n\n📽️ FINAL RECOMMENDATIONS:")
for i, (movie_id, title, score) in enumerate(final_recs, 1):
    print(f"{i}. {title}: {score:.3f}")

# Check if any user's liked movies are in recommendations
user_liked_titles = set(user_profile['liked_movies'])
recommended_titles = set([title for _, title, _ in final_recs])
hits = user_liked_titles & recommended_titles

if hits:
    print(f"\n✅ SUCCESS: Recommended {len(hits)} movies user actually likes!")
    print(f"   Hits: {', '.join(list(hits)[:3])}")
else:
    print(f"\n⚠️  No direct hits, but recommendations match user's genre preferences")

In [None]:
# Test full QELM system
from qelm.models.embedding_qelm import EmbeddingQLEM

# Note: This will use GPT for question generation (requires API key)
qelm = EmbeddingQLEM(movielens_data_path=None)

# Load trained RL weights
qelm.rl_agent.actor.load_state_dict(
    torch.load(checkpoint_path, weights_only=False)['actor_state_dict']
)

print("✓ Loaded trained RL actor")

In [None]:
# Simulate a conversation
print("\n" + "="*60)
print("QELM CONVERSATION DEMO")
print("="*60)

# User starts the conversation
response1 = "I love Christopher Nolan films, especially Inception and Interstellar"
qelm.process_user_response(response1)
print(f"\n👤 User: {response1}")

# System asks first question based on user input
question1 = qelm.select_next_question(explore=False, verbose=True)
print(f"\n🤖 QELM: {question1}")

# User responds
response2 = "I also enjoy psychological thrillers with complex narratives"
qelm.process_user_response(response2)
print(f"\n👤 User: {response2}")

# System asks second question
question2 = qelm.select_next_question(explore=False, verbose=True)
print(f"\n🤖 QELM: {question2}")

# Get recommendations based on full conversation
conv_state = qelm.encode_conversation_state()
final_recs = rec_trainer.recommend(conv_state, top_k=5)

print(f"\n\n📽️ RECOMMENDATIONS:")
for i, (movie_id, title, score) in enumerate(final_recs, 1):
    print(f"{i}. {title}")

## 6. Download Checkpoints (Optional)

In [None]:
# Download checkpoints to local machine
from google.colab import files

# Zip checkpoints
!zip -r qelm_checkpoints.zip /content/drive/MyDrive/qelm_checkpoints/

# Download
files.download('qelm_checkpoints.zip')

## Summary

**What we trained:**
1. ✅ Stage 1: RL actor to predict semantic embeddings
2. ✅ Two-Tower Recommender: Dialogue state → Movie recommendations

**Checkpoints saved to:**
- `/content/drive/MyDrive/qelm_checkpoints/stage1_final.pt`
- `/content/drive/MyDrive/qelm_checkpoints/recommender_final.pt`

**Next steps:**
- Train on real MovieLens data
- Scrape more Reddit conversations
- Train Stage 3 (end-to-end RL with reward)