## Fine-Tuning an LLM for Movie Recommendations

#### Goals
- Improve the model's relevance in generating recommendations.
- Address challenges of domain-specific language and sparse user data.
- Create a deployable recommendation pipeline.

#### Requirements

- A pre-trained LLM (e.g., GPT-3, GPT-4, or an open-source model like GPT-J).
- A fine-tuning framework such as Hugging Face Transformers.
- A movie dataset (e.g., MovieLens or IMDb).
- Compute resources (GPU recommended).

In [20]:
# %pip install --upgrade datasets
# %pip install datasets accelerate
# %pip install protobuf==3.20.*
# %pip install peft==0.10.0
# %pip install transformers==4.37.2
# %pip install --upgrade "transformers>=4.37.0"
# %pip install rapidfuzz
# %pip show peft transformers datasets

In [21]:
import yaml
import os
os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'

# Read the YAML file
with open('./../../../Curify/curify_api.yaml', 'r') as yaml_file:
    data = yaml.safe_load(yaml_file)

#### Step 1: Preprocess Movielens Data and Format Input-Output Pairs

Use a dataset containing:

- Movie descriptions (e.g., summaries, genres, metadata).
- User-item interaction data (e.g., ratings, reviews, watch history).

Input-Output Pair Formatting:

- Input: A prompt with user context, such as:

`User preferences: [list of liked movies]. Recommend 3 movies based on their taste.`

- Output: A list of relevant recommendations or a response explaining the recommendations.

In [22]:
# # Define download URL and target directory
# url="https://files.grouplens.org/datasets/movielens/ml-10m.zip"
# movielens_zip="./../Data/movielens.zip"
# target_dir="./../Data/"

# # Ensure the target directory exists
# mkdir -p $target_dir

# # Download the dataset
# wget -q $url -O $movielens_zip

# # Unzip the dataset to the target directory
# unzip -q $movielens_zip -d $target_dir

# echo "Dataset downloaded and extracted to $target_dir"

In [23]:
def prepare_finetuning_data(data, user_id, n_history=5, n_target=3):
    """
    Prepares prompt-response pairs for fine-tuning a movie recommendation LLM.
    - Uses a few liked/disliked movies as history.
    - Predicts the next liked movies.
    """
    user_data = data[data['userId'] == user_id].sort_values(by='timestamp')

    # Ensure enough data for history + target
    if len(user_data) < (n_history + n_target):
        return None

    # Split: last n_history interactions for history, rest is future
    past = user_data.iloc[:n_history]
    future = user_data.iloc[n_history:]

    liked = past[past['rating'] >= 4]['title'].tolist()
    disliked = past[past['rating'] <= 2]['title'].tolist()
    targets = future[future['rating'] >= 4]

    if targets.empty:
        return None

    # Prompt construction
    parts = ["You are a helpful movie recommendation assistant."]
    if liked:
        parts.append("The user liked: " + ", ".join(liked) + ".")
    if disliked:
        parts.append("The user disliked: " + ", ".join(disliked) + ".")

    parts.append("Recommend new movies the user might enjoy next. Don't repeat any from the history. Output only movie titles separated by semicolons (`;`).")

    prompt = " ".join(parts)
    response = "; ".join(targets['title'].tolist())

    return {
        "user_id": user_id,
        "prompt": prompt,
        "history": parts[1:-1],  # optional: useful for debugging
        "response": response,
        "liked_movies": targets['movieId'].tolist()
    }


#### Step 2: LoRA (Parameter-Efficient Fine-Tuning)

LoRA requires fewer resources by updating a subset of parameters.


In [24]:
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments
)
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset
import torch
import time

def train_lora_model_olmo(train_samples, output_dir="./lora_finetuned_model", eval_samples=None, base_model_name="amd/AMD-OLMo-1B-SFT"):
    """
    Train a LoRA-adapted OLMo causal LM model on prompt-response movie recommendation data.

    Args:
        train_samples (list): List of samples, each with 'prompt', 'response'.
        output_dir (str): Directory to save the model.
        eval_samples (list): Optional evaluation samples.
        base_model_name (str): Hugging Face model ID.

    Returns:
        model: The trained model.
    """
    try:
        # Load tokenizer and model
        tokenizer = AutoTokenizer.from_pretrained(base_model_name)
        model = AutoModelForCausalLM.from_pretrained(base_model_name)

        # Apply LoRA configuration
        lora_config = LoraConfig(
            task_type=TaskType.CAUSAL_LM,
            r=8,
            lora_alpha=16,
            lora_dropout=0.1,
            target_modules=["q_proj", "v_proj"]  # Adjust if needed
        )
        model = get_peft_model(model, lora_config)

        # Tokenization function (causal LM style)
        def tokenize_function(example):
            full_text = example["prompt"] + "\n" + example["response"]
            tokenized = tokenizer(
                full_text,
                truncation=True,
                padding="max_length",
                max_length=512,
                return_tensors="pt"
            )
            tokenized = {k: v[0] for k, v in tokenized.items()}  # Convert to non-batch
            tokenized["labels"] = tokenized["input_ids"].clone()
            return tokenized

        # Convert to HuggingFace Dataset
        train_dataset = Dataset.from_list(train_samples)
        train_dataset = train_dataset.map(tokenize_function, remove_columns=["prompt", "history", "response"])

        if eval_samples:
            eval_dataset = Dataset.from_list(eval_samples)
            eval_dataset = eval_dataset.map(tokenize_function, remove_columns=["prompt", "history", "response"])
        else:
            split = train_dataset.train_test_split(test_size=0.1)
            train_dataset = split["train"]
            eval_dataset = split["test"]

        # Collator for Causal LM
        data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

        # Training arguments
        training_args = TrainingArguments(
            output_dir=output_dir,
            per_device_train_batch_size=4,
            per_device_eval_batch_size=4,
            num_train_epochs=3,
            evaluation_strategy="epoch",
            save_strategy="epoch",
            logging_dir="./logs",
            save_total_limit=2,
            report_to="none"
        )

        # Trainer
        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            tokenizer=tokenizer,
            data_collator=data_collator
        )

        # Training loop
        start_time = time.time()
        trainer.train()
        print(f"Training completed in {time.time() - start_time:.2f} seconds.")

        trainer.save_model(output_dir)
        print(f"Model saved to {output_dir}")

        return model

    except Exception as e:
        print(f"An error occurred during training: {e}")
        return False


#### Step 3: Analyze and Compare

- Compare methods using metrics like precision@K, recall@K.
- training 100 examples: 293.55438566207886

In [25]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from rapidfuzz import process
from sklearn.metrics import ndcg_score
import numpy as np
import re

def map_title_to_id(title, movie_lookup, threshold=80):
    result = process.extractOne(title, movie_lookup.keys())
    if result is None:
        return None
    match, score, _ = result  # Correct unpacking for rapidfuzz
    return movie_lookup[match] if score >= threshold else None

def build_prompt_with_few_shots(few_shot_examples, target_history, num_recommend=20):
    """
    Constructs a prompt with few-shot examples and target user history.
    """
    prompt = "You are a movie recommendation assistant.\n\n"
    prompt += "Here are a few examples:\n\n"

    # Add few-shot examples
    for example in few_shot_examples:
        prompt += f"{example['history']}"
        prompt += "Recommended movies: " + example["response"] + "\n\n"

    # Add target history and instruction
    prompt += f"{target_history}"
    prompt += f"Based on this history, recommend exactly {num_recommend} movies the user will like. \n\n"
    prompt += "Do NOT repeat any movies from the user's history.\n"
    prompt += "Output only the movie titles, separated by semicolons `;`, ordered by relevance."

    return prompt

def extract_recommended_titles(generated_text, num_recommend=20):
    """
    Extract movie titles from generated LLM output with robustness to varied separators and formats.
    Handles:
      - Delimiters: newline (\n), semicolon (;), comma (,)
      - Missing parentheses around year: e.g., 'Movie Title 1998' → 'Movie Title (1998)'
    """
    # Step 1: Cut text after 'Recommended movies:'
    match = re.search(r"(Recommended movies:|recommend the following movies:)", generated_text, re.IGNORECASE)
    text_after = generated_text[match.end():].strip() if match else generated_text.strip()

    # Step 2: Normalize delimiters → replace \n and semicolons with commas
    text_after = text_after.replace("\n", ",").replace(";", ",")

    # Step 3: Split by comma, clean whitespace
    raw_titles = [t.strip() for t in text_after.split(",") if t.strip()]

    # Step 4: Clean titles and fix missing parentheses for year
    cleaned_titles = []
    for title in raw_titles:
        # Match ending year (e.g., "Movie Title 1998")
        match = re.match(r"^(.*?)(?:\s+(\d{4}))?$", title)
        if match:
            title_name = match.group(1).strip()
            year = match.group(2)
            if year and not title_name.endswith(f"({year})"):
                title_name = f"{title_name} ({year})"
            cleaned_titles.append(title_name)

    # Step 5: Deduplicate while preserving order
    seen = set()
    final_titles = []
    for title in cleaned_titles:
        if title not in seen:
            final_titles.append(title)
            seen.add(title)

    return final_titles[:num_recommend]

def generate_recommendations(model, movie_lookup, eval_samples, num_recommend=20, is_fine_tuned=True, few_shot_samples=None):
    tokenizer = AutoTokenizer.from_pretrained(model.config._name_or_path)
    user_recommendations = {}

    for sample in eval_samples:
        # Build prompt
        if few_shot_samples:
            input_text = build_prompt_with_few_shots(few_shot_samples, sample['history'], num_recommend)
        else:
            input_text = sample['prompt']  + f"\nRecommended movies:"

        # Generate
        inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
        outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, top_k=50, top_p=0.95)
        generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Extract titles
        generated_titles = extract_recommended_titles(generated_text, num_recommend)

        print("\n[INPUT PROMPT]\n", input_text)
        print("[GENERATED TEXT]\n", generated_text)
        print("[GENERATED TITLES]\n", generated_titles)

        # Fuzzy match to movie IDs
        recommended_movies = []
        for title in generated_titles:
            movie_id = map_title_to_id(title, movie_lookup)
            if movie_id:
                recommended_movies.append(movie_id)

        user_recommendations[sample["user_id"]] = recommended_movies

    return user_recommendations

# Step 2: Evaluate Recommendations at Multiple k's
def evaluate_recommendations(user_recommendations, eval_samples, movie_lookup, ks=[5, 10, 20]):
    results = {k: {"precision": [], "recall": [], "ndcg": []} for k in ks}

    for sample in eval_samples:
        user_id = sample["user_id"]
        if user_id not in user_recommendations:
            continue

        recommended_movies = user_recommendations[user_id]
        ground_truth_movies = sample["liked_movies"]

        if not ground_truth_movies:
            continue

        for k in ks:
            rec_at_k = recommended_movies[:k]
            relevance = [1 if movie in ground_truth_movies else 0 for movie in rec_at_k]

            precision = sum(relevance) / k
            recall = sum(relevance) / len(ground_truth_movies)

            predicted_scores = list(range(k, 0, -1))
            try:
                ndcg = ndcg_score([relevance], [predicted_scores], k=k)
            except:
                ndcg = 0.0

            results[k]["precision"].append(precision)
            results[k]["recall"].append(recall)
            results[k]["ndcg"].append(ndcg)

    # Aggregate averages
    aggregated = {
        f"@{k}": {
            "Precision": round(np.mean(results[k]["precision"]), 4),
            "Recall": round(np.mean(results[k]["recall"]), 4),
            "NDCG": round(np.mean(results[k]["ndcg"]), 4)
        }
        for k in ks
    }

    return aggregated


In [26]:
import pandas as pd
import json
import time
import random

movielens_folder = "./../Data/ml-1m/"

# Read ratings.dat with the correct delimiter and encoding
ratings = pd.read_csv(
    f"{movielens_folder}ratings.dat", 
    sep="::", 
    engine="python", 
    header=None, 
    names=["userId", "movieId", "rating", "timestamp"], 
    encoding="ISO-8859-1"
)

# Read movies.dat with the correct encoding
movies = pd.read_csv(
    f"{movielens_folder}movies.dat", 
    sep="::", 
    engine="python", 
    header=None, 
    names=["movieId", "title", "genres"], 
    encoding="ISO-8859-1"
)

movie_lookup = movies.set_index('title')['movieId'].to_dict()

# Merge ratings with movie metadata
data = ratings.merge(movies, on="movieId")

# Convert timestamp to datetime format
data["timestamp"] = pd.to_datetime(data["timestamp"], unit="s")

# Example usage
user_samples = [prepare_finetuning_data(data, user_id) for user_id in data['userId'].unique()]
user_samples = [sample for sample in user_samples if sample]
random.shuffle(user_samples)

# Define number of samples
n_train = 500
n_test = 300

# Split into train and eval
train_samples = user_samples[:n_train]
eval_samples = user_samples[n_train:n_train + n_test]

fewshot_samples = random.sample(train_samples, 5)

# Track fine-tuning time
start_finetune = time.time()
fine_tuned_model = train_lora_model_olmo(train_samples, eval_samples=eval_samples)
end_finetune = time.time()
finetune_time = end_finetune - start_finetune

# Load the non-fine-tuned model
non_fine_tuned_model = AutoModelForCausalLM.from_pretrained("amd/AMD-OLMo-1B-SFT")

# Track evaluation time
start_eval = time.time()
reco_FT = generate_recommendations(fine_tuned_model, movie_lookup, eval_samples, is_fine_tuned=True)
reco_zeroshot = generate_recommendations(non_fine_tuned_model, movie_lookup, eval_samples, is_fine_tuned=False)
reco_fewshot = generate_recommendations(non_fine_tuned_model, movie_lookup, eval_samples, is_fine_tuned=False, few_shot_samples=fewshot_samples)

eval_FT = evaluate_recommendations(reco_FT, eval_samples, movie_lookup)
eval_zeroshot = evaluate_recommendations(reco_zeroshot, eval_samples, movie_lookup)
eval_fewshot = evaluate_recommendations(reco_fewshot, eval_samples, movie_lookup)
end_eval = time.time()
eval_time = end_eval - start_eval

# Serialize results with time metrics
results_combined = {
    "Fine-tuning": {
        "metrics": eval_FT,
        "finetune_time_seconds": finetune_time
    },
    "Zeroshot": {
        "metrics": eval_zeroshot
    },
    "Fewshot": {
        "metrics": eval_fewshot
    },
    "Evaluation_time_seconds": eval_time
}
# Save to JSON file
with open("fine_tuning.json", "w") as f:
    json.dump(results_combined, f, indent=2, default=str)


Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/300 [00:00<?, ? examples/s]

  trainer = Trainer(


Epoch,Training Loss,Validation Loss
1,No log,1.246373


In [None]:
results_combined

In [None]:
print(1)

In [None]:
print(1)

In [33]:
print(1)

1
