# Stream Scouter

This notebook demonstrates a recommendation system that leverages the power of Sentence Transformers. We load a dataset of Netflix titles and their descriptions, compute embeddings for each description, and then, given a user query, return the top recommendations based on the cosine similarity between the query and the movie/TV show descriptions.

**Libraries used:**
- **torch**: For PyTorch operations and handling CUDA if available.
- **pandas & numpy**: For data manipulation.
- **sentence_transformers**: To obtain pre-trained models that compute sentence embeddings.
- **rich**: For displaying the recommendations in a colorful and formatted table.

Follow the cells to see how the data is loaded, processed, and how recommendations are generated.

# Installing all requirements

In [None]:
!pip install pandas numpy sentence_transformers rich
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

In [None]:
# Import necessary libraries
import torch
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer, util
from rich.console import Console
from rich.table import Table
from rich import box

# Determine whether to use a GPU (CUDA) if available or fall back to CPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
console = Console()
console.print(f"[bold green]Using device: {device}[/bold green]")

## Loading and Preparing the Data

We load the `netflix_titles.csv` file into a pandas DataFrame. Since our recommendation system is based on movie descriptions, we drop any entries that do not have a description.

In [None]:
# Load the Netflix dataset and remove entries with missing descriptions
df = pd.read_csv('netflix_titles.csv')
df = df.dropna(subset=['description']).reset_index(drop=True)

## Loading the Pre-trained Model and Computing Embeddings

We use the multilingual model `paraphrase-multilingual-MiniLM-L12-v2` from Sentence Transformers. This model is capable of generating embeddings for descriptions in multiple languages. We then compute embeddings for every movie/TV show description in our dataset.

In [None]:
# Load the pre-trained Sentence Transformer model on the specified device
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2', device=device)

# Extract descriptions from the DataFrame and compute embeddings
descriptions = df['description'].tolist()
desc_embeddings = model.encode(descriptions, convert_to_tensor=True, device=device)

## Recommendation Function

The `recommend_movie` function takes a user query and returns the top `n` recommended movies/TV shows based on the cosine similarity between the query embedding and the description embeddings.

**Steps:**
1. Compute the embedding for the query.
2. Compute cosine similarity scores between the query and all movie descriptions.
3. Identify the indices corresponding to the top recommendations.
4. Return a DataFrame with the recommended titles and additional information.

In [None]:
def recommend_movie(query, model, desc_embeddings, df, top_n=5):
    # Compute embedding for the query
    query_embedding = model.encode(query, convert_to_tensor=True, device=device)
    
    # Calculate cosine similarity between the query and all movie descriptions
    cosine_scores = util.cos_sim(query_embedding, desc_embeddings)[0]
    
    # Move scores to CPU and convert to a numpy array for further processing
    cosine_scores_cpu = cosine_scores.cpu().numpy()
    
    # Get indices of the top_n most similar descriptions
    top_indices = np.argpartition(-cosine_scores_cpu, range(top_n))[:top_n]
    top_indices = top_indices[np.argsort(-cosine_scores_cpu[top_indices])]
    
    # Prepare a DataFrame with the top recommendations and compute match percentage
    recommended_df = df.iloc[top_indices][['title', 'description', 'listed_in', 'release_year']].copy()
    recommended_df['match_percentage'] = cosine_scores_cpu[top_indices] * 100
    return recommended_df

## Displaying Recommendations

The `display_recommendations` function uses the `rich` library to display the recommendations in a well-formatted and colorful table. Each row includes the movie/TV show title, description, genre (listed_in), release year, and the matching percentage.

In [None]:
def display_recommendations(recommendations: pd.DataFrame):
    table = Table(
        title="[bold bright_blue]Recommended movies/TV Shows[/bold bright_blue]",
        title_style="bold underline",
        box=box.DOUBLE_EDGE,
        border_style="bright_green",
        show_lines=True,
        padding=(0, 1)
    )

    # Define table columns with styling
    table.add_column("Name", style="bold cyan", no_wrap=False)
    table.add_column("Description", style="green", no_wrap=False, overflow="fold", justify="left")
    table.add_column("Listed in", style="magenta", no_wrap=False)
    table.add_column("Release Year", style="yellow", no_wrap=True)
    table.add_column("Match", style="bright_red", no_wrap=True)

    # Add each recommendation as a row in the table
    for _, row in recommendations.iterrows():
        table.add_row(
            f"[bold]{row['title']}[/bold]",
            row['description'],
            row['listed_in'],
            str(row['release_year']),
            f"{row['match_percentage']:.2f}%"
        )

    console.print(table)

## Interactive Query Loop

The final cell creates an interactive loop where the user can enter a query (e.g., a movie description or keywords) and receive the top recommendations based on semantic similarity. Type `'exit'` to stop the loop.

In [None]:
# Interactive loop to receive user queries and display recommendations
while True:
    query = input("Enter query (or 'exit' to stop): ")
    if query.lower() == 'exit':
        break
    # Retrieve top 3 recommendations for the query
    recommendations = recommend_movie(query, model, desc_embeddings, df, top_n=3)
    display_recommendations(recommendations)