##### Copyright 2025 Google LLC.

In [1]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Movie Recommendation System with Gemini API and Qdrant

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/qdrant/Movie_Recommendation.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

## Overview

The [Gemini API](https://ai.google.dev/models/gemini) provides access to a family of generative AI models for generating content and solving problems. These models are designed and trained to handle both text and images as input.

[Qdrant](https://qdrant.tech/) is an open-source vector similarity search engine designed for efficient and scalable semantic search. It offers a simple yet powerful API to store and search high-dimensional vectors, supports filtering with metadata (payloads), and integrates easily into production systems. Qdrant can be self-hosted or accessed via its managed cloud service, making it quick to set up and ideal for a wide range of AI applications that rely on semantic understanding and retrieval.

In this notebook, you'll learn how to perform a similarity search on data from a website with the help of Gemini API and Qdrant.


<!-- Community Contributor Badge -->
<table>
  <tr>
    <!-- Author Avatar Cell -->
    <td bgcolor="#d7e6ff">
      <a href="https://github.com/andycandy" target="_blank" title="View Anand Roy's profile on GitHub">
        <img src="https://github.com/andycandy.png?size=100"
             alt="andycandy's GitHub avatar"
             width="100"
             height="100">
      </a>
    </td>
    <!-- Text Content Cell -->
    <td bgcolor="#d7e6ff">
      <h2><font color='black'>This notebook was contributed by <a href="https://github.com/andycandy" target="_blank"><font color='#217bfe'><strong>Anand Roy</strong></font></a>.</font></h2>
      <h5><font color='black'><a href="https://www.linkedin.com/in/anand-roy-61a2b529b"><font color="#078efb">LinkedIn</font></a> - See <a href="https://github.com/andycandy" target="_blank"><font color="#078efb"><strong>Anand</strong></font></a> other notebooks <a href="https://github.com/search?q=repo%3Agoogle-gemini%2Fcookbook%20%22Anand%20Roy%22&type=code" target="_blank"><font color="#078efb">here</font></a>.</h5></font><br>
      <!-- Footer -->
      <font color='black'><small><em>Have a cool Gemini example? Feel free to <a href="https://github.com/google-gemini/cookbook/blob/main/CONTRIBUTING.md" target="_blank"><font color="#078efb">share it too</font></a>!</em></small></font>
    </td>
  </tr>
</table>

## Setup

First, you must install the packages and set the necessary environment variables.

### Installation

Install google's python client SDK for the Gemini API, `google-genai`. Next, install Qdrant's Python client SDK, `qdrant-client`.

In [2]:
%pip install -q "google-genai>=1.0.0"
%pip install -q protobuf==4.25.1 qdrant-client[fastembed]

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/294.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m286.7/294.6 kB[0m [31m48.6 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m286.7/294.6 kB[0m [31m48.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.6/294.6 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m327.7/327.7 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.6/61.6 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.6/101.6 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [3]:
from google.colab import userdata
from google import genai

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
client = genai.Client(api_key=GOOGLE_API_KEY)

## Building the Movie Vector Index
This section covers preparing the movie dataset, generating embeddings using Gemini, and indexing them in Qdrant for similarity search.

### 1. Load the Dataset from Kaggle

Begin by loading the dataset from Kaggle using the kagglehub library. The dataset used in this notebook is the [TMDB Movie Dataset 2024](https://www.kaggle.com/datasets/asaniczka/tmdb-movies-dataset-2023-930k-movies), which contains approximately 1 Million+ movie entries.

In [4]:
import kagglehub
from kagglehub import KaggleDatasetAdapter

file_path = "TMDB_movie_dataset_v11.csv"

df = kagglehub.load_dataset(
  KaggleDatasetAdapter.PANDAS,
  "asaniczka/tmdb-movies-dataset-2023-930k-movies",
  file_path,
)


  df = kagglehub.load_dataset(


Downloading from https://www.kaggle.com/api/v1/datasets/download/asaniczka/tmdb-movies-dataset-2023-930k-movies?dataset_version_number=596&file_name=TMDB_movie_dataset_v11.csv...


100%|██████████| 533M/533M [00:10<00:00, 51.0MB/s]


### 2. Inspect the Dataset Structure

Since the dataset is large, inspecting it helps you identify useful fields and filter out irrelevant data early on.

In [5]:
print("\nDataset Columns:")
print(df.columns)

print("\nMissing Values per Column:")
print(df.isnull().sum())

print(f"\nNumber of rows: {len(df)}")
print(f"Number of unique IDs: {df['id'].nunique()}")


Dataset Columns:
Index(['id', 'title', 'vote_average', 'vote_count', 'status', 'release_date',
       'revenue', 'runtime', 'adult', 'backdrop_path', 'budget', 'homepage',
       'imdb_id', 'original_language', 'original_title', 'overview',
       'popularity', 'poster_path', 'tagline', 'genres',
       'production_companies', 'production_countries', 'spoken_languages',
       'keywords'],
      dtype='object')

Missing Values per Column:
id                            0
title                        13
vote_average                  0
vote_count                    0
status                        0
release_date             231604
revenue                       0
runtime                       0
adult                         0
backdrop_path            916971
budget                        0
homepage                1107810
imdb_id                  611252
original_language             0
original_title               13
overview                 264156
popularity                    0
poster_path 

### 3. Filter and Clean the Dataset

This step filters the dataset to keep only metadata useful for semantic search: `id`, `title`, `overview`, `genres`, `keywords`, `tagline`, and `release_date`. These fields provide enough context to generate meaningful embeddings.

Entries (rows) missing a `title` or lacking both `overview` and `genres` are removed, as they don’t have enough descriptive data for accurate recommendations.


In [6]:
import pandas as pd
import numpy as np
import ast
print(f"Original rows: {len(df)}")

columns_to_keep = ['id', 'title', 'overview', 'genres', 'keywords', 'tagline', 'release_date']

df_relevant = df[columns_to_keep].copy()

print(f"Rows before dropping missing title: {len(df_relevant)}")
df_relevant.dropna(subset=['title'], inplace=True)
df_relevant = df_relevant[~(df_relevant['genres'].isna() & df_relevant['overview'].isna())]
print(f"Rows after dropping missing title and dropping missing (genres and overview): {len(df_relevant)}")

# Fill missing text columns with empty strings
text_cols_to_fill = ['overview', 'genres', 'keywords', 'tagline']
for col in text_cols_to_fill:
    df_relevant[col] = df_relevant[col].fillna('')


# Extract release year from the release_date string
def get_year(date_str):
    if pd.isna(date_str) or not isinstance(date_str, str) or len(date_str) < 4:
        return None
    try:
        return int(date_str[:4])
    except (ValueError, TypeError):
        return None

df_relevant['release_year'] = df_relevant['release_date'].apply(get_year)

print("\nSample data after cleaning (keeping missing overviews):")
print(df_relevant[['id', 'title', 'overview', 'genres', 'keywords', 'tagline', 'release_year']].head())

Original rows: 1237355
Rows before dropping missing title: 1237355
Rows after dropping missing title and dropping missing (genres and overview): 1097135

Sample data after cleaning (keeping missing overviews):
       id            title                                           overview  \
0   27205        Inception  Cobb, a skilled thief who commits corporate es...   
1  157336     Interstellar  The adventures of a group of explorers who mak...   
2     155  The Dark Knight  Batman raises the stakes in his war on crime. ...   
3   19995           Avatar  In the 22nd century, a paraplegic Marine is di...   
4   24428     The Avengers  When an unexpected enemy emerges and threatens...   

                                        genres  \
0           Action, Science Fiction, Adventure   
1            Adventure, Drama, Science Fiction   
2               Drama, Action, Crime, Thriller   
3  Action, Adventure, Fantasy, Science Fiction   
4           Science Fiction, Action, Adventure   

  

### 4. Prepare Text for Embedding

This step prepares the movie metadata for embedding by combining relevant fields into a single structured text string. This representation includes the title, overview, genres, keywords, tagline, and release year (if available). The output is stored in a new column called `text_for_embedding`.

Embeddings are numerical vector representations of text that capture semantic meaning and relationships. These vectors can be used for tasks like similarity search and clustering.
Learn more about [text embeddings](https://ai.google.dev/gemini-api/docs/embeddings) and explore the [Gemini embedding notebook](../../quickstarts/Embeddings.ipynb).

In [7]:
def create_embedding_text(row):
    """Combines available movie metadata into a single string for embedding."""
    # Title is always present, so it can be included directly
    title_str = f"Title: {row['title']}"
    overview_str = f"Overview: {row['overview']}" if row['overview'] else ""
    year_str = f"Release Year: {int(row['release_year'])}" if pd.notna(row['release_year']) else ""
    genre_str = f"Genres: {row['genres']}" if row['genres'] else ""
    keywords_str = f"Keywords: {row['keywords']}" if row['keywords'] else ""
    tagline_str = f"Tagline: {row['tagline']}" if row['tagline'] else ""

    parts = [
        title_str,
        overview_str,
        year_str,
        genre_str,
        keywords_str,
        tagline_str
    ]
    return "\n".join(part for part in parts if part)

df_relevant['text_for_embedding'] = df_relevant.apply(create_embedding_text, axis=1)

# Use this to inspect how movie data has been transformed for embedding
print(df_relevant[['id', 'title', 'text_for_embedding']].head())

       id            title                                 text_for_embedding
0   27205        Inception  Title: Inception\nOverview: Cobb, a skilled th...
1  157336     Interstellar  Title: Interstellar\nOverview: The adventures ...
2     155  The Dark Knight  Title: The Dark Knight\nOverview: Batman raise...
3   19995           Avatar  Title: Avatar\nOverview: In the 22nd century, ...
4   24428     The Avengers  Title: The Avengers\nOverview: When an unexpec...


### 5. Sample a Subset for Development

To keep the notebook easy to run and ensure efficient development, you’ll want to iterate quickly and minimize resource usage. Instead of using the full dataset, this step samples 5,000 movies from the cleaned data, unless the dataset is already smaller, in which case all entries are used.

In [8]:
SAMPLE_SIZE = 5000

if len(df_relevant) > SAMPLE_SIZE:
    print(f"\nTaking a random sample of {SAMPLE_SIZE} movies for development.")
    df_sample = df_relevant.sample(n=SAMPLE_SIZE, random_state=42)
else:
    print(f"\nCleaned dataset size ({len(df_relevant)}) is smaller than or equal to SAMPLE_SIZE. Using the full cleaned dataset.")
    df_sample = df_relevant

print(f"Working with {len(df_sample)} movies for the next steps.")
print(df_sample[['id', 'title', 'release_year']].head())

columns_for_payload = ['title', 'overview', 'genres', 'keywords', 'tagline', 'release_year']
columns_final = ['id', 'text_for_embedding'] + columns_for_payload
df_sample = df_sample[columns_final]

print("\nFinal sample DataFrame structure for embedding/indexing:")
print(df_sample.info())



Taking a random sample of 5000 movies for development.
Working with 5000 movies for the next steps.
              id                             title  release_year
1022852   913650                             SERYO        2015.0
1081047   990808  Ang Galing-galing Mo, Mrs. Jones        1980.0
191884    422235                             Hedda        2016.0
1007202   535478                          Deducked        2018.0
393540   1238198                 Songs of Paradise           NaN

Final sample DataFrame structure for embedding/indexing:
<class 'pandas.core.frame.DataFrame'>
Index: 5000 entries, 1022852 to 604641
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   id                  5000 non-null   int64  
 1   text_for_embedding  5000 non-null   object 
 2   title               5000 non-null   object 
 3   overview            5000 non-null   object 
 4   genres              5000 non-null   object 

### 6. Initialize Qdrant for Vector Indexing


With the data prepared, the next step is to set up **Qdrant**, a vector similarity search engine optimized for storing and querying high-dimensional vectors. It supports fast indexing, filtering, and similarity search across millions of vectors.

Qdrant can run:

* Locally as a standalone service
* In the cloud for production deployments
* Or entirely **in-memory** for fast, temporary use during development

In this notebook, Qdrant is initialized using in-memory mode by passing `":memory:"` to the client. This stores data only in RAM, meaning it **will not persist after the session ends**. This is suitable for experimentation but not for saving results long-term.

You also configure the following:

* `COLLECTION_NAME`: The name of the Qdrant collection to store movie vectors
* `VECTOR_SIZE`: Set to `768` to match the dimensionality of the text embeddings generated by Gemini
* `DISTANCE_METRIC`: Set to **cosine distance**, which is ideal for measuring semantic similarity between embedding vectors

In [9]:
from qdrant_client import QdrantClient, models
import time

COLLECTION_NAME = "tmdb_movies_sample"

VECTOR_SIZE = 768
DISTANCE_METRIC = models.Distance.COSINE


# Initialize Qdrant client using in-memory storage
qdrant_client = QdrantClient(":memory:")

### 7. Define Batch Embedding Function

This step defines the `get_embeddings_batch` function, which generates text embeddings for batches of movie data using the Gemini embedding model (`embedding-001`) including automatic retries for robustness.


In [10]:
import time
from google.api_core import exceptions, retry

MODEL_FOR_EMBEDDING = "embedding-001" # @param ["embedding-001", "text-embedding-004","gemini-embedding-exp-03-07"] {"allow-input":true, isTemplate: true}

BATCH_SIZE = 100
QDRANT_BATCH_SIZE = 768


@retry.Retry(timeout=3000)
def get_embeddings_batch(texts: list[str], task_type="RETRIEVAL_DOCUMENT") -> list[list[float]] | None:
    """
    Generates embeddings for a batch of texts using Gemini API with retry.

    Args:
        texts: A list of strings to embed.
        task_type: The task type for the embedding model.

    Returns:
        A list of embedding vectors (list of floats), or None if a non-retryable error occurs.
    """
    if not texts:
        return []
    try:
        response = client.models.embed_content(
          model=MODEL_FOR_EMBEDDING,
          contents=texts,
          config={
            "task_type":task_type,
          }
        )
        return response.embeddings
    except exceptions.RetryError as e:
        print(f"Embedding batch failed after retries: {e}")
        return None
    except Exception as e:
        print(f"An unexpected error occurred during embedding: {e}")
        return None

# Example of what an embedding looks like
sample_embedding = get_embeddings_batch(["Example movie about space and survival"])[0]
print("Example embedding vector:", sample_embedding.values[:10])


Example embedding vector: [0.04476204, -0.021633359, -0.08440986, -0.0115498435, 0.06313622, 0.011256916, 0.0036218397, 0.018719628, 0.012621079, 0.03685386]


### 8. Create a Collection in Qdrant


A collection in Qdrant is like a table in a database, it stores vectors along with optional metadata (payload). Each collection has its own configuration, including vector size and similarity metric.

In [11]:
# In case someone tries running the whole notebook again they would want to create the collection again

try:
    qdrant_client.delete_collection(collection_name=COLLECTION_NAME)
    print(f"Existing collection '{COLLECTION_NAME}' deleted.")
except Exception as e:
    print(f"Error deleting collection (it might not exist): {e}")

try:
    qdrant_client.create_collection(
        collection_name=COLLECTION_NAME,
        vectors_config=models.VectorParams(
            size=VECTOR_SIZE,
            distance=DISTANCE_METRIC
        )
    )
    print(f"Collection '{COLLECTION_NAME}' created successfully.")
except Exception as e:
    print(f"Error creating collection: {e}")

Existing collection 'tmdb_movies_sample' deleted.
Collection 'tmdb_movies_sample' created successfully.


### 9. Create Payloads for Metadata Storage


In Qdrant, besides storing vector embeddings, you can attach additional information called payload to each vector. This metadata helps in filtering or retrieving relevant results based on attributes like title, genres, or release year.

The `create_payload` function prepares the payload by extracting specified columns from each movie record, handling missing values and ensuring data types are compatible with Qdrant.

In [12]:
payload_columns = [
    'title', 'overview', 'genres', 'keywords', 'tagline', 'release_year'
]

def create_payload(row, payload_columns):
    payload = {}
    for col in payload_columns:
        value = row[col]
        if pd.isna(value):
            payload[col] = None
        elif isinstance(value, (np.int64, np.int32)):
            payload[col] = int(value)
        elif isinstance(value, (np.float64, np.float32)):
             payload[col] = float(value)
        else:
            payload[col] = value
    return payload

### 10. Batch Embedding and Indexing to Qdrant


This step processes the sampled movies dataset in batches to generate vector embeddings using the Gemini API and upload (upsert) these embeddings along with their metadata payloads to the Qdrant collection.

**Key points of this process:**

* The dataset is divided into batches of size `BATCH_SIZE` for embedding generation to stay within API limits.
* Each batch's text data is sent to the Gemini embedding API with retries handled in the embedding function.
* For every successfully embedded batch, the code prepares **points** (each containing an ID, vector embedding, and metadata payload) to be uploaded to Qdrant.
* Points are buffered and uploaded in chunks of size `QDRANT_BATCH_SIZE` to optimize performance.
* The process includes error handling and retry logic to avoid failures halting the entire operation.
* At the end, any remaining points in the buffer are uploaded.
* Summary statistics of processed, failed, and successfully upserted items are printed.


In [13]:
from tqdm import tqdm

print(f"Starting batch embedding and indexing process for {len(df_sample)} movies...")
print(f"Using Gemini Batch Size: {BATCH_SIZE}, Qdrant Upsert Batch Size: {QDRANT_BATCH_SIZE}")

points_to_upsert_buffer = []
total_processed = 0
total_failed_embedding = 0
total_upserted = 0

num_batches = (len(df_sample) + BATCH_SIZE - 1) // BATCH_SIZE

for i in tqdm(range(0, len(df_sample), BATCH_SIZE), total=num_batches, desc="Processing Batches"):
    batch_df = df_sample.iloc[i : i + BATCH_SIZE]
    batch_texts = batch_df['text_for_embedding'].tolist()
    batch_ids = batch_df['id'].tolist()

    if not batch_texts:
        continue

    # Generate embeddings for the current batch of movie texts
    batch_embeddings = get_embeddings_batch(batch_texts, task_type="RETRIEVAL_DOCUMENT")

    # Check if embeddings were successfully generated and correspond to batch size
    if batch_embeddings and len(batch_embeddings) == len(batch_texts):
        for j in range(len(batch_ids)):
            item_id = batch_ids[j]
            item_embedding = batch_embeddings[j]
            row_data = batch_df.iloc[j]

            # Prepare metadata payload for this movie
            payload = create_payload(row_data, payload_columns)

            # Create a Qdrant PointStruct with id, embedding vector, and payload
            point = models.PointStruct(
                id=int(item_id),
                vector=item_embedding.values,
                payload=payload
            )
            points_to_upsert_buffer.append(point)

        total_processed += len(batch_ids)

    else:
        print(f"Failed to get embeddings for batch starting at index {i}. Skipping {len(batch_ids)} items.")
        total_failed_embedding += len(batch_ids)
        continue

    # Upload buffered points to Qdrant if buffer reached batch size or end of data
    if len(points_to_upsert_buffer) >= QDRANT_BATCH_SIZE or (i + BATCH_SIZE >= len(df_sample)):
        if points_to_upsert_buffer:
            try:
                qdrant_client.upsert(
                    collection_name=COLLECTION_NAME,
                    points=points_to_upsert_buffer,
                    wait=False
                )
                total_upserted += len(points_to_upsert_buffer)
                points_to_upsert_buffer = []
            except Exception as e:
                print(f"Error upserting chunk to Qdrant: {e}")
                points_to_upsert_buffer = []
                time.sleep(5)
                # Pause before retrying to avoid hammering the service after an error

# Upload any remaining points left in buffer after loop completion
if points_to_upsert_buffer:
    print(f"Upserting final remaining chunk of {len(points_to_upsert_buffer)} points.")
    try:
        qdrant_client.upsert(
            collection_name=COLLECTION_NAME,
            points=points_to_upsert_buffer,
            wait=True
        )
        total_upserted += len(points_to_upsert_buffer)
        points_to_upsert_buffer = []
    except Exception as e:
        print(f"Error upserting final chunk: {e}")

print("Batch embedding and indexing finished.")
print(f"Total items processed (attempted embedding): {total_processed}")
print(f"Total points successfully prepared for upsert: {total_upserted}")


Starting batch embedding and indexing process for 5000 movies...
Using Gemini Batch Size: 100, Qdrant Upsert Batch Size: 768


Processing Batches: 100%|██████████| 50/50 [00:52<00:00,  1.04s/it]

Batch embedding and indexing finished.
Total items processed (attempted embedding): 5000
Total points successfully prepared for upsert: 5000





In [14]:
# Waiting for collection to settle
time.sleep(5)

try:
    count = qdrant_client.count(collection_name=COLLECTION_NAME, exact=True)
    print(f"\nVerification: Collection '{COLLECTION_NAME}' now contains {count.count} points.") # it should print 5000

except Exception as e:
    print(f"Error verifying collection count: {e}")


Verification: Collection 'tmdb_movies_sample' now contains 5000 points.


### 11. Search and Recommend Similar Movies Using Vector Embeddings


With all movie vectors indexed in Qdrant, you can now perform semantic search. This allows you to take any user query (such as a phrase, movie description, or concept), convert it into an embedding using the same Gemini model, and retrieve the most similar movie vectors from the collection using cosine similarity.

This `recommend_movies` function demonstrates how to:

* Generate an embedding from your input query using the Gemini API.
* Perform a similarity search using Qdrant’s `search()` method.
* Retrieve the top `k` most similar movie entries, including their metadata and similarity scores.

This is the final step where the vector database functions as a recommendation engine.


In [15]:
def recommend_movies(query_text, top_k=5):
    """
    Finds movies similar to the query_text using the Qdrant index.

    Args:
        query_text (str): The user's query (e.g., movie title, description, theme).
        top_k (int): The maximum number of recommendations to return.

    Returns:
        list: A list of dictionaries, where each dictionary contains the
              payload (movie details) and similarity score of a recommended movie.
              Returns an empty list if query embedding fails or no results found.
    """
    print(f"Searching for recommendations based on: '{query_text}'")
    # Generate embedding for the user query using Gemini
    query_embedding = get_embeddings_batch(query_text, task_type="RETRIEVAL_QUERY")[0].values

    if query_embedding is None:
        print("Error: Could not generate embedding for the query.")
        return []

    try:
        # Perform a semantic search on Qdrant using the query vector
        search_result = qdrant_client.search(
            collection_name=COLLECTION_NAME,
            query_vector=query_embedding,
            limit=top_k,
            with_payload=True
        )

        recommendations = []
        if search_result:
            print(f"Found {len(search_result)} potential recommendations:")
            for hit in search_result:
                recommendation = {
                    "id": hit.id,
                    "score": hit.score,
                    "payload": hit.payload
                }
                recommendations.append(recommendation)
        else:
            print("No recommendations found matching the query.")

        return recommendations

    except Exception as e:
        print(f"Error during Qdrant search: {e}")
        return []

## Try Out Your Movie Recommender
You can now try querying your movie recommender by describing a theme, genre, or concept in natural language. The system will return the most semantically similar movies from your dataset based on vector similarity search using Gemini-generated embeddings.

In [16]:
query = """
  I want to watch something with my girlfriends that’s
  both funny and teaches something.
"""
recommendations = recommend_movies(query, top_k=5)

if recommendations:
    print("\n--- Recommendations ---")
    for rec in recommendations:
        print(f"  - Score: {rec['score']:.4f}")
        print(f"    Title: {rec['payload'].get('title', 'N/A')}")
        print(f"    Genre: {rec['payload'].get('genres', 'N/A')}")
        print(f"    Year: {rec['payload'].get('release_year', 'N/A')}")
        print("-" * 10)

Searching for recommendations based on: '
  I want to watch something with my girlfriends that’s 
  both funny and teaches something.
'
Found 5 potential recommendations:

--- Recommendations ---
  - Score: 0.6326
    Title: #pregnancytestroulette
    Genre: Comedy, Drama
    Year: 2022.0
----------
  - Score: 0.6244
    Title: Locas y atrapadas
    Genre: Comedy
    Year: 2014.0
----------
  - Score: 0.6222
    Title: Amigas de Sorte
    Genre: Comedy
    Year: 2021.0
----------
  - Score: 0.6206
    Title: The Perfect Secret
    Genre: Comedy, Drama
    Year: 2019.0
----------
  - Score: 0.6192
    Title: YOLO
    Genre: Drama, Music
    Year: 2013.0
----------


  search_result = qdrant_client.search(


## Next Steps

This notebook demonstrated how to build a movie recommendation system by combining the Gemini API’s embedding capabilities with Qdrant’s vector search.

### Useful API References

For more detailed understanding and to explore advanced features, refer to the official documentation:

* **[Gemini API Embeddings Documentation](https://ai.google.dev/gemini-api/docs/embeddings)**: Learn how to generate text embeddings, understand model parameters, and use them effectively for semantic search and similarity tasks.

* **[Qdrant Python Client Docs](https://python-client.qdrant.tech/)**: Understand how to manage collections, insert and search vectors, configure indexing, and interact with the vector database.

### Related Examples

To explore more use cases and get additional inspiration, check out these related notebooks in this directory:

* **[Similarity Search using Qdrant](../examples/qdrant/Qdrant_similarity_search.ipynb)**: A focused example on building semantic search systems with embeddings and Qdrant.

* **[Google GenAI SDK Overview](../../quickstarts/Get_started.ipynb)**: Walks you through installing and setting up the SDK, text and multimodal prompting, token counting, safety filters, multi-turn chat, function calling, file uploads, context caching, and more.

* **[Text Embeddings with Gemini API](../../quickstarts/Embeddings.ipynb)**: Focuses on generating and working with text embeddings using the Gemini API, ideal for building vector-based search and recommendation systems.
