As content becomes more and more freely available, it has become kind of a full time job to identify what to watch next. The streaming services have their own recommendation algorigthms but they work in silos.

In this article we are going to build a bare bones recommendation system using Qdrant vector search. We will be using semantic similarity to find movie recommendations.

_Disclaimer: I am no expert in AI, recommendation systems or vector databases. This post is an experiment to move towards that point in this high dimensional vector space._

If you would like to follow along you can use the notebook from this github repo.


# Installing ipywidgets

__You can skip this section if you are running on google collab.__

I had to restart the jupyter kernel to get ipywidgets to work. It's better to do that now instead of doing that later and loosing all the progress.

In [None]:
# Install
%pip install ipywidgets

Let's see if `ipywidgets` is available. You should be able to see a slider when you run the next cell. It didn't work for me right away. I had to restart the jupyter kernel to get it to work.

In [None]:
import ipywidgets as widgets
widgets.IntSlider()

# Fetch the data from TMDB

First, we need some data on movies and tv series. [TMDB provides an API](https://developer.themoviedb.org/reference/intro/getting-started) that would be perfect for this.

We need an API key to access the data from TMDB. If you would like to follow along, [head over to TMDB now](https://developer.themoviedb.org/v4/reference/intro/getting-started) to create an API key.

In [9]:
# Update this with your api key
import os
os.environ["TMDB_API_KEY"] = ""

We are going to fetch the top rated movies and the top rated tv series from TMDB. We have to fetch the data from 2 different endpoints for that.
- https://api.themoviedb.org/3/movie/top_rated
- https://api.themoviedb.org/3/tv/top_rated

Like most APIs the data provided by TMDB is paginated. Let's write a function to fetch the data for specific page.

In [10]:
import requests
import json

def fetch_titles_for_page(title_type, page):
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {os.getenv('TMDB_API_KEY')}",
    }

    url = f"https://api.themoviedb.org/3/{title_type}/top_rated?language=en-US&page={page}"
    response = requests.get(url, headers=headers)
    response_text = response.text
    response_json = json.loads(response_text)
    return {
        "titles": response_json["results"],
        "total_pages": response_json["total_pages"],
    }

Let's try this for a page of movies

In [100]:
import pprint
response = fetch_titles_for_page("tv", 1)
print(len(response["titles"]), response["total_pages"])
pprint.pp(response["titles"][0])

20 102
{'adult': False,
 'backdrop_path': '/9faGSFi5jam6pDWGNd0p8JcJgXQ.jpg',
 'genre_ids': [18, 80],
 'id': 1396,
 'origin_country': ['US'],
 'original_language': 'en',
 'original_name': 'Breaking Bad',
 'overview': 'Walter White, a New Mexico chemistry teacher, is diagnosed with '
             'Stage III cancer and given a prognosis of only two years left to '
             'live. He becomes filled with a sense of fearlessness and an '
             "unrelenting desire to secure his family's financial future at "
             'any cost as he enters the dangerous world of drugs and crime.',
 'popularity': 547.234,
 'poster_path': '/ztkUQFLlC19CCMYHW9o1zWhJRNq.jpg',
 'first_air_date': '2008-01-20',
 'name': 'Breaking Bad',
 'vote_average': 8.913,
 'vote_count': 14017}


Now, we are ready to write a function that would take a title type, fetch the data from all the pages and write it to a jsonl file. We are storing the data as json lines so that we can read and write individual titles without deserializing all the titles everytime.

Let's create a data directory in our root folder. You can do this in google collab using the file explorer or using the code below.

In [102]:
from pathlib import Path
Path("./data").mkdir(exist_ok=True)

Let's do a quick check to see if our directory has been created in the correct spot

In [103]:
with open("./data/test.jsonl", "a") as fh:
  json.dump([1, 2, 3], fh)
  fh.write("\n")

We are now ready to fetch all the titles. Let's write the function for that.

In [110]:
import time

def fetch_top_rated_titles(title_type):
  page = 1
  total_pages = 1 # We don't know the total number of pages yet, so we start with 1

  while page <= total_pages:
    response = fetch_titles_for_page(title_type, page)
    if len(response["titles"]) != 0:
      total_pages = response["total_pages"]
      with open("data/top-titles.jsonl", "a") as fh:
        for title in response["titles"]:
          # TMDB has different field names for movies and tv series
          # We will use a common name for both
          if title.get("title") is not None:
            title["name"] = title.get("title")
            del title["title"]          

          if title.get("original_title") is not None:
            title["original_name"] = title.get("original_title")
            del title["original_title"]
          
          # We are parsing the release year from the date
          if title.get("release_date") is not None:
            title["release_year"] = int(title.get("release_date").split("-")[0])
          elif title.get("first_air_date") is not None:
            title["release_year"] = int(title.get("first_air_date").split("-")[0])
            
          title["type"] = title_type
          json.dump(title, fh)
          fh.write("\n")
      page += 1
      time.sleep(0.1)

In [111]:
# Fetch all top rated movies
# This would take a few minutes as we are fetching close to 10,000 movies
fetch_top_rated_titles(title_type="movie")

In [112]:
# Fetch all top rated tv series
fetch_top_rated_titles(title_type="tv")

We have saved all the movies and all the tv series in our json file. Let's write a function that would make it easier to read them whenever needed.

__Note: If you are running this in a cloud environment it would be best to download the json file to your local machine otherwise you might lose it if the session ends.__

In [11]:
import json

# We will store all the titles in a cache to avoid loading it everytime we need a title
_titles = []
def load_titles_from_disk():
    global _titles

    # If we have already loaded the titles from disk, return them directly
    if len(_titles) > 0:
        return _titles

    # Load the titles from disk and save them to the cache
    file_path = "./data/top-titles.jsonl"
    with open(file_path, "r") as fh:
        for line in fh:
            title = json.loads(line)
            _titles.append(title)

    return _titles

Now that we have all the titles ready and loaded, let's do a (shallow) dive into Qdrant.

# Qdrant Vector DB
Qdrant is an open source vector database purpose built for performance. Qdrant is built in rust, a language all new performance oriented things should be built in.

## What exactly is a vector database?
A vector database allows us to find approximate nearest neighbors of a given vector. Along with this core capability it allows us to filter the vectors by all sorts of metadata.

## But, what is a vector?
A vector is just an array of floats. The reason why vectors are so popular now is mainly because of embeddings. Embeddings are a way to represent data in such a way that semantically similar things are closer to each other in the vector space.

We can create embeddings out of all sorts of data like text, images, sounds, etc.

Let's install Qdrant python client. The python client has a fully functional local client that would work greate for out purpose. We will also install the Fastembed library from Qdrant. This will allow us to create embeddings for out titles (Movies and TV series) without installing any additional dependencies.



In [None]:
%pip install "qdrant-client[fastembed]"

# Data Exploration
A quick look at the data shows we have a few fields of interest

In [12]:
import pprint

titles = load_titles_from_disk()
pprint.pp(titles[0])

{'adult': False,
 'backdrop_path': '/avedvodAZUcwqevBfm8p4G2NziQ.jpg',
 'genre_ids': [18, 80],
 'id': 278,
 'original_language': 'en',
 'overview': 'Imprisoned in the 1940s for the double murder of his wife and '
             'her lover, upstanding banker Andy Dufresne begins a new life at '
             'the Shawshank prison, where he puts his accounting skills to '
             'work for an amoral warden. During his long stretch in prison, '
             'Dufresne comes to be admired by the other inmates -- including '
             'an older prisoner named Red -- for his integrity and '
             'unquenchable sense of hope.',
 'popularity': 162.491,
 'poster_path': '/9cqNxx0GxF0bflZmeSMuL5tnGzr.jpg',
 'release_date': '1994-09-23',
 'video': False,
 'vote_average': 8.705,
 'vote_count': 26660,
 'name': 'The Shawshank Redemption',
 'original_name': 'The Shawshank Redemption',
 'release_year': 1994,
 'type': 'movie'}


The `overview` field tells us what the title is about. The `name` field tells us about.. well, the name of the title.

We will format our titles in the following structure

{original_name} is a {title_type} released on {release_data} with the following overview - {overview}

Let's create a function for that.

In [13]:
def get_formatted_title(title):
  formatted_string = ""
  title_type = title.get("type") if title.get("type") == "movie" else "tv series"
  if title.get("name") is not None:
    formatted_string += f"{title.get("name")} is a {title_type}"
  if title.get("release_year") is not None:
    formatted_string += f" released in {title.get("release_year")}"
  if title.get("overview") is not None:
    formatted_string += f" with the following overview - {title.get("overview")}"

  return formatted_string

In [119]:
get_formatted_title(titles[0])

'The Shawshank Redemption is a movie released in 1994 with the following overview - Imprisoned in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts his accounting skills to work for an amoral warden. During his long stretch in prison, Dufresne comes to be admired by the other inmates -- including an older prisoner named Red -- for his integrity and unquenchable sense of hope.'

In [120]:
get_formatted_title(titles[-1])

'Velma is a tv series released in 2023 with the following overview - Jinkies! This raucous reimagining of the Scooby-Doo franchise unravels the mysterious origins of Mystery, Inc. – as seen through the eyes of the gang’s beloved bespectacled detective Velma.'

We perform some basic sanity checks and return the formatted string. We will now create vector embeddings for these strings and store them in a qdrant collection.

## Qdrant + Fastembed

Qdrant along with Fastembed provides a super quick way to embed our documents. We can start by creating a client. We will be using the qdrant cloud. Qdrant's free tier should be enough for our liitle experiment and we can easily upgrade if we want to. 

If you dont have one already. Head over to [https://cloud.qdrant.io/](https://cloud.qdrant.io/) to create your free cluster. 

Once you have created a new cluster we will need the API key and the cluster url from the dashboard.

![Cluster Url](./images/cluster-url.png "Cluster Url")

![API Key](./images/api-key.png "API Key")

In [14]:
import os
os.environ["QDRANT_API_KEY"] = ""
os.environ["QDRANT_CLUSTER_URL"] = ""

In [15]:
from qdrant_client import QdrantClient

_client = None
def get_client():
    global _client
    if _client is not None:
        return _client
    _client = QdrantClient(
        url=os.getenv("QDRANT_CLUSTER_URL"), 
        api_key=os.getenv("QDRANT_API_KEY"))
    return _client

Fastembed provides multiple ways to manage embeddings. You can interact directly with the embedding generator or use a helper method like `client.add()`.

Let's try the direct method first.

In [16]:
from fastembed import TextEmbedding

# This will trigger the defautl model download and initialization
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

The model BAAI/bge-small-en-v1.5 is ready to use.


In [17]:
documents = [get_formatted_title(titles[i]) for i in range(2)]
embeddings_generator = embedding_model.embed(documents)
embeddings_list = list(embeddings_generator)
print("Embedding length - ", len(embeddings_list[0]))
print("First 5 dimensions of vector 0 - ", embeddings_list[0][:5])
print("First 5 dimensions of vector 1 - ", embeddings_list[1][:5])

Embedding length -  384
First 5 dimensions of vector 0 -  [-0.05396069  0.03002024 -0.07872865  0.00048775  0.02766631]
First 5 dimensions of vector 1 -  [-0.03963627  0.06695076 -0.03377489  0.04049916 -0.01512619]


We are also going to store some metadata along with out

`client.add()` helper method takes care of managing the embedding model for us. We can just pass in the documents along with some optional metadata and ids and be done with it. 

Let's try that now. 
First we will write a function that seperates out the formatted strings, the metadata and the ids for all the titles. If your data does not have ids qdrant can generate uuids automatically.

In [19]:
def get_metadata_from_title(title):
    return {
        "name": title.get("name"),
        "type": title.get("type"),
        "vote_average": title.get("vote_average"),
        "vote_count": title.get("vote_count"),
        "release_year": title.get("release_year"),
        "backdrop_path": title.get("backdrop_path"),
        "poster_path": title.get("poster_path"), 
    }

def get_prepared_titles(titles):
    docs = []
    metadata = []
    ids = []
    for title in titles:
        if title.get("name") and title.get("overview"):
            docs.append(get_formatted_title(title))
            metadata.append(get_metadata_from_title(title))
            ids.append(title["id"])
    return docs, metadata, ids

Next we will use `client.add()` to embed and store our documents.

In [20]:
COLLECTION_NAME = "titles_v1" 

def embed_titles(limit=None):
    titles = load_titles_from_disk()
    docs, metadata, ids = get_prepared_titles(titles[:limit] if limit else titles)
    client = get_client()
    client.add(
        collection_name=COLLECTION_NAME, documents=docs, metadata=metadata, ids=ids
    )

    return client.count(COLLECTION_NAME)


Let's embed 10 titles first.

In [21]:
embed_titles(limit=10)

CountResult(count=10)

Let's try to find a movie and see if the embedding worked.

In [22]:
titles = load_titles_from_disk()
client = get_client()
client.query(
    collection_name=COLLECTION_NAME,
    query_text=get_formatted_title(titles[1]),
    limit=5
)

[QueryResponse(id=238, embedding=None, sparse_embedding=None, metadata={'document': 'The Godfather is a movie released in 1972 with the following overview - Spanning the years 1945 to 1955, a chronicle of the fictional Italian-American Corleone crime family. When organized crime family patriarch, Vito Corleone barely survives an attempt on his life, his youngest son, Michael steps in to take care of the would-be killers, launching a campaign of bloody revenge.', 'name': 'The Godfather', 'type': 'movie', 'vote_average': 8.69, 'vote_count': 20222, 'release_year': 1972, 'backdrop_path': '/tmU7GeKVybMWFButWEGl2M4GeiP.jpg', 'poster_path': '/3bhkrj58Vtu7enYsRolD1fZdja1.jpg'}, document='The Godfather is a movie released in 1972 with the following overview - Spanning the years 1945 to 1955, a chronicle of the fictional Italian-American Corleone crime family. When organized crime family patriarch, Vito Corleone barely survives an attempt on his life, his youngest son, Michael steps in to take c

Let's embed all the titles now. This will take a while.

In [23]:
# We are embedding ~11k documents. This could take up to 10 minutes.
embed_titles()

CountResult(count=11367)

In [24]:
client = get_client()
client.count(collection_name=COLLECTION_NAME)

CountResult(count=11367)

Now that we we haveour vectors in the collection we need to create a few payload indexes so that we can filter and sort our data using the payload.

We need to create the indexes on the following fields. Different types of indexes support different types of query operations. You can read more about them [here] (https://qdrant.tech/documentation/concepts/indexing/#payload-index)

- Field: vote_average, schema: float
- Field: type, schema: keyword
- Field: release_year, schema: integer

Different types of indexes support different types of query operations. You can read more about them [here] (https://qdrant.tech/documentation/concepts/indexing/#payload-index)

- Field: vote_average, schema: float
- Field: type, schema: keyword

In [33]:
client = get_client()

indexes = [
    {"field": "vote_average", "schema": "float"},
    {"field": "type", "schema": "keyword"},
    {"field": "release_year", "schema": "integer"},
    
]

for index in indexes:
    client.create_payload_index(
        collection_name=COLLECTION_NAME,
        field_name=index["field"],
        field_schema=index["schema"],
    )

Now that we have embedded all the titles, let's create a simple CLI for our recommendation system. Initially we do not know anything about the user. So we will ask about their preference on the most popular titles. 

We will use the `QdrantClient.scroll()` method for that

In [34]:
from typing import List
from qdrant_client.http.models import (
    Direction,
    Filter,
    FieldCondition,
    HasIdCondition,
    MatchValue,
    OrderBy,
    Record,
    Range
)

def get_popular_titles(collection_name, ignore_ids=[], release_year_cutoff=1900, limit=10) -> List[Record]:
    client = get_client()
    titles = []
    for title_type in ["movie", "tv"]:
        response = client.scroll(
            collection_name=collection_name,
            order_by=OrderBy(
                key="vote_average",
                direction=Direction.DESC,
            ),
            scroll_filter=Filter(
                must=[
                    FieldCondition(key="type", match=MatchValue(value=title_type)),
                    FieldCondition(key="release_year", range=Range(gte=release_year_cutoff))
                ],
                must_not=[
                    HasIdCondition(has_id=ignore_ids),
                ],
            ),
            limit=limit//2,
        )

        titles.extend(response[0])
    return titles

In [35]:
get_popular_titles(collection_name=COLLECTION_NAME, release_year_cutoff=1990, limit=20)

[Record(id=278, payload={'document': 'The Shawshank Redemption is a movie released in 1994 with the following overview - Imprisoned in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts his accounting skills to work for an amoral warden. During his long stretch in prison, Dufresne comes to be admired by the other inmates -- including an older prisoner named Red -- for his integrity and unquenchable sense of hope.', 'name': 'The Shawshank Redemption', 'type': 'movie', 'vote_average': 8.705, 'vote_count': 26660, 'release_year': 1994, 'backdrop_path': '/avedvodAZUcwqevBfm8p4G2NziQ.jpg', 'poster_path': '/9cqNxx0GxF0bflZmeSMuL5tnGzr.jpg'}, vector=None, shard_key=None, order_value=8.705),
 Record(id=129, payload={'document': 'Spirited Away is a movie released in 2001 with the following overview - A young girl, Chihiro, becomes trapped in a strange new world of spirits. When her parents undergo a 

We are fetching the top 20 movies and tv series. Once we have some likes and dislikes we will use Qdrant's recommend API to get more directed results. 

Let's now define a class that can store the user's choices.

In [80]:
class User:
    def __init__(self):
        self._likes = set()
        self._dislikes = set()
        self._not_watched = set()
        self._recommendations = set()
    
    def discard(self, id):
        self._likes.discard(id)
        self._dislikes.discard(id)
        self._not_watched.discard(id)
        self._recommendations.discard(id)

    def like(self, id):
        self.discard(id)
        self._likes.add(id)
    
    def dislike(self, id):
        self.discard(id)
        self._dislikes.add(id)
    
    def not_watched(self, id, recommended=False):
        self.discard(id)
        self._not_watched.add(id)
        if recommended:
            self._recommendations.add(id)
    
    def get_likes(self): 
        return list(self._likes)

    def get_dislikes(self): 
        return list(self._dislikes)

    def get_not_watched(self): 
        return list(self._not_watched)
    
    def get_recommendations(self): 
        return list(self._recommendations)
        
    def __repr__(self):
        return ("User::\n" +
            f"Likes: [{', '.join(map(str, self._likes))}] \n" +
            f"Dislikes: [{', '.join(map(str, self._dislikes))}] \n" +
            f"Not Watched: [{', '.join(map(str, self._not_watched))}] \n" +
            f"Recommendations: [{', '.join(map(str, self._recommendations))}]>")            


Now we are all set to create a widget that can ask for the user's preferences. We will be using ipywidget for this purpose. We will create a widget and attach event listeners on the buttons.


In [72]:
import ipywidgets as widgets

def create_prompt_widget_for_title(title, user: User, recommended: bool):
    # Create the UI
    base_img_url = "https://image.tmdb.org/t/p/w300"
    poster = widgets.Image.from_url(url=f'{base_img_url}{title["poster_path"]}', width=300)
    released = f'({title.get("release_year")})' if title.get("release_year") is not None else ''
    name = widgets.HTML(value=f'<h1>{title["name"]} {released}</h1>')
    tmdb_rating = widgets.HTML(value=f'<b>TMDB Rating: {title["vote_average"]} ({title["vote_count"]} ratings)</b>')
    overview = widgets.HTML(value=title["overview"])
    notice = widgets.HTML(value="This widget will disappear once you click on any of the buttons")
    
    # Create the action buttons
    like_button = widgets.Button(
        description='Like',
        disabled=False,
        button_style='Success',
        tooltip='Like',
        icon='check'
    )
    dislike_button = widgets.Button(
        description='Dislike',
        disabled=False,
        button_style='danger',
        tooltip='Dislike',
        icon='cross'
    )
    not_watched_button = widgets.Button(
        description='Not Watched',
        disabled=False,
        button_style='',
        tooltip='Not Watched',
        icon='check',
    )
    buttons = widgets.HBox([like_button, not_watched_button, dislike_button])
    details = widgets.VBox([name, tmdb_rating, overview, buttons, notice], layout=widgets.Layout(padding='20px'))

    widget = widgets.HBox([poster, details], layout=widgets.Layout(padding='5px', margin='10px'))

    # Setup the event handlers for the action buttons
    def like_handler(_):
        user.like(title["id"])
        widget.close()
    
    def dislike_handler(_):
        user.dislike(title["id"])
        widget.close()
    
    def not_watched_handler(_):
        user.not_watched(title["id"], recommended)
        widget.close()
    
    # Wire 'em up.
    like_button.on_click(like_handler)
    dislike_button.on_click(dislike_handler)
    not_watched_button.on_click(not_watched_handler)

    return widget


Here is what a prompt would look like

In [None]:
titles = load_titles_from_disk()
title = titles[0]
user = User()
create_prompt_widget_for_title(title, user=user, recommended=True)

Click a button to save your preference. The object of the User class will hold our preferences for future use.

## Preview Image for non interactive environments
![Prompt Widget](./images/prompt-widget.png "Prompt widget")

In [83]:
# The user object holds our preferences

user

User::
Likes: [] 
Dislikes: [] 
Not Watched: [] 
Recommendations: []>

We can now create a function to understand the user's likes and dislikes. We will fetch top 10 movies and tv shows and ask for our user's opinion on them.

In [42]:
def run_recommendation_loop(user: User):
    results = get_popular_titles(COLLECTION_NAME, user.get_likes() + user.get_dislikes() + user.get_not_watched(), release_year_cutoff=1995, limit=10)
    for result in results:
        title = {
            "id": result.id,
            "overview": result.payload["document"],
            **result.payload
        }
        widget = create_prompt_widget_for_title(title, recommended=False, user=user)
        display(widget)
            
    return user

In [None]:
user = User()

user = run_recommendation_loop(user)

We can see that the choices have been saved for our user.

In [44]:
user

User::
Likes: [497, 1396, 94605] 
Dislikes: [94954, 209867, 19404, 667257, 37854] 
Not Watched: [129, 496243] 
Recommendations: []>

We know a bit about the user now. We can now create a method to get recommendations using Qdrant's client.recommend() API.

## Get Recommendations

In [49]:
from qdrant_client.http.models import RecommendStrategy, Range

def get_recommended_titles(collection_name, positive_ids=[], negative_ids=[], ignore_ids=[], release_year_cutoff=1900, limit=10):
    return client.recommend(
        collection_name=collection_name,
        positive=positive_ids,
        negative=negative_ids,
        query_filter=Filter(
            must=[
                    FieldCondition(key="vote_average", range=Range(gte=8)), 
                    FieldCondition(key="release_year", range=Range(gte=release_year_cutoff))
                ],
            must_not=[
                HasIdCondition(has_id=positive_ids + negative_ids + ignore_ids),
            ],
        ),
        strategy=RecommendStrategy.AVERAGE_VECTOR,
        # We need to pass in the name of the vector field. This is name is defined in the fastembed config and
        # is choosen on the basis of our embedding model.
        using="fast-bge-small-en",
        limit=limit,
    )

We can update our recommendation loop function to choose between popular and recommended titles.

In [53]:
def run_recommendation_loop(user: User):
    recommended = False
    release_year_cutoff = 1990
    if len(user.get_likes()) < 10:
        # If we have less than 10 likes don't try to get recommendations. Just get most popular titles
        results = get_popular_titles(
            collection_name=COLLECTION_NAME, 
            ignore_ids=user.get_likes() + user.get_dislikes() + user.get_not_watched(), 
            release_year_cutoff=release_year_cutoff, 
            limit=10)
    else:   
        recommended = True
        results = get_recommended_titles(
            collection_name=COLLECTION_NAME, 
            positive_ids=user.get_likes(), 
            negative_ids=user.get_dislikes(), 
            ignore_ids=user.get_not_watched(), 
            release_year_cutoff=release_year_cutoff,
            limit=10)
    for result in results:
        title = {
            "id": result.id,
            "overview": result.payload["document"],
            **result.payload
        }
        # Create and display prompt widgets for all titles
        widget = create_prompt_widget_for_title(title, recommended=recommended, user=user)
        display(widget)
            
    return user

In [84]:
user = User()

user = run_recommendation_loop(user)

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/9cqNxx0G...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/39wmItIW...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/lfRkUr7D...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/7IiTTglo...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/8VG8fDNi...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/ztkUQFLl...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/rXojaQcx...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/dqZENchT...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/fqldf2t8...', format='url', width='300'), VBox(ch…

HBox(children=(Image(value=b'https://image.tmdb.org/t/p/w300/e3NBGiAi...', format='url', width='300'), VBox(ch…

Let's run the recommendation loop a few more times to collect more preferences.

In [None]:
user = run_recommendation_loop(user)

Now that we have collected enough preferences let's create a widget that will list out all the recommendations.

In [95]:
def create_display_widget_for_title(title, width):
    base_img_url = "https://image.tmdb.org/t/p/w300"

    html = f'<b>{title["name"]}</b><br />'
    html += f'<b>TMDB Rating: {title["vote_average"]} ({title["vote_count"]} ratings)</b>'
    if title.get("release_year") is not None:
        html += f'<br />Released: {title.get("release_year")}'
    html = f'<div style="padding: 10px; background-color: #eee; width: {width}">{html}</div>'
    poster = widgets.Image.from_url(url=f'{base_img_url}{title["poster_path"]}', width=300)
    details = widgets.HTML(value=html)

    widget = widgets.VBox([poster, details])
    return widget

def find_by_title_id(title_id):
    titles = load_titles_from_disk()
    return next(title for title in titles if title["id"] == title_id)

def show_recommendations(user: User):
    width = "300px"
    recommendations = user.get_recommendations()
    if len(recommendations) == 0:
        print("We don't have any recommendations yet. Run the loop a few time to get some recommendations")
    title_widgets = [create_display_widget_for_title(find_by_title_id(title), width) for title in recommendations]
    return widgets.GridBox(title_widgets, layout=widgets.Layout(grid_template_columns=f"repeat(3, {width})", flex="flex", justify_content="center", grid_gap="10px", padding="20px"))
    

In [None]:
# Call the show_recommendations function to list all the recommendations so far.
show_recommendations(user)

## Preview Image for non interactive environments
![Display Widget](./images/display-widget.png "Display widget")

# Next steps

Right now we are using the default and the most light weight model to generate our embeddings. The results that we are getting a okay but certainly can be a lot better. The next step would be try out some of the available models and see which one gives the best result. Another thing to do would be to use a movie reviews dataset and use that to get better recommendations

But that is something for another day.

# Summary
In this notebook we learned about
- [Vector Embeddings]()
- [Qdrant DB](https://qdrant.tech/)
- [Fastembed](https://qdrant.github.io/fastembed/)
- [QdrantClient.add()](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/#using-fastembed-with-qdrant-for-vector-search)
- [QdrantClient.scroll()](https://qdrant.tech/documentation/concepts/points/#scroll-points)
- [QdrantClient.recommend()](https://qdrant.tech/documentation/concepts/explore/#recommendation-api)
- [TMDB API](https://developer.themoviedb.org/reference/intro/getting-started)
- [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/)

# Footnotes
- We could have installed and imported all the libraries at the top of the file. But in my opinion doing it just in time makes more sense for jupyter notebooks. Although it does increase the work if you want to move the code to a proper project.