In [27]:
import redis
import csv
import numpy as np
from sentence_transformers import SentenceTransformer
from redis.commands.search.query import Query
from redis.commands.search.field import TextField, TagField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
import openai
import tiktoken

REDIS_HOST="127.0.0.1"
REDIS_PORT=6380
VSS_MINIMUM_SCORE=2

In [28]:
conn = redis.Redis(host=REDIS_HOST, port=REDIS_PORT, decode_responses=True)
print(conn.ping())

True


The first thing we'll do is to import the data. For this example, I have chosen a movie database, with an overview, the rating, and additional information that may be useful for searching and filtering our results further. The Kaggle [IMDB movies dataset](https://www.kaggle.com/datasets/ashpalsingh1525/imdb-movies-dataset) database is in CSV format and ready to import. 

In [29]:
def load():
    with open("imdb_movies.csv", encoding='utf-8') as csvf:
        csvReader = csv.DictReader(csvf)
        cnt = 0
        for row in csvReader:
            conn.json().set(f'moviebot:movie:{cnt}', '$', row)
            cnt = cnt + 1
        print("Data was loaded")

Next, is creating the secondary index on the desired fields, and most important, on the `overview_embedding` field that will store the vector embedding (which we will create in the next step). Read the [documentation](https://redis.io/docs/interact/search-and-query/basic-constructs/field-and-type-options/#vector-fields) to learn more about the supported distances, indexing methods and other parameters.

In [30]:
def create_index():
    indexes = conn.execute_command("FT._LIST")
    if "movie_idx" not in indexes:
        index_def = IndexDefinition(prefix=["moviebot:movie:"], index_type=IndexType.JSON)
        schema = (TextField("$.crew", as_name="crew"),
                  TextField("$.overview", as_name="overview"),
                  TagField("$.genre", as_name="genre"),
                  TagField("$.names", as_name="names"),
                  VectorField("$.overview_embedding", "HNSW", {"TYPE": "FLOAT32", "DIM": 384, "DISTANCE_METRIC": "L2"}, as_name="embedding"))
        conn.ft('movie_idx').create_index(schema, definition=index_def)
        print("The index has been created")
    else:
        print("The index exists")

The data has been imported and the index created: we can proceed to generate the vector embeddings. For this task, we will use a free embedding model, the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) Sentence-Transformers model, which maps sentences & paragraphs to a 384-dimensional dense vector space. Note that we want to add the relevant information that may be retrieved later during the semantic search to the index. For this purpose, we will extract and concatenate information such as the movie name, the overview, the genre, the crew, and the score.

In [31]:
def create_embeddings():
    model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
    for key in conn.scan_iter(match='moviebot:movie:*'):
        print(f"creating the embedding for {key}")
        result = conn.json().get(key, "$.names", "$.overview", "$.crew", "$.score", "$.genre")
        movie = f"movie title is: {result['$.names'][0]}\n"
        movie += f"movie genre is: {result['$.genre'][0]}\n"
        movie += f"movie crew is: {result['$.crew'][0]}\n"
        movie += f"movie score is: {result['$.score'][0]}\n"
        movie += f"movie overview is: {result['$.overview'][0]}\n"
        conn.json().set(key, "$.overview_embedding", model.encode(movie).astype(np.float32).tolist())

The following function, given the question from the user, performs a semantic search in the database to retrieve relevant information that will be used to construct the prompt. Specifically, we will retrieve the three most relevant movies from the database, and construct a prompt so ChatGPT can define a relevant answer from the provided context. This is probably the most delicate part, which you can change at will to instruct ChatGPT as desired.

In [32]:
def get_prompt(model, query):
    context = ""
    prompt = ""
    q = Query("@embedding:[VECTOR_RANGE $radius $vec]=>{$YIELD_DISTANCE_AS: score}") \
        .sort_by("score", asc=True) \
        .return_fields("overview", "names", "score", "$.crew", "$.genre", "$.score") \
        .paging(0, 3) \
        .dialect(2)

    # Find all vectors within VSS_MINIMUM_SCORE of the query vector
    query_params = {
        "radius": VSS_MINIMUM_SCORE,
        "vec": model.encode(query).astype(np.float32).tobytes()
    }

    res = conn.ft("movie_idx").search(q, query_params)

    if (res is not None) and len(res.docs):
        it = iter(res.docs[0:])
        for x in it:
            # print("the score is: " + str(x['score']))
            movie = f"movie title is: {x['names']}\n"
            movie += f"movie genre is: {x['$.genre']}\n"
            movie += f"movie crew is: {x['$.crew']}\n"
            movie += f"movie score is: {x['$.score']}\n"
            movie += f"movie overview is: {x['overview']}\n"
            context += movie + "\n"

    if len(context) > 0:
        prompt = '''Use the provided information to answer the search query the user has sent.
            The information in the database provides three movies, choose the one or the ones that fit most.
            If you can't answer the user's question, say "Sorry, I am unable to answer the question, try to refine your question". Do not guess. You must deduce the answer exclusively from the information provided. 
            The answer must be formatted in markdown or HTML.
            Do not make things up. Do not add personal opinions. Do not add any disclaimer.

            Search query: 

            {}

            Information in the database: 

            {}
            '''.format(query, context)

    return prompt

The following function is the heart of the interaction with ChatGPT. Here we provide the prompt that was built earlier and forward the answer to the user.

In [33]:
def getOpenAIGPT35(prompt):
    # Define the system message
    system_msg = 'You are a smart and knowledgeable AI assistant with expertise in all kinds of movies. You are a very friendly and helpful AI. You are empowered to recommend movies based on the provided context. Do NOT make anything up. Do NOT engage in topics that are not about movies.';

    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo-0613")
    # print("tokens: " + str(num_tokens_from_string(prompt, "cl100k_base")))

    try:
        response = openai.ChatCompletion.create(model="gpt-3.5-turbo-0613",
                                                stream=False,
                                                messages=[{"role": "system", "content": system_msg},
                                                          {"role": "user", "content": prompt}])
        return response["choices"][0]["message"]["content"]
    except openai.error.OpenAIError as e:
        # Handle the error here
        if "context window is too large" in str(e):
            print("Error: Maximum context length exceeded. Please shorten your input.")
            return "Maximum context length exceeded"
        else:
            print("An unexpected error occurred:", e)
            return "An unexpected error occurred"


def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

This is an infinite loop that will get questions from the input field and compute the answer. We are not using streaming for simplicity, so the answer may take a few seconds to compute and show.

In [34]:
def render():
    model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
    # React to user input
    while True:
        question = input("Ask a question")
        reply = f"You asked: {question}"
        prompt = get_prompt(model, question)
        response = getOpenAIGPT35(prompt)
        print(response)
        print("--------------------------------")

In [None]:
load()
create_index()
create_embeddings()
render()

Data was loaded
The index has been created
creating the embedding for moviebot:movie:9309
creating the embedding for moviebot:movie:3770
creating the embedding for moviebot:movie:817
creating the embedding for moviebot:movie:432
creating the embedding for moviebot:movie:4070
creating the embedding for moviebot:movie:3836
creating the embedding for moviebot:movie:5160
creating the embedding for moviebot:movie:2557
creating the embedding for moviebot:movie:6939
creating the embedding for moviebot:movie:4378
creating the embedding for moviebot:movie:4967
creating the embedding for moviebot:movie:2511
creating the embedding for moviebot:movie:1187
creating the embedding for moviebot:movie:3353
creating the embedding for moviebot:movie:2242
creating the embedding for moviebot:movie:3704
creating the embedding for moviebot:movie:663
creating the embedding for moviebot:movie:6166
creating the embedding for moviebot:movie:8413
creating the embedding for moviebot:movie:8488
creating the embeddi

Ask a question Suggest a movie with Natalie Portman


Based on the information provided, I would recommend the movie "Look Both Ways" as it features Natalie Portman in the cast. 

Title: Look Both Ways
Genre: Romance, Drama, Comedy
Crew: Natalie Portman
Score: 69.0
Overview: On the eve of her college graduation, Natalie's life diverges into two parallel realities: one in which she becomes pregnant and must navigate motherhood as a young adult in her Texas hometown, the other in which she moves to LA to pursue her career. In both journeys throughout her twenties, Natalie experiences life-changing love, devastating heartbreak, and rediscovers herself.

I hope you enjoy watching "Look Both Ways"!
--------------------------------


Ask a question Do you have any movie about super heros?


Based on the provided information in the database, I recommend the following movies about superheroes:

1. Once Upon a Time: The Super Heroes:
   - Genre: Documentary, TV Movie
   - Overview: The historical saga of American superheroes. Born in the period between the Great Depression and World War II, these mutant human beings with superhuman powers colonized various forms of media and became a national industry in the United States.
   - Score: 65.0

2. How I Became a Superhero:
   - Genre: Science Fiction, Adventure, Action, Comedy, Thriller
   - Overview: Set in Paris in the year 2020, superheroes have assimilated into society and discover a new drug that grants superpowers to ordinary people. The investigation of this case becomes complicated as past conflicts resurface.
   - Score: 61.0

3. Mystery Men:
   - Genre: Adventure, Fantasy, Action, Comedy, Science Fiction
   - Overview: When the hero Captain Amazing is kidnapped, a group of average, everyday superheroes assembles to sav