# 04 - AI Orchestration with Azure Cognitive Search

In this lab, we will do a deeper dive around the Azure Cognitive Search vector store and different ways to interact with it.

## Create Azure Cognitive Search Vector Store in Azure

First, we need to create an Azure Cognitive Search service in Azure, which will act as a vector store. We'll use the Azure CLI to do this.

**NOTE:** Update **`<INITIALS>`** to make the name unique.

In [56]:
RESOURCE_GROUP="azure-cognitive-search-rg"
LOCATION="westeurope"
NAME="acs-vectorstore-<INITIALS>"
!az group create --name $RESOURCE_GROUP --location $LOCATION
!az search service create -g $RESOURCE_GROUP -n $NAME -l $LOCATION --sku Basic --partition-count 1 --replica-count 1

{
  "id": "/subscriptions/5c97e433-957b-489e-9b03-9227c1dbdf0b/resourceGroups/azure-cognitive-search-rg",
  "location": "westeurope",
  "managedBy": null,
  "name": "azure-cognitive-search-rg",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": "Microsoft.Resources/resourceGroups"
}


The system cannot find the file specified.


Next, we need to find and update the following values in the `.env` file with the Azure Cognitive Search **endpoint**, **admin key**, and **index name** values. Use the Azure Portal or CLI.

```
AZURE_COGNITIVE_SEARCH_SERVICE_NAME = "<YOUR AZURE COGNITIVE SEARCH SERVICE NAME - e.g. cognitive-search-service>"
AZURE_COGNITIVE_SEARCH_ENDPOINT_NAME = "<YOUR AZURE COGNITIVE SEARCH ENDPOINT NAME - e.g. https://cognitive-search-service.search.windows.net"
AZURE_COGNITIVE_SEARCH_INDEX_NAME = "<YOUR AZURE COGNITIVE SEARCH INDEX NAME - e.g. cognitive-search-index>"
AZURE_COGNITIVE_SEARCH_API_KEY = "<YOUR AZURE COGNITIVE SEARCH ADMIN API KEY - e.g. cognitive-search-admin-api-key>"
```

## Setup Azure OpenAI

We'll start as usual by defining our Azure OpenAI service API key and endpoint details, specifying the model deployment we want to use and then we'll initiate a connection to the Azure OpenAI service.

**NOTE**: As with previous labs, we'll use the values from the `.env` file in the root of this repository.

In [1]:
import os
from dotenv import load_dotenv

 # Load environment variables
if load_dotenv():
    print("Found OpenAPI Base Endpoint: " + os.getenv("OPENAI_API_BASE"))
else: 
    print("No file .env found")
openai_api_type = os.getenv("OPENAI_API_TYPE")
openai_api_key = os.getenv("OPENAI_API_KEY")
openai_api_base = os.getenv("OPENAI_API_BASE")
openai_api_version = os.getenv("OPENAI_API_VERSION")
deployment_name = os.getenv("OPENAI_DEPLOYMENT_NAME")
embedding_name = os.getenv("OPENAI_EMBEDDING_DEPLOYMENTE")
acs_service_name = os.getenv("AZURE_SEARCH_SERVICE_NAME")
acs_endpoint_name = os.getenv("AZURE_SEARCH_ENDPOINT")
acs_index_name = "movies-index"
acs_api_key = os.getenv("AZURE_SEARCH_KEY")

Found OpenAPI Base Endpoint: https://trefoil.openai.azure.com/


First, we will load the data from the movies.csv file using the Langchain CSV document loader.

In [2]:
from langchain.document_loaders.csv_loader import CSVLoader

# Movie Fields in CSV
# id,original_language,original_title,popularity,release_date,vote_average,vote_count,genre,overview,revenue,runtime,tagline
loader = CSVLoader(file_path='../data/movies/movies.csv', source_column='original_title', encoding='utf-8', 
                   csv_args={'delimiter':',', 
                             'fieldnames': ['id', 'original_language', 'original_title', 'popularity', 
                                            'release_date', 'vote_average', 'vote_count', 'genre', 
                                            'overview', 'revenue', 'runtime', 'tagline'
                                            ]
                            }
                    )
data = loader.load()
data = data[1:51] # reduce dataset if you want
print('Loaded %s movies' % len(data))
data[0]

Loaded 50 movies


Document(page_content="id: 381284.0\noriginal_language: en\noriginal_title: Hidden Figures\npopularity: 49.802\nrelease_date: 2016-12-10\nvote_average: 8.1\nvote_count: 7310.0\ngenre: ['Drama', 'History']\noverview: The untold story of Katherine G. Johnson, Dorothy Vaughan and Mary Jackson – brilliant African-American women working at NASA and serving as the brains behind one of the greatest operations in history – the launch of astronaut John Glenn into orbit. The visionary trio crossed all gender and race lines to inspire generations to dream big.\nrevenue: 230698791.0\nruntime: 127.0\ntagline: Meet the women you don't know, behind the mission you do.", metadata={'source': 'Hidden Figures', 'row': 1})

Next, we will create an Azure OpenAI embedding and completion deployments in order to create the vector representation of the movies so we can start asking our questions.

In [4]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import AzureChatOpenAI

# Create an Embeddings Instance of Azure OpenAI

embeddings = OpenAIEmbeddings(
    openai_api_base = openai_api_base,
    openai_api_version = openai_api_version,
    deployment_name ="text-embedding-ada-002",
    openai_api_key = openai_api_key,
    openai_api_type = openai_api_type,
    embedding_ctx_length=8191,
    chunk_size=1000,
    max_retries=6)

# Create a Completion Instance of Azure OpenAI
llm = AzureChatOpenAI(
    openai_api_base= openai_api_base,
    openai_api_version= openai_api_version,
    deployment_name="gpt-35-turbo-16k",
    temperature=0,
    openai_api_key= openai_api_key,
    openai_api_type = openai_api_type,
    max_retries=6,
    max_tokens=4000
)


print('Completed creation of embedding and completion instances.')

Completed creation of embedding and completion instances.


## Load Movies into Azure Cognitive Search

Next, we'll create the Azure Cognitive Search index, embed the loaded movies from the CSV file, and upload the data into the newly created index. Depending on the number of movies loaded and rate limiting, this might take a while to do the embeddings so be patient.

In [10]:
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SearchField,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    SearchIndex,
    SemanticConfiguration,
    PrioritizedFields,
    SemanticField,
    SearchField,
    SemanticSettings,
    VectorSearch,
    HnswVectorSearchAlgorithmConfiguration,
)

# Let's Create the Azure Cognitive Search Index
index_client = SearchIndexClient(
    acs_endpoint_name,
    AzureKeyCredential(acs_api_key)
)
# Movie Fields in CSV
# id,original_language,original_title,popularity,release_date,vote_average,vote_count,genre,overview,revenue,runtime,tagline

fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),
    SearchableField(name="title", type=SearchFieldDataType.String),
    SearchableField(name="tagline", type=SearchFieldDataType.String),
    SearchableField(name="popularity", type=SearchFieldDataType.Double, sortable=True),
    SearchableField(name="content", type=SearchFieldDataType.String),
    SearchField(name="content_vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), searchable=True, vector_search_dimensions=1536, vector_search_configuration="my-vector-config"),
]

# Configure Vector Search Configuration
vector_search = VectorSearch(
    algorithm_configurations=[
        HnswVectorSearchAlgorithmConfiguration(
            name="my-vector-config",
            kind="hnsw",
            parameters={
                "m": 4,
                "efConstruction": 400,
                "efSearch": 500,
                "metric": "cosine"
            }
        )
    ]
)

# Configure Semantic Configuration
semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=PrioritizedFields(
        title_field=SemanticField(field_name="title"),
        prioritized_keywords_fields=[SemanticField(field_name="title"), SemanticField(field_name="tagline")],
        prioritized_content_fields=[SemanticField(field_name="content")]
    )
)

# Create the semantic settings with the configuration
semantic_settings = SemanticSettings(configurations=[semantic_config])

# Create the search index with the desired vector search and semantic configurations
index = SearchIndex(
    name=acs_index_name,
    fields=fields,
    vector_search=vector_search,
    semantic_settings=semantic_settings
)
result = index_client.create_or_update_index(index)
print(f'The {result.name} index was created.')

The movies-index index was created.


Next we will create the document structure needed to upload the data into the Azure Cognitive Search index.

In [5]:
# Now that the index is created, let's load the documents into it.

import uuid

# Let's take a quick look at the data structure of the CSVLoader
print(data[0])
print(data[0].metadata['source'])
print("----------")

# Generate Document Embeddings for page_content field in the movies CSVLoader dataset using Azure OpenAI
items = []
for movie in data:
    content = movie.page_content
    items.append(dict([("id", str(uuid.uuid4())), ("title", movie.metadata['source']), ("content", content), ("content_vector", embeddings.embed_query(content))]))

# Print out a sample item to validate the updated data structure.
# It should have the id, content, and content_vector values.
print(items[0])
print(f"Movie Count: {len(items)}")

page_content="id: 381284.0\noriginal_language: en\noriginal_title: Hidden Figures\npopularity: 49.802\nrelease_date: 2016-12-10\nvote_average: 8.1\nvote_count: 7310.0\ngenre: ['Drama', 'History']\noverview: The untold story of Katherine G. Johnson, Dorothy Vaughan and Mary Jackson – brilliant African-American women working at NASA and serving as the brains behind one of the greatest operations in history – the launch of astronaut John Glenn into orbit. The visionary trio crossed all gender and race lines to inspire generations to dream big.\nrevenue: 230698791.0\nruntime: 127.0\ntagline: Meet the women you don't know, behind the mission you do." metadata={'source': 'Hidden Figures', 'row': 1}
Hidden Figures
----------
{'id': 'aa3a6018-6e3c-4074-9a7e-c30dfb4a58ff', 'title': 'Hidden Figures', 'content': "id: 381284.0\noriginal_language: en\noriginal_title: Hidden Figures\npopularity: 49.802\nrelease_date: 2016-12-10\nvote_average: 8.1\nvote_count: 7310.0\ngenre: ['Drama', 'History']\nove

Next we will upload the movie documents in the newly created structure to the Azure Cognitive Search index.

In [12]:
# Upload movies to Azure Cognitive Search index.
from azure.search.documents.models import Vector
from azure.search.documents import SearchClient

# Insert Text and Embeddings into the Azure Cognitive Search index created.
search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)
result = search_client.upload_documents(items)
print("Successfully added documents to Azure Cognitive Search index.")
print(f"Uploaded {len(data)} documents")

Successfully added documents to Azure Cognitive Search index.
Uploaded 50 documents


## Vector Store Searching using Azure Cognitive Search

Now that we have the movies loaded into Azure Cognitive Search, let's do some different types of searches using the Azure Cognitive Search SDK.

In [None]:
# First, let's do a plain vanilla text search, no vectors or embeddings.
query = "What are the best 80s movies I should look at?"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# Execute the search
results = list(search_client.search(
    search_text=query,
    include_total_count=True,
    top=5
))

# Print count of total results.
print(f"Returned {len(results)} results using only text-based search.")
print("----------")
# Iterate over Results
# Index Fields - id, content, content_vector
for result in results:
    print("Movie: {}".format(result["content"]))
    print("----------")

In [None]:
# Now let's do a vector search that uses the embeddings we created and inserted into content_vector field in the index.
query = "What are the best 80s movies I should look at?"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=5,
    fields="content_vector"
)

# Execute the search
results = list(search_client.search(
    search_text="",
    include_total_count=True,
    vectors=[vector],
    select=["id", "content", "title"],
))

# Print count of total results.
print(f"Returned {len(results)} results using only vector-based search.")
print("----------")
# Iterate over results and print out the content.
for result in results:
    print(result["title"])
    print("----------")

Did that return what you expected? Probably not, let's dig deeper to see why.

Let's do the same search again, but this time let's return the **Search Score** so we can see the value returned by the cosine similarity vector store calculation.

In [None]:
# Try again, but this time let's add the relevance score to maybe see why
query = "What are the best 80s movies I should look at?"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=5,
    fields="content_vector"
)

# Execute the search
results = list(search_client.search(
    search_text="",
    include_total_count=True,
    vectors=[vector],
    select=["id", "content", "title"],
))

# Print count of total results.
print(f"Returned {len(results)} results using vector search.")
print("----------")
# Iterate over results and print out the id and search score.
for result in results:  
    print(f"Id: {result['id']}")
    print(f"Id: {result['title']}")
    print(f"Score: {result['@search.score']}")
    print("----------")

If you look at the Search Score you will see the relevant ranking of the closest vector match to the query inputted. The lower the score the farther apart the two vectors are. Let's change the search term and see if we can get a higher Search Score which means a higher match and closer vector proximity.

In [None]:
# Try again, but this time let's add the relevance score to maybe see why
query = "Who are the actors in the movie Hidden Figures?"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=5,
    fields="content_vector"
)

# Execute the search
results = list(search_client.search(
    search_text="",
    include_total_count=True,
    vectors=[vector],
    select=["id", "content", "title"],
))

# Print count of total results.
print(f"Returned {len(results)} results using vector search.")
print("----------")
# Iterate over results and print out the id and search score.
for result in results:  
    print(f"Id: {result['id']}")
    print(f"Id: {result['title']}")
    print(f"Score: {result['@search.score']}")
    print("----------")

**NOTE:** As you have seen from the results, different inputs can return different results, it all depends on what data is in the Vector Store. The higher the score the higher the likelihood of a match.

## Hybrid Searching using Azure Cognitive Search

What is Hybrid Search? The search is implemented at the field level, which means you can build queries that include vector fields and searchable text fields. The queries execute in parallel and the results are merged into a single response. Optionally, add semantic search, currently in preview, for even more accuracy with L2 reranking using the same language models that power Bing.

**NOTE:** Hybrid Search is a key value proposition of Azure Cognitive Search in comparison to vector only data stores. Click [Hybrid Search](https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview) for more details.

In [None]:
# Hybrid Search
# Let's try our original query again using Hybrid Search (ie. Combination of Text & Vector Search)
query = "What are the best 80s movies I should look at?"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=5,
    fields="content_vector"
)

# Notice we also fill in the search_text parameter with the query.
results = list(search_client.search(
    search_text=query,
    include_total_count=True,
    top=10,
    vectors=[vector],
    select=["id", "content", "title"],
))

# Print count of total results.
print(f"Returned {len(results)} results using vector search.")
print("----------")
# Iterate over results and print out the id and search score.
for result in results:  
    print(f"Id: {result['id']}")
    print(result['title'])
    print(f"Hybrid Search Score: {result['@search.score']}")
    print("----------")

In [None]:
# Hybrid Search
# Let's try our more specific query again to see the difference in the score returned.
query = "Who are the actors in the movie Hidden Figures?"

search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)

# You can see here that we are getting the embedding representation of the query.
vector = Vector(
    value=embeddings.embed_query(query),
    k=5,
    fields="content_vector"
)

# -----
# Notice we also fill in the search_text parameter with the query along with the vector.
# -----
results = list(search_client.search(
    search_text=query,
    include_total_count=True,
    top=10,
    vectors=[vector],
    select=["id", "content", "title"],
))

# Print count of total results.
print(f"Returned {len(results)} results using hybrid search.")
print("----------")
# Iterate over results and print out the id and search score.
for result in results:  
    print(f"Id: {result['id']}")
    print(f"Title: {result['title']}")
    print(f"Hybrid Search Score: {result['@search.score']}")
    print("----------")

## Bringing it All Together with Retrieval Augmented Generation (RAG) + Langchain (LC)

Now that we have our Vector Store setup and data loaded, we are now ready to implement the RAG pattern using AI Orchestration. At a high-level, the following steps are required:
1. Ask the question
2. Create Prompt Template with inputs
3. Get Embedding representation of inputted question
4. Use embedded version of the question to search Azure Cognitive Search (ie. The Vector Store)
5. Inject the results of the search into the Prompt Template & Execute the Prompt to get the completion

In [19]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
# Create an Embeddings Instance of Azure OpenAI

embeddings = OpenAIEmbeddings(
    openai_api_base = openai_api_base,
    openai_api_version = openai_api_version,
    deployment_name ="text-embedding-ada-002",
    openai_api_key = openai_api_key,
    openai_api_type = openai_api_type,
    embedding_ctx_length=8191,
    chunk_size=1000,
    max_retries=6)

# Create a Completion Instance of Azure OpenAI

llm = AzureChatOpenAI(
    openai_api_base= openai_api_base,
    openai_api_version= openai_api_version,
    deployment_name="gpt-35-turbo-16k",
    temperature=0,
    openai_api_key= openai_api_key,
    openai_api_type = openai_api_type,
    max_retries=6,
    max_tokens=4000
)



In [21]:
from langchain.schema import HumanMessage, SystemMessage, AIMessage 
llm(
    [
        SystemMessage(content="You are a nice AI bot that helps a user figure out what to eat in one short sentence"),
        HumanMessage(content="I like tomatoes, what should I eat?")
    ]
)

AIMessage(content='You could try a Caprese salad with fresh tomatoes, mozzarella, and basil.')

## Ask questions using a simple chain 

In [22]:

# Ask the question
question = "List the movies about ships on the water."

# Create a prompt template with variables, note the curly braces
from langchain.prompts import PromptTemplate
prompt = PromptTemplate(
    input_variables=["original_question","search_results"],
    template="""
    Question: {original_question}

    Do not use any other data.
    Only use the movie data below when responding.
    {search_results}
    """,
)

# Get Embedding for the original question
question_embedded=embeddings.embed_query(question)

# Search Vector Store
search_client = SearchClient(
    acs_endpoint_name,
    acs_index_name,
    AzureKeyCredential(acs_api_key)
)
vector = Vector(
    value=question_embedded,
    k=5,
    fields="content_vector"
)
results = list(search_client.search(
    search_text="",
    include_total_count=True,
    vectors=[vector],
    select=["title"],
))

# Build the Prompt and Execute against the Azure OpenAI to get the completion
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
response = chain.run({"original_question": question, "search_results": results})
print(response)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
    Question: List the movies about ships on the water.

    Do not use any other data.
    Only use the movie data below when responding.
    [{'title': 'Pirates of the Caribbean: Tales of the Code – Wedlocked', '@search.score': 0.83367515, '@search.reranker_score': None, '@search.highlights': None, '@search.captions': None}, {'title': 'Pirates of the Caribbean: Tales of the Code – Wedlocked', '@search.score': 0.83367515, '@search.reranker_score': None, '@search.highlights': None, '@search.captions': None}, {'title': 'Броненосец Потёмкин', '@search.score': 0.8313832, '@search.reranker_score': None, '@search.highlights': None, '@search.captions': None}, {'title': 'Броненосец Потёмкин', '@search.score': 0.8313832, '@search.reranker_score': None, '@search.highlights': None, '@search.captions': None}, {'title': 'Under Siege 2: Dark Territory', '@search.score': 0.8253132, '@search.reranker_score': None, '@sea

## Using a conversational chain 

#### Conversational Retrieval Chain

In [23]:
# Connect to Azure Cognitive Search
from langchain.vectorstores import AzureSearch
acs = AzureSearch(azure_search_endpoint= acs_endpoint_name,
                 azure_search_key= acs_api_key,
                 index_name = acs_index_name,
                 embedding_function=embeddings.embed_query)

In [24]:
retriver = acs.as_retriever()
testdocs = retriver.get_relevant_documents(query="List the movies about ships on the water.")
len(testdocs )

4

In [25]:
retriever = acs.as_retriever(search_kwargs={"k": 5})

In [26]:
retriever.search_type

'similarity'

In [27]:
retriever.search_kwargs

{'k': 5}

In [28]:
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate

# Adapt if needed
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:""")

qa = ConversationalRetrievalChain.from_llm(llm=llm,
                                           retriever=retriever,
                                           condense_question_prompt=CONDENSE_QUESTION_PROMPT,
                                           return_source_documents=True,
                                           verbose=False)

In [29]:
chat_history = []
query = "List the movies about ships on the water?"
result = qa({"question": query, "chat_history": chat_history})

print("Question:", query)
print("Answer:", result["answer"])


Question: List the movies about ships on the water?
Answer: 1. Броненосец Потёмкин (1925) - A dramatized account of a great Russian naval mutiny and a resultant public demonstration, showing support, which brought on a police massacre. This film had an incredible impact on the development of cinema and is a masterful example of montage editing.

2. Pirates of the Caribbean: Tales of the Code – Wedlocked (2011) - This short film serves as a prequel to The Curse of the Black Pearl and explains the events leading up to Jack Sparrow's boat, the Jolly Mon, sinking. It follows the story of two wenches, Scarlett and Giselle, who find themselves in an auction led by the Auctioneer after realizing they were both engaged to Jack Sparrow.

3. Under Siege 2: Dark Territory (1995) - A passenger train has been hijacked by an electronics expert and turned into an untraceable command center for a weapons satellite. Former Navy SEAL Casey Ryback is the only one who can stop him and prevent the destruct

#### Using the RetrievalQA Chain

In [30]:
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm=llm, 
                                  chain_type="stuff",
                                  retriever=acs.as_retriever(),
                                  return_source_documents=True,
                                  verbose=False)


In [31]:
llm_response = qa_chain(query)

In [32]:
print(llm_response)

{'query': 'List the movies about ships on the water?', 'result': "1. Броненосец Потёмкин (1925) - A dramatized account of a great Russian naval mutiny and a resultant public demonstration, showing support, which brought on a police massacre. The film had an incredible impact on the development of cinema and is a masterful example of montage editing.\n\n2. Pirates of the Caribbean: Tales of the Code – Wedlocked (2011) - This short film serves as a prequel to The Curse of the Black Pearl and explains the events leading up to Jack Sparrow's boat sinking and the wenches' anger towards him. It is set on the water and features ships and pirates.\n\nPlease note that these are just two examples and there may be other movies about ships on the water.", 'source_documents': [Document(page_content="id: 643.0\noriginal_language: ru\noriginal_title: Броненосец Потёмкин\npopularity: 8.956\nrelease_date: 1925-12-24\nvote_average: 7.7\nvote_count: 800.0\ngenre: ['Drama', 'History']\noverview: A dramati

In [46]:
print(llm_response['source_documents'])

[Document(page_content="id: 643.0\noriginal_language: ru\noriginal_title: Броненосец Потёмкин\npopularity: 8.956\nrelease_date: 1925-12-24\nvote_average: 7.7\nvote_count: 800.0\ngenre: ['Drama', 'History']\noverview: A dramatized account of a great Russian naval mutiny and a resultant public demonstration, showing support, which brought on a police massacre. The film had an incredible impact on the development of cinema and is a masterful example of montage editing.\nrevenue: 45100.0\nruntime: 75.0\ntagline: Revolution is the only lawful, equal, effectual war. It was in Russia that this war was declared and begun.", metadata={'id': '9bcc115c-5a28-4005-8684-445a92088b86', 'tagline': None, 'popularity': None, 'title': 'Броненосец Потёмкин', '@search.score': 0.027756938710808754, '@search.reranker_score': None, '@search.highlights': None, '@search.captions': None}), Document(page_content="id: 230652.0\noriginal_language: en\noriginal_title: Pirates of the Caribbean: Tales of the Code – We

Wrapping the answer in a better format

In [49]:
## Cite sources
def process_llm_response(llm_response):
    print(llm_response['result'])
    print('\n\nSources:')
    #for source in llm_response["source_documents"]:
    #    print(source.metadata['source'])

In [50]:
# full example
query = "List the movies about ships on the water?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

1. Броненосец Потёмкин (1925) - A dramatized account of a great Russian naval mutiny and a resultant public demonstration, showing support, which brought on a police massacre. The film had an incredible impact on the development of cinema and is a masterful example of montage editing.

2. Pirates of the Caribbean: Tales of the Code – Wedlocked (2011) - This short film serves as a prequel to The Curse of the Black Pearl and explains the events leading up to Jack Sparrow's boat sinking and the wenches' anger towards him. It is set on the water and features ships and pirates.

Please note that these are just two examples and there may be other movies about ships on the water.


Sources:


In [51]:
llm_response["source_documents"]

[Document(page_content="id: 643.0\noriginal_language: ru\noriginal_title: Броненосец Потёмкин\npopularity: 8.956\nrelease_date: 1925-12-24\nvote_average: 7.7\nvote_count: 800.0\ngenre: ['Drama', 'History']\noverview: A dramatized account of a great Russian naval mutiny and a resultant public demonstration, showing support, which brought on a police massacre. The film had an incredible impact on the development of cinema and is a masterful example of montage editing.\nrevenue: 45100.0\nruntime: 75.0\ntagline: Revolution is the only lawful, equal, effectual war. It was in Russia that this war was declared and begun.", metadata={'id': '9bcc115c-5a28-4005-8684-445a92088b86', 'tagline': None, 'popularity': None, 'title': 'Броненосец Потёмкин', '@search.score': 0.027756938710808754, '@search.reranker_score': None, '@search.highlights': None, '@search.captions': None}),
 Document(page_content="id: 230652.0\noriginal_language: en\noriginal_title: Pirates of the Caribbean: Tales of the Code – W