# Queries with and without Azure OpenAI

So far, you have your Search Engine loaded **from two different data sources in two diferent indexes**, on this notebook we are going to try some example queries and then use Azure OpenAI service to see if we can get a good answer for the user query.

The idea is that a user can ask a question about the dialogues of the TV Show FRIENDS (first datasource index) or about Covid (second datasource/index), and the engine will respond accordingly.

This **Multi-Index** demo, mimics the scenario where a company loads multiple type of documents of different types and about completly different topics and the search engine must respond with the most relevant results.

## Set up variables

In [1]:
import os
import urllib
import requests
import random
import json
from collections import OrderedDict
from IPython.display import display, HTML, Markdown
from typing import List
from operator import itemgetter

# LangChain Imports needed
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

from langchain_core.output_parsers import StrOutputParser
from langchain_core.retrievers import BaseRetriever
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.runnables import ConfigurableField


# Our own libraries needed
from common.prompts import DOCSEARCH_PROMPT_TEXT
from common.utils import get_search_results

from dotenv import load_dotenv
load_dotenv("credentials.env")

True

In [2]:
# Setup the Payloads header
headers = {'Content-Type': 'application/json','api-key': os.environ['AZURE_SEARCH_KEY']}
params = {'api-version': os.environ['AZURE_SEARCH_API_VERSION']}

## Multi-Index Search queries

In [3]:
# Text-based Indexes that we are going to query (from Notebook 01 and 02)
index1_name = "srch-index-files"
index2_name = "srch-index-csv"
indexes = [index2_name, index1_name]

Try questions that you think might be answered or addressed in the dialogues of Friends, or that can be addressed by medical publications about COVID in 2020-2021. Try comparing the results with the open version of ChatGPT.<br>

**Example Questions you can ask**:
- Is Chandler ever jealous of Richard?
- Who is Mindy?
- What happened between Ross and Rachel in Vegas?
- What are some examples of reinforcement learning in virus spread?
- What are the main risk factors for Covid-19?
- What medicine reduces inflamation in the lungs?
- Why Covid doesn't affect kids that much compared to adults?
- Does chloroquine really works against covid?
- Who won the 1994 soccer world cup? # This question should yield no answer if the system is correctly grounded

In [4]:
QUESTION = "Is Chandler ever jealous of Richard?"

### Search on both indexes individually and aggragate results

#### **Note**: 
In order to standarize the indexes, **there must be 6 mandatory fields present on each index**: `id, title, name, location, chunk, chunkVector`. This is so that each document can be treated the same along the code. Also, **all indexes must have a semantic configuration**.

We are going to use Semantic Hybrid Queries: vector search and keyword search, with semantic ranking over the merged result set for optimal results!. Per documentation:
> Hybrid search combines text (keyword) and vector queries in a single search request. All subqueries in the request execute in parallel. The results are merged and reordered by new search scores, using Reciprocal Rank Fusion (RRF) to return a unified result set. In many cases, per benchmark tests, hybrid queries with semantic ranking return the most relevant results.

In [9]:
agg_search_results = dict()

# Whenever you use semantic ranking with vectors, make sure k is set to 50. 
# Semantic ranker uses up to 50 matches as input. Specifying less than 50 deprives the semantic ranking models of necessary inputs.
k = 50 

for index in indexes:
    search_payload = {
        "search": QUESTION, # Text query
        "select": "id, title, name, location, chunk",
        "count":"true",
        "top": k,
        "queryType": "semantic",
        "semanticConfiguration": "my-semantic-config",
        "captions": "extractive",
        "answers": "extractive",
        "vectorQueries": [  # Vector query
            {
                "text": QUESTION, 
                "fields": "chunkVector", 
                "kind": "text", 
                "k": k
            }
        ],
        "debug": "all",
    }

    r = requests.post(os.environ['AZURE_SEARCH_ENDPOINT'] + "/indexes/" + index + "/docs/search",
                     data=json.dumps(search_payload), headers=headers, params=params)
    print(r.status_code)

    search_results = r.json()
    agg_search_results[index]=search_results
    print("Index:", index, "Results Found: {}, Results Returned: {}".format(search_results['@odata.count'], len(search_results['value'])))

200
Index: srch-index-csv Results Found: 2801, Results Returned: 50
200
Index: srch-index-files Results Found: 2896, Results Returned: 50


#### **Important Note**: 
You may encounter errors (502) when attempting to search for results IF the indexer is still processing documents. This occurs because the embedding model is heavily utilized by the indexer, hitting its TPM quota. If you experience search errors, please try again or wait until the indexing is complete, which may take several hours.

In [10]:
# agg_search_results

### Display the top results (from both searches) based on the score

In [11]:
display(HTML('<h4>Top Answers</h4>'))

for index,search_results in agg_search_results.items():

    for result in search_results['@search.answers']:
        if result['score'] > 0.5: # Show answers that are at least 50% of the max possible score=1
            display(HTML('<h5>' + 'Answer - score: ' + str(round(result['score'],2)) + '</h5>'))
            display(HTML(result['text']))

            
print("\n\n")
display(HTML('<h4>Top Results</h4>'))

content = dict()
ordered_content = OrderedDict()


for index,search_results in agg_search_results.items():
    for result in search_results['value']:
        if result['@search.rerankerScore'] > 1:# Show answers that are at least 25% of the max possible score=4
            content[result['id']]={
                                    "title": result['title'],
                                    "chunk": result['chunk'], 
                                    "name": result['name'], 
                                    "location": result['location'] ,
                                    "caption": result['@search.captions'][0]['text'],
                                    "score": result['@search.rerankerScore'],
                                    "index": index
                                    }
    
#After results have been filtered we will Sort and add them as an Ordered list\n",
for id in sorted(content, key= lambda x: content[x]["score"], reverse=True):
    ordered_content[id] = content[id]
    url = str(ordered_content[id]['location']) + os.environ['BLOB_SAS_TOKEN']
    title = str(ordered_content[id]['title']) if (ordered_content[id]['title']) else ordered_content[id]['name']
    score = str(round(ordered_content[id]['score'],2))
    display(HTML('<h5><a href="'+ url + '">' + ordered_content[id]['location'] + '</a> - score: '+ score + '</h5>'))
    display(HTML(ordered_content[id]['caption']))






### Comments on Query results

As seen above the semantic re-ranking feature of Azure AI Search service is decent. It gives answers (sometimes) and also the top results with the corresponding file and the paragraph where the answers is possible located.

Let's see if we can make this better with Azure OpenAI

# Using Azure OpenAI

To use OpenAI to get a better answer to our question, the thought process is simple: let's **give the answer and the content of the documents from the search result to the GPT model as context and let it provide a better response**. This is what RAG (Retreival Augmented Generation) is about.

Now, before we do this, we need to understand a few things first:

1) Chainning and Prompt Engineering
2) Embeddings

We will use a library call **LangChain** that wraps a lot of boiler plate code.
Langchain is one library that does a lot of the prompt engineering for us under the hood, for more information see [here](https://python.langchain.com/docs/introduction/)

In [12]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

**Important Note**: Starting now, we will utilize Azure OpenAI models. Please ensure that you have deployed the following models within the Azure OpenAI portal:

- text-embedding-3-large
- gpt-4o
- gpt-4o-mini

Reference for Azure OpenAI models (regions, limits, dimensions, etc): [HERE](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)

## A gentle intro to chaining LLMs and prompt engineering

Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step.

Azure OpenAI is a type of LLM (provider) that you can use but there are others like Cohere, Huggingface, etc.

Chains can be simple (i.e. Generic) or specialized (i.e. Utility).

* Generic — A single LLM is the simplest chain. It takes an input prompt and the name of the LLM and then uses the LLM for text generation (i.e. output for the prompt).

Here’s an example:

In [13]:
COMPLETION_TOKENS = 2000
llm = AzureChatOpenAI(deployment_name=os.environ["GPT4oMINI_DEPLOYMENT_NAME"], 
                      temperature=0, 
                      max_tokens=COMPLETION_TOKENS)

In [14]:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}. Give your response in {language}")
])

The | symbol is similar to a unix pipe operator, which chains together the different components feeds the output from one component as input into the next component.

In [15]:
chain = prompt | llm | output_parser

In [16]:
%%time
display(Markdown(chain.invoke({"input": QUESTION, "language": "English"})))

Yes, Chandler does experience feelings of jealousy towards Richard in the TV show "Friends." This jealousy primarily arises during the earlier seasons when Monica starts dating Richard, who is significantly older than her. Chandler, being one of Monica's close friends, feels protective of her and is concerned about the age difference and the potential for Richard to hurt her.

In particular, Chandler's jealousy is highlighted in Season 2, Episode 24 ("The One with Barry and Mindy"), when he expresses discomfort with Monica's relationship with Richard. He worries about how serious their relationship is and whether Richard is the right person for her. Chandler's jealousy is more about his concern for Monica's well-being and his protective instincts as a friend rather than a romantic rivalry.

Overall, while Chandler's jealousy is not a central theme, it does surface in moments throughout the series, reflecting his loyalty to Monica and his feelings about her relationship with Richard.

CPU times: user 30.5 ms, sys: 242 μs, total: 30.7 ms
Wall time: 1.93 s


**Note**: this is the first time you use OpenAI in this Accelerator, so if you get a Resource not found error, is most likely because the name of your OpenAI model deployment is different than the environmental variable set above `os.environ["GPT4oMINI_DEPLOYMENT_NAME"]`

Great!!, now you know how to create a simple prompt and use a chain in order to answer a general question using ChatGPT knowledge!. 

It is important to note that we rarely use generic chains as standalone chains. More often they are used as building blocks for Utility chains (as we will see next). Also important to notice is that we are NOT using our documents or the result of the Azure Search yet, just the knowledge of ChatGPT on the data it was trained on.

**The second type of Chains are Utility:**

* Utility — These are specialized chains, comprised of many building blocks to help solve a specific task. For example, LangChain supports some end-to-end chains (such as `create_retrieval_chain` for QnA Doc retrieval, Summarization, etc).

We will build our own specific chain in this workshop for digging deeper and solve our use case of enhancing the results of Azure AI Search.


But before dealing with the utility chain needed, let's first review the concept of Embeddings and Vector Search and RAG. 

## Embeddings and Vector Search

From the Azure OpenAI documentation ([HERE](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=python)), An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. 

### Why Do We Need Vectors?

Vectors are essential for several reasons:

- **Semantic Richness**: They convert the semantic meaning of text into mathematical vectors, capturing nuances that simple keyword searches miss. This makes them incredibly powerful for understanding and processing language.
- **Human-like Searching**: Searching using vector distances mimics the human approach to finding information based on context and meaning, rather than relying solely on exact word matches.
- **Efficiency in Scale**: Vector representations allow for efficient handling and searching of large datasets. By reducing complex text to numerical vectors, algorithms can quickly sift through vast amounts of information.

### Understanding LLM Tokens' Context Limitation

Large Language Models (LLMs) like GPT come with a token limit for each input, which poses a challenge when dealing with lengthy documents or extensive data sets. This limitation restricts the model's ability to understand and generate responses based on the full context of the information provided. It becomes crucial, therefore, to devise strategies that can effectively manage and circumvent this limitation to leverage the full power of LLMs.

To address this challenge, the solution incorporates several key steps:

1. **Segmenting Documents**: Breaking down large documents into smaller, manageable segments.
2. **Vectorization of Chunks**: Converting these segments into vectors, making them compatible with vector-based search techniques.
3. **Hybrid Search**: Employing both vector and text search methods to pinpoint the most relevant segments in relation to the query.
4. **Optimal Context Provision**: Presenting the LLM with the most pertinent segments, ensuring a balance between detail and brevity to stay within token limits.


Our ultimate goal is to rely solely on vector indexes and hybrid searchs (vector + text). While it is possible to manually code parsers with OCR for various file types and develop a scheduler to synchronize data with the index, there is a more efficient alternative: **Azure AI Search has automated chunking strategies and vectorization**.

It's important to note that **document segmentation and vectorization have already been completed in AI Azure Search**, as seen in the `ordered_content` dictionary. This pre-processing step simplifies subsequent operations, ensuring rapid response times and adherence to the token limits of the chosen OpenAI model.


So really, our only job now is to make sure that the results from the Azure AI Search queries fit on the LLM context size, and then let it do its magic.

In [17]:
index_name = "srch-index-files"
index2_name = "srch-index-csv"
indexes = [index1_name, index2_name]

In order to not duplicate code, we have put many of the code used above into functions. These functions are in the `common/utils.py` and `common/prompts.py` files. This way we can use these functios in the app that we will build later.

`get_search_results()` do the multi-index search and returns the combined ordered list of documents/chunks.

In [18]:
k = 50  # play with this parameter and see the quality of the final answer
ordered_results = get_search_results(QUESTION, indexes, k=k, reranker_threshold=1)
print("Number of results:",len(ordered_results))

Number of results: 50


In [19]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

Now let's create a Prompt Template that will ground the response only in the chunks retrieve by our hybrid AI Search.

In [20]:
template = """Answer the question thoroughly, based **ONLY** on the following context:
{context}

Important: Assume you know nothing about the subject, only based your answer on the context above.

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [21]:
%%time 
# Creation of our custom chain
chain = prompt | llm | output_parser

try:
    display(Markdown(chain.invoke({"question": QUESTION, "context": ordered_results})))
except Exception as e:
    print(e)

Yes, Chandler Bing does exhibit jealousy towards Richard Burke in the context provided. In one exchange, Chandler expresses his discomfort and jealousy when he imagines Monica being with Richard, particularly when he pictures her in a compromising situation. He makes comments that indicate he is bothered by Richard's lingering feelings for Monica and the fact that Richard keeps a tape of Monica, suggesting that he feels threatened by Richard's past relationship with her. Additionally, Chandler's reactions and comments throughout the dialogues indicate that he is concerned about Richard's presence in Monica's life and how it affects their relationship.

CPU times: user 14.7 ms, sys: 3.91 ms, total: 18.6 ms
Wall time: 4.13 s


# Improving the Prompt and adding citations

We could see above that in the answer given by GPT4o-mini, there is no citations or references. **How do we know if the answer is grounded on the context or not?**

Let's see if this can be improved by Prompt Engineering.<br>
On `common/prompts.py` we created a prompt called `DOCSEARCH_PROMPT_TEXT` check it out!

**Let's also create a custom Retriever class** so we can plug it in easily within the chain building. 
Note: we can also use the Azure AI Search retriever class [HERE](https://python.langchain.com/docs/integrations/vectorstores/azuresearch), however we want to create a custom Retriever for the following reasons:

1) We want to do multi-index searches in one call
2) Easier to teach complex concepts of LangChain in this notebook
3) We want to use the REST API vs the Python Azure Search SDK

In [22]:
class CustomRetriever(BaseRetriever):
    
    topK : int
    reranker_threshold : int
    indexes: List
    sas_token: str = None
    search_filter: str = None
    
    def _get_relevant_documents(self, query: str) -> List[dict]:
        
        ordered_results = get_search_results(query, self.indexes, k=self.topK, 
                                             reranker_threshold=self.reranker_threshold, 
                                             sas_token=self.sas_token, search_filter=self.search_filter)
        top_docs = []
        for key,value in ordered_results.items():
            location = value["location"] if value["location"] is not None else ""
            document = {"source": location,
                        "score": value["score"],
                        "page_content": value["chunk"]}
            top_docs.append(document)

        return top_docs

In [23]:
# Create the retriever
retriever = CustomRetriever(indexes=indexes, topK=k, reranker_threshold=1, sas_token=os.environ['BLOB_SAS_TOKEN'])

In [24]:
# Test retreiver
results = retriever.invoke(QUESTION)
len(results)

50

In [25]:
# We can create now a dynamically configurable llm object that can change the model at runtime
dynamic_llm = AzureChatOpenAI(deployment_name=os.environ["GPT4oMINI_DEPLOYMENT_NAME"], 
                              temperature=0.5, max_tokens=COMPLETION_TOKENS).configurable_alternatives(
    # This gives this field an id
    # When configuring the end runnable, we can then use this id to configure this field
    ConfigurableField(id="model"),
    # This sets a default_key.
    # If we specify this key, the default LLM  (initialized above) will be used
    default_key="gpt4omini",
    # This adds a new option, with name `gpt4o`
    gpt4o=AzureChatOpenAI(deployment_name=os.environ["GPT4o_DEPLOYMENT_NAME"], 
                         temperature=0.5, max_tokens=COMPLETION_TOKENS),
    # You can add more configuration options here
)

In [26]:
# Define prompt template
DOCSEARCH_PROMPT = ChatPromptTemplate.from_messages(
    [
        ("system", DOCSEARCH_PROMPT_TEXT + "\n\nCONTEXT:\n{context}\n\n"),
        ("human", "{question}"),
    ]
)

In [27]:
# Declaration of the chain with the dynamic llm and the new prompt
configurable_chain = (
    {
        "context": itemgetter("question") | retriever, # Passes the question to the retriever and the results are assign to context
        "question": itemgetter("question")
    }
    | DOCSEARCH_PROMPT  # Passes the input variables above to the prompt template
    | dynamic_llm   # Passes the finished prompt to the LLM
    | StrOutputParser()  # converts the output (Runnable object) to the desired output (string)
)

In [28]:
%%time

try:
    display(Markdown(configurable_chain.with_config(configurable={"model": "gpt4omini"}).invoke({"question": QUESTION})))
except Exception as e:
    print(e)

Yes, Chandler Bing does express feelings of jealousy towards Richard Burke in various interactions. For instance, in one conversation, Chandler confronts Monica about Richard keeping a tape of her, suggesting that Richard is not over her and implying that he should feel bad for Richard instead of being jealous [[1]](https://blobstorageixqo5iaqmpzwc.blob.core.windows.net/friends/s09/e07/c11.txt?sv=2022-11-02&ss=b&srt=sco&sp=rltfx&se=2026-01-02T09:04:19Z&st=2025-01-02T01:04:19Z&spr=https&sig=q%2FjY9R25rdc%2BIH1iiq1uPIBm82xECsN9d%2B2ftdM1SJI%3D). 

Additionally, when Monica runs into Richard and has lunch with him, she chooses not to tell Chandler because she believes it would freak him out, especially since it is close to their anniversary [[2]](https://blobstorageixqo5iaqmpzwc.blob.core.windows.net/friends/s05/e23/c02.txt?sv=2022-11-02&ss=b&srt=sco&sp=rltfx&se=2026-01-02T09:04:19Z&st=2025-01-02T01:04:19Z&spr=https&sig=q%2FjY9R25rdc%2BIH1iiq1uPIBm82xECsN9d%2B2ftdM1SJI%3D). 

In another instance, Chandler directly expresses his jealousy when he learns that Monica had lunch with Richard, as he feels insecure about his relationship with her and how Richard could potentially still have feelings for her [[3]](https://blobstorageixqo5iaqmpzwc.blob.core.windows.net/friends/s06/e25/c11.txt?sv=2022-11-02&ss=b&srt=sco&sp=rltfx&se=2026-01-02T09:04:19Z&st=2025-01-02T01:04:19Z&spr=https&sig=q%2FjY9R25rdc%2BIH1iiq1uPIBm82xECsN9d%2B2ftdM1SJI%3D). 

Overall, Chandler's jealousy is a recurring theme in his interactions with Monica regarding Richard.

CPU times: user 37.7 ms, sys: 8.83 ms, total: 46.5 ms
Wall time: 8.16 s


As seen above, we were able to improve the quality and breath of the answer and add citations with only prompt engineering!

#### Let's try again, but with GPT-4o

In [29]:
%%time
try:
    display(Markdown(configurable_chain.with_config(configurable={"model": "gpt4o"}).invoke({"question": QUESTION})))
except Exception as e:
    print(e)

Yes, Chandler does experience jealousy towards Richard. In one instance, Chandler expresses his jealousy when he discovers that Monica had lunch with Richard. Monica tries to assure Chandler that it was nothing, but Chandler is clearly bothered by the situation, even though he tries to hide it by saying he's not mad [[3]](https://blobstorageixqo5iaqmpzwc.blob.core.windows.net/friends/s05/e23/c03.txt?sv=2022-11-02&ss=b&srt=sco&sp=rltfx&se=2026-01-02T09:04:19Z&st=2025-01-02T01:04:19Z&spr=https&sig=q%2FjY9R25rdc%2BIH1iiq1uPIBm82xECsN9d%2B2ftdM1SJI%3D).

Additionally, Chandler's jealousy is evident when he confronts Richard after Monica goes to see him. Chandler accuses Richard of making Monica "think" about their relationship, which clearly upsets him. Chandler even reveals to Richard that he was planning to propose to Monica, showing how serious he is about their relationship and how threatened he feels by Richard's presence [[11]](https://blobstorageixqo5iaqmpzwc.blob.core.windows.net/friends/s06/e25/c11.txt?sv=2022-11-02&ss=b&srt=sco&sp=rltfx&se=2026-01-02T09:04:19Z&st=2025-01-02T01:04:19Z&spr=https&sig=q%2FjY9R25rdc%2BIH1iiq1uPIBm82xECsN9d%2B2ftdM1SJI%3D).

CPU times: user 31.7 ms, sys: 10.3 ms, total: 42 ms
Wall time: 21 s


**Answers from GPT-4o-mini and GPT-4o can vary ever time you run it!, and they are all correct most of the time**

However if you try many times, you will see that GPT-4o provide better answers and is better at following instructions and citations and it is less prune to hallucinate. 

## Adding Streaming to improve user experience and performance

One way to make the response look faster is to stream the answer, so the user can see the response as it is typed. To do this, we just simply need to call the method `stream` instead of `invoke`. More on Streaming and Callbacks in later notebooks, but for now, this is one simple way to do it:

In [30]:
for chunk in configurable_chain.with_config(configurable={"model": "gpt4o"}).stream({"question": QUESTION}):
    print(chunk, end="", flush=True)

Yes, Chandler does show signs of jealousy towards Richard in several instances. For example, Chandler expresses insecurity about Monica's past with Richard when he mentions that Richard is not over Monica and keeps a tape of her, which makes Chandler feel inadequate and jealous [[1]](https://blobstorageixqo5iaqmpzwc.blob.core.windows.net/friends/s09/e07/c11.txt?sv=2022-11-02&ss=b&srt=sco&sp=rltfx&se=2026-01-02T09:04:19Z&st=2025-01-02T01:04:19Z&spr=https&sig=q%2FjY9R25rdc%2BIH1iiq1uPIBm82xECsN9d%2B2ftdM1SJI%3D). Additionally, Chandler becomes concerned when Monica runs into Richard and has lunch with him, even though she assures Phoebe that she felt nothing for Richard anymore [[2]](https://blobstorageixqo5iaqmpzwc.blob.core.windows.net/friends/s05/e23/c02.txt?sv=2022-11-02&ss=b&srt=sco&sp=rltfx&se=2026-01-02T09:04:19Z&st=2025-01-02T01:04:19Z&spr=https&sig=q%2FjY9R25rdc%2BIH1iiq1uPIBm82xECsN9d%2B2ftdM1SJI%3D). Lastly, Chandler confronts Richard when Monica is contemplating their relatio

## Testing Groundness

Let’s ask the same question again, but this time for Season 1, where we know Richard doesn’t appear.

In [31]:
search_filter = "substringof('/s01/', location)"
retriever = CustomRetriever(indexes=indexes, topK=k, reranker_threshold=1, 
                            sas_token=os.environ['BLOB_SAS_TOKEN'],
                            search_filter=search_filter)
configurable_chain = (
    {
        "context": itemgetter("question") | retriever, # Passes the question to the retriever and the results are assign to context
        "question": itemgetter("question")
    }
    | DOCSEARCH_PROMPT  # Passes the input variables above to the prompt template
    | dynamic_llm   # Passes the finished prompt to the LLM
    | StrOutputParser()  # converts the output (Runnable object) to the desired output (string)
)

In [32]:
%%time

try:
    display(Markdown(configurable_chain.with_config(configurable={"model": "gpt4o"}).invoke({"question": QUESTION})))
except Exception as e:
    print(e)

Empty Search Response


The tools did not provide relevant information. I cannot answer this from prior knowledge.

CPU times: user 32.2 ms, sys: 4.02 ms, total: 36.2 ms
Wall time: 1.27 s


**Perfect!**, even know the model knows the answer based on its training data, it is grounding the answer only on the results from the context retrieved from azure AI search

# Summary
##### By using OpenAI, the answers to user questions are way better than taking just the results from Azure AI Search. So the summary is:
- Utilizing Azure AI Search, we conduct a multi-index hybrid search that identifies the top chunks of documents from each index.
- Subsequently, Azure OpenAI utilizes these extracted chunks as context, comprehends the content, and employs it to deliver optimal answers.
- Best of two worlds!

##### Important observations on this notebook:

1) Answers with GPT-4o-mini and GPT-4o are both correct, but GPT-4o seems have more breath and depth on its answers.
2) Both models provide good and diverse citations in the right format.
3) Streaming the answers improves the user experience big time!
4) We achieved a good level of groundness using prompt engineering

# NEXT
In the next notebook, we are going to see how we can treat complex and large documents separately, also using Vector Search