# Queries with and without Azure OpenAI

So far, you have your Search Engine loaded **from two different data sources in two diferent indexes**, on this notebook we are going to try some example queries and then use Azure OpenAI service to see if we can get a good answer for the user query.

The idea is that a user can ask a question about Computer Science (first datasource/index) or about Covid (second datasource/index), and the engine will respond accordingly.
This **Multi-Index** demo, mimics the scenario where a company loads multiple type of documents of different types and about completly different topics and the search engine must respond with the most relevant results.

## Set up variables

In [1]:
import os
import urllib
import requests
import random
import json
from collections import OrderedDict
from IPython.display import display, HTML, Markdown
from typing import List
from operator import itemgetter

# LangChain Imports needed
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.retrievers import BaseRetriever
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.runnables import ConfigurableField


# Our own libraries needed
from common.prompts import DOCSEARCH_PROMPT
from common.utils import get_search_results

from dotenv import load_dotenv
load_dotenv("credentials.env")

True

In [2]:
# Setup the Payloads header
headers = {'Content-Type': 'application/json','api-key': os.environ['AZURE_SEARCH_KEY']}
params = {'api-version': os.environ['AZURE_SEARCH_API_VERSION']}

## Multi-Index Search queries

In [3]:
# Text-based Indexes that we are going to query (from Notebook 01 and 02)
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
indexes = [index2_name, index1_name]

Try questions that you think might be answered or addressed in computer science papers in 2020-2021 or that can be addressed by medical publications about COVID in 2020-2021. Try comparing the results with the open version of ChatGPT.<br>
The idea is that the answers using Azure OpenAI only looks at the information contained on these publications.

**Example Questions you can ask**:
- What is CLP?
- How Markov chains work?
- What are some examples of reinforcement learning?
- What are the main risk factors for Covid-19?
- What medicine reduces inflamation in the lungs?
- Why Covid doesn't affect kids that much compared to adults?
- Does chloroquine really works against covid?
- Who won the 1994 soccer world cup? # This question should yield no answer if the system is correctly grounded

In [4]:
QUESTION = "What medicine reduces inflamation in the lungs?"

### Search on both indexes individually and aggregate results

#### **Note**: 
In order to standarize the indexes, **there must be 6 mandatory fields present on each index**: `id, title, name, location, chunk, chunkVector`. This is so that each document can be treated the same along the code. Also, **all indexes must have a semantic configuration**.

We are going to use Hybrid Queries: Text + Vector Search combined for optimal results!

In [5]:
agg_search_results = dict()
k = 10

for index in indexes:
    search_payload = {
        "search": QUESTION, # Text query
        "select": "id, title, name, location, chunk",
        "queryType": "semantic",
        "vectorQueries": [{"text": QUESTION, "fields": "chunkVector", "kind": "text", "k": k}], # Vector query
        "semanticConfiguration": "my-semantic-config",
        "captions": "extractive",
        "answers": "extractive",
        "count":"true",
        "top": k
    }

    r = requests.post(os.environ['AZURE_SEARCH_ENDPOINT'] + "/indexes/" + index + "/docs/search",
                     data=json.dumps(search_payload), headers=headers, params=params)
    print(r.status_code)

    search_results = r.json()
    agg_search_results[index]=search_results
    print("Index:", index, "Results Found: {}, Results Returned: {}".format(search_results['@odata.count'], len(search_results['value'])))

200
Index: cogsrch-index-csv Results Found: 943, Results Returned: 10
200
Index: cogsrch-index-files Results Found: 207, Results Returned: 10


In [6]:
# agg_search_results

### Display the top results (from both searches) based on the score

In [6]:
display(HTML('<h4>Top Answers</h4>'))

for index,search_results in agg_search_results.items():

    for result in search_results['@search.answers']:
        if result['score'] > 0.5: # Show answers that are at least 50% of the max possible score=1
            display(HTML('<h5>' + 'Answer - score: ' + str(round(result['score'],2)) + '</h5>'))
            display(HTML(result['text']))

            
print("\n\n")
display(HTML('<h4>Top Results</h4>'))

content = dict()
ordered_content = OrderedDict()


for index,search_results in agg_search_results.items():
    for result in search_results['value']:
        if result['@search.rerankerScore'] > 1:# Show answers that are at least 25% of the max possible score=4
            content[result['id']]={
                                    "title": result['title'],
                                    "chunk": result['chunk'], 
                                    "name": result['name'], 
                                    "location": result['location'] ,
                                    "caption": result['@search.captions'][0]['text'],
                                    "score": result['@search.rerankerScore'],
                                    "index": index
                                    }
    
#After results have been filtered we will Sort and add them as an Ordered list\n",
for id in sorted(content, key= lambda x: content[x]["score"], reverse=True):
    ordered_content[id] = content[id]
    url = str(ordered_content[id]['location']) + os.environ['BLOB_SAS_TOKEN']
    title = str(ordered_content[id]['title']) if (ordered_content[id]['title']) else ordered_content[id]['name']
    score = str(round(ordered_content[id]['score'],2))
    display(HTML('<h5><a href="'+ url + '">' + title + '</a> - score: '+ score + '</h5>'))
    display(HTML(ordered_content[id]['caption']))






### Comments on Query results

As seen above the semantic re-ranking feature of Azure AI Search service is good. It gives answers (sometimes) and also the top results with the corresponding file and the paragraph where the answers is possible located.

Let's see if we can make this better with Azure OpenAI

# Using Azure OpenAI

To use OpenAI to get a better answer to our question, the thought process is simple: let's **give the answer and the content of the documents from the search result to the GPT model as context and let it provide a better response**. This is what RAG (Retreival Augmented Generation) is about.

Now, before we do this, we need to understand a few things first:

1) Chainning and Prompt Engineering
2) Embeddings

We will use a library call **LangChain** that wraps a lot of boiler plate code.
Langchain is one library that does a lot of the prompt engineering for us under the hood, for more information see [here](https://python.langchain.com/en/latest/index.html)

In [7]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

**Important Note**: Starting now, we will utilize OpenAI models. Please ensure that you have deployed the following models within the Azure OpenAI portal:

- text-embedding-ada-002 (or newer)
- gpt-35-turbo (1106 or newer)
- gpt-4-turbo (1106 or newer)

Reference for Azure OpenAI models (regions, limits, dimensions, etc): [HERE](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)

## A gentle intro to chaining LLMs and prompt engineering

Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step.

Azure OpenAI is a type of LLM (provider) that you can use but there are others like Cohere, Huggingface, etc.

Chains can be simple (i.e. Generic) or specialized (i.e. Utility).

* Generic — A single LLM is the simplest chain. It takes an input prompt and the name of the LLM and then uses the LLM for text generation (i.e. output for the prompt).

Here’s an example:

In [8]:
COMPLETION_TOKENS = 2000
llm = AzureChatOpenAI(deployment_name=os.environ["GPT35_DEPLOYMENT_NAME"], 
                      temperature=0, 
                      max_tokens=COMPLETION_TOKENS)

In [9]:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}. Give your response in {language}")
])

The | symbol is similar to a unix pipe operator, which chains together the different components feeds the output from one component as input into the next component.

In [10]:
chain = prompt | llm | output_parser

In [11]:
%%time
display(Markdown(chain.invoke({"input": QUESTION, "language": "Spanish"})))

El medicamento que reduce la inflamación en los pulmones se llama corticoesteroide inhalado. Este tipo de medicamento se utiliza comúnmente para tratar enfermedades respiratorias crónicas como el asma y la enfermedad pulmonar obstructiva crónica (EPOC). Los corticoesteroides inhalados ayudan a reducir la inflamación en las vías respiratorias y pueden aliviar los síntomas como la dificultad para respirar y la opresión en el pecho. Es importante seguir las indicaciones de su médico y utilizar el medicamento según lo prescrito para obtener los mejores resultados.

CPU times: total: 31.2 ms
Wall time: 4.99 s


**Note**: this is the first time you use OpenAI in this Accelerator, so if you get a Resource not found error, is most likely because the name of your OpenAI model deployment is different than the environmental variable set above `os.environ["GPT35_DEPLOYMENT_NAME"]`

Great!!, now you know how to create a simple prompt and use a chain in order to answer a general question using ChatGPT knowledge!. 

It is important to note that we rarely use generic chains as standalone chains. More often they are used as building blocks for Utility chains (as we will see next). Also important to notice is that we are NOT using our documents or the result of the Azure Search yet, just the knowledge of ChatGPT on the data it was trained on.

**The second type of Chains are Utility:**

* Utility — These are specialized chains, comprised of many building blocks to help solve a specific task. For example, LangChain supports some end-to-end chains (such as `create_retrieval_chain` for QnA Doc retrieval, Summarization, etc).

We will build our own specific chain in this workshop for digging deeper and solve our use case of enhancing the results of Azure AI Search.


But before dealing with the utility chain needed, let's first review the concept of Embeddings and Vector Search and RAG. 

## Embeddings and Vector Search

From the Azure OpenAI documentation ([HERE](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=python)), An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. 

### Why Do We Need Vectors?

Vectors are essential for several reasons:

- **Semantic Richness**: They convert the semantic meaning of text into mathematical vectors, capturing nuances that simple keyword searches miss. This makes them incredibly powerful for understanding and processing language.
- **Human-like Searching**: Searching using vector distances mimics the human approach to finding information based on context and meaning, rather than relying solely on exact word matches.
- **Efficiency in Scale**: Vector representations allow for efficient handling and searching of large datasets. By reducing complex text to numerical vectors, algorithms can quickly sift through vast amounts of information.

### Understanding LLM Tokens' Context Limitation

Large Language Models (LLMs) like GPT come with a token limit for each input, which poses a challenge when dealing with lengthy documents or extensive data sets. This limitation restricts the model's ability to understand and generate responses based on the full context of the information provided. It becomes crucial, therefore, to devise strategies that can effectively manage and circumvent this limitation to leverage the full power of LLMs.

To address this challenge, the solution incorporates several key steps:

1. **Segmenting Documents**: Breaking down large documents into smaller, manageable segments.
2. **Vectorization of Chunks**: Converting these segments into vectors, making them compatible with vector-based search techniques.
3. **Hybrid Search**: Employing both vector and text search methods to pinpoint the most relevant segments in relation to the query.
4. **Optimal Context Provision**: Presenting the LLM with the most pertinent segments, ensuring a balance between detail and brevity to stay within token limits.


Our ultimate goal is to rely solely on vector indexes and hybrid searchs (vector + text). While it is possible to manually code parsers with OCR for various file types and develop a scheduler to synchronize data with the index, there is a more efficient alternative: **Azure AI Search has automated chunking strategies and vectorization**.

It's important to note that **document segmentation and vectorization have already been completed in AI Azure Search**, as seen in the `ordered_content` dictionary. This pre-processing step simplifies subsequent operations, ensuring rapid response times and adherence to the token limits of the chosen OpenAI model.


So really, our only job now is to make sure that the results from the Azure AI Search queries fit on the LLM context size, and then let it do its magic.

In [12]:
index_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
indexes = [index1_name, index2_name]

In order to not duplicate code, we have put many of the code used above into functions. These functions are in the `common/utils.py` and `common/prompts.py` files. This way we can use these functios in the app that we will build later.

`get_search_results()` do the multi-index search and returns the combined ordered list of documents/chunks.

In [14]:
k = 10  # play with this parameter and see the quality of the final answer
ordered_results = get_search_results(QUESTION, indexes, k=k, reranker_threshold=1)
print("Number of results:",len(ordered_results))

Number of results: 10


In [15]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

Now let's create a Prompt Template that will ground the response only in the chunks retrieve by our hybrid AI Search.

In [15]:
template = """Answer the question thoroughly, based **ONLY** on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [16]:
%%time 
# Creation of our custom chain
chain = prompt | llm | output_parser

try:
    display(Markdown(chain.invoke({"question": QUESTION, "context": ordered_results})))
except Exception as e:
    print(e)

Based on the given context, the medicine that reduces inflammation in the lungs is fluticasone propionate and salmeterol. In the study titled "Combined fluticasone propionate and salmeterol reduces RSV infection more effectively than either of them alone in allergen-sensitized mice," it is mentioned that the combination of fluticasone propionate and salmeterol was found to be more effective in decreasing airway hyperreactivity and inflammation in the lungs compared to either of them alone. This combination treatment showed significant decreases in the percentage of eosinophils and neutrophils in bronchoalveolar lavage fluid and in lung pathology. Therefore, fluticasone propionate and salmeterol are the medicines that reduce inflammation in the lungs.

CPU times: total: 0 ns
Wall time: 4.48 s


### From GPT-3.5 to GPT-4

Now let's see how the response changes if we change to GPT-4


In [17]:
llm_2 = AzureChatOpenAI(deployment_name=os.environ["GPT4_DEPLOYMENT_NAME"], temperature=0.5, max_tokens=COMPLETION_TOKENS)
chain = prompt | llm_2 | output_parser

In [18]:
%%time
try:
    display(Markdown(chain.invoke({"question": QUESTION, "context": ordered_results})))
except Exception as e:
    print(e)

Based on the provided context, several medicines and treatments have been mentioned that reduce inflammation in the lungs:

1. **Heme oxygenase-1 (HO-1) and Carbon Monoxide (CO)**: HO-1, an inducible stress protein, has anti-inflammatory effects that limit tissue damage in response to proinflammatory stimuli. Administration of CO at low concentrations can substitute for HO-1 with respect to anti-inflammatory effects.

2. **Fluticasone Propionate and Salmeterol (FPS)**: This combination is more effective in reducing airway hyperreactivity and inflammation in allergen-sensitized, RSV-infected mice compared to either fluticasone propionate or salmeterol alone.

3. **Human Embryonic Stem Cell-Derived Progenitor Cells**: Specifically, the ACE+ subset of these cells has anti-inflammatory properties that ameliorate sepsis-induced lung inflammation and reduce mortality.

4. **Complementary and Alternative Medicines**: Certain natural agents like Biochanin A (found in red clover) and Chinese herbs such as Angelica sinensis (Dang Gui) and Salvia miltiorrhiza (Danshen) have anti-inflammatory properties and have shown potential in reducing inflammation in the context of influenza.

5. **Natural Herbal Medicine (NHM)**: In a study, NHM showed potential benefits as a supplementary treatment for SARS or SARS-like infectious diseases, reducing the duration before showing improvement in chest radiographic scores.

These treatments and medicines have shown effectiveness in reducing lung inflammation in various contexts and models.

CPU times: total: 15.6 ms
Wall time: 5.15 s


#### As we can see, the model selection MATTERS!

We will dive deeper into this later, but for now, **look at the diference between GPT3.5 and GPT4, in quality and in response time**.

# Improving the Prompt and adding citations

We could see above that the answer given by GPT3.5 was very simple compared to GPT4, even when the prompt says "thorough responses to users". We also could see that there is no citations or references. **How do we know if the answer is grounded on the context or not?**

Let's see if these two issues can be improved by Prompt Engineering.<br>
On `common/prompts.py` we created a prompt called `DOCSEARCH_PROMPT` check it out!

**Let's also create a custom Retriever class** so we can plug it in easily within the chain building. 
Note: we can also use the Azure AI Search retriever class [HERE](https://python.langchain.com/docs/integrations/vectorstores/azuresearch), however we want to create a custom Retriever for the following reasons:

1) We want to do multi-index searches in one call
2) Easier to teach complex concepts of LangChain in this notebook
3) We want to use the REST API vs the Python Azure Search SDK

In [19]:
class CustomRetriever(BaseRetriever):
    
    topK : int
    reranker_threshold : int
    indexes: List
    sas_token: str = None
    
    def _get_relevant_documents(self, query: str) -> List[Document]:
        
        ordered_results = get_search_results(query, self.indexes, k=self.topK, 
                                             reranker_threshold=self.reranker_threshold, 
                                             sas_token=self.sas_token)
        top_docs = []
        for key,value in ordered_results.items():
            location = value["location"] if value["location"] is not None else ""
            top_docs.append(Document(page_content=value["chunk"], metadata={"source": location, "score":value["score"]}))

        return top_docs

In [20]:
# Create the retriever
retriever = CustomRetriever(indexes=indexes, topK=k, reranker_threshold=1, sas_token=os.environ['BLOB_SAS_TOKEN'])

In [21]:
# Test retreiver
results = retriever.invoke(QUESTION)
len(results)

10

In [22]:
# We can create now a dynamically configurable llm object that can change the model at runtime
dynamic_llm = AzureChatOpenAI(deployment_name=os.environ["GPT35_DEPLOYMENT_NAME"], 
                              temperature=0.5, max_tokens=COMPLETION_TOKENS).configurable_alternatives(
    # This gives this field an id
    # When configuring the end runnable, we can then use this id to configure this field
    ConfigurableField(id="model"),
    # This sets a default_key.
    # If we specify this key, the default LLM  (initialized above) will be used
    default_key="gpt35",
    # This adds a new option, with name `gpt4`
    gpt4=AzureChatOpenAI(deployment_name=os.environ["GPT4_DEPLOYMENT_NAME"], 
                         temperature=0.5, max_tokens=COMPLETION_TOKENS),
    # You can add more configuration options here
)

In [23]:
# Declaration of the chain with the dynamic llm and the new prompt
configurable_chain = (
    {
        "context": itemgetter("question") | retriever, # Passes the question to the retriever and the results are assign to context
        "question": itemgetter("question")
    }
    | DOCSEARCH_PROMPT  # Passes the input variables above to the prompt template
    | dynamic_llm   # Passes the finished prompt to the LLM
    | StrOutputParser()  # converts the output (Runnable object) to the desired output (string)
)

In [24]:
%%time

try:
    display(Markdown(configurable_chain.with_config(configurable={"model": "gpt35"}).invoke({"question": QUESTION})))
except Exception as e:
    print(e)

There are several medicines that have been shown to reduce inflammation in the lungs. One such medicine is fluticasone propionate (FP), which is a corticosteroid. In a study conducted on mice with influenza-induced lung inflammation, treatment with FP alone resulted in a decrease in eosinophils (a type of white blood cell associated with inflammation) in the lungs [[7]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892358/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D). Another medicine that has anti-inflammatory properties is gemfibrozil, which is a synthetic ligand of peroxisome proliferator-activated receptors (PPAR) alpha. Gemfibrozil has been shown to significantly reduce mortality associated with influenza infections in mice [[7]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892358/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

In addition to these medicines, natural herbal medicines (NHM) have also been studied for their potential anti-inflammatory effects on lung inflammation. Angelica sinensis (Dang Gui) and Salvia miltiorrhiza (Danshen), two Chinese herbs, have been shown to protect mice during lethal experimental sepsis by inhibiting the production of the inflammatory cytokine High Mobility Group Box 1 protein (HMGB1) [[6]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2822957/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

It is important to note that the effectiveness of these medicines may vary depending on the specific condition and individual response. It is always recommended to consult with a healthcare professional for appropriate diagnosis and treatment options for lung inflammation.

CPU times: total: 0 ns
Wall time: 16.6 s


As seen above, we were able to improve the quality and breath of the answer and add citations with only prompt engineering!

#### Let's try again, but with GPT-4

In [25]:
%%time
try:
    display(Markdown(configurable_chain.with_config(configurable={"model": "gpt4"}).invoke({"question": QUESTION})))
except Exception as e:
    print(e)

Several medicines and treatments have been shown to reduce inflammation in the lungs:

1. **Heme Oxygenase-1 (HO-1):** HO-1 is an inducible stress protein that provides cytoprotection against oxidative stress and has anti-inflammatory effects. It limits tissue damage in response to proinflammatory stimuli and prevents allograft rejection after transplantation. The anti-inflammatory and anti-apoptotic effects of HO-1 are mediated by its enzymatic reaction products, including carbon monoxide (CO) [[1]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC193681/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

2. **Fluticasone Propionate (FP) and Salmeterol (Sal):** These are used either alone or in combination to treat airway inflammation. The combination of FP and Sal (FPS) is more effective in reducing airway hyperreactivity and inflammation compared to either drug alone. FPS treatment decreases the percentage of eosinophils and neutrophils in bronchoalveolar lavage fluid and reduces RSV titers in the lungs [[2]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1488829/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

3. **Human Embryonic Stem Cells:** Differentiated under mesoderm-inducing conditions, these cells have shown therapeutic properties in reducing lung inflammation and edema in sepsis-induced lung injury in mice. They also reduce the production of proinflammatory cytokines and increase the production of anti-inflammatory cytokines like interleukin-10 [[3]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3069906/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

4. **Natural Herbal Medicine (NHM):** Certain herbal medicines have been used for thousands of years to control infectious diseases and have shown potential benefits in reducing lung inflammation. For instance, a study indicated that NHM could be beneficial as a supplementary treatment for SARS or SARS-like infectious diseases [[4]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2529389/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

These treatments highlight the diverse approaches available for reducing lung inflammation, from pharmaceutical agents to stem cell therapy and natural herbal medicines.

CPU times: total: 31.2 ms
Wall time: 14.7 s


#### As you can see the answer from GPT4 is richer, and includes all the relevant chunks. GPT3.5 tends to focus in the first and last chunks only

## Adding Streaming to improve user experience and performance

It is obvious by now that **GPT4 answers are better quality than GPT3.5**. None are incorrect, but GPT4 is better at understanding the context, following the prompt instructions and on giving a comprehensive answer.

One way to make GPT4 look faster is to stream the answer, so the user can see the response as it is typed. To do this, we just simply need to call the method `stream` instead of `invoke`. More on Streaming and Callbacks in later notebooks, but for now, this is one simple way to do it:

In [26]:
for chunk in configurable_chain.with_config(configurable={"model": "gpt4"}).stream({"question": QUESTION}):
    print(chunk, end="", flush=True)

Various medicines and treatments have been documented to reduce inflammation in the lungs:

1. **Heme Oxygenase-1 (HO-1)**: This inducible stress protein has anti-inflammatory effects and can limit tissue damage in response to proinflammatory stimuli. HO-1 catalyzes the conversion of heme to its metabolites, and its protective effects are likely due to these reaction products. Remarkably, low concentrations of carbon monoxide (CO) can substitute for HO-1 with respect to anti-inflammatory effects [[1]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC193681/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

2. **Fluticasone Propionate (FP) and Salmeterol (Sal)**: These medications, especially when used in combination (FPS), have been shown to reduce airway hyperreactivity and inflammation in allergen-sensitized, RSV-infected mice. FPS treatment significantly decreases eosinophils a

# Using Other models from the Azure AI Catalog (optional)

Besides OpenAI models, Azure AI Studio provides a wide selection of models from an extensive catalog featuring providers such as Mistral, Cohere, Meta, HuggingFace, Nvidia, Databricks, and Microsoft's own offerings.

Several models are available as serverless, pay-as-you-go endpoints. Examples include Mistral-Large and Cohere Command R+.

To use the following code, visit https://ai.azure.com and deploy [Mistral-Large](https://ai.azure.com/explore/models/Mistral-large/version/1/registry/azureml-mistral) and [Cohere-Command-R-plus](https://ai.azure.com/explore/models/Cohere-command-r-plus/version/3/registry/azureml-cohere) on a pay-as-you-go basis, then copy the endpoint URLs and keys into the cells below.

In [28]:
from langchain_mistralai import ChatMistralAI
from langchain_cohere.chat_models import ChatCohere

In [29]:
os.environ["MISTRAL_API_ENDPOINT"] = "https://<YOUR_MISTRAL_DEPLOYMENT_NAME>.<YOUR_REGION>.inference.ai.azure.com"
os.environ["MISTRAL_API_KEY"] = "ENTER HERE YOUR MISTRAL ENDPOINT API KEY"
os.environ["COHERE_API_ENDPOINT"] = "https://<YOUR_COHERE_DEPLOYMENT_NAME>.<YOUR_REGION>..inference.ai.azure.com"
os.environ["COHERE_API_KEY"] = "ENTER HERE YOUR COHERE ENDPOINT API KEY"

In [32]:
llm_mistral = ChatMistralAI(endpoint=os.environ["MISTRAL_API_ENDPOINT"],
                    api_key=os.environ["MISTRAL_API_KEY"],
                    temperature=0, max_tokens=COMPLETION_TOKENS,
                    streaming=True)

llm_cohere = ChatCohere(
    base_url=os.environ["COHERE_API_ENDPOINT"],
    cohere_api_key=os.environ["COHERE_API_KEY"],
    streaming=True,
    max_tokens=COMPLETION_TOKENS,
    temperature=0
)

Let's call the invoke method and see if they work

In [35]:
llm_mistral.invoke("who are you?").content

'I am a language model trained by the Mistral AI team. I am designed to understand and generate human-like text based on the input I receive. I can assist with a wide range of tasks, from answering questions and providing information to helping with brainstorming and creativity.'

In [None]:
llm_cohere.invoke("who are you?").content

Great, now let's add the same RAG chain that we used before with the Azure OpenAI models

In [38]:
mistral_chain = (
    {
        "context": itemgetter("question") | retriever, # Passes the question to the retriever and the results are assign to context
        "question": itemgetter("question")
    }
    | DOCSEARCH_PROMPT  # Passes the input variables above to the prompt template
    | llm_mistral   # Passes the finished prompt to the LLM
    | StrOutputParser()  # converts the output (Runnable object) to the desired output (string)
)

cohere_chain = (
    {
        "context": itemgetter("question") | retriever, # Passes the question to the retriever and the results are assign to context
        "question": itemgetter("question")
    }
    | DOCSEARCH_PROMPT  # Passes the input variables above to the prompt template
    | llm_cohere   # Passes the finished prompt to the LLM
    | StrOutputParser()  # converts the output (Runnable object) to the desired output (string)
)

Since they also both support streaming, we can use the `stream` method from langchain as well

In [45]:
for chunk in mistral_chain.stream({"question": QUESTION}):
    print(chunk, end="", flush=True)

Several documents suggest different medications that can reduce inflammation in the lungs.

1. Chloroquine has been found to inhibit the lipopolysaccharide-induced release of pro-inflammatory cytokines such as TNF-α, IL-6, CCL2, and CCL3 in human lung parenchymal explants [[1]](https://doi.org/10.1093/cid/ciaa546;%20https://www.ncbi.nlm.nih.gov/pubmed/32382733/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2025-05-07T22:20:04Z&st=2024-05-07T14:20:04Z&spr=https&sig=iy2PycZwWefYLWwqHjIzwkA6TjEBJTKr65Cd9yuv4HA%3D).

2. Interleukin-22 (IL-22) is a promising cytokine that protects the lung during infection. A pro-IL-22 environment reduces pulmonary inflammation during H1N1 infection and protects the lung by promoting tight junction formation. Administering recombinant IL-22 in vivo reduces inflammation and fluid leak into the lung [[2]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6917921/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2025-05-07T22:20:04Z&st=2024-05-07T14:20:04Z&spr=https&sig=iy2PycZwWefYLWwqH

In [46]:
for chunk in cohere_chain.stream({"question": QUESTION}):
    print(chunk, end="", flush=True)

Several medications have been mentioned that can help reduce inflammation in the lungs:

- Chloroquine: Inhibits the release of inflammatory cytokines such as TNF-α, IL-6, CCL2, and CCL3. [[1]](https://doi.org/10.1093/cid/ciaa546; https://www.ncbi.nlm.nih.gov/pubmed/32382733/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2025-05-07T22:20:04Z&st=2024-05-07T14:20:04Z&spr=https&sig=iy2PycZwWefYLWwqHjIzwkA6TjEBJTKr65Cd9yuv4HA%3D)
- Interleukin-22 (IL-22): A cytokine that protects the lungs during infection and can reduce pulmonary inflammation. [[2]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6917921/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2025-05-07T22:20:04Z&st=2024-05-07T14:20:04Z&spr=https&sig=iy2PycZwWefYLWwqHjIzwkA6TjEBJTKr65Cd9yuv4HA%3D)
- Low doses of radiation therapy (RT): Increases IL-10 secretion, decreases IFNγ and IL-6 production, thereby reducing lung inflammation. [[3]](https://doi.org/10.1101/2020.05.11.077651?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2025-05-07T22:20:04Z&st=2024-05-07T14:

The two models, Mistral Large and Command R+, as demonstrated, are capable of capturing context more effectively than GPT-3.5, resulting in superior responses. In terms of speed and breath of response, they are comparable to GPT-4.

As we will see in the next notebooks, we will need a bit more than just understanding all the context in order to build a multi-agentic bot architecture (parallel function calling for example), but for now it is good to understand the options available in terms of: 
- Context understanding
- Speed of response
- Quality of response
- Price

Check out these Model Benchmark charts provided by Azure AI Studio: https://ai.azure.com/explore/benchmarks

# Summary
##### By using OpenAI, the answers to user questions are way better than taking just the results from Azure AI Search. So the summary is:
- Utilizing Azure AI Search, we conduct a multi-index hybrid search that identifies the top chunks of documents from each index.
- Subsequently, Azure OpenAI utilizes these extracted chunks as context, comprehends the content, and employs it to deliver optimal answers.
- Best of two worlds!

##### Important observations on this notebook:

1) Answers with GPT-3.5 are less quality but fast
2) Answers with GPT-3.5 sometimes failed on provinding citations in the right format
3) Answers with GPT-4 are great quality but slower
4) Answers with GPT-4 always provide good and diverse citations in the right format
5) Models like Mistral Large or Cohere Command R+ provide, so far, a similar experience to GPT-4
5) Streaming the answers improves the user experience big time!

# NEXT
In the next notebook, we are going to see how we can treat complex and large documents separately, also using Vector Search