## Extending the capabilities of our model

An LLM is a very capable tool, but only to the extent of the knowledge or information it has been trained on. After all, you only know what you know, right? But what if you need to ask a question that is not in the training data? Or what if you need to ask a question that is not in the training data, but is related to it?

There are different ways to solve this problem, depending on the resources you have and the time or money you can spend on it. Here are a few options:

- Fully retrain the model to include the information you need. For an LLM, it's only possible for a handful of companies in the world that can afford literally thousands of GPUs running for weeks.
- Fine-tune the model with this new information. This requires way less resources, and can usually be done in a few hours or minutes (depending on the size of the model). However as it does not fully retrain the model, the new information may not be completely integrated in the answers. Fine-tuning excels at giving a better understanding of a specific context or vocabulary, a little bit less on injecting new knowledge. Plus you have to retrain and redeploy the model anyway any time you want to add more information.
- Put this new information in a database and have the parts relevant to the query retrieved and added to this query as a context before sending it to the LLM. This technique is called **Retrieval Augmented Generation, or RAG**. It is interesting as you don't have to retrain or fine-tune the model to benefit of this new knowledge, that you can easily update at any time.

We have already prepared a Vector Database using [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search), where we have stored the content of the [California Driver's Handbook](https://www.dmv.ca.gov/portal/handbook/california-driver-handbook/).

In this Notebook, we are going to use RAG to **make some queries about a Claim** and see how this new knowledge can help without having to modify our LLM.

### Library imports

First we will import the libraries we need, they are already installed on our workbench image so no need to run `pip install`.

In [None]:
import json
import os
import warnings
from os import listdir
from os.path import isfile, join
from langchain.chains import LLMChain, RetrievalQA
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate
from langchain_community.vectorstores.azure_cosmos_db import (
    AzureCosmosDBVectorSearch,
    CosmosDBSimilarityType,
    CosmosDBVectorSearchType
)
from langchain_community.retrievers import AzureAISearchRetriever

from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import ClientAuthenticationError
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv

warnings.filterwarnings("ignore")

if load_dotenv():
    print("Found Azure OpenAI endpoint: " + os.getenv("AZURE_OPENAI_ENDPOINT"))
    print("Found Azure AI Search instance: " + os.getenv("AZURE_AI_SEARCH_SERVICE_NAME"))
else:
    print(".env file not found")

### Langchain elements

Again, we are going to use Langchain to define our task pipeline.

First, the **LLM** where we will send our queries.

In [None]:
if not os.getenv("AZURE_OPENAI_API_KEY"):
    credential = DefaultAzureCredential()
    access_token = credential.get_token("https://cognitiveservices.azure.com/.default")
    os.environ["AZURE_OPENAI_AD_TOKEN"] = access_token.token

llm = AzureChatOpenAI(
    azure_deployment = os.getenv("AZURE_DEPLOYMENT"),
    max_tokens=512,
    temperature=0.01,
    top_p=0.92,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

Then the connection to the **vector database** where we have prepared and stored the California Driver Handbook.

In [None]:
# First we define the embeddings that we used to process the Handbook
embeddings = AzureOpenAIEmbeddings(
    azure_deployment = os.getenv("AZURE_EMBEDDING")
)

azure_ai_search_service_name = os.getenv("AZURE_AI_SEARCH_SERVICE_NAME")
azure_ai_search_index_name = os.getenv("AZURE_AI_SEARCH_INDEX_NAME")

if not os.getenv("AZURE_AI_SEARCH_API_KEY"):
    search_access_token = credential.get_token("https://search.azure.com/.default")
    os.environ["AZURE_AI_SEARCH_AD_TOKEN"] = search_access_token.token

retriever = AzureAISearchRetriever(
    content_key="chunk", top_k=5, index_name=azure_ai_search_index_name
)

We will now define the **template** to use to make our query. Note that this template now contains a **References** section. That's were the documents returned from the vector database will be injected.

In [8]:
template="""<s>[INST]
You are a helpful, respectful and honest assistant named "Parasol Assistant".
You will be given a claim summary, references to provide you with information, and a question.
You must answer the question based as much as possible on this claim with the help of the references.
Always answer as helpfully as possible, while being safe.
Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information.
<</SYS>>

Claim Summary:
{claim}

References:
{{context}}

Question: {{question}} [/INST]
"""

We are now ready to query the model!

In the `claims` folder we have JSON files with examples of claims that could be received. We are going to read the first claim and ask a question related to it.

In [9]:
# Read the claim and put its content in the "claim" variable

filename = 'claims/claim1.json'

# Opening JSON file
with open(filename, 'r') as file:
    data = json.load(file)
claim = data["content"]

### First test, no additional knowledge

Let's start with a first query about the claim, but without help from our vector database.

> Note: You may see a warning message about `LLMChain` being deprecated, you can ignore that.

In [None]:
# Create and send our query.

query = "Was Daniel allowed to pass at the red light?"

# Quick hack to reuse the same template with a different type of query.
prompt_template = template.format(claim=claim).format(context="", question=query)
prompt = PromptTemplate.from_template(prompt_template)
conversation = LLMChain(
            llm=llm,
            prompt=prompt,
            verbose=False
        )
resp = conversation.predict(input="")

We can see that the answer is valid. Here the model is using its general understanding of traffic regulation.

### Second test, with added knowledge

We will use the same prompt and query, but this time the model will have access to some references from the California's Driver Handbook.

In [None]:
# Create and send our query.

query = "Was Daniel allowed to pass at the red light?"

prompt_template = template.format(claim=claim)
prompt = PromptTemplate.from_template(prompt_template)
rag_chain = RetrievalQA.from_chain_type(
            llm,
            retriever=retriever,
            chain_type_kwargs={"prompt": prompt},
            return_source_documents=True,
        )
resp = rag_chain.invoke({"query": query})

That is pretty neat! Now the model refers directly to a source stating that **a red traffic signal light means "STOP."**.

But where did we get this information from? We can look into the sources associated with the answers from the vector database.

In [None]:
import re

def extract_pages_number(input_string):
    # Use regular expression to find the "pages_" followed by digits
    match = re.search(r'pages_(\d+)', input_string)
    if match:
        return match.group(1)
    return None

def format_sources(input_list):
    sources = []

    # Define fixed widths for the columns
    page_number_width = 4
    page_content_width = 90

    # Add header row
    header = f"| {'Page':<{page_number_width}} | {'Page content snippet':<{page_content_width}} |"
    sources.append(header)
    sources.append('-' * len(header))  # Add a separator line

    for item in input_list:
        pages_number = extract_pages_number(item.metadata["chunk_id"])
        page_content_preview = item.page_content.replace('\r', '').replace('\n', '')[:80] + "..."
        # Format the string with fixed column widths
        formatted_string = f"| {pages_number:<{page_number_width}} | {page_content_preview:<{page_content_width}} |"
        sources.append(formatted_string)
    return sources

results = format_sources(resp['source_documents'])

for line in results:
    print(line)


That's it! We now know how to complement our LLM with some external knowledge!