<a href="https://colab.research.google.com/github/napsugark/LLM_Course/blob/main/02_LLM_Learning_Path_3_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **Retrieval Augmented Generation**

Link to old version: https://colab.research.google.com/drive/1JWVqwwrlUQrABqz4b5r4IY85kz1itvlr#scrollTo=_OnPDWm577oq

# Concepts to know


At the end of this module you should have an understanding of the following concepts:

- Methods of LLM adaptation (Fine Tuning, RAG, In-context learning)
- Components of a RAG system
- Context
- Vector Database
- Retriever Systems (BM25, vectors, SQL, Graph, Ensemble)
- Metadata filtering
- Chunking
- Query Rewriting (e.g. HyDE)



# Materials


### Mandatory:
LLM Adaptation Overview:
- https://ai.meta.com/blog/adapting-large-language-models-llms/ - DONE
- https://ai.meta.com/blog/when-to-fine-tune-llms-vs-other-techniques/ - DONE

Retrieval Augmented Generation:
- https://www.datacamp.com/blog/what-is-retrieval-augmented-generation-rag - DONE
- Course - RAG from Scratch (PART 1-4): https://www.youtube.com/watch?v=wd7TZ4w1mSw&list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x - DONE

Retrieval:
- https://python.langchain.com/docs/concepts/retrieval/ - DONE



### Additional Material:
- Langchain RAG Tutorial : https://python.langchain.com/docs/tutorials/rag/
- Microsoft GenAI for Beginners on RAG: https://learn.microsoft.com/en-us/shows/generative-ai-for-beginners/retrieval-augmented-generation-rag-and-vector-databases-generative-ai-for-beginners?WT.mc_id=academic-105485-koreyst
- Research Survey Summary: https://www.promptingguide.ai/research/rag
- Visualization of Chunking: https://chunkviz.up.railway.app/
- Retrieval Strategies:https://medium.com/@vinayak.sengupta/exploring-the-core-of-augmented-intelligence-advancing-the-power-of-retrievers-in-rag-frameworks-3ef9fe273764

Vector Stores:
- https://python.langchain.com/docs/concepts/vectorstores/
- https://www.datacamp.com/blog/the-top-5-vector-databases


Chunking:
- https://www.pinecone.io/learn/chunking-strategies/

Document Parsing:
- https://www.youtube.com/watch?v=9lBTS5dM27c&ab_channel=DaveEbbelaar

LoRa:
- https://www.datacamp.com/tutorial/mastering-low-rank-adaptation-lora-enhancing-large-language-models-for-efficient-adaptation

RAFT:
- https://techcommunity.microsoft.com/blog/aiplatformblog/raft-a-new-way-to-teach-llms-to-be-better-at-rag/4084674

Adaptation:
- https://learn.microsoft.com/en-us/azure/developer/ai/augment-llm-rag-fine-tuning





# Coding

Install the following to run the coding examples below

In [1]:
pip install langchain_community rank_bm25 openai langchain_openai langchain_chroma wikipedia

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Example: Build a simple vector DB RAG from scratch


To begin with, we are going to build a vector database on a very simple dataset containing random facts on cats and dogs.

In [34]:

dataset = [
    "Cats can rotate their ears 180 degrees to better locate sounds.",
    "Dogs have a sense of smell that's 10,000 to 100,000 times more sensitive than humans.",
    "Cats spend approximately 70% of their lives sleeping.",
    "The Basenji dog breed is unique because it doesn't bark; it yodels instead.",
    "Domestic cats are descendants of African wildcats, which were first domesticated about 9,000 years ago.",
    "Dogs are known to dream, and puppies and older dogs tend to dream more frequently.",
    "A group of kittens is called a 'kindle,' while a group of adult cats is called a 'clowder.'",
    "Dalmatians are born completely white and develop their spots as they grow.",
    "Cats have a specialized collarbone that allows them to always land on their feet when falling.",
    "The Labrador Retriever has been the most popular dog breed in the United States for decades.",
    "Cats can rotate their ears 180 degrees which makes them great hunters."
]

In [35]:
# # Setup the Azure OpenAI client
# from openai import AzureOpenAI
# from google.colab import userdata

# client = AzureOpenAI(
#     azure_endpoint = userdata.get('AZURE_OPENAI_ENDPOINT'),
#     api_key=userdata.get('AZURE_OPENAI_API_KEY'),
#     api_version="2024-06-01" )

import os
from openai import AzureOpenAI
from dotenv import load_dotenv

load_dotenv()

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01",
)

For creating a vector database, we need an embedding that transforms each piece of text into an embedding, e.g. a numerical representation.

In [36]:
EMBEDDING_MODEL = "text-embedding-ada-002"

The simple_vector_database will be a list of tuples containing the original text and the embedding of the text.

In [37]:
# Create a vector DB
simple_vector_db = []

def get_embedding(chunk):
    response = client.embeddings.create(
        input=[chunk],
        model=EMBEDDING_MODEL
    )
    return response.data[0].embedding

def add_chunk_to_database(chunk):
    embedding = get_embedding(chunk)
    simple_vector_db.append((chunk, embedding))

In [38]:
# Populate the database
for i, chunk in enumerate(dataset):
    add_chunk_to_database(chunk)
    print(f'Added chunk {i+1}/{len(dataset)} to the database')

Added chunk 1/11 to the database
Added chunk 2/11 to the database
Added chunk 3/11 to the database
Added chunk 4/11 to the database
Added chunk 5/11 to the database
Added chunk 6/11 to the database
Added chunk 7/11 to the database
Added chunk 8/11 to the database
Added chunk 9/11 to the database
Added chunk 10/11 to the database
Added chunk 11/11 to the database


The embedding created has the dimension of 1536, i.e. 1536 numbers representing the original text.

In [39]:
len(simple_vector_db[0][1])

1536

Now that we have a vector database, we need to have a way to retrieve from it.

We will use the embeddings to retrieve texts that are numerically similar to an input query. For this we need a function that calculates the cosine similarity between two texts. The retrieve function then calculates the embedding of the input query and the embeddings of the texts in the vector db, returning the top_n most similar texts from the vector database.

In [40]:
# Define similarity Retriever
def cosine_similarity(a, b):
    dot_product = sum([x * y for x, y in zip(a, b)])
    norm_a = sum([x ** 2 for x in a]) ** 0.5
    norm_b = sum([x ** 2 for x in b]) ** 0.5
    return dot_product / (norm_a * norm_b)

def retrieve(query, top_n=3):
    query_embedding = get_embedding(query)
    similarities = []
    for chunk, embedding in simple_vector_db:
        similarity = cosine_similarity(query_embedding, embedding)
        similarities.append((chunk, similarity))
    similarities.sort(key=lambda x: x[1], reverse=True)
    return similarities[:top_n]

Let's try our vector db retriever:

In [41]:
retrieve('How far can cats rotate their ears?')
#

[('Cats can rotate their ears 180 degrees to better locate sounds.',
  0.9183710617759656),
 ('Cats can rotate their ears 180 degrees which makes them great hunters.',
  0.8980675589993741),
 ('Cats have a specialized collarbone that allows them to always land on their feet when falling.',
  0.8194481567208423)]

The last step to a RAG chatbot is to pass the retrieved information along with the original query to a LLM that generates an answer based on the information it receives.

In [42]:
# Chatbot

GENERATION_MODEL = "gpt-4o-mini"

input_query = 'How far can cats rotate their ears? '

retrieved_knowledge = retrieve(input_query)

print('Retrieved knowledge:')
for chunk, similarity in retrieved_knowledge:
    print(f' - (similarity: {similarity:.2f}) {chunk}')

instruction_prompt = f"""You are a helpful chatbot.
Use only the following pieces of context to answer the question. Don't make up any new information:
{chr(10).join([f' - {chunk}' for chunk, similarity in retrieved_knowledge])}
"""

response = client.chat.completions.create(
    model=GENERATION_MODEL,
    messages=[
        {"role": "system", "content": instruction_prompt},
        {"role": "user", "content": input_query},
    ]
)

# Print the chatbot response
print('Chatbot response:')
print(response.choices[0].message.content)



Retrieved knowledge:
 - (similarity: 0.92) Cats can rotate their ears 180 degrees to better locate sounds.
 - (similarity: 0.90) Cats can rotate their ears 180 degrees which makes them great hunters.
 - (similarity: 0.83) Cats have a specialized collarbone that allows them to always land on their feet when falling.
Chatbot response:
Cats can rotate their ears 180 degrees to better locate sounds.


## Assignment:
Try out what happens, if you ask the chatbot something that is not in the cats & dogs datase, e.g. how to make a full English breakfast.

In [43]:
# Chatbot

GENERATION_MODEL = "gpt-4o-mini"

input_query = "How to make a full English breakfast?"

retrieved_knowledge = retrieve(input_query)

print("Retrieved knowledge:")
for chunk, similarity in retrieved_knowledge:
    print(f" - (similarity: {similarity:.2f}) {chunk}")

instruction_prompt = f"""You are a helpful chatbot.
Use only the following pieces of context to answer the question. Don't make up any new information:
{chr(10).join([f" - {chunk}" for chunk, similarity in retrieved_knowledge])}
"""

response = client.chat.completions.create(
    model=GENERATION_MODEL,
    messages=[
        {"role": "system", "content": instruction_prompt},
        {"role": "user", "content": input_query},
    ],
)

# Print the chatbot response
print("Chatbot response:")


Retrieved knowledge:
 - (similarity: 0.72) A group of kittens is called a 'kindle,' while a group of adult cats is called a 'clowder.'
 - (similarity: 0.71) Dalmatians are born completely white and develop their spots as they grow.
 - (similarity: 0.70) Cats spend approximately 70% of their lives sleeping.
Chatbot response:


## Example: Populate a ChromaDB vector database

Instead of just having a list of tuples, lets now use the cats&dogs dataset to pupulate a vector db with Chroma DB.

In [44]:
from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma import Chroma

In [46]:
# Initialize the embedding functions and ChromaDB
embeddings = AzureOpenAIEmbeddings(
    model=EMBEDDING_MODEL,
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-06-01",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
)

# client = AzureOpenAI(
#     azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
#     api_key=
#     api_version="2024-02-01",
# )
vector_db = Chroma(
    collection_name="cats-and-dogs",
    embedding_function=embeddings,
)

In [47]:
# Add texts to the vectorDB
vector_db.add_texts(dataset)

['6cc6b3ab-b4ce-490d-8d2c-2d13f2dd63ea',
 '03dd3db8-a87d-4ac6-9370-eee9f26e56a4',
 '98c1410e-cd16-4d6f-836e-14f939eeb978',
 '96931f4f-c14d-40cb-b2ee-2099bd7a3ee3',
 'd9534f66-ca71-43b7-aa17-18ef76d9029b',
 'aae2dc53-cb4d-4119-96aa-e3d237a67b92',
 '6473bd8d-d570-42f5-ad86-8562f078cf36',
 'e9b8e6e2-89a6-4cce-b71a-b253cddd3014',
 '2db85562-aa8f-4ec2-81b1-416d9d2d88f4',
 '4268dbb8-6373-4cff-8e07-d852fe1f79ba',
 '6011bc3f-c29a-4c76-8daa-5db751ef5787']

In [55]:
results = vector_db.similarity_search(
    "How far can cats rotate their ears",
    k=2,
)
for res in results:
    print(f"* {res.page_content}")

* Cats can rotate their ears 180 degrees to better locate sounds.
* Cats can rotate their ears 180 degrees which makes them great hunters.


## Assignment:

Try out what happens if you perform a similarity search on the input queries "How far can dogs rotate their ears?". Why do you think that is? Could this be a problem?

In [56]:
results2 = vector_db.similarity_search(
    "How far can dogs rotate their ears",
    k=2,
)
for res in results2:
    print(f"* {res.page_content}")

* Cats can rotate their ears 180 degrees to better locate sounds.
* Cats can rotate their ears 180 degrees which makes them great hunters.


## Example: Metadata filtering

One way to solve the problem observed above is to introduce metadata into the vector DB. In this case, it would be a label indicating be whether the text relates to cats or dogs.

In [57]:
# Create a new dataset
vector_db_metadata = Chroma(
    collection_name="cats-and-dogs-metadata",
    embedding_function=embeddings,
)

In [58]:
metadata = [{"Animal": "Cat"}, {"Animal": "Dog"},
            {"Animal": "Cat"}, {"Animal": "Dog"},
            {"Animal": "Cat"}, {"Animal": "Dog"},
            {"Animal": "Cat"}, {"Animal": "Dog"},
            {"Animal": "Cat"}, {"Animal": "Dog"},
            {"Animal": "Cat"},]

You can pass metadata objects as dictionaries when you add data to the vector DB.

In [60]:
vector_db_metadata.from_texts(
    dataset,
    collection_name="cats-and-dogs-metadata",
    metadatas = metadata,
    embedding = embeddings
    )

<langchain_chroma.vectorstores.Chroma at 0x20b54b8b200>

In a vector DB with metadata you can perform similarity search with previous filtering, i.e. for a question concerning docs you only look at entries that have that attribute.

In [61]:
results = vector_db_metadata.similarity_search(
    "How far can dogs rotate their ears?",
    k=2,
    filter={"Animal": "Dog"},
)
for res in results:
    print(f"* {res.page_content}")

* Dogs have a sense of smell that's 10,000 to 100,000 times more sensitive than humans.
* Dogs have a sense of smell that's 10,000 to 100,000 times more sensitive than humans.


## Example: MMR retrieval

Max mariginal relevance is a different retrieval method. Execute the code cell below and obeserve how the result changes in comparison to the standard similiarity search. Why do you think that is?

In [62]:
vector_db.max_marginal_relevance_search(
    query = "How far can cats rotate their ears",
    k = 2,
    fetch_k = 4)

[Document(id='6cc6b3ab-b4ce-490d-8d2c-2d13f2dd63ea', metadata={}, page_content='Cats can rotate their ears 180 degrees to better locate sounds.'),
 Document(id='2db85562-aa8f-4ec2-81b1-416d9d2d88f4', metadata={}, page_content='Cats have a specialized collarbone that allows them to always land on their feet when falling.')]

## Example: Similarity Threshold

You can also set a similarity threshold, meaning that results are only retrieved from the db if their similiarity is above a defined threshold.

In [63]:
retriever = vector_db.as_retriever(
    search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.4}
)

In [64]:
retriever.invoke("How do you call a group of kittens?")

[Document(id='6473bd8d-d570-42f5-ad86-8562f078cf36', metadata={}, page_content="A group of kittens is called a 'kindle,' while a group of adult cats is called a 'clowder.'"),
 Document(id='2db85562-aa8f-4ec2-81b1-416d9d2d88f4', metadata={}, page_content='Cats have a specialized collarbone that allows them to always land on their feet when falling.'),
 Document(id='d9534f66-ca71-43b7-aa17-18ef76d9029b', metadata={}, page_content='Domestic cats are descendants of African wildcats, which were first domesticated about 9,000 years ago.'),
 Document(id='98c1410e-cd16-4d6f-836e-14f939eeb978', metadata={}, page_content='Cats spend approximately 70% of their lives sleeping.')]

In [65]:
retriever = vector_db.as_retriever(
    search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.8}
)

In [66]:
retriever.invoke("How do you call a group of kittens?")

[Document(id='6473bd8d-d570-42f5-ad86-8562f078cf36', metadata={}, page_content="A group of kittens is called a 'kindle,' while a group of adult cats is called a 'clowder.'")]

In [67]:
retriever.invoke("How do I make a group of pancakes?")

No relevant docs were retrieved using the relevance score threshold 0.8


[]

## Example: Ensemble Retriever

You do not have to use a single retriever, but can combine different retrieval methods with an EnsembleRetriever.

In [68]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever

In [69]:
semantic_retriever= vector_db.as_retriever(search_kwargs={"k": 2})


keyword_retriever = BM25Retriever.from_texts(dataset)
keyword_retriever.k =  2

ensemble_retriever = EnsembleRetriever(retrievers=[semantic_retriever,
                                                   keyword_retriever],
                                       weights=[0.5, 0.5])

In [70]:
semantic_retriever.invoke("How do you call a group of kittens?")

[Document(id='6473bd8d-d570-42f5-ad86-8562f078cf36', metadata={}, page_content="A group of kittens is called a 'kindle,' while a group of adult cats is called a 'clowder.'"),
 Document(id='2db85562-aa8f-4ec2-81b1-416d9d2d88f4', metadata={}, page_content='Cats have a specialized collarbone that allows them to always land on their feet when falling.')]

In [71]:
keyword_retriever.invoke("How do you call a group of kittens?")

[Document(metadata={}, page_content="A group of kittens is called a 'kindle,' while a group of adult cats is called a 'clowder.'"),
 Document(metadata={}, page_content="Dogs have a sense of smell that's 10,000 to 100,000 times more sensitive than humans.")]

In [72]:
ensemble_retriever.invoke("How do you call a group of kittens?")

[Document(id='6473bd8d-d570-42f5-ad86-8562f078cf36', metadata={}, page_content="A group of kittens is called a 'kindle,' while a group of adult cats is called a 'clowder.'"),
 Document(id='2db85562-aa8f-4ec2-81b1-416d9d2d88f4', metadata={}, page_content='Cats have a specialized collarbone that allows them to always land on their feet when falling.'),
 Document(metadata={}, page_content="Dogs have a sense of smell that's 10,000 to 100,000 times more sensitive than humans.")]

## Example: Build a RAG chatbot with langchain

Now let's rebuild the simple, from scratch cats and dogs retriever from above using langchain.

In [73]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

In [74]:
# Initialize the model you want to use
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
    deployment_name="gpt-4o-mini",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2023-06-01-preview",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
)
  

In [75]:
# Define a retriever
retriever = vector_db.as_retriever(search_kwargs={"k": 4})

In [76]:
# Define a prompt for the generation
prompt = ChatPromptTemplate.from_template("""
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know..

Question: {question}

Context: {context}

Answer:
""")


Here we will define the chatbot as a chain, that first uses the retriever and a formatting function that prepares the output in a for the LLM, than populates the prompt with the results, calls the LLM and returns the response.

In [77]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


Finally, invoking the chain will give you the response of the chatbot.


In [78]:
rag_chain.invoke("How far can cats rotate their ears?")

'Cats can rotate their ears 180 degrees to better locate sounds.'

## Example: Add website text as corpus to the RAG chabot

So far, we used just a few sentences on cats and dogs as context. Vector databases will be typically containing many more and much larger documents. So, let's build a vector db that contains wikipedia articles on some movies. Langchain offeres a retriever class to get articles from Wikipedia.

In [79]:
from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever(
    top_k_results = 1,
    lang = 'en',
)

In [80]:
movie_docs =[]
movies = ["Inception", "Django Unchained", "Shutter Island", "The Dark Knight"]

In [81]:
for movie in movies:
  movie_docs += retriever.invoke(movie)

The retriever returns a document object, which is used by langchain to process and populate databases. It contains metadata (in this case title, source link and a summary) and the actual text retrieved as page content.

In [83]:
movie_docs[0]

Document(metadata={'title': 'Inception', 'summary': 'Inception is a 2010  science fiction  action  heist film written and directed by Christopher Nolan, who also produced it with Emma Thomas, his wife. The film stars Leonardo DiCaprio as a professional thief who steals information by infiltrating the subconscious of his targets. He is offered a chance to have his criminal history erased as payment for the implantation of another person\'s idea into a target\'s subconscious. The ensemble cast includes Ken Watanabe, Joseph Gordon-Levitt, Marion Cotillard, Elliot Page, Tom Hardy, Cillian Murphy, Tom Berenger, Dileep Rao, and Michael Caine.\nAfter the 2002 completion of Insomnia, Nolan presented to Warner Bros. a written 80-page treatment for a horror film envisioning "dream stealers," based on lucid dreaming. Deciding he needed more experience before tackling a production of this magnitude and complexity, Nolan shelved the project and instead worked on 2005\'s Batman Begins, 2006\'s The P

Longer documents can be split up into smaller parts with the RecursiveCharacterTextSplitter.

In [84]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)

movie_docs_split = text_splitter.split_documents(movie_docs)


In [85]:
for movie in movie_docs:
  print(movie.metadata['title'])

Inception
Django Unchained
Shutter Island (film)
The Dark Knight


In [86]:
len(movie_docs)

4

In [87]:
len(movie_docs_split)

23

Now lets create a vector db from the split documents and define a movie RAG chain.

In [88]:
movie_vector_db = Chroma.from_documents(documents=movie_docs_split, embedding=embeddings, persist_directory="persist")

In [89]:
movie_retriever = movie_vector_db.as_retriever(search_kwargs={"k": 4})

In [90]:
# Define a prompt for the generation
movie_prompt = ChatPromptTemplate.from_template("""
You are an assistant for question on famous movies. Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know. Do not answer any question that are not related to movies.

Question: {question}

Context: {context}

Answer:
""")


In [91]:
movie_retriever.invoke("Who plays in Inception?")

[Document(id='abcf179b-9e0f-4540-9e85-fa3cbac8df20', metadata={'source': 'https://en.wikipedia.org/wiki/Inception', 'summary': 'Inception is a 2010  science fiction  action  heist film written and directed by Christopher Nolan, who also produced it with Emma Thomas, his wife. The film stars Leonardo DiCaprio as a professional thief who steals information by infiltrating the subconscious of his targets. He is offered a chance to have his criminal history erased as payment for the implantation of another person\'s idea into a target\'s subconscious. The ensemble cast includes Ken Watanabe, Joseph Gordon-Levitt, Marion Cotillard, Elliot Page, Tom Hardy, Cillian Murphy, Tom Berenger, Dileep Rao, and Michael Caine.\nAfter the 2002 completion of Insomnia, Nolan presented to Warner Bros. a written 80-page treatment for a horror film envisioning "dream stealers," based on lucid dreaming. Deciding he needed more experience before tackling a production of this magnitude and complexity, Nolan she

In [92]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


movie_rag_chain = (
    {"context": movie_retriever | format_docs, "question": RunnablePassthrough()}
    | movie_prompt
    | llm
    | StrOutputParser()
)


In [93]:
movie_rag_chain.invoke('Who plays in Inception?')

'The film Inception stars Leonardo DiCaprio, Ken Watanabe, Joseph Gordon-Levitt, Marion Cotillard, Elliot Page, Tom Hardy, Cillian Murphy, Tom Berenger, Dileep Rao, and Michael Caine.'

## Example: RAG chatbot with memory in chainlit

Now, let's take the simple chatbot implementation in chainlit and turn it into a RAG chatbot with a vector DB on movies.


- Make a venv with python -m .venv venv
- Activate .venv with .venv\Scripts\activate


- Make a .env file

In [None]:
AZURE_OPENAI_ENDPOINT=<insert endpoint>
AZURE_OPENAI_API_KEY=<insert key>
QDRANT_ENDPOINT=<insert Qdrant Endpoint>
QDRANT_API_KEY=<insert API Key>

- Make a requirements.txt file

In [None]:
langchain_openai
qdrant-client
langchain_qdrant
langchain_core
chainlit
openai
wikipedia
langchain_community
python-dotenv



*  Create the following create_db.py file and execute with python create_db.py to create a qdrant collection


In [None]:
from langchain_community.retrievers import WikipediaRetriever
from langchain_openai import AzureOpenAIEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance
from langchain_qdrant import QdrantVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv())

embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    api_key=os.getenv('AZURE_OPENAI_API_KEY'),
    api_version="2024-06-01",
    azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT')
)

retriever = WikipediaRetriever(
    top_k_results = 1,
    lang = 'en',
)
qdrant_client = QdrantClient(
    url=os.getenv('QDRANT_ENDPOINT'),
    api_key=os.getenv('QDRANT_API_KEY')
)
if not qdrant_client.collection_exists(collection_name="local_movie_db"):
    qdrant_client.create_collection(
        collection_name="local_movie_db",
        vectors_config=VectorParams(
            size=1536,
            distance=Distance.COSINE
        )
    )
qdrant_db = QdrantVectorStore(
    client=qdrant_client,
    collection_name="local_movie_db",
    embedding=embeddings
)
movie_docs = []
movies = ["Inception", "The Return of the King", "Shutter Island", "The Dark Knight"]

for movie in movies:
  movie_docs += retriever.invoke(movie)

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)

movie_docs_split = text_splitter.split_documents(movie_docs)

movie_vector_db = qdrant_db.add_documents(documents=movie_docs_split)


- Create the following rag_chatbot.py file and execute with chainlit run rag_chatbot.py -w

In [None]:
from openai import AzureOpenAI
from langchain_openai import AzureOpenAIEmbeddings
import chainlit as cl
import os
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from dotenv import load_dotenv, find_dotenv
from chainlit.input_widget import Select, Slider

load_dotenv(find_dotenv())


@cl.on_chat_start
async def start_chat():
    await cl.ChatSettings([
        Select(
            id="language",
            values=["English", "Romanian"],
            label="Select your preferred language",
            initial_value="English"
        ),
        Slider(
            id="Temperature",
            label="Temperature",
            initial=0,
            min=0,
            max=1,
            step=0.1
        )
    ]
    ).send()
    cl.user_session.set("chat_history", [])
    cl.user_session.set("client", AzureOpenAI(
        azure_endpoint=os.environ.get('AZURE_OPENAI_ENDPOINT'),
        api_key=os.environ.get('AZURE_OPENAI_API_KEY'),
        api_version="2024-06-01"))
    cl.user_session.set("embedding_model", AzureOpenAIEmbeddings(
        model="text-embedding-ada-002",
        api_key=os.getenv('AZURE_OPENAI_API_KEY'),
        api_version="2024-06-01",
        azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT')
    ))
    cl.user_session.set("qdrant_client", QdrantClient(
            url=os.getenv('QDRANT_ENDPOINT'),
            api_key=os.getenv('QDRANT_API_KEY')
    ))
    cl.user_session.set("retriever", QdrantVectorStore(
        collection_name='local_movie_db',
        embedding=cl.user_session.get("embedding_model"),
        client=cl.user_session.get("qdrant_client")))
    if SCENARIO == "assignment_1":
        qdrant_client = QdrantClient(
            url=os.getenv("QDRANT_ENDPOINT"), api_key=os.getenv("QDRANT_API_KEY")
        )
        cl.user_session.set("qdrant_client", qdrant_client)

        retriever = get_retriever("English", embedding_model, qdrant_client)
        cl.user_session.set("retriever", retriever)


def get_system_prompt(language):
    return f"""
    You are a helpful assistant for question on famous movies.
    You will formulate all its answers in {language}.
    Base you answer only on pieces of information received as context below.
    If you don't know the answer, just say that you don't know.
    Do not answer any question that are not related to movies."""


@cl.on_settings_update
async def setup_agent(settings):
    cl.user_session.set("language", settings["language"])
    cl.user_session.set("temperature", settings["Temperature"])


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


@cl.on_message
async def message_send(message: cl.Message):
    language = cl.user_session.get("language", "English")
    temperature = cl.user_session.get("temperature", 0)
    chat_history = cl.user_session.get("chat_history", [])
    retriever = cl.user_session.get("retriever")
    client = cl.user_session.get("client")
    retrieved_docs = retriever.similarity_search(message.content, k=4)
    context = format_docs(retrieved_docs)
    system_prompt = get_system_prompt(language)
    chat_history.append({"role": "system", "content": system_prompt})
    chat_history.append({"role": "user", "content": f"QUESTION: {message.content}"})
    chat_history.append({"role": "system", "content": f"CONTEXT: {context}"})
    full_response = ""
    source_elements = []
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=temperature,
        stream=True,
        messages=[
            {"role": m["role"], "content": m["content"]}
            for m in chat_history
        ]
    )

    msg = cl.Message(content="")
    await msg.send()
    for chunk in stream:
        if not chunk.choices or not chunk.choices[0].delta:
            continue

        delta = chunk.choices[0].delta.content or ""
        full_response += delta
        await msg.stream_token(delta)
    await msg.update()

    source_elements.append(cl.Text(content=context, name="Context", display="side"))
    msg.content += "\n\nContext"
    msg.elements = source_elements
    await msg.update()

    chat_history.append({"role": "assistant", "content": full_response})
    cl.user_session.set("chat_history", chat_history)

## Assignment: Vector DB in Romanian and English

Retrieve wikipedia articles on the movies in Romanian and English. Depending on the language selected, the retriever should only query the articles of the specified language.

In [None]:
#Solution TBD

## Assignment: Standalone question

Implement a reformulation step, that after the first conversation turn reformulates follow-up questions into standalone questions to be used for retrieval and generation. Use only user questions and llm responses for the reformulation step and only send the reformulated question plus retrieved context to the LLM.