<a href="https://colab.research.google.com/github/nafis-neehal/LLM_Projects/blob/main/RAG_VectorDB/RAG_LangChain_ChromaDB_Harry_Potter_QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install --quiet -r requirements.txt

In [1]:
from langchain.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import HuggingFaceEndpoint

import os

from google.colab import userdata

In [2]:
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_KEY_BS1')
os.environ["HUGGINGFACEHUB_API_TOKEN"] = userdata.get('HF_TOKEN')

# Step 1: Load the Data

In [3]:
loader = DirectoryLoader('./data/', glob='./*.txt', loader_cls=TextLoader)
documents = loader.load()
len(documents)

7

# Step 2: Split it into chunks

In [4]:
#splitting the text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
len(texts)

7643

# Step 3: Create the DB: Using Chroma - Local DB

In [None]:
#to create anew
persist_directory = './db'
embeddings = OpenAIEmbeddings()

vectordb = Chroma.from_documents(documents = texts,
                                 embedding = embeddings,
                                 persist_directory = persist_directory)

  warn_deprecated(


In [6]:
# to use existing
persist_directory = './db'
embeddings = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)

  warn_deprecated(


# Step 4: Make retriever from VectorDB
Source: https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore

In [42]:
retriever = vectordb.as_retriever(search_kwargs={"k":10})

10

In [48]:
docs = retriever.get_relevant_documents("Who was Severus Snape?")
len(docs)
docs[0]

Document(page_content='"Or he might have been sacked!" said Ron enthusiastically. "I mean, everyone hates him -"\n\u3000\u3000"Or maybe," said a very cold voice right behind them, "he\'s waiting to hear why you two didn\'t arrive on the school train."\n\u3000\u3000Harry spun around. There, his black robes rippling in a cold breeze, stood Severus Snape. He was a thin man with sallow skin, a hooked nose, and greasy, shoulder-length black hair, and at this moment, he was smiling in a way that told Harry he and Ron were in very deep trouble.\n\u3000\u3000"Follow me," said Snape.\n\u3000\u3000Not daring even to look at each other, Harry and Ron followed Snape up the steps into the vast, echoing entrance hall, which was lit with flaming torches. A delicious smell of food was wafting from the Great Hall, but Snape led them away from the warmth and light, down a narrow stone staircase that led into the dungeons.\n\u3000\u3000"In!" he said, opening a door halfway down the cold passageway and po

# Step 5: Add this to Context of LLM - Make a Chain

In [44]:
template = """
Instruction: Answer to the best of your ability the question based on the context below.
If the question can't be answered using the information provided answer with "I don't know".

Context: {context}

Question: {query}

Answer: """

prompt_template = PromptTemplate(
    input_variables=["context","query"],
    template=template
)
Context = "Answer the question based on your prior knowledge about Harry Potter books."
Question = "Who was Sirius Black?"

### 1. Evaluation of GPT 3.5 without and with RAG - OpenAI API

In [49]:
def process_llm_response(llm_response):
    print(llm_response['result'])
    print('\n\nSources:')
    for source in llm_response['source_documents']:
        print(source.metadata['source'])

In [50]:
llm = OpenAI(model_name="gpt-3.5-turbo-instruct")
chain = LLMChain(llm=llm, prompt=prompt_template)
input = {"context":Context, "query":Question}
chain.run(input)

" Sirius Black was a character in the Harry Potter books. He was the godfather of Harry Potter and a member of the Order of the Phoenix. He was also a convicted criminal and the brother of Harry's mother, Lily Potter."

In [51]:
qa_chain = RetrievalQA.from_chain_type(llm=OpenAI(model_name="gpt-3.5-turbo-instruct"),
                                     chain_type="stuff",
                                     retriever=retriever,
                                     return_source_documents=True)


In [52]:
query = "Who was Sirius Black?"
llm_response = qa_chain({"query": query})
process_llm_response(llm_response)

 Sirius Black was a wizard who had been believed guilty of mass murder for fourteen years, but new evidence had recently come to light that suggested he may not have committed the crimes. He had escaped from Azkaban, the wizard jail, and was being hunted by the Ministry of Magic. He was also Harry Potter's godfather and the last of the Black family line. Additionally, he was feared by many as he was believed to be Lord Voldemort's right-hand man. However, it was later revealed that he was actually innocent and the murders were committed by his friend, Wormtail.


Sources:
data/Harry Potter and the Order of the Phoenix.txt
data/Harry Potter and the Order of the Phoenix.txt
data/Harry Potter and the Prisoner of Azkaban .txt
data/Harry Potter and the Prisoner of Azkaban .txt
data/Harry Potter and The Half-Blood Prince.txt
data/Harry Potter and The Half-Blood Prince.txt
data/Harry Potter and the Prisoner of Azkaban .txt
data/Harry Potter and the Prisoner of Azkaban .txt
data/Harry Potter a

### 2. Evaluation of Open-Source models without and with RAG - HuggingFace API

In [60]:
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id, max_length=128, temperature=0.1, max_new_tokens=250,
    return_full_text=False)
llm_chain = LLMChain(prompt=prompt_template, llm=llm)
print(llm_chain.run(input))

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.


Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful
 Sirius Black was a character in the Harry Potter series, portrayed as a friend of Harry's godfather, Sir James Potter, and his godfather's best friend, Remus Lupin. Sirius was also the godfather of Harry Potter. He was a member of the Order of the Phoenix and was known as "the Prisoner of Azkaban" because he was wrongly accused of murdering thirteen people and was sent to Azkaban prison. Sirius was actually innocent, and the real killer was Peter Pettigrew, who had betrayed the Potters and Sirius to Lord Voldemort. Sirius was a large, black Ankou dog-like creature when he transformed into a werewolf during the full moon. He was eventually killed by the Death Eater, Bellatrix Lestrange, during the Battle of the Department of Mysteries.


In [61]:
qa_chain = RetrievalQA.from_chain_type(llm=llm,
                                     chain_type="stuff",
                                     retriever=retriever,
                                     return_source_documents=True)
query = "Who was Sirius Black?"
llm_response = qa_chain({"query": query})
process_llm_response(llm_response)

 Sirius Black was a wizard who had been wrongly imprisoned in Azkaban for fourteen years for the mass murder of thirteen Muggles and one wizard. He was the godfather of Harry Potter and a close friend of James Potter and Remus Lupin. He was believed to have escaped from Azkaban two years ago and was currently being hunted by the Ministry of Magic. However, new evidence has come to light suggesting that Sirius may not have committed the crimes for which he was imprisoned and that he may not have even been present at the killings.


Sources:
data/Harry Potter and the Order of the Phoenix.txt
data/Harry Potter and the Order of the Phoenix.txt
data/Harry Potter and the Prisoner of Azkaban .txt
data/Harry Potter and the Prisoner of Azkaban .txt
data/Harry Potter and The Half-Blood Prince.txt
data/Harry Potter and The Half-Blood Prince.txt
data/Harry Potter and the Prisoner of Azkaban .txt
data/Harry Potter and the Prisoner of Azkaban .txt
data/Harry Potter and the Goblet of Fire.txt
data/Ha