# Textbot

A chatbot that answers queries related to the given text documents 


## Setup

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_KEY")

In [2]:
# Imports

from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.document_loaders import DirectoryLoader


In [3]:
# Other imports
from os.path import normpath

## Load Documents & Process Documents

In this stage, we are going to load the text documents from the directory using the
`DirectoryLoaded` method

### Load Documents

In [7]:
dir_path = normpath("./data/")
loader = DirectoryLoader(dir_path, glob="./*.txt", loader_cls=TextLoader)

# Load the documents
documents = loader.load()

The type of the documents object is: <class 'list'>
Number of text documents loaded: 3


In [13]:
# View info about document object
print(f"The type of the documents object is: {type(documents)}")
print(f"Number of text documents loaded: {len(documents)}")

The type of the documents object is: <class 'list'>
Number of text documents loaded: 3


### Process Documents

The `RecursiveCharacterTextSplitter` can be used to split the dcouments into chunks and
make it ready for the use of LLMs



In [11]:
# Create splitter object
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Apply splitter object on the documents
texts = text_splitter.split_documents(documents)

The type of the documents object is: <class 'list'>
Total number of text chunks created: 49


In [12]:
# View info about texts object
print(f"The type of the documents object is: {type(texts)}")
print(f"Total number of text chunks created: {len(texts)}")


The type of the documents object is: <class 'list'>
Total number of text chunks created: 49


## Create Vector Database

The vector database is created using the `Chroma`. The vector database is used to store
the text embeddings.

Supplying a directory path to the `persist_directory` argument, saves the vector 
database on the disk.

To create text embeddings, we'll be using the OpenAI embeddings.

In [14]:
# Vector database directory
db_path = normpath("./chromadb/")

# Create text embedding object with OpenAI
embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

# Create the vector database using Chroma and save the embeddings in the vector database
vector_db = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=db_path)

## Load Vector Database

Since, we have created the `vector_db` in the previous cell and in the same runtime. So, 
we can now access the embeddings. But, in practical scenarios, we first create the
embeddings and store them in a vector database; so that, we can load it anytime we need 
and use it.

To ensure that, let's make the `vector_db` none and load it from the directory.

In [15]:
# persist the db to disk
vector_db.persist()

# Clearing the vector_db data from memory
vector_db = None

In [16]:
# Load the persisted database from the disk
vector_db = Chroma(persist_directory=db_path, embedding_function=embedding)

## Create Retriever

By default, the retriever object returns top $4$ text chunks as reference based on
the similarity. We can change it as follows:

In [17]:
# create a retriever object
retriever = vector_db.as_retriever()

In [27]:
# View number of documents returned
docs = retriever.get_relevant_documents("Who is aladin?")

# View the number of chunks returns
print(f"Number of relevant text chunks: {len(docs)}")

Number of relevant text chunks: 4


In [28]:
# View the docs returns based on similarity
docs

[Document(page_content='THE ADVENTURES OF ALADDIN', metadata={'source': 'data\\aladin.txt'}),
 Document(page_content='an old lamp, Aladdin wondered. Perhaps he was a wizard. He decided to be on \nhis guard. Picking up the lamp, he retraced his steps up to the entrance.\n   "Give me the lamp," urged the wizard impatiently. "Hand it over," he began\nto shout, thrusting out his arm to grab it, but Aladdin cautiously drew back.\n   "Let me out first . . ."\n   "Too bad for you," snapped the stranger, slamming down the manhole cover, \nnever noticing that, as he did so, a ring slid off his finger. A terrified \nAladdin was left in pitch darkness, wondering what the wizard would do next. \nThen he trod on the ring. Aimlessly putting it on his finger, he twisted it \nround and round. Suddenly the room was flooded with a rosy light and a great \ngenie with clasped hands appeared on a cloud.\n   "At your command, sire," said the genie.\n   Now astoundede, Aladdin could only stammer:\n   "I want

In [37]:
# Changing the relevant search chunks
retriever = vector_db.as_retriever(search_kwargs={"k": 3})

In [38]:
# View number of documents returned
docs = retriever.get_relevant_documents("Who is Pinoccioh?")

# View the number of chunks returns
print(f"Number of relevant text chunks: {len(docs)}")

Number of relevant text chunks: 3


In [39]:
# View the docs returns based on similarity
docs

[Document(page_content='PPINOCCHIO\n   Once upon a time... a carpenter, picked up a strange lump of wood one day \nwhile mending a table. When he began to chip it, the wood started to moan. \nThis frightened the carpenter and he decided to get rid of it at once, so he \ngave it to a friend called Geppetto, who wanted to make a puppet. Geppetto, a \ncobbler, took his lump of wood home, thinking about the name he would give his\npuppet.\n   "I\'ll call him Pinocchio," he told himself. "It\'s a lucky name." Back in \nhis humble basement home and workshop, Geppetto started to carve the wood. \nSuddenly a voice squealed:\n   "Ooh! That hurt!" Geppeto was astonished to find that the wood was alive. \nExcitedly he carved a head, hair and eyes, which immediately stared right at \nthe cobbler. But the second Geppetto carved out the nose, it grew longer and \nlonger, and no matter how often the cobbler cut it down to size, it just \nstayed a long nose. The newly cut mouth began to chuckle and wh

In [40]:
print(f"The retriever search type is: {retriever.search_type}")
print(f"The retriever search arguments are: {retriever.search_kwargs}")

The retriever search type is: similarity
The retriever search arguments are: {'k': 3}


## Create Chain

In [41]:
# Create the chain to answer questions

qa_chain = RetrievalQA.from_chain_type(
    llm = OpenAI(openai_api_key=OPENAI_API_KEY),
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

In [42]:
## Cite sources
def process_llm_response(llm_response):
    print(llm_response["result"])
    print("\n\nSources:")
    for source in llm_response["source_documents"]:
        print(source.metadata["source"])

## Q&A with ChatBot

In [43]:
query = "Who is Pinocchio?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

 Pinocchio is a puppet created by Geppetto that comes to life and learns to be a good boy.


Sources:
data\pinocchio.txt
data\pinocchio.txt
data\pinocchio.txt


In [44]:
query = "What are the different stories we have?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

 The stories mentioned in the context are the story of Beauty and the Beast and the story of Pinocchio. 


Sources:
data\beauty_beast.txt
data\aladin.txt
data\pinocchio.txt


In [45]:
query = "Give me a brief overview of the Pinoccio story"
llm_response = qa_chain(query)
process_llm_response(llm_response)



The story of Pinocchio follows a carpenter named Geppetto who receives a talking piece of wood and decides to make it into a puppet. The puppet, named Pinocchio, comes to life and causes mischief with his long nose. He eventually runs away and gets involved with a puppet show, where he is reunited with his creator and learns valuable life lessons. The story also includes a frightening puppet-master, Giovanni, and a subplot about Pinocchio's search for his missing mother.


Sources:
data\pinocchio.txt
data\pinocchio.txt
data\pinocchio.txt


In [46]:
query = "Give me a brief overview of about the adventures of Aladin"
llm_response = qa_chain(query)
process_llm_response(llm_response)


 The Adventures of Aladdin is a classic story about a young man named Aladdin who comes across a magic lamp containing a powerful genie. With the help of the genie, Aladdin overcomes various obstacles and obtains great wealth and power. However, he must also outsmart a wicked sorcerer who wants the lamp for himself. Along the way, Aladdin falls in love with a princess and must prove himself worthy of her hand in marriage.


Sources:
data\aladin.txt
data\aladin.txt
data\aladin.txt
