# Tutorial

Kai Foerster, Amin Oueslati, Steve Kerr

## Introduction
Policy motivation: many institutions want to use something like ChatGPT but with their own domain knowledge <br>
Explain what a RAG chatbot is   <br>

### Next steps

	- Add HF token in .env
	- Look for a better model that has 1000 characters size
	- Make it keep the memory of old conversation
    - Create a backstop that it does not generate ist own questions

# Setup

* Install dependencies
* Configure an API key for Hugging Face

In [1]:
# install dependencies
#!pipenv install langchain
#!pipenv install sentence_transformers
#!pipenv install chromadb
#!pipenv install unstructured
#!pipenv install chainlit

[1mLoading .env environment variables...[0m
[1;32mInstalling langchain[0m[1;33m...[0m
[?25lResolving langchain[33m...[0m
[2K✔ Installation Succeeded
[2K[32m⠋[0m Installing langchain...
[1A[2K[1mInstalling dependencies from Pipfile.lock [0m[1m([0m[1mb504e4[0m[1m)[0m[1;33m...[0m
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[1mLoading .env environment variables...[0m
[1;32mInstalling sentence_transformers[0m[1;33m...[0m
[?25lResolving sentence_transformers[33m...[0m
[2K✔ Installation Succeeded
[2K[32m⠋[0m Installing sentence_transformers...
[1A[2K[1mInstalling dependencies from Pipfile.lock [0m[1m([0m[1mb504e4[0m[1m)[0m[1;33m...[0m
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[1mLoading .env environment variables...[0m
[1;32mInstalling chromadb

## Building a chatbot (no RAG)

In [25]:
import os
import chainlit as cl
from langchain import HuggingFaceHub, PromptTemplate, LLMChain

In [26]:
HF_API_TOKEN = "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"
os.environ["HF_API_TOKEN"] = HF_API_TOKEN

In [27]:
model_id = "gpt2-medium"
conv_model = HuggingFaceHub(
    huggingfacehub_api_token=os.environ['HF_API_TOKEN'], 
    repo_id=model_id, 
    model_kwargs={"temperature":0.8,"max_length": 500}
    )



In [28]:
template="""You are a helpful assistant that answers questions of the user.
{human_message}
"""

prompt=PromptTemplate(template=template, input_variables=["human_message"])

In [29]:
conv_chain = LLMChain(llm=conv_model, prompt=prompt, verbose=True)

In [30]:
#res=conv_chain.run("what is string theory?")
#print(res)
print(conv_chain.run("what is string theory?"))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a helpful assistant that answers questions of the user.
what is string theory?
[0m

[1m> Finished chain.[0m
string theory is a branch of mathematics that attempts to describe the structure of the universe in terms of some set of set theory axioms that are consistent with the mathematical reality of nature. string theory has many applications in physics, and is also used to describe the structure of the universe.
string theory is not without its problems. One of the most important problems in string theory is that of information. This is the concept of "information" which is an idea with which most people are unfamiliar, and which has been used to describe information about an entity. When we use the word "information," we often mean simply the set of possible values that can exist in the universe. However, in string theory, we don't actually mean the set of possible values that can exist in the 

### Appending last response to follow up question

In [31]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory, ConversationBufferMemory

In [32]:
memory = ConversationBufferMemory(memory_key="history")

In [33]:
user_message = "what is string theory?"
while user_message != "bye":
    memory.chat_memory.add_user_message(user_message)
    res = conv_chain.run(user_message)
    print("AI: ", res)
    memory.chat_memory.add_ai_message(res)
    user_message = input("Enter a message or 'bye' to exit!")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a helpful assistant that answers questions of the user.
what is string theory?
[0m

[1m> Finished chain.[0m
AI:  string theory is a branch of mathematics that attempts to describe the structure of the universe in terms of some set of set theory axioms that are consistent with the mathematical reality of nature. string theory has many applications in physics, and is also used to describe the structure of the universe.
string theory is not without its problems. One of the most important problems in string theory is that of information. This is the concept of "information" which is an idea with which most people are unfamiliar, and which has been used to describe information about an entity. When we use the word "information," we often mean simply the set of possible values that can exist in the universe. However, in string theory, we don't actually mean the set of possible values that can exist in

### Hallucinations

In [15]:
print(conv_chain.run("What is so special about Llama 2?"))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a helpful assistant that answers questions of the user.
What is so special about Llama 2?
[0m

[1m> Finished chain.[0m
Llama 2 is a complete recommender system (RS). This means that it has a front end, which allows the user to ask questions and then answers the questions based on the learned model. This allows the user to get recommendations from the system.
Also, the backend is based on Tensorflow, which allows the system to learn a model that is able to classify the questions of the user.
How does it work?
The user is able to ask a question and sketch an image. To do this, as you can see in the picture below, a form is presented to the user. The user can then submit the question. After the question is submitted, the front end will check the question and use the API to classify it. If the answer is 1, the question is of picture type, otherwise it is of text type.

The user is then able to add a

### Source knowledging (manual)

In [34]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

In [35]:
source_knowledge

'A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.\nChains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.\nLangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those

In [36]:
template_with_context="""You are a helpful assistant that answers questions of the user, using the context provided below.

Contexts:{source_knowledge}

{human_message}
"""

prompt2=PromptTemplate(template=template_with_context, input_variables=["human_message",  "source_knowledge"])

In [37]:
print(prompt2.format(human_message="What is a LLMChain?", source_knowledge=source_knowledge))

You are a helpful assistant that answers questions of the user, using the context provided below.

Contexts:A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.
Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.
LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with

In [38]:
context_chain = LLMChain(llm=conv_model, prompt=prompt2, verbose=True)

In [45]:
print(context_chain.run({

  'source_knowledge': source_knowledge,

  'human_message': "What is LLMChain?"

}))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a helpful assistant that answers questions of the user, using the context provided below.

Contexts:A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.
Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.
LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model 

## RAG 
### Create database to store your corpus on

In [43]:
# load dependencies
from langchain.document_loaders import DirectoryLoader
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
import shutil

In [44]:
# set params
DATA_PATH = "data/html"
CHROMA_PATH = "chroma_db"
EMBED_MODEL = "all-MiniLM-L6-v2" # Chroma defaults to "sentence-transformers/all-MiniLM-L6-v2"
# alternative: "BAAI/bge-small-en-v1.5"

# Load Documents

In [46]:
# load docs
def load_docs(directory):
  loader = DirectoryLoader(directory)
  documents = loader.load()
  return documents

documents = load_docs(DATA_PATH)
len(documents)

2023-12-11 13:23:06 - Reading document from string ...
2023-12-11 13:23:06 - Reading document ...
2023-12-11 13:23:07 - Reading document from string ...
2023-12-11 13:23:07 - Reading document ...
2023-12-11 13:23:07 - Reading document from string ...
2023-12-11 13:23:07 - Reading document ...
2023-12-11 13:23:07 - Reading document from string ...
2023-12-11 13:23:07 - Reading document ...
2023-12-11 13:23:07 - Reading document from string ...
2023-12-11 13:23:07 - Reading document ...
2023-12-11 13:23:08 - Reading document from string ...
2023-12-11 13:23:08 - Reading document ...
2023-12-11 13:23:08 - Reading document from string ...
2023-12-11 13:23:08 - Reading document ...
2023-12-11 13:23:08 - Reading document from string ...
2023-12-11 13:23:08 - Reading document ...
2023-12-11 13:23:08 - Reading document from string ...
2023-12-11 13:23:08 - Reading document ...
2023-12-11 13:23:08 - Reading document from string ...
2023-12-11 13:23:08 - Reading document ...
2023-12-11 13:23:08 

3487

In [47]:
documents[0]

Document(page_content='1.000 Scope of part.\n\nThis part sets forth basic policies and general information about the Federal Acquisition Regulations System including purpose, authority, applicability, issuance, arrangement, numbering, dissemination, implementation, supplementation, maintenance, administration, and deviation. subparts\xa0 1.2,1.3, and 1.4 prescribe administrative procedures for maintaining the FAR System.\n\nPart 1 - Federal Acquisition Regulations System', metadata={'source': 'data\\html\\1.000.html'})

# Embed Documents & Upload to Vector Database

In [48]:
# define text embedding model
embedding_func = SentenceTransformerEmbeddings(model_name=EMBED_MODEL)

# See https://huggingface.co/spaces/mteb/leaderboard

2023-12-11 13:32:31 - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2023-12-11 13:32:32 - Use pytorch device: cpu


In [49]:
# first, clear out current db
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

# initialize Chroma db and save locally
db = Chroma.from_documents(
    documents=documents, embedding=embedding_func, persist_directory=CHROMA_PATH
    )

db.persist()

# print message
print(f"Saved {len(documents)} chunks to {CHROMA_PATH}.")

2023-12-11 13:33:11 - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


Batches: 100%|██████████| 6/6 [00:21<00:00,  3.62s/it]
Batches: 100%|██████████| 6/6 [00:19<00:00,  3.22s/it]
Batches: 100%|██████████| 6/6 [00:17<00:00,  2.87s/it]
Batches: 100%|██████████| 6/6 [00:21<00:00,  3.63s/it]
Batches: 100%|██████████| 6/6 [00:19<00:00,  3.27s/it]
Batches: 100%|██████████| 6/6 [00:21<00:00,  3.52s/it]
Batches: 100%|██████████| 6/6 [00:18<00:00,  3.10s/it]
Batches: 100%|██████████| 6/6 [00:19<00:00,  3.22s/it]
Batches: 100%|██████████| 6/6 [00:19<00:00,  3.24s/it]
Batches: 100%|██████████| 6/6 [00:19<00:00,  3.17s/it]
Batches: 100%|██████████| 6/6 [00:17<00:00,  2.91s/it]
Batches: 100%|██████████| 6/6 [00:19<00:00,  3.29s/it]
Batches: 100%|██████████| 6/6 [00:19<00:00,  3.19s/it]
Batches: 100%|██████████| 6/6 [00:20<00:00,  3.40s/it]
Batches: 100%|██████████| 6/6 [00:21<00:00,  3.62s/it]
Batches: 100%|██████████| 6/6 [00:22<00:00,  3.71s/it]
Batches: 100%|██████████| 6/6 [00:22<00:00,  3.80s/it]
Batches: 100%|██████████| 6/6 [00:23<00:00,  3.89s/it]
Batches: 1

Saved 3487 chunks to chroma_db.





# Query Vector Database

In [50]:
# query vector db
query = "What is the purpose of the Federal Acquisition Regulations?"
matching_docs = db.similarity_search_with_relevance_scores(
    query=query, 
    k=4, # number of docs to return
    #score_threshold=.5,
    #filter=[{"":""}]
    )

matching_docs

Batches: 100%|██████████| 1/1 [00:00<00:00, 19.61it/s]


[(Document(page_content='1.101 Purpose.\n\nThe Federal Acquisition Regulations System is established for the codification and publication of uniform policies and procedures for acquisition by all executive agencies. The Federal Acquisition Regulations System consists of the Federal Acquisition Regulation (FAR), which is the primary document, and agency acquisition regulations that implement or supplement the FAR. The FAR System does not include internal agency guidance of the type described in 1.301(a)(2).\n\nSubpart 1.1 - Purpose, Authority, Issuance', metadata={'source': 'data\\html\\1.101.html'}),
  0.754029959492634),
 (Document(page_content='1.000 Scope of part.\n\nThis part sets forth basic policies and general information about the Federal Acquisition Regulations System including purpose, authority, applicability, issuance, arrangement, numbering, dissemination, implementation, supplementation, maintenance, administration, and deviation. subparts\xa0 1.2,1.3, and 1.4 prescribe a

### Query data from your database based on your prompt

In [51]:
### adapted version 
PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---

Answer the question based on the above context: {question}
"""

def RAG(query_text):
    # Search the DB.
    results = db.similarity_search_with_relevance_scores(query_text, k=2)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
        return

    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    prompt_template=PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["context",  "question"])
    prompt = prompt_template.format(context=context_text, question=query_text)
    #print(prompt)

    chain = LLMChain(llm=conv_model, prompt=prompt_template, verbose=True)
    response_text = chain.run({"context": context_text, "question": query})
    
    sources = [doc.metadata.get("source", None) for doc, _score in results]
    formatted_response = f"Response: {response_text}\nSources: {sources}"
    print(formatted_response)

### Parse the augumented prompt into the chatmodel

In [52]:
RAG("What is the purpose of the Federal Acquisition Regulations?")

Batches: 100%|██████████| 1/1 [00:00<00:00, 18.54it/s]






[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Answer the question based only on the following context:

1.101 Purpose.

The Federal Acquisition Regulations System is established for the codification and publication of uniform policies and procedures for acquisition by all executive agencies. The Federal Acquisition Regulations System consists of the Federal Acquisition Regulation (FAR), which is the primary document, and agency acquisition regulations that implement or supplement the FAR. The FAR System does not include internal agency guidance of the type described in 1.301(a)(2).

Subpart 1.1 - Purpose, Authority, Issuance

---

1.000 Scope of part.

This part sets forth basic policies and general information about the Federal Acquisition Regulations System including purpose, authority, applicability, issuance, arrangement, numbering, dissemination, implementation, supplementation, maintenance, administration, and deviation. subparts  1.2,1.3, and 

### Human evaluation of RAG model
Do we wanna add some other evaluation methods here??

In [None]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

## References

https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb <br>
https://github.com/pixegami/langchain-rag-tutorial/tree/main <br>
https://www.youtube.com/watch?v=LhnCsygAvzY <br>
https://www.youtube.com/watch?v=tcqEUSNCn8I

I MADE SOME CHANGES HERE