# Tutorial

Kai Foerster, Amin Oueslati, Steve Kerr

## Introduction
Policy motivation: many institutions want to use something like ChatGPT but with their own domain knowledge <br>
Explain what a RAG chatbot is   <br>

### Next steps

	- Add HF token in .env

# Setup

* Install dependencies
* Configure an API key for Hugging Face

In [1]:
# install dependencies
!pipenv install langchain
!pipenv install sentence_transformers
!pipenv install chromadb
!pipenv install unstructured
!pipenv install chainlit
!pipenv install python-dotenv
!pipenv install bs4
!pipenv install tqdm

[1mLoading .env environment variables...[0m
[1;32mInstalling langchain[0m[1;33m...[0m
[?25lResolving langchain[33m...[0m
[2K✔ Installation Succeeded
[2K[32m⠋[0m Installing langchain...
[1A[2K[1mInstalling dependencies from Pipfile.lock [0m[1m([0m[1m1560fd[0m[1m)[0m[1;33m...[0m
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[1mLoading .env environment variables...[0m
[1;32mInstalling sentence_transformers[0m[1;33m...[0m
[?25lResolving sentence_transformers[33m...[0m
[2K✔ Installation Succeeded
[2K[32m⠋[0m Installing sentence_transformers...
[1A[2K[1mInstalling dependencies from Pipfile.lock [0m[1m([0m[1m1560fd[0m[1m)[0m[1;33m...[0m
To activate this project's virtualenv, run [33mpipenv shell[0m.
Alternatively, run a command inside the virtualenv with [33mpipenv run[0m.
[1mLoading .env environment variables...[0m
[1;32mInstalling chromadb

## Building a chatbot (no RAG)

In [2]:
import os
import chainlit as cl
from langchain import HuggingFaceHub, PromptTemplate, LLMChain
from dotenv import load_dotenv

2023-12-12 15:38:12 - Loaded .env file


In [3]:
load_dotenv()
HF_API_TOKEN = os.getenv('HF_API_TOKEN')

In [4]:
model_id = "tiiuae/falcon-7b-instruct"
conv_model = HuggingFaceHub(
    huggingfacehub_api_token=os.environ['HF_API_TOKEN'], 
    repo_id=model_id, 
    model_kwargs={"temperature":0.8,"max_length": 1000}
    )

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
template="""You are a helpful assistant that answers questions of the user.
{human_message}
"""

prompt=PromptTemplate(template=template, input_variables=["human_message"])

In [6]:
conv_chain = LLMChain(llm=conv_model, prompt=prompt, verbose=True)

In [7]:
print(conv_chain.run("How much does a cappucino cost at Pret a Manger in Berlin Mitte?"))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a helpful assistant that answers questions of the user.
How much does a cappucino cost at Pret a Manger in Berlin Mitte?
[0m

[1m> Finished chain.[0m
The cost of a cappucino at Pret a Manger in Berlin Mitte varies by size, but generally ranges from 4.95 to 6.95 euros.


### Hallucinations

In [8]:
print(conv_chain.run("How much does a cappucino cost at Pret a Manger in Berlin Mitte?"))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a helpful assistant that answers questions of the user.
How much does a cappucino cost at Pret a Manger in Berlin Mitte?
[0m

[1m> Finished chain.[0m
The cost of a cappucino at Pret a Manger in Berlin Mitte varies by size, but generally ranges from 4.95 to 6.95 euros.


### Source knowledging (manual)

In [9]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

In [10]:
template_with_context="""You are a helpful assistant that answers questions of the user, using the context provided below.

Contexts:{source_knowledge}

{human_message}
"""

prompt2=PromptTemplate(template=template_with_context, input_variables=["human_message",  "source_knowledge"])

In [11]:
print(prompt2.format(human_message="What is a LLMChain?", source_knowledge=source_knowledge))

You are a helpful assistant that answers questions of the user, using the context provided below.

Contexts:A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.
Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.
LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with

In [12]:
context_chain = LLMChain(llm=conv_model, prompt=prompt2, verbose=True)

In [13]:
print(context_chain.run({

  'source_knowledge': source_knowledge,

  'human_message': "What is Langchain?"

}))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a helpful assistant that answers questions of the user, using the context provided below.

Contexts:A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.
Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.
LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model 

## RAG 
### Create database to store your corpus on

In [14]:
# load dependencies
from langchain.document_loaders import DirectoryLoader
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
import shutil

In [15]:
# set params
DATA_PATH = "data/html"
CHROMA_PATH = "chroma_db"
EMBED_MODEL = "all-MiniLM-L6-v2" # Chroma defaults to "sentence-transformers/all-MiniLM-L6-v2"
# alternative: "BAAI/bge-small-en-v1.5"

# Load Documents

In [16]:
from bs4 import SoupStrainer
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import BSHTMLLoader

In [17]:
# define Beautiful Soup key word args
bs_kwargs = {
    "features": "html.parser", 
    "parse_only": SoupStrainer("p") # only include relevant text
}

# define Loader key word args
loader_kwargs = {
    "open_encoding": "utf-8",
    "bs_kwargs": bs_kwargs
}

# define Loader
loader = DirectoryLoader(
    path='data/html', 
    glob="*.html", 
    loader_cls=BSHTMLLoader,
    loader_kwargs=loader_kwargs,
    show_progress=True
    )

# load docs
documents = loader.load()
len(documents)

  1%|          | 22/3487 [00:00<00:36, 94.69it/s]

100%|██████████| 3487/3487 [00:04<00:00, 728.75it/s]


3487

In [18]:
# clean up document content

import re

for doc in documents:
    #text = re.sub(r'\n+', ' ', doc.page_content)
    #text = re.sub(r'\t+', ' ', text)
    #text = text.strip()
    doc.page_content = doc.page_content.replace("\n", " ").replace("\t", " ").strip()
    print(doc.page_content)

Warranties of data shall be developed and used in accordance with agency regulations.
(a) Contractors debarred, suspended, or proposed for debarment are excluded from receiving contracts, and agencies shall not solicit offers from, award contracts to, or consent to subcontracts with these contractors, unless the agency head determines that there is a compelling reason for such action (see 9.405-1(a)(2), 9.405-2, 9.406-1(c), 9.407-1(d), and 23.506(e)). Contractors debarred, suspended, or proposed for debarment are also excluded from conducting business with the Government as agents or representatives of other contractors. (b) Contractors and other entities that have an active exclusion record in SAM because they have been declared ineligible on the basis of statutory or other regulatory procedures are excluded from receiving contracts, and if applicable, subcontracts, under the conditions and for the period set forth in the statute or regulation. Agencies shall not solicit offers from, 

In [19]:
# inspect first doc
documents[0]

Document(page_content='Warranties of data shall be developed and used in accordance with agency regulations.', metadata={'source': 'data/html/46.708.html', 'title': ''})

## Label Metadata

In [20]:
# # add source label
# for doc in documents:
#     doc_source = re.search("\d{1,2}[.]\d+(\-\d)*", doc.metadata["source"]).group() 
#     doc.metadata["source"] = " ".join(["FAR", doc_source])

In [21]:
# define Beautiful Soup key word args
bs_kwargs = {
    "features": "html.parser", 
    "parse_only": SoupStrainer("title") # only include relevant text
}

# define Loader key word args
loader_kwargs = {
    "open_encoding": "utf-8",
    "bs_kwargs": bs_kwargs
}

loader = DirectoryLoader(
    path='data/html', 
    glob="*.html", 
    loader_cls=BSHTMLLoader,
    loader_kwargs=loader_kwargs,
    show_progress=True
    )

document_titles = loader.load()

# Convert the metadata for the specified label into a list
title_list = [doc.metadata["title"] for doc in document_titles]

# add title label
i = 0
for doc in documents:
    doc.metadata["source"] = " ".join(["FAR", title_list[i]])
    i += 1

100%|██████████| 3487/3487 [00:02<00:00, 1251.03it/s]


In [22]:
# # add FAR part label
# import re 

# for doc in docs:
#     doc_part = re.search('^(\d{1,2})', doc.metadata['source']).group()
#     doc.metadata["part"] = " ".join(["FAR Part", doc_part])
    
#     print(doc.metadata["part"])

In [23]:
# inspect metadata 
doc_metadata = [doc.metadata  for doc in documents]
doc_metadata[0:5]

[{'source': 'FAR 46.708 Warranties of data.', 'title': ''},
 {'source': 'FAR 9.405 Effect of listing.', 'title': ''},
 {'source': 'FAR 11.106 Purchase descriptions for service contracts.',
  'title': ''},
 {'source': 'FAR 16.204 Fixed-price incentive contracts.', 'title': ''},
 {'source': 'FAR 7.201 [Reserved]', 'title': ''}]

# Embed Documents & Upload to Vector Database

In [24]:
# define text embedding model
embedding_func = SentenceTransformerEmbeddings(model_name=EMBED_MODEL)

# See https://huggingface.co/spaces/mteb/leaderboard

2023-12-12 15:38:25 - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2023-12-12 15:38:25 - Use pytorch device: cpu


In [25]:
# first, clear out current db
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

# initialize Chroma db and save locally
db = Chroma.from_documents(
    documents=documents, embedding=embedding_func, persist_directory=CHROMA_PATH
    )

db.persist()

# print message
print(f"Saved {len(documents)} chunks to {CHROMA_PATH}.")

2023-12-12 15:38:25 - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


Batches: 100%|██████████| 109/109 [01:42<00:00,  1.06it/s]


Saved 3487 chunks to chroma_db.


# Query Vector Database

In [26]:
# query vector db
query = "What is the purpose of the Federal Acquisition Regulations?"
matching_docs = db.similarity_search_with_relevance_scores(
    query=query, 
    k=4, # number of docs to return
    score_threshold=.5,
    #filter=[{"":""}]
    )

matching_docs

Batches: 100%|██████████| 1/1 [00:00<00:00, 66.43it/s]


[(Document(page_content='The Federal Acquisition Regulations System is established for the codification and publication of uniform policies and procedures for acquisition by all executive agencies. The Federal Acquisition Regulations System consists of the Federal Acquisition Regulation (FAR), which is the primary document, and agency acquisition regulations that implement or supplement the FAR. The FAR System does not include internal agency guidance of the type described in 1.301(a)(2).', metadata={'source': 'FAR 1.101 Purpose.', 'title': ''}),
  0.677825443885269),
 (Document(page_content='This part sets forth basic policies and general information about the Federal Acquisition Regulations System including purpose, authority, applicability, issuance, arrangement, numbering, dissemination, implementation, supplementation, maintenance, administration, and deviation. subparts\xa0 1.2,1.3, and 1.4 prescribe administrative procedures for maintaining the FAR System.', metadata={'source': 

### Query data from your database based on your prompt

In [29]:
### adapted version 
PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---

Answer the question based on the above context: {question}
"""

def RAG(query_text):
    # Search the DB.
    results = db.similarity_search_with_relevance_scores(query_text, k=5, score_threshold=.5)
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")
        return

    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    if len(context_text) > 1000:
        context_text = context_text[:1000]
        print("Warning: Context exceeded 1000 characters, trimming from the end.")

    prompt_template=PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["context",  "question"])
    #prompt = prompt_template.format(context=context_text, question=query_text)
    #print(prompt)

    chain = LLMChain(llm=conv_model, prompt=prompt_template, verbose=True)
    response_text = chain.run({"context": context_text, "question": query})
    
    sources = [doc.metadata.get("source", None) for doc, _score in results]
    formatted_response = f"Response: {response_text}\nSources: {sources}"
    print(formatted_response)

### Parse the augumented prompt into the chatmodel

In [30]:
RAG("What does the Federal Acquisition Regulations define?")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches: 100%|██████████| 1/1 [00:00<00:00,  8.25it/s]




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Answer the question based only on the following context:

The Federal Acquisition Regulations System is established for the codification and publication of uniform policies and procedures for acquisition by all executive agencies. The Federal Acquisition Regulations System consists of the Federal Acquisition Regulation (FAR), which is the primary document, and agency acquisition regulations that implement or supplement the FAR. The FAR System does not include internal agency guidance of the type described in 1.301(a)(2).

---

This part sets forth basic policies and general information about the Federal Acquisition Regulations System including purpose, authority, applicability, issuance, arrangement, numbering, dissemination, implementation, supplementation, maintenance, administration, and deviation. subparts  1.2,1.3, and 1.4 prescribe administrative procedures for maintaining the FAR System.

---

Agen

### Human evaluation of RAG model
Do we wanna add some other evaluation methods here??

In [None]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

## References

https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb <br>
https://github.com/pixegami/langchain-rag-tutorial/tree/main <br>
https://www.youtube.com/watch?v=LhnCsygAvzY <br>
https://www.youtube.com/watch?v=tcqEUSNCn8I <br>
https://towardsdatascience.com/a-3-step-approach-to-evaluate-a-retrieval-augmented-generation-rag-5acf2aba86de <br>
https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain <br>
https://www.mlexpert.io/prompt-engineering/chatbot-with-local-llm-using-langchain <br>
https://www.youtube.com/watch?v=N7dGOUwufBM <br>
https://www.youtube.com/watch?v=ypzmPwLH_Q4&list=PLIUOU7oqGTLjAwPzyCu6m0wxLOlhJg8N5&index=2<br>
https://www.youtube.com/watch?v=qMIM7dECAkc <br>
https://www.youtube.com/watch?v=ukj_ITJKBwE&list=PLIUOU7oqGTLjAwPzyCu6m0wxLOlhJg8N5&index=6
