# Introduction
 
LLM in isolation knows only what it was trained on, which does not include:
- personal data
- propritery documents not on the internet
- data/articles written after the LLM was trained

Langchain Overview:

<center>
  <img src="images/overview.png"/>
</center> 

Langchain Components:

<center>
  <img src="images/components.png"/>
</center> 

Topics covered:
1. Document Loaders - load data from variety of sources
2. Document Splitters - split documents into sematically meaningful chunks
3. Sematic Search - basic method of finding relevant information
4. Retrieval - retreive documents to answer questions
5. Memory - to create a fully functional chatbot

In retrieval augmented generation (RAG), an LLM retrieves contextual documents from an external dataset as part of its execution. This is useful if we want to ask question about specific documents (e.g., our PDFs, a set of videos, etc).

<center>
  <img src="images/RAG.png"/>
</center> 

---

# Document Loaders

Document loaders deal with the specifics of accessing and converting data in a standardized format so that we can chat with it. Over 80 different type of document loaders available.
- Accessing: Web sites, data bases, youtube, wiki, arvix, epub, etc.
- Data Types: pdf, html, json, word, ppt, markdown, toml, email, etc.

Document loaders return a list of document objects standardized in a format consisting of content and the associated metadata.

Document loaders are also available for structured data - can be used of there are any unstructured text based columns within structured data

<center>
  <img src="images/document_loaders.png"/>
</center> 

In [None]:
#! pip install langchain
#! pip install pypdf 

In [None]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

#### PDF

Let's load a PDF [transcript](https://see.stanford.edu/materials/aimlcs229/transcripts/MachineLearning-Lecture01.pdf) from Andrew Ng's famous CS229 course! These documents are the result of automated transcription so words and sentences are sometimes split unexpectedly.

In [None]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf")
pages = loader.load()
len(pages)

NOTE: Each page is a document. A document contains page content and metadata

In [None]:
page = pages[0]
print(page.page_content[0:500])
print()
print(page.metadata)

#### Youtube

Load documents from Youtube to ask questions on videos

In [None]:
# ! pip install yt_dlp
# ! pip install pydub

In [None]:
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser #openai-whisper model for speech-to-text
from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader

In [None]:
url="https://www.youtube.com/watch?v=jGwO_UgTS7I"
save_dir="docs/youtube/"
loader = GenericLoader(
    YoutubeAudioLoader([url],save_dir),
    OpenAIWhisperParser()
)
docs = loader.load()
print(docs[0].page_content[0:100])

#### Internet URL

We import the web-based loader

In [None]:
from langchain.document_loaders import WebBaseLoader

In [None]:
loader = WebBaseLoader("https://github.com/basecamp/handbook/blob/master/37signals-is-you.md")
docs = loader.load()
print(docs[0].page_content[:100])
print()
print(docs[0].metadata)

#### Notion

Follow steps [here](https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/notion) for an example Notion site such as [this one](https://yolospace.notion.site/Blendle-s-Employee-Handbook-e31bff7da17346ee99f531087d8b133f):

* Duplicate the page into your own Notion space and export as `Markdown / CSV`.
* Unzip it and save it as a folder that contains the markdown file for the Notion page.
 

In [None]:
from langchain.document_loaders import NotionDirectoryLoader
loader = NotionDirectoryLoader("docs/Notion_DB")
docs = loader.load()

In [None]:
print(docs[0].page_content[0:200])

In [None]:
docs[0].metadata

In the next section, we look at Document Splitters to split the loaded documents. This is important so that the LLMs recieve only relevant content to answer the question. This could be paragraphs or few sentences that are the most topical to what is being talked about

---

## Document Splitters

<center>
  <img src="images/splitting.png"/>
</center> 

Large docs need to be split into smaller chunks. Chunking aims to keep text with common context together. During RAG, it will then be possible to retrieve pieces of content that are most relevant, instead of selecting the whole loaded document. After splitting the data, it goes into a vector store.

Document chunking is tricky because sentence continuation could be disrupted when chunking text and individually phrases may not mean much. The goal is to get semantically relevant chunks together. 

The basics of all splitting involve 2 major parameters:
1. chunk size - size of the chunk (characters or tokens)
2. chunk overlap - overlap between chunks, more like a sliding window to help create a notion of consistency to get sematically relevant chunks together

The text splitters in langchain all have the following 2 methods:
1. create_documents() - create documents for a list of texts
2. split_documents() - splits documents

<center>
  <img src="images/example_splitter.png"/>
</center> 

Types of Splitters - langchain.text_splitters.
* `CharacterTextSplitter()` - Implementation of splitting text that looks like characters
* `MarkdownHeaderTextSplitter()` - Implementation of splitting markdown files based on specific headers
* `TokenTextSplitter()` - Implementation of splitting text that looks like tokens
* `SentenceTransformersTokenTextSplitter()` - Implementation of splitting text that looks like tokens
* `RecursiveCharacterTextSplitter()` - Recursively tries to split by different chracters
* `Language()` - for CPP, Python, Ruby, Markdown, etc.
* `NLTKTextSplitter()` - Implementation of splitting text using NLTK
* `SpacyTextSplitter()` - Splitting text based on Spacy implementation
* Others

The above splitters vary on the following:
- how the chunks are split
- how the length of the chunks are measured
- some use another smaller models to determine sentences
- adding new pieces of metadata where relevant

Splitting of chunks is relevant to a type of document. Code below gives examples of the following text splitters: 
- RecursiveCharacterTextSplitter
- CharacterTextSplitter
- TokenTextSplitter
- MarkdownHeaderTextSplitter

In [None]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter, TokenTextSplitter, MarkdownHeaderTextSplitter

In [None]:
# check capability
chunk_size =26
chunk_overlap = 4

# init splitters
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)
c_splitter = CharacterTextSplitter(  # by default, this splits on a single newline character
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)

In [None]:
text1 = 'abcdefghijklmnopqrstuvwxyz'
print("Recursive:", r_splitter.split_text(text1))
print()
print("Character:", c_splitter.split_text(text1))

In [None]:
text2 = 'abcdefghijklmnopqrstuvwxyzabcdefg'
print("Recursive:", r_splitter.split_text(text2))
print()
print("Character:", c_splitter.split_text(text2))

In [None]:
text3 = "a b c d e f g h i j k l m n o p q r s t u v w x y z"
print("Recursive:", r_splitter.split_text(text3))
print()
print("Character:", c_splitter.split_text(text3))

In [None]:
c_splitter = CharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    separator = ' ' # change separator to empty space
)
c_splitter.split_text(text3)

In [None]:
# new init
c_splitter = CharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0,
    separator = ' '
)
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0, 
    separators=["\n\n", "\n", " ", ""] # order matters in the list, as split will first happen based on \n\n (double new lines), and then \n, and so on
)

NOTE: RecursiveCharacterTextSplitter is recommended for generic text.

In [None]:
some_text = """When writing documents, writers will use document structure to group content. \
This can convey to the reader, which idea's are related. For example, closely related ideas \
are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. \n\n  \
Paragraphs are often delimited with a carriage return or two carriage returns. \
Carriage returns are the "backslash n" you see embedded in this string. \
Sentences have a period at the end, but also, have a space.\
and words are separated by space."""

print("Document Length:", len(some_text))
print()
print("Recursive:", r_splitter.split_text(some_text))
print()
print("Character":, c_splitter.split_text(some_text))

In [None]:
# smaller chunks with a period separator
c_splitter = CharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0,
    separator = ' '
)
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=0,
    separators=["\n\n", "\n", "\. ", " ", ""]
)
r_splitter.split_text(some_text)
print("Recursive:", r_splitter.split_text(some_text))
print()
print("Character":, c_splitter.split_text(some_text))

In [None]:
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=0,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""] # possible to specific a regex to fix period issue
)
r_splitter.split_text(some_text)

Apply the above methods to a PDF:

In [None]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf")
pages = loader.load()
len(pages)

In [None]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=1000,
    chunk_overlap=150,
    length_function=len # split based on the length of the characters
)

In [None]:
docs = text_splitter.split_documents(pages)
len(docs)

In [None]:
from langchain.document_loaders import NotionDirectoryLoader
loader = NotionDirectoryLoader("docs/Notion_DB")
notion_db = loader.load()

docs = text_splitter.split_documents(notion_db)

print(len(notion_db))
print(len(docs))

#### TokenTextSplitter

We can also split on token count explicity, if we want.

This can be useful because LLMs often have context windows designated in tokens.

Tokens are often ~4 characters.

In [None]:
text_splitter = TokenTextSplitter(chunk_size=1, chunk_overlap=0)
text1 = "foo bar bazzyfoo"
text_splitter.split_text(text1)

In [None]:
text_splitter = TokenTextSplitter(chunk_size=2, chunk_overlap=0)
text1 = "foo bar bazzyfoo"
text_splitter.split_text(text1)

In [None]:
text_splitter = TokenTextSplitter(chunk_size=10, chunk_overlap=0)
docs = text_splitter.split_documents(pages)
docs[0]

#### Context aware splitting

Chunking aims to keep text with common context together.

A text splitting often uses sentences or other delimiters to keep related text together but many documents (such as Markdown) have structure (headers) that can be explicitly used in splitting.

We can use `MarkdownHeaderTextSplitter` to preserve header metadata in our chunks, as show below.

In [None]:
markdown_document = """# Title\n\n \
## Chapter 1\n\n \
Hi this is Jim\n\n Hi this is Joe\n\n \
### Section \n\n \
Hi this is Lance \n\n 
## Chapter 2\n\n \
Hi this is Molly"""

headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]

markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on
)
md_header_splits = markdown_splitter.split_text(markdown_document)
md_header_splits[0]

In [None]:
md_header_splits[1]

---

# Vector Stores and Embeddings

<center>
  <img src="images/vector_store.png"/>
</center> 


After splitting the documents into smaller and meaningful chunks, the next step is to put them in an **index** to easily retrieve the chunks to answer questions. We thus utilize embeddings and vector stores.
- The first step is to create embeddings, which are numerical representations of the text/chunks. Text with similar content will have similar vectors in the numeric space.
- We then store all the embeddings into a vector store. This allows to easily lookup similar vectors.
- The question at hand (input) is created into embeddings and compared against all the elements in the vector store. The "n" most similar vectors/chunks are then picked up. These, along with the question, are then passed to the LLM for a final response.

<center>
  <img src="images/embeddings.png"/>
</center> 

The Embeddings used in the example below are OpenAI embeddings. The VectorStore used is Chroma. Chroma is lightweight and in-memory!

Below is the end-to-end embedding workflow:

<center>
  <img src="images/embedding_workflow_1.png"/>
</center> 

<center>
  <img src="images/embedding_workflow_2.png"/>
</center> 

In [None]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [None]:
# ! pip install chromadb
from langchain.vectorstores import Chroma 

In [None]:
import numpy as np

# Load PDF
loaders = [
    # Duplicate documents on purpose - messy data
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf"),
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf"), # the same file is duplicated to introduce noise
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture02.pdf"),
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture03.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

In [None]:
# Split
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)
len(splits)

In [None]:
# init Embeddings 
from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings()

In [None]:
# chroma settings
persist_directory = 'docs/chroma/'
!rm -rf ./docs/chroma  # remove old database files if any

In [None]:
# create vector store
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embedding,
    persist_directory=persist_directory # chroma specific keyword argument to save the directory to disk
)
print(vectordb._collection.count()) # this should be the same as the number of splits we had before

In [None]:
# ask question
question = "is there an email i can ask for help"

# retreive answer
docs = vectordb.similarity_search(question,k=3) # k=3 is the number of chunks/documents to be returned
print(len(docs))
print()
print(docs[0].page_content)

In [None]:
vectordb.persist() # saves the vector database

#### Failure modes

Basic similarity search will get you 80% of the way there very easily. 

But there are some failure modes that can creep up. 

Here are some edge cases that can arise - we'll fix them in the next class.

The next code snippet shows the identical chunk issue (chunk 1 and 2 is similar for the below question), which will be fixed later.

Notice that we're getting duplicate chunks (because of the duplicate `MachineLearning-Lecture01.pdf` in the index).

Semantic search fetches all similar documents, but does not enforce diversity.

`docs[0]` and `docs[1]` are indentical.

In [None]:
question = "what did they say about matlab?"
docs = vectordb.similarity_search(question,k=5)
print("Chunk1: ", docs[0])
print()
print("Chunk2:", docs[1])

Any failure of this approach is that information for non-relevant chunks could also be picked up. In the example below, the ask is to focus only on the 3rd lecture, but upon printing metadata, it is noticed that the results include other lectres as well

In [None]:
question = "what did they say about regression in the third lecture?"
docs = vectordb.similarity_search(question,k=5)
for doc in docs:
    print(doc.metadata)

In [None]:
print(docs[4].page_content)

---

# Retrieval 

<center>
  <img src="images/retrieval.png"/>
</center> 

Retrieval techinques help address failure more and improve retrieval accuracy. Retrieval is important at query time, to retrieve the most relevant splits. The following is a list of different retrieval methods:

0. Simple sematic search

1. Maximal Marginal Relevance (MMR)
- Query the vector store
- Choose the 'fetch_k' most similar responses based on semantic search
- *Within those responses*, choose the 'k' most diverse responses
- Maximum marginal relevance strives to achieve both relevance to the query and diversity among the results

<center>
  <img src="images/mmr_1.png"/>
</center> 

<center>
  <img src="images/mmr_2.png"/>
</center>


2. Self-Query Retrieval - LLM Aided retrieval
- Useful when questions are not solely about the context we want to look up semantically but also include a Metadata filter. This also applies when there is more than 1 question to be answered in the same query.
- We use the language model itself to split the original question into 2 separate queries - a filter term and a search term. Then use the vector store capability for meta data filtering.
- We often want to infer the metadata from the query itself. To address this, we can use SelfQueryRetriever, which uses an LLM to extract:
    - The query string to use for vector search
    - A metadata filter to pass in as well
Most vector databases support metadata filters, so this doesn't require any new databases or indexes.

<center>
  <img src="images/llm_aided_retrieval.png"/>
</center>

3. Contextual Compression
- Useful for pulling out only the relevant bit from the retreived passages
- Once all semantically relevant documents are retrieved, use a language model to extract only the relevant sentences/segments. Then pass this to the final language model call.
- This technique is more costly but focuses only on the final answer having the most important things.
- Thus, we increase the number of results you can put in the context by shrinking the response to only the relevant information
- The approach for improving the quality of retrieved docs is compression. Information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses. Contextual compression is meant to fix this.

<center>
  <img src="images/compression.png"/>
</center>

It's worth noting that vectordb as not the only kind of tool to retrieve documents. The LangChain retriever abstraction includes other ways to retrieve documents, such as TF-IDF or SVM. These do not use vector dbs and instead use traditional NLP techniques

In [None]:
# ! pip install lark

In [None]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

In [None]:
# Vectordb and Embedding model settings 

persist_directory = 'docs/chroma/'

embedding = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding
)

print(vectordb._collection.count())

#### MMR Examples

In [None]:
## MMR

texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]

question = "Tell me about all-white mushrooms with large fruiting bodies"

smalldb = Chroma.from_texts(texts, embedding=embedding)

print("Without MMR:")
print(smalldb.similarity_search(question, k=2))
print()

print("With MMR:")
print(smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3))

Last class we introduced one problem: how to enforce diversity in the search results. `Maximum marginal relevance` strives to achieve both relevance to the query *and diversity* among the results.

In [None]:
## MMR - Example 2

question = "what did they say about matlab?"

print("Without MMR:")
docs_ss = vectordb.similarity_search(question,k=3)
print("chunk1:")
print(docs_ss[0].page_content[:100])
print()
print("chunk2:")
print(docs_ss[1].page_content[:100])

In [None]:
print("With MMR:")
docs_mmr = vectordb.max_marginal_relevance_search(question,k=3)
print("chunk1:")
print(docs_mmr[0].page_content[:100])
print()
print("chunk2:")
print(docs_mmr[1].page_content[:100])

#### Self Query Retrieval Examples

Addressing Specificity: working with metadata

In last lecture, we showed that a question about the third lecture can include results from other lectures as well.

To address this, many vectorstores support operations on `metadata`.

`metadata` provides context for each embedded chunk.

In [None]:
'''
LANGCHAIN: Self Query Retriever
'''
from langchain.retrievers.self_query.base import SelfQueryRetriever

'''
LANGCHAIN: Allows to specify different fields in the metadata and what they correspond to
'''
from langchain.chains.query_constructor.base import AttributeInfo

In [None]:
question = "what did they say about regression in the third lecture?"

# option 1 - fix metadata by hand
docs = vectordb.similarity_search(
    question,
    k=3,
    filter={"source":"docs/cs229_lectures/MachineLearning-Lecture03.pdf"} # this ensures results are returned only from the 3rd lecture
)

for d in docs:
    print(d.metadata)

But we have an interesting challenge: we often want to infer the metadata from the query itself.

To address this, we can use `SelfQueryRetriever`, which uses an LLM to extract:
 
1. The `query` string to use for vector search
2. A metadata filter to pass in as well

Most vector databases support metadata filters, so this doesn't require any new databases or indexes.

In [None]:
# option 2 - infer the metadata from the query itself using LLM

# we only have 2 fields in the metadata - source and page
# we fill out the name, description & type for each of these attributes
metadata_field_info = [
    AttributeInfo(
        name="source",
        description="The lecture the chunk is from, should be one of `docs/cs229_lectures/MachineLearning-Lecture01.pdf` \
        `docs/cs229_lectures/MachineLearning-Lecture02.pdf`, or `docs/cs229_lectures/MachineLearning-Lecture03.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="The page from the lecture",
        type="integer",
    ),
]


# we then specify information about what's in the document store
document_content_description = "Lecture notes"

# init the model
llm = OpenAI(model='gpt-3.5-turbo-instruct', temperature=0)

# init the self query retriever
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True
)

In [None]:
# test retriever
question = "what did they say about regression in the third lecture?"
docs = retriever.get_relevant_documents(question)
for d in docs:
    print(d.metadata)

#### Context Compression

Another approach for improving the quality of retrieved docs is compression.

Information most relevant to a query may be buried in a document with a lot of irrelevant text. 

Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this.

In [None]:
'''
LANGCHAIN: Init Compression Retriever
'''
from langchain.retrievers import ContextualCompressionRetriever

'''
LANGCHAIN: Extract the relevant bits from each document and pass those as the final return response
'''
from langchain.retrievers.document_compressors import LLMChainExtractor

In [None]:
# init compressor
compressor = LLMChainExtractor.from_llm(llm)

# init retriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever()
)

# helper function to print output
def pretty_print_docs(docs):
print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

# print output
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

The problem with the above output is that even though they are a lot shorter, there is repeat in information. The repeat problem can be fixed using MMR

In [None]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type = "mmr") # add MMR to contextual retriever to include diversity instead of duplication
)

question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

#### Other Retrivers - TFIDF, SVM

It's worth noting that vectordb as not the only kind of tool to retrieve documents. 

The `LangChain` retriever abstraction includes other ways to retrieve documents, such as TF-IDF or SVM.

Note: SVM uses embeddings but TF-IDF does not

In [None]:
from langchain.retrievers import SVMRetriever
from langchain.retrievers import TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
## Usual pipleine of loading and splitting

# Load PDF
loader = PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf")
pages = loader.load()
all_page_text=[p.page_content for p in pages]
joined_page_text=" ".join(all_page_text)

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)
splits = text_splitter.split_text(joined_page_text)

In [None]:
# Retrieve - both of these have a from_texts method 
svm_retriever = SVMRetriever.from_texts(splits, embedding) # requires the embedding module
tfidf_retriever = TFIDFRetriever.from_texts(splits) # takes splits directly

In [None]:
question = "What are major topics for this class?"
docs_svm=svm_retriever.get_relevant_documents(question)
docs_svm[0]

In [None]:
question = "what did they say about matlab?"
docs_tfidf=tfidf_retriever.get_relevant_documents(question)
docs_tfidf[0]

# Q&A using Chatbot

<center>
  <img src="images/q&a.png"/>
</center>

We now take the retrieved documents, take the question, pass them both to the language model and answer the question. Steps:
- Multiple relevant documents have been retreieved from the vector store
- Potentially compress the relevant splits to fit into the LLM context
- Send the information along with the question & system prompt, for the LLM to select and format the answer

<center>
  <img src="images/rqa.png"/>
</center>

Once the final chunks are selected, there are a few different ways to retrieve the final answer. This is mainly to address the short context window problem.

<center>
  <img src="images/methods.png"/>
</center>

In [None]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [None]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

#### 0. Stuff 
Stuff all data as prompt into the context to pass to the llm
- Pros: Single call to LLM + LLM has access to all data at once
- Cons: LLM context length restrictions

In [None]:
# retrieval over documents
from langchain.chains import RetrievalQA

# prompt template
from langchain.prompts import PromptTemplate

In [None]:
# check vector db/import chunks
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
print(vectordb._collection.count())

In [None]:
# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, 
              just say that you don't know, don't try to make up an answer. 
              Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
            {context}
            Question: {question}
            Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [None]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [None]:
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

In [None]:
result["source_documents"][0]

#### 1. Map Reduce 
Each individual chunk/document is first sent to the language model indipendently to get an original answer. Those individual answers are then "stuffed" to get a final answer from another LLM call.
- Pros: Can operate over any number of documents + can do individual documents in parallel
- Cons: 
    - Many LLM calls + Each document treated indpendently thus may not capture the context always
    - Since a lot of the individual call answers are going to be "no information found", and only few calls will have the actual answer, after compiling the information and sending to the LLM during the final call, it may say "no information found" as maximum number of answers during compilation say so. **If so, use Refine.**

In [None]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)
result = qa_chain_mr({"query": question})
result["result"]

#### 2. Refine

This invokes the RefineDocuments Chain which involves sequential calls to the LLM chunks. Each chunk is passed as a system message first, and the user question is used for finding the answer. In the next call to the LLM, the previous response is combined with the new chunk and the LLM is asked to improve the response. This process is iteratively done for each chunk. This might work better than the map-reduce chain as more information, even though sequentially, is carried forward.
- Pros: Combines information and builds up answer over time + provides longer answers
- Cons: Many llm calls hence slow (as many as map reduce)

In [None]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_mr({"query": question})
result["result"]

QA fails to preserve conversational history.

In [None]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [None]:
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

In [None]:
question = "why are those prerequesites needed?"
result = qa_chain({"query": question})
result["result"]

---

# Q&A with Memory

<center>
  <img src="images/chat_history.png"/>
</center>

- Adding memory allows follow up questions and taking chatbot history into context when responding to follow up questions
- Memory type is passed in the Retrieval chain. In this chain, the historical Q&A along with the new question is condensed into a standalone question. Thus:
    - previous question + previous answer + new question = final standalone question
    - This is important because the follow up question might ask for additional details and also the new standalone question will include any missing context summarized and now baked in from the old Q&A
    - It is possible to customize the prompt template to create a new standalone final question
    - It is possible to indtroduce different kinds of memories
    - Also, if entire memory needs to be retained, it can be handleded, managed separately as well


In [None]:
import os
import openai
import sys
sys.path.append('../..')

import panel as pn  # GUI
pn.extension()

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [None]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [None]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

In [None]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name=llm_name, temperature=0)
llm.predict("Hello world!")

In [None]:
# Build prompt
from langchain.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)

# Run chain
from langchain.chains import RetrievalQA
question = "Is probability a class topic?"
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectordb.as_retriever(),
                                       return_source_documents=True,
                                       chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})


result = qa_chain({"query": question})
result["result"]

#### Memory

In [None]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

#### ConversationalRetrievalChain

Add a new step: takes the history and condenses into a stand-alone questions to pass to the vector store to look up relevant documents.

In [None]:
from langchain.chains import ConversationalRetrievalChain
retriever=vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

In [None]:
question = "Is probability a class topic?"
result = qa({"question": question})
result['answer']

In [None]:
question = "why are those prerequesites needed?"
result = qa({"question": question})
result['answer']

### Final Q&A App

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA,  ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.document_loaders import PyPDFLoader

In [None]:
def load_db(file, chain_type, k):
    
    # load documents
    loader = PyPDFLoader(file)
    documents = loader.load()
    
    # split documents
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
    docs = text_splitter.split_documents(documents)
    
    # define embedding
    embeddings = OpenAIEmbeddings()
    
    # create vector database from data
    db = DocArrayInMemorySearch.from_documents(docs, embeddings)
    
    # define retriever
    retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": k})
    
    # create a chatbot chain. Memory is managed externally.
    qa = ConversationalRetrievalChain.from_llm(
        llm=ChatOpenAI(model_name=llm_name, temperature=0), 
        chain_type=chain_type, 
        retriever=retriever, 
        return_source_documents=True,
        return_generated_question=True,
    )
    return qa 

In [None]:
import panel as pn
import param

class cbfs(param.Parameterized):
    chat_history = param.List([])
    answer = param.String("")
    db_query  = param.String("")
    db_response = param.List([])
    
    def __init__(self,  **params):
        super(cbfs, self).__init__( **params)
        self.panels = []
        self.loaded_file = "docs/cs229_lectures/MachineLearning-Lecture01.pdf"
        self.qa = load_db(self.loaded_file,"stuff", 4)
    
    def call_load_db(self, count):
        if count == 0 or file_input.value is None:  # init or no file specified :
            return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")
        else:
            file_input.save("temp.pdf")  # local copy
            self.loaded_file = file_input.filename
            button_load.button_style="outline"
            self.qa = load_db("temp.pdf", "stuff", 4)
            button_load.button_style="solid"
        self.clr_history()
        return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")

    def convchain(self, query):
        if not query:
            return pn.WidgetBox(pn.Row('User:', pn.pane.Markdown("", width=600)), scroll=True)
        result = self.qa({"question": query, "chat_history": self.chat_history})
        self.chat_history.extend([(query, result["answer"])])
        self.db_query = result["generated_question"]
        self.db_response = result["source_documents"]
        self.answer = result['answer'] 
        self.panels.extend([
            pn.Row('User:', pn.pane.Markdown(query, width=600)),
            pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=600, style={'background-color': '#F6F6F6'}))
        ])
        inp.value = ''  #clears loading indicator when cleared
        return pn.WidgetBox(*self.panels,scroll=True)

    @param.depends('db_query ', )
    def get_lquest(self):
        if not self.db_query :
            return pn.Column(
                pn.Row(pn.pane.Markdown(f"Last question to DB:", styles={'background-color': '#F6F6F6'})),
                pn.Row(pn.pane.Str("no DB accesses so far"))
            )
        return pn.Column(
            pn.Row(pn.pane.Markdown(f"DB query:", styles={'background-color': '#F6F6F6'})),
            pn.pane.Str(self.db_query )
        )

    @param.depends('db_response', )
    def get_sources(self):
        if not self.db_response:
            return 
        rlist=[pn.Row(pn.pane.Markdown(f"Result of DB lookup:", styles={'background-color': '#F6F6F6'}))]
        for doc in self.db_response:
            rlist.append(pn.Row(pn.pane.Str(doc)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    @param.depends('convchain', 'clr_history') 
    def get_chats(self):
        if not self.chat_history:
            return pn.WidgetBox(pn.Row(pn.pane.Str("No History Yet")), width=600, scroll=True)
        rlist=[pn.Row(pn.pane.Markdown(f"Current Chat History variable", styles={'background-color': '#F6F6F6'}))]
        for exchange in self.chat_history:
            rlist.append(pn.Row(pn.pane.Str(exchange)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    def clr_history(self,count=0):
        self.chat_history = []
        return 

In [None]:
cb = cbfs()

file_input = pn.widgets.FileInput(accept='.pdf')
button_load = pn.widgets.Button(name="Load DB", button_type='primary')
button_clearhistory = pn.widgets.Button(name="Clear History", button_type='warning')
button_clearhistory.on_click(cb.clr_history)
inp = pn.widgets.TextInput( placeholder='Enter text here…')

bound_button_load = pn.bind(cb.call_load_db, button_load.param.clicks)
conversation = pn.bind(cb.convchain, inp) 

jpg_pane = pn.pane.Image( './img/convchain.jpg')

tab1 = pn.Column(
    pn.Row(inp),
    pn.layout.Divider(),
    pn.panel(conversation,  loading_indicator=True, height=300),
    pn.layout.Divider(),
)
tab2= pn.Column(
    pn.panel(cb.get_lquest),
    pn.layout.Divider(),
    pn.panel(cb.get_sources ),
)
tab3= pn.Column(
    pn.panel(cb.get_chats),
    pn.layout.Divider(),
)
tab4=pn.Column(
    pn.Row( file_input, button_load, bound_button_load),
    pn.Row( button_clearhistory, pn.pane.Markdown("Clears chat history. Can use to start a new topic" )),
    pn.layout.Divider(),
    pn.Row(jpg_pane.clone(width=400))
)
dashboard = pn.Column(
    pn.Row(pn.pane.Markdown('# ChatWithYourData_Bot')),
    pn.Tabs(('Conversation', tab1), ('Database', tab2), ('Chat History', tab3),('Configure', tab4))
)
dashboard

You can try alternate memory and retriever models by changing the configuration in `load_db` function and the `convchain` method. [Panel](https://panel.holoviz.org/) and [Param](https://param.holoviz.org/) have many useful features and widgets you can use to extend the GUI.

Panel based chatbot inspired by Sophia Yang, [github](https://github.com/sophiamyang/tutorials-LangChain)

<center>
  <img src="images/final.png"/>
</center>