# Langchain Bot with custom data

## Technical Libraries
### Doucment Loaders

Her an overview to load contents from the most poular exhange formats, PDF, HTML, and video. 
- langchain - [Langchain](https://www.langchain.com) genAI Application framework
- [langchain-openai](https://github.com/langchain-ai/langchain/blob/master/libs/partners/openai/README.md) - contains the LangChain integrations for OpenAI through their openai SDK (see other [partner SDK packages](https://github.com/langchain-ai/langchain/tree/master/libs/partners)).
- openai - openai API Interface
- yt_dlp - [yt-dlp](https://github.com/yt-dlp/yt-dlp) is a youtube-dl fork based on the now inactive youtube-dlc.
- pydub - [jaaro/pydub](https://github.com/jiaaro/pydub) is a high level interface for audio manpulation
- pypdf - [pypdf](https://github.com/py-pdf/pypdf) is a pure-python PDF library
- lark - [parsing toolkit for Pytho](https://github.com/lark-parser/lark)
- [ffmpeg](https://ffmpeg.org) streaming livbrary is required on the platform for video sand audio processing (on mac `brew install ffmpeg`)
- docarray - [docarray](https://docs.docarray.org) is crafted for the representation, transmission, storage, and retrieval of multimodal data. 

### GPT versions

- **GPT-4** - The most capable GPT model series to date. Able to do complex tasks, but slower at giving answers. Currently used by ChatGPT Plus.
- **GPT-3.5**  - Faster than GPT-4 and more flexible than GPT Base. The “good enough” model series for most tasks, whether chat or general. 
- **GPT-3.5 Turbo** - The best model in the GPT-3.5 series. Currently used by the free version of ChatGPT. Cost effective and flexible.
- **GPT Base** - Not trained with instruction following. Best used when fine tuned for specific tasks, otherwise use GPT-3.5 or GPT-4. Used for legacy cases as a replacement for the original GPT-3.
- **GPT-3** - The predecessor to GPT-3.5, currently depreciated.

For emebddings OpenAI recommends you use Ada v2.

## Intro

In retrieval augmented generation (RAG), an LLM retrieves contextual documents from an external dataset as part of its execution. This enables to ask question about specific documents (e.g., our PDFs, a set of videos, etc).

Each page is a Document. A Document contains text (page_content) and metadata.

<img src="./img/LangchainBotFlow.jpeg" width="500" />

Document Loaders: LangChain has over 80 different types of document loaders to handle variety of document types. Document loaders can also be used to load structured data in a tabular format where you want to search over with semantic queries.





### Load variables and read API keys

In [8]:
import os
import openai
import sys
from langchain.document_loaders import PyPDFLoader
# videos
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser
from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader
# URLs, WebBaseLoader is a generic class for loading web pages
from langchain.document_loaders import WebBaseLoader
# Notioan Loader
from langchain.document_loaders import NotionDirectoryLoader


from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [3]:
print(openai.VERSION)

1.10.0


## Document loaders
### Load PDF

In [10]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/MachineLearning-Lecture01.pdf")
pages = loader.load()

In [11]:
print(len(pages))
# first page, first 500 characters
print(pages[0].page_content[0:500])
print(pages[0].metadata)

22
MachineLearning-Lecture01  
Instructor (Andrew Ng):  Okay. Good morning. Welcome to CS229, the machine 
learning class. So what I wanna do today is ju st spend a little time going over the logistics 
of the class, and then we'll start to  talk a bit about machine learning.  
By way of introduction, my name's  Andrew Ng and I'll be instru ctor for this class. And so 
I personally work in machine learning, and I' ve worked on it for about 15 years now, and 
I actually think that machine learning i
{'source': 'docs/MachineLearning-Lecture01.pdf', 'page': 0}


### Load video

YouTube audio loader loads an audio file from a YouTube video. OpenAI Whisper parser uses OpenAI's Whisper model, a speech-to-text model, to convert the YouTube audio into a text format for further processing.
Specify a URL or specify a directory in which to save the audio files, and then create the generic loader as a combination of this YouTube audio loader combined  with the OpenAI Whisper parser. And then we can call "loader.load" to 
load the documents corresponding to the selected YouTube video. 

In [5]:
url="https://www.youtube.com/watch?v=jGwO_UgTS7I"
save_dir="docs/youtube/"
loader = GenericLoader(
    YoutubeAudioLoader([url],save_dir),
    OpenAIWhisperParser()
)
docs = loader.load()

[youtube] Extracting URL: https://www.youtube.com/watch?v=jGwO_UgTS7I
[youtube] jGwO_UgTS7I: Downloading webpage
[youtube] jGwO_UgTS7I: Downloading ios player API JSON
[youtube] jGwO_UgTS7I: Downloading android player API JSON
[youtube] jGwO_UgTS7I: Downloading m3u8 information
[info] jGwO_UgTS7I: Downloading 1 format(s): 140
[download] docs/youtube//Stanford CS229： Machine Learning Course, Lecture 1 - Andrew Ng (Autumn 2018).m4a has already been downloaded
[download] 100% of   69.76MiB
[ExtractAudio] Not converting audio docs/youtube//Stanford CS229： Machine Learning Course, Lecture 1 - Andrew Ng (Autumn 2018).m4a; file is already in target format m4a
Transcribing part 1!
Transcribing part 2!
Transcribing part 3!
Transcribing part 4!


In [6]:
docs[0].page_content[0:500]

"Welcome to CS229 Machine Learning. Uh, some of you know that this is a class that's taught at Stanford for a long time. And this is often the class that, um, I most look forward to teaching each year because this is where we've helped, I think, several generations of Stanford students become experts in machine learning, got- built many of their products and services and startups that I'm sure, many of you or probably all of you are using, uh, uh, today. Um, so what I want to do today was spend s"

### URL Content

[WebBaseLoader](https://python.langchain.com/docs/integrations/document_loaders/web_base) loads text from HTML. WebBaseLoader uses [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) to parse the HTML. For more custom logic for loading webpages look at some child class examples such as IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader.

To bypass SSL verification errors during fetching, you can set the “verify” option:

`loader.requests_kwargs = {‘verify’:False}`


In [13]:

loader = WebBaseLoader("https://github.com/basecamp/handbook/blob/master/benefits-and-perks.md", requests_kwargs={"verify": False})
docs = loader.load()

In [12]:
print(docs[0].page_content[:500])

MachineLearning-Lecture01  
Instructor (Andrew Ng):  Okay. Good morning. Welcome to CS229, the machine 
learning class. So what I wanna do today is ju st spend a little time going over the logistics 
of the class, and then we'll start to  talk a bit about machine learning.  
By way of introduction, my name's  Andrew Ng and I'll be instru ctor for this class. And so 
I personally work in machine learning, and I' ve worked on it for about 15 years now, and 
I actually think that machine learning i


### Notion Loader

See example Notion site: [Blenle's Employee Handbook](https://yolospace.notion.site/Blendle-s-Employee-Handbook-e31bff7da17346ee99f531087d8b133f)
Steps:
- Duplicate the page into your own Notion space and export as Markdown / CSV.
- Unzip it and save it as a folder that contains the markdown file for the Notion page.

In [13]:

loader = NotionDirectoryLoader("docs/Notion_DB")
docs = loader.load()

In [14]:
print(docs[0].page_content[0:200])

# Blendle's Employee Handbook

This is a living document with everything we've learned working with people while running a startup. And, of course, we continue to learn. Therefore it's a document that


## Document Splitting

Retrieve teh Content which is most relevant.

Due to different senetence and words length there's a lot of nuance and importance in how you split the chunks so that you get semantically relevant chunks together. The basis of all the text splitters in Lang Chain involves splitting on chunks in some chunk size with some chunk overlap.

langchain.text_splitter:
- `CharacterTextSplitter()` - splitting text that looks at characters
- `MarkdownHeasderTextSplitter()` - splitting text that looks at tokens
- `SentenceTransformersTokenTextSpliiter()` - splitting tehat looks like tokens
- `TecursiveChracterTextSplitter()` - splitting text that looks at characters. Recursively tries to split by different characters to find that word
- `Language()` - for CPP, python, Mardown etc.
- `NLTKTextSplitter()` - splitting text that looks at sentences using NLTK toolkit
- `SpacyTextSplitter()` - splitting text that looks at sentences using Spacy





In [15]:
# additional imports
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

# configs
chunk_size =26
chunk_overlap = 4

#  splitters
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)
c_splitter = CharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)

In [16]:
# text smaller than chunk size
text1 = 'abcdefghijklmnopqrstuvwxyz'
r_splitter.split_text(text1)

['abcdefghijklmnopqrstuvwxyz']

In [24]:
# overlap is too big
text2 = 'abcdefghijklmnopqrstuvwxyzabcdefg'
r_splitter.split_text(text2)

['abcdefghijklmnopqrstuvwxyz', 'wxyzabcdefg']

In [18]:
some_text = """When writing documents, writers will use document structure to group content. \
This can convey to the reader, which idea's are related. For example, closely related ideas \
are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. \n\n  \
Paragraphs are often delimited with a carriage return or two carriage returns. \
Carriage returns are the "backslash n" you see embedded in this string. \
Sentences have a period at the end, but also, have a space.\
and words are separated by space."""
len(some_text)

496

In [19]:
c_splitter = CharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0,
    separator = ' '
)
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0, 
    separators=["\n\n", "\n", " ", ""]
)

In [20]:
c_splitter.split_text(some_text)

['When writing documents, writers will use document structure to group content. This can convey to the reader, which idea\'s are related. For example, closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. \n\n Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this string. Sentences have a period at the end, but also,',
 'have a space.and words are separated by space.']

In [21]:
r_splitter.split_text(some_text)

["When writing documents, writers will use document structure to group content. This can convey to the reader, which idea's are related. For example, closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document.",
 'Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this string. Sentences have a period at the end, but also, have a space.and words are separated by space.']

In [22]:
# reduce chunk size and overlap
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=0,
    separators=["\n\n", "\n", "\. ", " ", ""]
)
r_splitter.split_text(some_text)

["When writing documents, writers will use document structure to group content. This can convey to the reader, which idea's are related. For example,",
 'closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document.',
 'Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this',
 'string. Sentences have a period at the end, but also, have a space.and words are separated by space.']

In [23]:
# even better
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=0,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""]
)
r_splitter.split_text(some_text)

["When writing documents, writers will use document structure to group content. This can convey to the reader, which idea's are related. For example,",
 'closely related ideas are in sentances. Similar ideas are in paragraphs. Paragraphs form a document.',
 'Paragraphs are often delimited with a carriage return or two carriage returns. Carriage returns are the "backslash n" you see embedded in this',
 'string. Sentences have a period at the end, but also, have a space.and words are separated by space.']

In [24]:
# try with PDF

from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/MachineLearning-Lecture01.pdf")
pages = loader.load()

from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=1000,
    chunk_overlap=150,
    length_function=len
)
docs = text_splitter.split_documents(pages)

In [26]:
# length of spliited documents
print(len(docs))
# length of original pages
print(len(pages))


77
22


In [27]:
# Notion DB example

from langchain.document_loaders import NotionDirectoryLoader
loader = NotionDirectoryLoader("docs/Notion_DB")
notion_db = loader.load()
docs = text_splitter.split_documents(notion_db)

In [28]:
# length of spliited documents
print(len(docs))
# length of original pages
print(len(pages))


352
22


### TokenTextSplitter

In [29]:
from langchain.text_splitter import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=1, chunk_overlap=0)


In [30]:
text1 = "foo bar bazzyfoo"
text_splitter.split_text(text1)

['foo', ' bar', ' b', 'az', 'zy', 'foo']

In [43]:
text_splitter = TokenTextSplitter(chunk_size=10, chunk_overlap=0)
docs = text_splitter.split_documents(pages)

In [44]:
docs[0]

Document(page_content='MachineLearning-Lecture01  \n', metadata={'source': 'docs/MachineLearning-Lecture01.pdf', 'page': 0})

In [45]:
pages[0].metadata

{'source': 'docs/MachineLearning-Lecture01.pdf', 'page': 0}

### Context aware splitting
Chunking aims to keep text with common context together.
A text splitting often uses sentences or other delimiters to keep related text together but many documents (such as Markdown) have structure (headers) that can be explicitly used in splitting.
We can use MarkdownHeaderTextSplitter to preserve header metadata in our chunks, as show below.

In [31]:
from langchain.document_loaders import NotionDirectoryLoader
from langchain.text_splitter import MarkdownHeaderTextSplitter
markdown_document = """# Title\n\n \
## Chapter 1\n\n \
Hi this is Jim\n\n Hi this is Joe\n\n \
### Section \n\n \
Hi this is Lance \n\n 
## Chapter 2\n\n \
Hi this is Molly"""

headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]

In [32]:
markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on
)
md_header_splits = markdown_splitter.split_text(markdown_document)

In [49]:
md_header_splits[0]

Document(page_content='Hi this is Jim  \nHi this is Joe', metadata={'Header 1': 'Title', 'Header 2': 'Chapter 1'})

In [50]:
md_header_splits[1]

Document(page_content='Hi this is Lance', metadata={'Header 1': 'Title', 'Header 2': 'Chapter 1', 'Header 3': 'Section'})

In [51]:
# real markdown file
loader = NotionDirectoryLoader("docs/Notion_DB")
docs = loader.load()
txt = ' '.join([d.page_content for d in docs])

headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
]
markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on
)


In [52]:
md_header_splits = markdown_splitter.split_text(txt)

In [53]:
md_header_splits[0]

Document(page_content="This is a living document with everything we've learned working with people while running a startup. And, of course, we continue to learn. Therefore it's a document that will continue to change.  \n**Everything related to working at Blendle and the people of Blendle, made public.**  \nThese are the lessons from three years of working with the people of Blendle. It contains everything from [how our leaders lead](https://www.notion.so/ecfb7e647136468a9a0a32f1771a8f52?pvs=21) to [how we increase salaries](https://www.notion.so/Salary-Review-e11b6161c6d34f5c9568bb3e83ed96b6?pvs=21), from [how we hire](https://www.notion.so/Hiring-451bbcfe8d9b49438c0633326bb7af0a?pvs=21) and [fire](https://www.notion.so/Firing-5567687a2000496b8412e53cd58eed9d?pvs=21) to [how we think people should give each other feedback](https://www.notion.so/Our-Feedback-Process-eb64f1de796b4350aeab3bc068e3801f?pvs=21) — and much more.  \nWe've made this document public because we want to learn fro

## Vector stores

After splitting contents into small, semantically meaningful chunks these chunks need to put into an index, whereby we can easily retrieve them when it 
comes time to answer questions about this corpus of data.

Embeddings a numerical representation of that text. Text with similar content will have similar vectors in this numeric space.

<img src="img/VectorStore.png"  width="500" />


In [33]:
from langchain.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    # Duplicate documents on purpose - messy data
    PyPDFLoader("docs/MachineLearning-Lecture01.pdf"),
    PyPDFLoader("docs/MachineLearning-Lecture01.pdf"),
    PyPDFLoader("docs/MachineLearning-Lecture02.pdf"),
    PyPDFLoader("docs/MachineLearning-Lecture03.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

In [34]:
# split data
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)

In [35]:
len(splits)

209

### Embeddings

Embeddings are commonly used for:

- Search (where results are ranked by relevance to a query string)
- Clustering (where text strings are grouped by similarity)
- Recommendations (where items with related text strings are recommended)
- Anomaly detection (where outliers with little relatedness are identified)
- Diversity measurement (where similarity distributions are analyzed)
- Classification (where text strings are classified by their most similar label)

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness

In [36]:
from langchain_openai import OpenAIEmbeddings
import numpy as np
embedding = OpenAIEmbeddings()
sentence1 = "i like dogs"
sentence2 = "i like canines"
sentence3 = "the weather is ugly outside"

embedding1 = embedding.embed_query(sentence1)
embedding2 = embedding.embed_query(sentence2)
embedding3 = embedding.embed_query(sentence3)

In [37]:
# dot product to compare enbeddings
# higfher number means more similar, because cosine similarity is between 0 and 1
# and 1 is the most similar
np.dot(embedding1, embedding2)

0.9631675619330522

In [25]:
np.dot(embedding1, embedding3)

0.7710630976675937

In [17]:
np.dot(embedding2, embedding3)

0.7596682675219122

### Vectorstores

LangChain has integrations with lots, over 30 different vector stores. Here [Chroma DB](https://www.trychroma.com) is used. By default, Chroma uses Sentence Transformers to embed for you but you can also use OpenAI embeddings, Cohere (multilingual) embeddings, or your own. 

In this case the [embeddings from OpenAI](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) are used.

In [18]:
#!rm -rf ./docs/chroma/*  # remove old database files if any

In [38]:
from langchain.vectorstores import Chroma
persist_directory = 'docs/chroma/'
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embedding,
    persist_directory=persist_directory
)

In [39]:
print(vectordb._collection.count())

209


In [40]:
# similarity search
question = "is there an email i can ask for help"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [41]:
docs[0].page_content

"cs229-qa@cs.stanford.edu. This goes to an acc ount that's read by all the TAs and me. So \nrather than sending us email individually, if you send email to this account, it will \nactually let us get back to you maximally quickly with answers to your questions.  \nIf you're asking questions about homework probl ems, please say in the subject line which \nassignment and which question the email refers to, since that will also help us to route \nyour question to the appropriate TA or to me  appropriately and get the response back to \nyou quickly.  \nLet's see. Skipping ahead — let's see — for homework, one midterm, one open and term \nproject. Notice on the honor code. So one thi ng that I think will help you to succeed and \ndo well in this class and even help you to enjoy this cla ss more is if you form a study \ngroup.  \nSo start looking around where you' re sitting now or at the end of class today, mingle a \nlittle bit and get to know your classmates. I strongly encourage you to f

In [42]:
vectordb.persist()

### Failure Modes

- Notice that we're getting duplicate chunks (because of the duplicate MachineLearning-Lecture01.pdf in the index).Semantic search fetches all similar documents, but does not enforce diversity.
- The second example shows, that it only should search over the 3rd lesson, but inludes results from others as well

In [43]:
question = "what did they say about matlab?"
docs = vectordb.similarity_search(question,k=5)

In [37]:
docs[0]

Document(page_content="cs229-qa@cs.stanford.edu. This goes to an acc ount that's read by all the TAs and me. So \nrather than sending us email individually, if you send email to this account, it will \nactually let us get back to you maximally quickly with answers to your questions.  \nIf you're asking questions about homework probl ems, please say in the subject line which \nassignment and which question the email refers to, since that will also help us to route \nyour question to the appropriate TA or to me  appropriately and get the response back to \nyou quickly.  \nLet's see. Skipping ahead — let's see — for homework, one midterm, one open and term \nproject. Notice on the honor code. So one thi ng that I think will help you to succeed and \ndo well in this class and even help you to enjoy this cla ss more is if you form a study \ngroup.  \nSo start looking around where you' re sitting now or at the end of class today, mingle a \nlittle bit and get to know your classmates. I stron

In [44]:
# duplicate documents
docs[1]

Document(page_content='those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people call it a free ve rsion of MATLAB, which it sort  of is, sort of isn\'t.  \nSo I guess for those of you that haven\'t s een MATLAB before, and I know most of you \nhave, MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to \nplot data. And it\'s sort of an extremely easy to  learn tool to use for implementing a lot of \nlearning algorithms.  \nAnd in case some of you want to work on your  own home computer or something if you \ndon\'t have a MATLAB license, for the purposes of  this class, there\'s also — [inaudible] \nwrite that down [inaudible] MATLAB — there\' s also a software package called Octave \nthat you can download for free off the Internet. And it has somewhat fewer features than MATLAB, but it\'s free, and for the purposes of  this class,

In [45]:
question = "what did they say about regression in the third lecture?"
docs = vectordb.similarity_search(question,k=5)
for doc in docs:
    print(doc.metadata)

{'page': 0, 'source': 'docs/MachineLearning-Lecture03.pdf'}
{'page': 14, 'source': 'docs/MachineLearning-Lecture03.pdf'}
{'page': 4, 'source': 'docs/MachineLearning-Lecture03.pdf'}
{'page': 0, 'source': 'docs/MachineLearning-Lecture02.pdf'}
{'page': 6, 'source': 'docs/MachineLearning-Lecture03.pdf'}


In [46]:
print(docs[4].page_content)

data sets as well. So don’t want to talk about  that. If you’re interested, look up the work 
of Andrew Moore on KD-trees. He, sort of, fi gured out ways to fit these models much 
more efficiently. That’s not something I want  to go into today. Okay? Let me move one. 
Let’s take more questions later.  
So, okay. So that’s locally weighted regres sion. Remember the outline I had, I guess, at 
the beginning of this lecture. What I want to do now is talk about a probabilistic interpretation of linear regres sion, all right? And in partic ular of the – it’ll be this 
probabilistic interpretati on that let’s us move on to talk  about logistic regression, which 
will be our first classification algorithm. So le t’s put aside locally weighted regression for 
now. We’ll just talk about ordinary unwei ghted linear regression. Let’s ask the question 
of why least squares, right? Of all the thi ngs we could optimize how do we come up with 
this criteria for minimizing the square of  the area betw

## Retrieval

When you ingest data into your document storage system, you often don’t know what specific queries will be used to retrieve those documents. Advancing from the section above, we want to retrieved the most relevant splits. Retrieval is a relatively new technology.

Problems:
- It might distract the LLM from the relevant information
- It takes up precious space that could be used to insert other relevant information.

Techniques:

- Maximum Marginal Relevance, or MMR: take the documents that are most similar to the query in the embedding space.
- Self query
- Compression

In [48]:
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings  # langchain.llms OpenAI is deprecated 
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding
)
print(vectordb._collection.count())

209


In [49]:
texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]
# add documents to database 
smalldb = Chroma.from_texts(texts, embedding=embedding)

### Diversity


<img src="img/MMR.png" width="500" />


In [50]:
# xhallange: quesstion has diverse intentions
# the query about a fruiting body and being all white. 
# make sure that we also get other information, like the fact that it's really poisonous.
question = "Tell me about all-white mushrooms with large fruiting bodies"
smalldb.similarity_search(question, k=2)
# k=2 means return 2 documents
# fetch_k=3 means return 3 sentences from each document
# both parameters are optional and have default values of 1, search based soley on semantic similarity


[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.'),
 Document(page_content='The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).')]

In [51]:
# now mmr
smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3)

[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.'),
 Document(page_content='A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.')]

In [52]:
question = "what did they say about matlab?"
docs_ss = vectordb.similarity_search(question,k=3)
docs_ss[0].page_content[:100]

'those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people '

In [53]:
docs_ss[1].page_content[:100]

'those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people '

In [54]:
docs_mmr = vectordb.max_marginal_relevance_search(question,k=3)
docs_mmr[0].page_content[:100]

'those homeworks will be done in either MATLA B or in Octave, which is sort of — I \nknow some people '

In [55]:
docs_mmr[1].page_content[:100]

'algorithm then? So what’s different? How come  I was making all that noise earlier about \nleast squa'

### Specificity: working with metadata and filter

In oder to get results from specific documents vectorstores support operations on metadata, which provides context for each embedded chunk.


In [56]:
question = "what did they say about regression in the third lecture?"
docs = vectordb.similarity_search(
    question,
    k=3,
    filter={"source":"docs/MachineLearning-Lecture03.pdf"}
)
for d in docs:
    print(d.metadata)

{'page': 0, 'source': 'docs/MachineLearning-Lecture03.pdf'}
{'page': 14, 'source': 'docs/MachineLearning-Lecture03.pdf'}
{'page': 4, 'source': 'docs/MachineLearning-Lecture03.pdf'}


### Self Query retriever

Typically we have an interesting challenge: metadata should be inferred from the query itself. 
The retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.

The only class method for the self query base class is from_llm. There are eight specified parameters and one to allow us to pass keyword arguments (kwargs). Four of them are required parameters for creating a self query class: llm, vectorstore, document_contents, and metadata_field_info.

- `llm` is for passing a language model.
- `vectorstore` is used to pass a vector store like Chroma. 
- `document_contents` parameter is a bit misleading. It doesn’t refer to the actual contents of the stored documents but rather a short description of them.
- `metadata_field_info` is a sequence of AttributeInfo objects, dictionaries containing information about the data in the vector database.

There are four of them as well: structured_query_translator, chain_kwargs, enable_limit, and use_original_query. The first two default to None, and the second to False.

The structured_query_translator parameter lets us pass in a translator. Translators convert expressions into filter statements for each vector database. These filter statements get passed into chain_kwargs as “allowed_comparators” or “allowed_operators,” depending on their usage. Each vector store has unique comparators and operators

<img src="img/Self.png" width="500" />

In [65]:
from langchain_openai import OpenAI # langchain.llms OpenAI is deprecated 
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
metadata_field_info = [
    AttributeInfo(
        name="source",
        description="The lecture the chunk is from, should be one of `docs/MachineLearning-Lecture01.pdf`, `docs/MachineLearning-Lecture02.pdf`, or `docs//MachineLearning-Lecture03.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="The page from the lecture",
        type="integer",
    ),
]
# until 4th of jan 2024 from langchain.llms import OpenAI is text-davinci-003 which is deprecated
# gpt-3.5-turbo-instruct is recommended now
document_content_description = "Lecture notes"
llm = OpenAI(model='gpt-3.5-turbo-instruct', temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True
)


In [66]:
question = "what did they say about regression in the third lecture?"

In [67]:
docs = retriever.get_relevant_documents(question)
for d in docs:
    print("Metadata", d.metadata)

Metadata {'page': 14, 'source': 'docs/MachineLearning-Lecture03.pdf'}
Metadata {'page': 0, 'source': 'docs/MachineLearning-Lecture03.pdf'}
Metadata {'page': 10, 'source': 'docs/MachineLearning-Lecture03.pdf'}
Metadata {'page': 10, 'source': 'docs/MachineLearning-Lecture03.pdf'}


### Compression

One challenge with retrieval is that usually you don’t know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. The goal of compressors is to make it easy to pass only the relevant information to the LLM. Langchain provides `DocumentCompressor` abstraction which allows to run compress_documents(documents: List[Document], query: str) on retrieved documents. The idea behind is, instead of immediately returning retrieved documents as-is, they can be compressed by using the context of the given query so that only the relevant information is returned. 

- `LLMChainExtractor` uses an `LLMChain` to extract from each document only the statements that are relevant to the query.
- `EmbeddingsFilter` embeds both the retrieved documents and the query and filters out any documents whose embeddings aren’t sufficiently similar to the embedded query.
- `DocumentCompressorPipeline`  makes it easy to create a pipeline of transformations and compressors and run them in sequence. A simple example of this is you may want to combine a TextSplitter and an EmbeddingsFilter to first break up your documents into smaller pieces and then filter out the split documents that are no longer relevant.

In [68]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))
# Wrap the vectorstore
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

In [69]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever()
)
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)



Document 1:

- "those homeworks will be done in either MATLA B or in Octave"
- "I know some people call it a free ve rsion of MATLAB"
- "MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to plot data."
- "there's also a software package called Octave that you can download for free off the Internet."
- "it has somewhat fewer features than MATLAB, but it's free, and for the purposes of this class, it will work for just about everything."
- "once a colleague of mine at a different university, not at Stanford, actually teaches another machine learning course."
----------------------------------------------------------------------------------------------------
Document 2:

- "those homeworks will be done in either MATLA B or in Octave"
- "I know some people call it a free ve rsion of MATLAB"
- "MATLAB is I guess part of the programming language that makes it very easy to write 

In [70]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type = "mmr")
)

In [71]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)



Document 1:

- "those homeworks will be done in either MATLA B or in Octave"
- "I know some people call it a free ve rsion of MATLAB"
- "MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write code for numerical routines, to move data around, to plot data."
- "there's also a software package called Octave that you can download for free off the Internet."
- "it has somewhat fewer features than MATLAB, but it's free, and for the purposes of this class, it will work for just about everything."
- "once a colleague of mine at a different university, not at Stanford, actually teaches another machine learning course."
----------------------------------------------------------------------------------------------------
Document 2:

"Oh, it was the MATLAB."
----------------------------------------------------------------------------------------------------
Document 3:

- "learning algorithms"
- "use learning algorithms"
- "send mail via 

### Other types of retrieval

These don't use a vector database and instead uses other more traditional NLP techniques as for instance TF-IDF or SVM. In the example SVM brings some good results as well.

In [72]:
from langchain.retrievers import SVMRetriever
from langchain.retrievers import TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load PDF
loader = PyPDFLoader("docs/MachineLearning-Lecture01.pdf")
pages = loader.load()
all_page_text=[p.page_content for p in pages]
joined_page_text=" ".join(all_page_text)

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)
splits = text_splitter.split_text(joined_page_text)


In [73]:
# Retrieve
svm_retriever = SVMRetriever.from_texts(splits,embedding)
tfidf_retriever = TFIDFRetriever.from_texts(splits)

In [74]:
question = "What are major topics for this class?"
docs_svm=svm_retriever.get_relevant_documents(question)
docs_svm[0]



Document(page_content="let me just check what questions you have righ t now. So if there are no questions, I'll just \nclose with two reminders, which are after class today or as you start to talk with other \npeople in this class, I just encourage you again to start to form project partners, to try to \nfind project partners to do your project with. And also, this is a good time to start forming \nstudy groups, so either talk to your friends  or post in the newsgroup, but we just \nencourage you to try to star t to do both of those today, okay? Form study groups, and try \nto find two other project partners.  \nSo thank you. I'm looking forward to teaching this class, and I'll see you in a couple of \ndays.   [End of Audio]  \nDuration: 69 minutes")

In [75]:
question = "what did they say about matlab?"
docs_tfidf=tfidf_retriever.get_relevant_documents(question)
docs_tfidf[0]

Document(page_content="Saxena and Min Sun here did, wh ich is given an image like this, right? This is actually a \npicture taken of the Stanford campus. You can apply that sort of cl ustering algorithm and \ngroup the picture into regions. Let me actually blow that up so that you can see it more \nclearly. Okay. So in the middle, you see the lines sort of groupi ng the image together, \ngrouping the image into [inaudible] regions.  \nAnd what Ashutosh and Min did was they then  applied the learning algorithm to say can \nwe take this clustering and us e it to build a 3D model of the world? And so using the \nclustering, they then had a lear ning algorithm try to learn what the 3D structure of the \nworld looks like so that they could come up with a 3D model that you can sort of fly \nthrough, okay? Although many people used to th ink it's not possible to take a single \nimage and build a 3D model, but using a lear ning algorithm and that sort of clustering \nalgorithm is the first ste

## Question Answering

<img src="img/QandA.png" width="500" />

The general flow is:
- question comes in
- look up the relevant documents
- pass those splits along with a system prompt and the human question to the language model 
- get the answer.

By default, all chunks are passed to the same context window (the same call of the language model). The "stuff technique" stuffs all the documents into the final prompt, so ther is only one call to the language model. There are a few different methods tht can be used if ther are too many documents, which can be passed to the context wiondow:

<img src="img/QundAMethods.png" width="500" />

With map-reduce technique each of the individual documents is first sent to the language model by itself to get 
an original answer. And then those answers are composed into a final answer with a final call to the language model.


In [76]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"

from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

print(llm_name)

gpt-3.5-turbo


In [77]:
print(vectordb._collection.count())

209


In [78]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [80]:
from langchain_openai import ChatOpenAI
# temperature 0 means no randomness for factual answers
llm = ChatOpenAI(model_name=llm_name, temperature=0)

### Retrieval Chain



In [81]:
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [83]:
result = qa_chain({"query": question})
result["result"]

'The major topics for this class are machine learning and its various applications.'

In [85]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [88]:
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

'Yes, probability is a topic that will be covered in the class. Thanks for asking!'

In [89]:
result["source_documents"][0]

Document(page_content="of this class will not be very program ming intensive, although we will do some \nprogramming, mostly in either MATLAB or Octa ve. I'll say a bit more about that later.  \nI also assume familiarity with basic proba bility and statistics. So most undergraduate \nstatistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna \nassume all of you know what ra ndom variables are, that all of you know what expectation \nis, what a variance or a random variable is. And in case of some of you, it's been a while \nsince you've seen some of this material. At some of the discussion sections, we'll actually \ngo over some of the prerequisites, sort of as  a refresher course under prerequisite class. \nI'll say a bit more about that later as well.  \nLastly, I also assume familiarity with basi c linear algebra. And again, most undergraduate \nlinear algebra courses are more than enough. So if you've taken courses like Math 51, \n103, Math 113 or 

## Chat

Document Loading, Splitting, Storage, Retrieval and how Retrieval can be used for output generation in Q+A using RetrievalQA chain has been introduced above.

<img src="img/ChatLangchain.png" width="400" />


In [8]:
import os
import openai
import sys
sys.path.append('../..')

import panel as pn  # GUI
pn.extension()

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']
llm_name = "gpt-3.5-turbo"

In [None]:
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [4]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [5]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name=llm_name, temperature=0)
llm.predict("Hello world!")

  warn_deprecated(
  warn_deprecated(


'Hello! How can I assist you today?'

In [6]:
# Build prompt
from langchain.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)

# Run chain
from langchain.chains import RetrievalQA
question = "Is probability a class topic?"
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectordb.as_retriever(),
                                       return_source_documents=True,
                                       chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})


result = qa_chain({"query": question})
result["result"]

  warn_deprecated(


'Yes, probability is a class topic assumed to be familiar to students, as mentioned by the instructor. Thanks for asking!'

### Memory in the conversation

In [7]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

In [8]:
from langchain.chains import ConversationalRetrievalChain
retriever=vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

In [9]:
question = "Is probability a class topic?"
result = qa({"question": question})

In [10]:
result['answer']

'Yes, probability is a class topic in the course being described. The instructor assumes familiarity with basic probability and statistics as prerequisites for the class.'

In [11]:
question = "why are those prerequesites needed?"
result = qa({"question": question})

In [12]:
result['answer']

'Familiarity with basic probability and statistics is needed for the class because the course will involve concepts and techniques from these areas. Understanding random variables, expectations, variances, and other statistical concepts is essential for grasping the material covered in the course on machine learning. These prerequisites ensure that students have the necessary background knowledge to engage with the course content effectively.'

### Chatbot for own documents

Try to alternate memory and retriever models by changing the configuration in load_db function and the convchain method. Panel and Param have many useful features and widgets you can use to extend the GUI.

You can pass in different prompt 
templates, not only for answering the question, 
but also for rephrasing that into a stand-alone question. 

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA,  ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import PyPDFLoader

In [2]:
def load_db(file, chain_type, k):
    # load documents
    loader = PyPDFLoader(file)
    documents = loader.load()
    # split documents
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
    docs = text_splitter.split_documents(documents)
    # define embedding
    embeddings = OpenAIEmbeddings()
    # create vector database from data
    db = DocArrayInMemorySearch.from_documents(docs, embeddings)
    # define retriever, similarity search
    retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": k})
    # create a chatbot chain. Memory is managed externally.
    qa = ConversationalRetrievalChain.from_llm(
        llm=ChatOpenAI(model_name=llm_name, temperature=0), 
        chain_type=chain_type, 
        retriever=retriever, 
        return_source_documents=True,
        return_generated_question=True,
    )
    return qa 


In [11]:
import panel as pn
import param

class cbfs(param.Parameterized):
    chat_history = param.List([])
    answer = param.String("")
    db_query  = param.String("")
    db_response = param.List([])
    
    def __init__(self,  **params):
        super(cbfs, self).__init__( **params)
        self.panels = []
        self.loaded_file = "docs/MachineLearning-Lecture01.pdf"
        self.qa = load_db(self.loaded_file,"stuff", 4)
    
    def call_load_db(self, count):
        if count == 0 or file_input.value is None:  # init or no file specified :
            return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")
        else:
            file_input.save("temp.pdf")  # local copy
            self.loaded_file = file_input.filename
            button_load.button_style="outline"
            self.qa = load_db("temp.pdf", "stuff", 4)
            button_load.button_style="solid"
        self.clr_history()
        return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")

    def convchain(self, query):
        if not query:
            return pn.WidgetBox(pn.Row('User:', pn.pane.Markdown("", width=600)), scroll=True)
        result = self.qa({"question": query, "chat_history": self.chat_history})
        self.chat_history.extend([(query, result["answer"])])
        self.db_query = result["generated_question"]
        self.db_response = result["source_documents"]
        self.answer = result['answer'] 
        self.panels.extend([
            pn.Row('User:', pn.pane.Markdown(query, width=600)),
            pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=600, style={'background-color': '#F6F6F6'}))
        ])
        inp.value = ''  #clears loading indicator when cleared
        return pn.WidgetBox(*self.panels,scroll=True)

    @param.depends('db_query ', )
    def get_lquest(self):
        if not self.db_query :
            return pn.Column(
                pn.Row(pn.pane.Markdown(f"Last question to DB:", styles={'background-color': '#F6F6F6'})),
                pn.Row(pn.pane.Str("no DB accesses so far"))
            )
        return pn.Column(
            pn.Row(pn.pane.Markdown(f"DB query:", styles={'background-color': '#F6F6F6'})),
            pn.pane.Str(self.db_query )
        )

    @param.depends('db_response', )
    def get_sources(self):
        if not self.db_response:
            return 
        rlist=[pn.Row(pn.pane.Markdown(f"Result of DB lookup:", styles={'background-color': '#F6F6F6'}))]
        for doc in self.db_response:
            rlist.append(pn.Row(pn.pane.Str(doc)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    @param.depends('convchain', 'clr_history') 
    def get_chats(self):
        if not self.chat_history:
            return pn.WidgetBox(pn.Row(pn.pane.Str("No History Yet")), width=600, scroll=True)
        rlist=[pn.Row(pn.pane.Markdown(f"Current Chat History variable", styles={'background-color': '#F6F6F6'}))]
        for exchange in self.chat_history:
            rlist.append(pn.Row(pn.pane.Str(exchange)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    def clr_history(self,count=0):
        self.chat_history = []
        return 


In [12]:
### Create a chatbot
cb = cbfs()

file_input = pn.widgets.FileInput(accept='.pdf')
button_load = pn.widgets.Button(name="Load DB", button_type='primary')
button_clearhistory = pn.widgets.Button(name="Clear History", button_type='warning')
button_clearhistory.on_click(cb.clr_history)
inp = pn.widgets.TextInput( placeholder='Enter text here…')

bound_button_load = pn.bind(cb.call_load_db, button_load.param.clicks)
conversation = pn.bind(cb.convchain, inp) 

jpg_pane = pn.pane.Image( './img/ChatWithYourData_Bot.jpg')

tab1 = pn.Column(
    pn.Row(inp),
    pn.layout.Divider(),
    pn.panel(conversation,  loading_indicator=True, height=300),
    pn.layout.Divider(),
)
tab2= pn.Column(
    pn.panel(cb.get_lquest),
    pn.layout.Divider(),
    pn.panel(cb.get_sources ),
)
tab3= pn.Column(
    pn.panel(cb.get_chats),
    pn.layout.Divider(),
)
tab4=pn.Column(
    pn.Row( file_input, button_load, bound_button_load),
    pn.Row( button_clearhistory, pn.pane.Markdown("Clears chat history. Can use to start a new topic" )),
    pn.layout.Divider(),
    pn.Row(jpg_pane.clone(width=400))
)
dashboard = pn.Column(
    pn.Row(pn.pane.Markdown('# ChatWithYourData_Bot')),
    pn.Tabs(('Conversation', tab1), ('Database', tab2), ('Chat History', tab3),('Configure', tab4))
)
dashboard

  warn_deprecated(


BokehModel(combine_events=True, render_bundle={'docs_json': {'bb6a652a-4213-4ab1-9a19-4ca3de95698f': {'version…

# Code LLAMA

Starting with the foundation model Llama 2 (a decoder-only Transformer model similar to GPT-4), Meta AI did further training with 500B tokens of training data, which was mostly code.

After that, there are three different versions of Code Llama with four different sizes.
The Code Llama models are free for research and commercial use.

<img src="img/Code-LLama.png" width="500px" />

Code Llama is a foundation model for code generation. The Code Llama models are trained using an infill objective and are designed for code completion within an IDE.

Code Llama is a foundation model for code generation. The Code Llama models are trained using an infill objective and are designed for code completion within an IDE.

The Python version was trained on an additional dataset of 100B tokens of Python code. These models are intended for code generation.

Resource requirment for  7B: Using 16-bit half-precision for the parameters, the model requires about 14 GB of GPU memory. With 4-bit quantization, we can reduce the memory requirement to about 3.5 GB.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

class ChatModel:
    def __init__(self, model="codellama/CodeLlama-7b-Instruct-hf"):
        quantization_config = BitsAndBytesConfig(
            load_in_4bit=True, # use 4-bit quantization
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=True,
        )
        self.model = AutoModelForCausalLM.from_pretrained(
            model,
            quantization_config=quantization_config,
            device_map="cuda",
            cache_dir="./models", # download model to the models folder
        )
        self.tokenizer = AutoTokenizer.from_pretrained(
            model, use_fast=True, padding_side="left"
        )