<a href="https://colab.research.google.com/github/jai-llm/RAG_Docs_LLaMA2/blob/main/RAG_HastieBooks_chromaDB_V3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG Colab Notebook
@jai-llm

14-Sep-2023

This notebook does **R**etreival **A**ugmented **G**eneration to improve the results from the LLaMA2 model and reduce hallucinations.

This notebook uses RAG to answers ML questions using the freely available ISLP and ESL textbooks by Hastie et al as context.

These books can be downloaded from Prof. Hastie's webpage at:

1. ISLP - https://hastie.su.domains/ISLP/ISLP_website.pdf
2. ESL - https://hastie.su.domains/Papers/ESLII.pdf

<br>

**Note**: Notebook needs at least a T4 GPU. Go to Edit -> Notebook Setting and select T4 GPU from the  text box. The Free Tier of Google Colab gives you access to a T4 GPU. Colab Notebook uses 7.6 GB of GPU RAM (15 GB of GPU RAM is available on a T4 GPU).

## Install Packages

In [None]:
# Reading in PDF Files
!pip install -q -U pypdf
# Setting Up Vector Store
!pip install -q -U chromadb
# Using Llama-7b-GPTQ LLM model in HuggingFace
!pip install q -U torch auto-gptq transformers optimum
# LangChain - Loading PDFs, Text Chunking, BGE Embeddings, Retrieval QA Chain
!pip install -q -U langchain sentence_transformers



## Do Imports

In [None]:
# Import torch
import torch

# Import for loading PDFs from Google Drive.
# Note: Not needed if GDrive is already mounted or we are using wget to get files from Web.
from google.colab import drive

# Imports to read PDF and setup Chroma Vector Store
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceBgeEmbeddings

# Imports for LLM
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.llms import HuggingFacePipeline
from langchain import PromptTemplate  #, LLMChain

# Imports for QA Retrieval Chain
from langchain.chains import RetrievalQA

# Import to Clenup LLM Output
import textwrap

## Constants

In [None]:
# LLM Token Chunksize varies based on Context Window. LLaMA2 Context Window is 4096 tokens.
# For QA want to pick larger chunk size with some overlap to get context.
CHUNK_SIZE, CHUNK_OVERLAP = 1000, 200

## 1.0 Get Books From Hastie's Website

**Note:** Do this Section Once to download and store files

In [None]:
!pwd

/content


In [None]:
# Do this only once
!mkdir -p 'hastie_pdfs'

In [None]:
# This overwrites files in hastie_pdfs but does not need to be executed more than once
!wget https://hastie.su.domains/ISLP/ISLP_website.pdf -O /hastie_pdfs/ISLP_website.pdf
!wget https://hastie.su.domains/Papers/ESLII.pdf -O /hastie_pdfs/ESLII.pdf

--2023-09-15 00:07:12--  https://hastie.su.domains/ISLP/ISLP_website.pdf
Resolving hastie.su.domains (hastie.su.domains)... 159.89.149.97
Connecting to hastie.su.domains (hastie.su.domains)|159.89.149.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20053984 (19M) [application/pdf]
Saving to: ‘/hastie_pdfs/ISLP_website.pdf’


2023-09-15 00:07:13 (61.6 MB/s) - ‘/hastie_pdfs/ISLP_website.pdf’ saved [20053984/20053984]

--2023-09-15 00:07:13--  https://hastie.su.domains/Papers/ESLII.pdf
Resolving hastie.su.domains (hastie.su.domains)... 159.89.149.97
Connecting to hastie.su.domains (hastie.su.domains)|159.89.149.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21644344 (21M) [application/pdf]
Saving to: ‘/hastie_pdfs/ESLII.pdf’


2023-09-15 00:07:13 (70.3 MB/s) - ‘/hastie_pdfs/ESLII.pdf’ saved [21644344/21644344]



## 2.0 Process PDF Files & Store in Chroma DB Vector Store

### Load Documents for RAG

In [None]:
# Mount the drive - Do this only if GDrive is not mounted
# drive.mount('/content/drive')

In [None]:
# If you get Files from Hastie's Website - Use this Code
# Note: This Takes about 3 mins
loader = DirectoryLoader('/hastie_pdfs/', glob="./*.pdf", loader_cls=PyPDFLoader)

documents = loader.load()

In [None]:
# # For Files Stored in Google Drive Folder - Do the following
# # Note: Loading from GDrive takes about 3 mins
# loader = DirectoryLoader('/content/drive/MyDrive/hastie_pdfs/',
#                          glob="./*.pdf", loader_cls=PyPDFLoader)

# documents = loader.load()

In [None]:
# Should get about 1377 pages if all goes well
len(documents)

1377

In [None]:
# LLaMA-2 LLM Context is 4096 tokens so split document into chunks
# Chunks are 1000 tokens with 200 token overlap
# Should get 4439 chunks if all goes well
text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE,
                                               chunk_overlap=CHUNK_OVERLAP)
texts = text_splitter.split_documents(documents)

len(texts)

4439

In [None]:
# Print Sample Chunk from Book
texts[20]

Document(page_content='7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325\n8 Tree-Based Methods 331\n8.1 The Basics of Decision Trees . . . . . . . . . . . . . . . . . 331\n8.1.1 Regression Trees . . . . . . . . . . . . . . . . . . . 331\n8.1.2 Classification Trees . . . . . . . . . . . . . . . . . . 337\n8.1.3 Trees Versus Linear Models . . . . . . . . . . . . . 341\n8.1.4 Advantages and Disadvantages of Trees . . . . . . . 341\n8.2 Bagging, Random Forests, Boosting, and Bayesian Additive\nRegression Trees . . . . . . . . . . . . . . . . . . . . . . . . 343\n8.2.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . 343\n8.2.2 Random Forests . . . . . . . . . . . . . . . . . . . . 346\n8.2.3 Boosting . . . . . . . . . . . . . . . . . . . . . . . . 347\n8.2.4 Bayesian Additive Regression Trees . . . . . . . . . 350\n8.2.5 Summary of Tree Ensemble Methods . . . . . . . . 353\n8.3 Lab: Tree-Based Methods . . . . . . . . . . . . . . . . . . . 354', metadata={'sou

### Create Retriever Embeddings - HF BGE Embeddings
BGE Embeddings are at the top of the leader board on Hugging Face (https://huggingface.co/spaces/mteb/leaderboard).

In [None]:
# BGE Embedding Model for Retrieval. Embedding Size is 768.
model_name = "BAAI/bge-base-en"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

model_embedding = HuggingFaceBgeEmbeddings(
                    model_name=model_name,
                    model_kwargs={'device': 'cuda'},
                    encode_kwargs=encode_kwargs
                  )

### Create Vector DB Store Using Chroma DB


In [None]:
%%time
# Embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
# Creating Vector Store takes ~ 2 mins

persist_directory = 'db'

## Here is the nmew embeddings being used
embedding = model_embedding

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

CPU times: user 2min 1s, sys: 968 ms, total: 2min 1s
Wall time: 2min 9s


In [None]:
# Returns the Top-k chunks from vectordb. Set to 2 to check.
retriever = vectordb.as_retriever(search_kwargs={"k": 2})

### Check Retrieval From Chroma DB

Both Approach 1 - Using Query and Approach 2 - Using Query Embedding should yield the same results.

Embeddings are created using BGE Embeddings.

In [None]:
# # Approach 1: Use Query to do Similarity Search
query = "What is Linear Regression?"
docs = vectordb.similarity_search(query)
print(docs[0].page_content)
print("\n")
print(docs[0].metadata)

3
Linear Regression
This chapter is about linear regression , a very simple approach for super-
vised learning. In particular, linear regression is a useful tool for predicting
a quantitative response. It has been around for a long time and is the topic
of innumerable textbooks. Though it may seem somewhat dull compared to
some of the more modern statistical learning approaches described in later
chapters of this book, linear regression is still a useful and widely used sta-
tistical learning method. Moreover, it serves as a good jumping-off point for
newer approaches: as we will see in later chapters, many fancy statistical
learning approaches can be seen as generalizations or extensions of linear
regression. Consequently, the importance of having a good understanding
of linear regression before studying more complex learning methods cannot
be overstated. In this chapter, we review some of the key ideas underlying


{'page': 77, 'source': '/hastie_pdfs/ISLP_website.pdf'}


In [None]:
print(docs[1].page_content)
print("\n")
print(docs[1].metadata)

3
Linear Regression
This chapter is about linear regression , a very simple approach for super-
vised learning. In particular, linear regression is a useful tool for predicting
a quantitative response. It has been around for a long time and is the topic
of innumerable textbooks. Though it may seem somewhat dull compared to
some of the more modern statistical learning approaches described in later
chapters of this book, linear regression is still a useful and widely used sta-
tistical learning method. Moreover, it serves as a good jumping-off point for
newer approaches: as we will see in later chapters, many fancy statistical
learning approaches can be seen as generalizations or extensions of linear
regression. Consequently, the importance of having a good understanding
of linear regression before studying more complex learning methods cannot
be overstated. In this chapter, we review some of the key ideas underlying


{'page': 77, 'source': '/hastie_pdfs/ISLP_website.pdf'}


In [None]:
# # Approach 2: Use Embedding Vector to do Similarity Search
embedding_vector = embedding.embed_query(query)
docs = vectordb.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)
print("\n")
print(docs[0].metadata)


3
Linear Regression
This chapter is about linear regression , a very simple approach for super-
vised learning. In particular, linear regression is a useful tool for predicting
a quantitative response. It has been around for a long time and is the topic
of innumerable textbooks. Though it may seem somewhat dull compared to
some of the more modern statistical learning approaches described in later
chapters of this book, linear regression is still a useful and widely used sta-
tistical learning method. Moreover, it serves as a good jumping-off point for
newer approaches: as we will see in later chapters, many fancy statistical
learning approaches can be seen as generalizations or extensions of linear
regression. Consequently, the importance of having a good understanding
of linear regression before studying more complex learning methods cannot
be overstated. In this chapter, we review some of the key ideas underlying


{'page': 77, 'source': '/hastie_pdfs/ISLP_website.pdf'}


In [None]:
print(docs[1].page_content)
print("\n")
print(docs[1].metadata)

3
Linear Regression
This chapter is about linear regression , a very simple approach for super-
vised learning. In particular, linear regression is a useful tool for predicting
a quantitative response. It has been around for a long time and is the topic
of innumerable textbooks. Though it may seem somewhat dull compared to
some of the more modern statistical learning approaches described in later
chapters of this book, linear regression is still a useful and widely used sta-
tistical learning method. Moreover, it serves as a good jumping-off point for
newer approaches: as we will see in later chapters, many fancy statistical
learning approaches can be seen as generalizations or extensions of linear
regression. Consequently, the importance of having a good understanding
of linear regression before studying more complex learning methods cannot
be overstated. In this chapter, we review some of the key ideas underlying


{'page': 77, 'source': '/hastie_pdfs/ISLP_website.pdf'}


In [None]:
# !nvidia-smi

## 3.0 Setup LLM
We are using LLaMA2 which is near the top of the leader board for Open LLM Models (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

LLaMA2 comes in 3 flavors - 7B, 13B, 70B. The base 7B model (assuming 4 Bytes or 32bits for each of the 7B weights) needs 28GB of RAM to load so we use a 4-bit Quantized Version of the model which should take 3.5GB and should be loadable using the free tier of Colab. Model used is:

LLaMA2 7B GPTQ - The Bloke's Model which is a GPTQ quantized version of original LLaMA2 model. All weights are compressed to 4-bit and activations are stored in 16-bit bfloats. Model Card can be found here https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ.

In [None]:
model_id = "TheBloke/Llama-2-7b-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    # temperature=0.7,
    # top_p=0.95,
    repetition_penalty=1.15
)

llm = HuggingFacePipeline(pipeline=pipe)

### Check LLM

#### What is Linear Regression?

LLM knows what Linear Regression from its training data and gives a reasonable answer.

In [None]:
prompt = "What is Linear Regression?"
prompt_template=f'''[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]

'''

print("\n\n*** Generate:")

print(pipe(prompt_template)[0]['generated_text'])



*** Generate:




[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
What is Linear Regression?[/INST]

Linear regression is a statistical method used to model the relationship between two continuous variables: the independent variable (also called the predictor or explanatory variable) and the dependent variable (also called the outcome or response variable). The goal of linear regression is to create a linear equation that best predicts the value of the dependent variable based on the values of the independent variable(s).
In sim

#### What are trees?

LLM does not have the context to know we are talking about ML trees rather than CS trees or trees in general.

This results in the LLM Hallucinating or giving the wrong answer for our ML context.

Since the LLM does not know we are talking about ML trees and not trees in general. This can be fixed either by giving the LLM the chat history or better yet improving the prompt as we see next.

In [None]:
prompt = "What are Trees?"
prompt_template=f'''[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]

'''

print("\n\n*** Generate:")

print(pipe(prompt_template)[0]['generated_text'])



*** Generate:
[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
What are Trees?[/INST]

Trees are living organisms that belong to the plant kingdom and are characterized by their ability to photosynthesize, grow from a single point (called a trunk), and have branches and roots that support their structure. They play a crucial role in the Earth's ecosystem by providing oxygen, food, shelter, and habitat for countless species of animals, insects, and microorganisms.
There are over 60,000 known tree species worldw

#### What are Decision Trees?
We get a really great response once the context is clear.

In [None]:
prompt = "What are Decision Trees?"
prompt_template=f'''[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]

'''

print("\n\n*** Generate:")

print(pipe(prompt_template)[0]['generated_text'])



*** Generate:
[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
What are Decision Trees?[/INST]

Decision trees are a popular machine learning algorithm used for both classification and regression tasks. They work by recursively partitioning the data into smaller subsets based on the values of the input features. Each internal node in the tree represents a feature selection and the leaf nodes represent the predicted class or value. The process of building a decision tree involves selecting the best split at eac

#### What are Boosted Trees?

Providing a bit more context to the LLM returns a great result on Boosted trees also.

In [None]:
prompt = "What are Boosted Trees?"
prompt_template=f'''[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]

'''

print("\n\n*** Generate:")

print(pipe(prompt_template)[0]['generated_text'])



*** Generate:
[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
What are Boosted Trees?[/INST]

Thank you for asking! Boosted trees are a type of machine learning algorithm used in natural language processing (NLP) tasks, particularly in text classification and sentiment analysis. In this context, "boosted" refers to the use of multiple weak models combined to create a stronger predictive model.
The basic idea behind boosted trees is to train multiple decision trees on the same dataset, each with a different su

## Setup RAG Chain

RAG Chain = LLM + Retriever + Query Prompt

In [None]:
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

In [None]:
qa_chain = RetrievalQA.from_chain_type(llm=llm,
                                  chain_type="stuff",
                                  retriever=retriever,
                                  return_source_documents=True)

In [None]:
# !nvidia-smi

### Check RAG Chain

#### What about Linear Regression?

Gives Reasonable Response about Linear Regression.

In [None]:
query = "What is Linear Regression?"
llm_response = qa_chain(query)
llm_response['result'].split('\n')

[' Linear regression is a type of regression analysis where the relationship between the independent variables and dependent variable is assumed to be linear.']

#### What about Trees?

With RAG LLM does not hallucinate. It decides to not give an answer since it is not sure whether this is a decision tree (which is what we were thinking) or a boosted tree.

In [None]:
query = "What are Trees?"
llm_response = qa_chain(query)
llm_response['result'].split('\n')

[" I don't know the answer to this question as I am not familiar with the specific algorithm being discussed in the passage."]

#### What about Boosted Trees?

Gives short response for boosted trees that is not incorrect but the answer lacks depth. We will try and improve on this with better prompting next.

In [None]:
query = "What are Boosted Trees?"
llm_response = qa_chain(query)
llm_response['result'].split('\n')

[' Boosted trees are a machine learning technique used to improve the accuracy of regression models. They work by combining multiple weak models to create a strong predictive model. In contrast to traditional regression techniques, which rely solely on a single model, boosted trees leverage the collective power of many weak models to produce more accurate predictions.']

## RAG With Better Prompting

In [None]:
## Default LLaMA-2 prompt style
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""

def get_prompt(instruction, new_system_prompt=DEFAULT_SYSTEM_PROMPT ):
    SYSTEM_PROMPT = B_SYS + new_system_prompt + E_SYS
    prompt_template =  B_INST + SYSTEM_PROMPT + instruction + E_INST
    return prompt_template

In [None]:
sys_prompt = """You are a helpful, respectful and honest assistant. Always answer as helpfully as possible using the context text provided. Your answers should only answer the question once and not have any text after the answer is done.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. """

instruction = """CONTEXT:/n/n {context}/n

Question: {question}"""
get_prompt(instruction, sys_prompt)

"[INST]<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible using the context text provided. Your answers should only answer the question once and not have any text after the answer is done.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. \n<</SYS>>\n\nCONTEXT:/n/n {context}/n\n\nQuestion: {question}[/INST]"

In [None]:
prompt_template = get_prompt(instruction, sys_prompt)

llama_prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [None]:
chain_type_kwargs = {"prompt": llama_prompt}

In [None]:
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

In [None]:
# create the chain to answer questions
qa_chain = RetrievalQA.from_chain_type(llm=llm,
                                       chain_type="stuff",
                                       retriever=retriever,
                                       chain_type_kwargs=chain_type_kwargs,
                                       return_source_documents=True)

In [None]:
## Cite sources
def wrap_text_preserve_newlines(text, width=110):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text

def process_llm_response(llm_response):
    print(wrap_text_preserve_newlines(llm_response['result']))
    print('\n\nSources:')
    for source in llm_response["source_documents"]:
        print(source.metadata['source'])

### Use Prompted RAG to Answer Some ML Questions

RAG with better prompting gives us good responses. It figures out we are talking about decision trees. It also provides the source chunks we used to provide the answer which makes it easier to verify the response.

In [None]:
# Example 1
query = "What is Linear Regression?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

  Linear regression is a statistical technique used to establish a linear relationship between a dependent
variable (target variable) and one or more independent variables (predictor variables). It is a simple and
widely used approach for supervised learning, which means the algorithm tries to predict the value of the
target variable based on the input values of the predictor variables. The goal of linear regression is to
create a linear equation that best fits the observed data, allowing predictions to be made on new
observations.


Sources:
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ESLII.pdf
/hastie_pdfs/ESLII.pdf


In [None]:
# Example 2
query = "What are Trees?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

  Trees are a type of machine learning algorithm used for classification and regression tasks. They consist of
a series of splits or decisions made about the features of the data, with each split resulting in a smaller
subset of the data. The final prediction is made by evaluating the features of the remaining observations in
the dataset. Trees are particularly useful when dealing with complex datasets or high-dimensional feature
spaces, as they can handle both linear and nonlinear relationships between the features and the target
variable.


Sources:
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ESLII.pdf
/hastie_pdfs/ESLII.pdf


In [None]:
# Example 3
query = "What are Boosted Trees?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

  Boosted trees refer to the process of combining multiple decision trees to create a more accurate prediction
model. In the context of boosting, each decision tree is trained on the residuals of the previous tree,
resulting in a sequence of trees that collectively improve the accuracy of the model. The final output of the
boosting algorithm is a weighted sum of the predictions made by each individual tree, with larger weights
indicating greater importance in the overall prediction.


Sources:
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf


In [None]:
# Example 4
query = "What is Random Forest?"
llm_response = qa_chain(query)
process_llm_response(llm_response)



  A Random Forest is a machine learning algorithm that involves building multiple decision trees and combining
their predictions to make a final prediction. Each decision tree is built by randomly sampling the training
data and creating a subset of features for each tree. The Random Forest algorithm reduces overfitting by
averaging the predictions of multiple trees and using a majority vote to make the final prediction.


Sources:
/hastie_pdfs/ESLII.pdf
/hastie_pdfs/ESLII.pdf
/hastie_pdfs/ESLII.pdf
/hastie_pdfs/ESLII.pdf
/hastie_pdfs/ESLII.pdf


In [None]:
# Example 5
query = "What are the Assumptions of Linear Regression?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

  According to the given context, the assumptions of linear regression are:
1. Additivity: The relationship between the predictors and response is additive, meaning that the association
between a predictor and the response does not depend on the values of the other predictors.
2. Linearity: The change in the response associated with a one-unit change in the predictor is constant,
regardless of the value of the predictor.


Sources:
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf


In [None]:
# Example 6
query = "How does one correct for False Positives in Multiple Hypotheses testing?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

  To correct for false positives in multiple hypotheses testing, one can use multiple testing correction
procedures such as Holm's procedure or Bonferroni's procedure. These procedures adjust the significance level
based on the number of tests performed, reducing the risk of Type I errors. Additionally, more powerful
procedures like the Šidák correction or the Holm-Bonferroni method can be used in special cases where higher
power is desired while maintaining a controlled Family Wide Error Rate (FWER).


Sources:
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf
/hastie_pdfs/ISLP_website.pdf


## Conclusion

Key takeaways are:

1. Even small quantized Open Source LLMs such as the LLaMA-7B-GPTQ model are really good at QA.

2. Inference with our model is fast even on a T4 GPU.

3. Benefits of RAG include:
 * Reducing "hallucinations" by providing better context.
 * Increasing trust by providing source of response (White-boxing LLMs).

4. RAG + Prompting gives the best responses to our ML questions.


Additional Comments:
- **VectorDB** - Can be used to do semantic search (based on user query intent rather than text matching) to return the revelant chunks of text. In our example on Linear Regression the ISLP text contains a chapter on linear regression so the Top-2 results or text chunks are from this chapter of the book.

- **LLM** - Can answer questions related to ML but sometimes it "hallucinates". For example when asked about trees it responds about trees in general rather than decision trees which is what we intended.

- **RAG** - Retrieves Top-5 chunks from Vector store (context) and provides it to the LLM in addition to the query. This improves the LLM results for the ML questions by providing additional context and also reduces LLM "hallucinations". For example asking RAG about trees results in LLM not giving a response since it does not know if we meant a "decision" tree or "boosted" tree.

- **RAG + Prompting** - Adding better prompting to the RAG chain gives good responses for all our ML questions.


