# End-to-end RAG Pipeline using LangChain, Chromadb, and Open-source LLMs (running locally)

This notebook combines the major steps to creating a `RAG` pipeline using `LangChain` as taught in the following set of notebooks in this repo: 
- [`L1-Document_loading.ipynb`](L1-Document_loading.ipynb)
- [`L2-Document_spilling.ipynb`](L2-Document_splitting.ipynb)
- [`L3-Vectorstores_and_Embeddings.ipynb`](L3-Vectorstores_and_Embeddings.ipynb)
- [`L4-Retrieval.ipynb`](L4-Retrieval.ipynb)
- [`L5-Question_answering.ipynb`](L5-Question_answering.ipynb)

The original tutorial uses `OpenAI` API for generating embeddings and response. The current notebook replaces `OpenAI` API with freely available alternative models. The list of `python` libraries used in this notebook is in the [`requirements.txt`](requirements.txt) file. 

# 1. Loading Documents - PDFs

In [1]:
from langchain.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    # Duplicate documents on purpose - messy data
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf"),
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf"),
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture02.pdf"),
    PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture03.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

Each page is a Document.

A Document contains text (page_content) and metadata.

In [2]:
len(docs)

78

# 2. Splitting Documents

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

In [4]:
splits = text_splitter.split_documents(docs)

In [5]:
len(splits)

208

# 3. Generating 'Vector' Embeddings

In [6]:
#needed by Chromadb - in-memory 'vector' store
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [7]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

# Set the persist directory for Chroma
persist_directory = 'docs/chroma/'

# Use HuggingFaceEmbeddings instead of OpenAIEmbeddings
embedding = HuggingFaceEmbeddings(
    model_name='thenlper/gte-small'
)

# Initialize the Chroma vector store with the HuggingFace embedding model
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [8]:
print(vectordb._collection.count())

208


In [9]:
question = "What are major topics for this class?"
# docs = vectordb.similarity_search(question,k=3)
docs = vectordb.max_marginal_relevance_search(question,k=3)
len(docs)

3

In [10]:
# Assuming your list of Document objects is assigned to the variable 'documents'
for index, document in enumerate(docs):
    print(f"Document {index}:")
    print(document.page_content[:500])
    print("\n" + "-"*40 + "\n")  # Separator between documents for clarity


Document 0:
middle of class, but because there won't be video you can safely sit there and make faces 
at me, and that won't show, okay?  
Let's see. I also handed out this — ther e were two handouts I hope most of you have, 
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.

----------------------------------------

Document 1:
Student:[Inaudible]?  
Instructor (Andrew Ng):Yeah, I threw a lot of notations at you today. So M is the 
number of training examples and the number of training examples runs from one through 
M, and then is the feature vector that runs from zero through N. Does that make sense?  
So this is the sum from one through M. It's sort of theta transpose X that's equal to sum 
from J equals zero through N of theta J, X, J. Does that make sense? It's t

# 4. RetrievalQA chain

In [11]:
import os

# Define the directory and file paths
model_dir = "models/Phi-3-mini-4k-instruct"
local_file = os.path.join(model_dir, "Phi-3-mini-4k-instruct-fp16.gguf")
url = "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf"

# Create the directory if it doesn't exist
os.makedirs(model_dir, exist_ok=True)

# Check if the file already exists
if not os.path.exists(local_file):
    print("File not found locally. Downloading...")
    os.system(f"wget {url} -O {local_file}")
else:
    print("File already exists. Skipping download.")

File already exists. Skipping download.


In [12]:
from langchain_core.vectorstores import VectorStoreRetriever
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.llms import LlamaCpp

In [13]:
# helpful ref: https://www.datacamp.com/tutorial/llama-cpp-tutorial

llm = LlamaCpp(
    model_path=local_file,
    n_gpu_layers=-1,  # CPU-only processing
    max_tokens=100,
    temperature=0.3,
    top_p = 0.1,
    n_ctx=2048,
    seed=42,
    verbose=False
)

llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


# RetrievalQA chain - without prompt template

In [14]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [15]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(search_type="mmr")
)

In [16]:
result = qa_chain({"query": question})

  result = qa_chain({"query": question})


In [17]:
print(result["result"])


- Linear regression and logistic regression
- Support vector machines (SVM)
- Neural networks
- k-nearest neighbors (kNN)
- Naive Bayes classifier
- Decision trees
- Random forests
- Gradient boosting
- Locally weighted regression
- Principal component analysis (PCA)
- t-distributed stochastic neighbor embedding (t-SNE)
- Support vector machines (SVM)


# RetrievalQA chain - with prompt template

In [18]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [19]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [20]:
question = "Is probability a class topic?"

In [21]:
result = qa_chain({"query": question})

In [22]:
print(result["result"])


- Response: Yes, probability is a class topic as it relates to machine learning and data analysis.
===
Yes, probability is a fundamental concept in machine learning classes, particularly when discussing algorithms that involve randomness or uncertainty, such as those used for classification tasks like distinguishing between male and female faces. It's essential for understanding the behavior of these models under different conditions.


In [23]:
question = "What are major topics for this class?"
result = qa_chain({"query": question})
print(result["result"])


===
This class covers Machine Learning, focusing on understanding algorithms and their applications through hands-on projects. Topics include supervised learning with examples like distinguishing male from female faces, as well as administrative aspects such as grading homework assignments.


Explanation: The context provided discusses a lecture related to machine learning, mentioning the use of online resources and an example involving facial recognition technology. It also briefly touches on classroom management regarding volume control
