# Retrieval Argumented Generation (RAG)

Retrieval
To use RAG, we need to have a database of documents that can provide relevant information for our queries. In this tutorial, we will create a database from the book "How to Build a Career in AI" by Andrew NG. We will use Langchain, Chroma, and Hugging Face to perform RAG on this book



The process of creating a database involves the following steps:

Chunking: We divide the book into smaller pieces, such as paragraphs or sentences, that can be easily indexed and retrieved.

Embedding: We use a pre-trained model from Hugging Face to convert each chunk into a vector representation, also known as a sentence embedding. This captures the semantic meaning of the chunk and allows us to compare it with other chunks or queries.

Indexing: We store the vector embeddings in a vector database, such as Chroma, that can efficiently perform similarity search. This means that given a query vector, we can find the most similar vectors in the database, and retrieve the corresponding chunks.

In [None]:
# install the vector database, langchain, pypdf, hugging face sentence_transformers
!pip install chromadb langchain pypdf sentence_transformers

Collecting chromadb
  Downloading chromadb-1.0.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting pypdf
  Downloading pypdf-5.5.0-py3-none-any.whl.metadata (7.2 kB)
Collecting fastapi==0.115.9 (from chromadb)
  Downloading fastapi-0.115.9-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-4.2.0-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.33.1-py3-none-any.whl.metadata (1.6 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.33.1-py3-none-any.whl.metadata (2.

In [None]:
# install langchain experimental features (this will likely be moved to stable in the future).
!pip install --quiet langchain_experimental

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/209.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.2/209.2 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m77.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.docstore.document import Document

# Load the pdf file... By default will split into pages
#loader = PyPDFLoader("How to Build a Career in AI.pdf")
loader = PyPDFLoader("ml.pdf")
pages = loader.load_and_split()

In [None]:
from langchain.embeddings import HuggingFaceBgeEmbeddings

# Load an embedding model from hugging face.
model_name = "BAAI/bge-large-en-v1.5"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': True}

embed_model = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

  embed_model = HuggingFaceBgeEmbeddings(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [None]:
from langchain_experimental.text_splitter import SemanticChunker

# create a Semantic text splitter for the document, At a high level,
# this splits into sentences, then groups into groups of 3 sentences,
# and then merges ones that are similar in the embedding space.
text_splitter = SemanticChunker(embed_model)

# split the pages using Semantic Chunker.
documents = text_splitter.split_documents(pages)

In [None]:
from langchain_community.vectorstores import Chroma

# embed and insert all chunks of the documents into the vector database
vector_db = Chroma.from_documents(
    documents,
    embed_model, # model to use for embedding the document chunks before storing.
    persist_directory='vector_db', # persist the database in memory.
    #collection_name='ai_career', # name of the collection to store the chunks in.
    collection_name='ml_interview'
)

In [None]:
# perform a vector similarity search on a query.
query = "Define precision and recall"

# return the chunks of the most similar five embeddings in the db
docs = vector_db.similarity_search(query, k=5)

print(docs[4].page_content)

Q1: What’s the trade-off between bias and variance? Answer: Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you’re using. This can lead to the model  underfitting your data, making it hard for it to have high predictive accuracy and for 
you to generalize your knowledge from the training set to the test set. Variance is error due to too much complexity in the learning algorithm you’re using. This leads to the algorithm 
being highly sensitive to high degrees of variation in your training data, which can lead your model 
to overfit the data. You’ll b e carrying too much noise from your training data for your model to be very useful 
for your test data. The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the 
bias, the variance and a bit of irreducible er ror due to noise in the underlying dataset. Essentially, if you make 
the model more complex and add more variables, you’ll lose bias but gain 

https://www.kaggle.com/code/gpreda/rag-using-llama-2-langchain-and-chromadb
https://medium.com/@romanbessouat/enhancing-your-llm-with-rag-a-python-implementation-guide-aeab2d5be3d3
https://www.matillion.com/blog/step-by-step-guide-building-a-rag-model-with-open-source-llm-llama-2-and-vector-store-faiss
