# Exploring Simple Local RAG

This is a very exploration with Langchain, Langsmith and AWS Bedrock, on how to create basic RAG with a locally running ChromaDB instance loaded with a single source PDF document.

Langsmith is not actually needed for the implementation but its useful if we want to trace whats going on in our application.

### Prerequisites


In [None]:
%pip install -r requirements.txt --quiet 

### Authenticate to AWS and Langsmith and Load Envrionment Variables
Follow the instuctions in the `README.md`.

In [None]:
# Load Environment Variables
from dotenv import load_dotenv, find_dotenv
import os

load_dotenv(find_dotenv())  # load the environment variables from .env

True

### AWS Bedrock
creating a bedrock client which will be used by the Bedrock classes in Langchain.

In [3]:
import os
import boto3

# Get region and profile from env
region = os.environ.get("AWS_REGION", "us-east-1")

# Create a Bedrock client
bedrock_client = boto3.client(service_name='bedrock-runtime', region_name=region)

### Langsmith

In [23]:
# Initialize Langsmith Client
from langsmith import Client

client = Client()

### Load Documents (Lost in the Middle)

Download [Lost in the Middle: How Language Models Use Long Contexts](https://arxiv.org/pdf/2307.03172.pdf) pdf paper, then split, chunk and embed the entire document into ChromaDB 

To test out the LLM have a look at this paper to get an idea of what questions you can ask 
or load in another online pdf from somewhere else.

Remember that you only have to load this in the first time you run this notebook to embed the data
into the vector-store. For subsequent any runs skip this step.

**This step should be skipped if you have already created the vector-store.**

In [2]:
from langchain_community.document_loaders import PyPDFLoader

url = "https://arxiv.org/pdf/2307.03172.pdf"
loader = PyPDFLoader(url)
data = loader.load()

print(len(data))
print(data)

With the pdf pages loaded we then need prepare the data by splitting the text into smaller chunks using "fixed sized chunking" method. 

> Depending on the text embedding model you will be using, you may need to take into account the model's max-context window and max-embedding-dimensions. 

**This step should also be skipped if you have already created the vector-store.**

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 128)
all_splits = text_splitter.split_documents(data)
len(all_splits)

# you can iterate of the first 10 splits and print out each split to check your chunking is working as expected
for split in all_splits[:10]:
    print(split)


### Create the embeddings from the pdf document with Bedrock
Creating vector embeddings from documents involves converting textual information into numerical form so that it can be processed and understood by machine learning models.

The embedding model is used for both creating the embeddings of pdf documents which will be stored in the vector-store as well as for creating the embedding of the user question.

Langchain has built in support for many different embedding models. https://python.langchain.com/docs/integrations/text_embedding/bedrock



In [4]:
# Initialize the BedrockEmbeddings Model
from langchain.embeddings import BedrockEmbeddings

embedding_model_id = "amazon.titan-embed-text-v1"

embedding_model = BedrockEmbeddings(
    client=bedrock_client,
    model_id=embedding_model_id
)

#### Initializing ChromaDB and generating the embeddings

**The following step should also be skipped if you have already created the vector-store.**

> Note: if you need to recreate the embeddings, the easiest is to simply delete the `./chroma_db` directory to start from scratch!


In [20]:
# Create Embeddings
from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=all_splits, embedding=embedding_model, persist_directory='chroma_db')

vectorstore.persist()


### Load your local ChromaDB vector store as a Retriever

In [None]:
# Create a retriever

from langchain.vectorstores import Chroma

# load from disk
# Note: The following code is demonstrating how to load the Chroma database from disk.
vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=embedding_model)

retriever = vectorstore.as_retriever()

# testing the retriever by similarity search
retriever.get_relevant_documents("why are language models not robust to changes in the position of relevant information?")

### Langchain Hub

The hub is which also a part of Langsmith is a place to discover, share, and version control prompts and either share it with the community or keep it for private use.

The following will import a predefined prompt-template that will be used in the retrieval chain

In [5]:
# Pulling in a forked community prompt
from langchain import hub
prompt = hub.pull("mrkmod/rag-prompt")


### Creating the Retrieval Augmentented Generation chain

In [25]:
from langchain_community.llms import Bedrock
from langchain.chains import RetrievalQA

# experiment with different models if you like
llm_model_id = "anthropic.claude-instant-v1"
# llm_model_id = "anthropic.claude-v2"

llm = Bedrock(
    client=bedrock_client, model_id=llm_model_id
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)

In [None]:
# Run the Retrieval Chain

question = "why are language models not robust to changes in the position of relevant information?"

result = qa_chain.invoke({"query": question})

print(result["result"])