# Simple RAG using langchain
## Introduction
- In this lesson we want to use the information in our vector database to answer questions.
- This concept of retrieving information based on a question is called RAG or Retrieval Augmented Generation. (RAG)

In [1]:
%pip install -q --upgrade langchain langchain_community langchain_openai chromadb langchain-chroma


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Setup embeddings model
- We setup the embeddings model, we use the same model that used for indexing the documents.

In [2]:
from langchain_openai import OpenAIEmbeddings

from langchain_openai import ChatOpenAI

embeddings = OpenAIEmbeddings()
print(embeddings.model)

text-embedding-ada-002


## Connecting to the vector database
- We now connect to the vector database that has persistent storage.
- And set the right collection name.

In [3]:
import chromadb
from langchain_chroma import Chroma

#from langchain_community.vectorstores import Chroma

collection_name="documentation"
chroma_client = chromadb.PersistentClient(path="./chromadb")

# Vectory database will calculate them using the embeddings_model provided
# and store the embeddings for each doc in it's database

vectordb = Chroma(
    client = chroma_client, collection_name = collection_name, embedding_function=embeddings
)

## Convert the vector database to a retriever
- To use the vector database as part of a chain, we need to convert it to a retriever.
- We can set the number of results we want to retrieve and the threshold for the similarity.

In [4]:
# retriever = db.as_retriever)
retriever = vectordb.as_retriever(search_kwargs={"k": 4}, threshold=0.7)

# https://python.langchain.com/docs/how_to/sequence/
docs = retriever.invoke("What are the architectural guidelines for this app ?")
print(docs)
print(len(docs))

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


[Document(id='5f95f5b1-6e65-4451-bd53-5e3b3ad67814', metadata={}, page_content='We use REST API to communicate with the server.'), Document(id='76388bf2-20a3-4f53-a7f4-cec237363f99', metadata={}, page_content='We prefer snake_case for naming variables over CamelCase in javascript.'), Document(id='3eb6ce36-b568-45f1-978e-a2ea98c3da0a', metadata={}, page_content='For testing in javascript we use Jest.')]
3


# The RAG prompt template

- Now that we retrieve all the documents relate to the query, we can add them to the prompt.
- Note that we add `answer the question based only on the following context:` to the prompt.
- This instructs the llm to only use the context to answer the question.

In [5]:
from typing import List
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.document import Document

def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([d.page_content for d in docs])

template = """Answer the question based only on the following context:

    {context}

    Question: {question}
    """
prompt = ChatPromptTemplate.from_template(template)
print(prompt)

input_variables=['context', 'question'] input_types={} partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n\n    {context}\n\n    Question: {question}\n    '), additional_kwargs={})]


## Running the chain
The whole chain now looks like this:
- The retriever finds all documents related to our query from the vectordatabase.
- This is set as the `context` in the prompt.
- Then the whole prompt gets send over to the llm.

In [6]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

llm = ChatOpenAI(model="gpt-4o")

chain = {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | output_parser

## RAG vs LLM
- We can use the llm directly to answer the question.
- But the RAG will use the context to answer the question.

In [7]:
import langchain
langchain.debug = False

question = "Write a hello world function in Javascript"
result = llm.invoke(question)
print("LLM: ", result.content)
result = chain.invoke(question)
print("LLM + RAG", result)

LLM:  Certainly! Here's a simple "Hello, World!" function in JavaScript:

```javascript
function sayHello() {
  console.log("Hello, World!");
}

// Call the function to display the message
sayHello();
```

This function, `sayHello`, when called, will print "Hello, World!" to the console. You can run this code in any JavaScript environment, such as a web browser's developer console or Node.js.


Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


LLM + RAG ```javascript
function hello_world() {
    console.log("Hello, World!");
}

hello_world();
```


## Under the hood
- We use a global langchain debug to see what goes on under the hood.
- This allows us to the populated prompt and the results.

In [8]:
import langchain
langchain.debug = True
result = chain.invoke(question)
langchain.debug = False

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "Write a hello world function in Javascript"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question>] Entering Chain run with input:
[0m{
  "input": "Write a hello world function in Javascript"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "Write a hello world function in Javascript"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnablePassthrough] Entering Chain run with input:
[0m{
  "input": "Write a hello world function in Javascript"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnablePassthrough] [2ms] Exiting Chain run with output:
[0m{
  "output": "Write a hello

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > chain:format_docs] Entering Chain run with input:
[0m[inputs]
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > chain:format_docs] [0ms] Exiting Chain run with output:
[0m{
  "output": "For testing in javascript we use Jest.\n\nWe prefer snake_case for naming variables over CamelCase in javascript.\n\nWe use REST API to communicate with the server."
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence] [313ms] Exiting Chain run with output:
[0m{
  "output": "For testing in javascript we use Jest.\n\nWe prefer snake_case for naming variables over CamelCase in javascript.\n\nWe use REST API to communicate with the server."
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<cont

## Grounding
- We can either override knowledge from the llm using RAG.
- Or also add our own knowledge and avoid having to train/fine tune the model
- Consider the following example that we only want to answer if it is part of our knowledge.

In [9]:
result = llm.invoke("Are we using CamelCase for our function names ?")
print(result.content)

The choice of using CamelCase for function names depends on the coding standards and style guidelines adopted by your team or organization. In some programming languages and communities, CamelCase is preferred for naming functions, while in others, a different convention like snake_case is used.

Here are a few common conventions:

- **CamelCase**: Commonly used in languages like Java and C#. Example: `CalculateInterestRate()`
  
- **PascalCase**: Similar to CamelCase but often used interchangeably in contexts where the first letter is also capitalized. Example: `FetchUserData()`

- **snake_case**: Preferred in languages like Python. Example: `calculate_interest_rate()`

When deciding on a naming convention, consider the following:

1. **Language Guidelines**: Check if the language has its own recommended convention.
2. **Team Standards**: Align with your team’s existing code style for consistency.
3. **Readability**: Choose a style that makes your code easy to read and understand.

If

In [10]:
result = chain.invoke("Should we using CamelCase for function names in javascript?")
print(result)

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


The context provided does not specifically mention a preference for function naming conventions in JavaScript, only that snake_case is preferred for variable names. Therefore, it does not definitively specify whether CamelCase should be used for function names. Preferences for function naming may depend on specific style guides or individual/team preferences beyond what is mentioned in the context.


In [11]:
result = chain.invoke("What is the authentication framework used in this app?")
print(result)

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


The context provided does not specify the authentication framework used in the app.
