# Simple RAG using langchain
## Introduction
- In this lesson we want to use the information in our vector database to answer questions.
- This concept of retrieving information based on a question is called RAG or Retrieval Augmented Generation. (RAG)

In [52]:
%pip install -q --upgrade langchain langchain_community langchain_openai chromadb langchain-chroma


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Setup embeddings model
- We setup the embeddings model, we use the same model that used for indexing the documents.

In [53]:
from langchain_openai import OpenAIEmbeddings

from langchain_openai import ChatOpenAI

embeddings = OpenAIEmbeddings()
print(embeddings.model)

text-embedding-ada-002


## Connecting to the vector database
- We now connect to the vector database that has persistent storage.
- And set the right collection name.

In [54]:
import chromadb
from langchain_chroma import Chroma

#from langchain_community.vectorstores import Chroma

collection_name="documentation"
chroma_client = chromadb.PersistentClient(path="./chromadb")

# Vectory database will calculate them using the embeddings_model provided
# and store the embeddings for each doc in it's database

vectordb = Chroma(
    client = chroma_client, collection_name = collection_name, embedding_function=embeddings
)

## Convert the vector database to a retriever
- To use the vector database as part of a chain, we need to convert it to a retriever.
- We can set the number of results we want to retrieve and the threshold for the similarity.

In [55]:
# retriever = db.as_retriever)
retriever = vectordb.as_retriever(search_kwargs={"k": 4}, threshold=0.7)

# https://python.langchain.com/docs/how_to/sequence/
docs = retriever.invoke("What are the architectural guidelines for this app ?")
print(docs)
print(len(docs))

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


[Document(id='af2aebf1-59ea-411e-8945-b2eda964641a', metadata={}, page_content='We use REST API to communicate with the server.'), Document(id='9dd6ef06-ee7c-4401-b5b3-77c71ffcd5a5', metadata={}, page_content='We prefer snake_case for naming variables over CamelCase in javascript.'), Document(id='c94fe47c-ee03-44ff-8d41-a0d1f599cc7f', metadata={}, page_content='For testing in javascript we use Jest.')]
3


# The RAG prompt template

- Now that we retrieve all the documents relate to the query, we can add them to the prompt.
- Note that we add `answer the question based only on the following context:` to the prompt.
- This instructs the llm to only use the context to answer the question.

In [56]:
from typing import List
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.document import Document

def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([d.page_content for d in docs])

template = """Answer the question based only on the following context:

    {context}

    Question: {question}
    """
prompt = ChatPromptTemplate.from_template(template)
print(prompt)

input_variables=['context', 'question'] input_types={} partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n\n    {context}\n\n    Question: {question}\n    '), additional_kwargs={})]


## Running the chain
The whole chain now looks like this:
- The retriever finds all documents related to our query from the vectordatabase.
- This is set as the `context` in the prompt.
- Then the whole prompt gets send over to the llm.

In [57]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

llm = ChatOpenAI(model="gpt-4o")

chain = {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | output_parser

## RAG vs LLM
- We can use the llm directly to answer the question.
- But the RAG will use the context to answer the question.

In [58]:
import langchain
langchain.debug = False

question = "Write a hello world function in Javascript"
result = llm.invoke(question)
print("LLM: ", result.content)
result = chain.invoke(question)
print("LLM + RAG", result)

LLM:  Certainly! Here's a simple "Hello, World!" function in JavaScript:

```javascript
function sayHello() {
    console.log("Hello, World!");
}

sayHello();
```

This function, `sayHello`, when called, will output "Hello, World!" to the console. You can run this code in any JavaScript environment, such as in the browser's developer console or a Node.js environment.


Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


LLM + RAG Based on the provided context, here is a simple "Hello, World!" function in JavaScript, following the preference for `snake_case` for variable names:

```javascript
function say_hello_world() {
    console.log('Hello, World!');
}

say_hello_world();
```

This function defines `say_hello_world` using `snake_case` and logs "Hello, World!" to the console.


## Under the hood
- We use a global langchain debug to see what goes on under the hood.
- This allows us to the populated prompt and the results.

In [59]:
import langchain
langchain.debug = True
result = chain.invoke(question)
langchain.debug = False

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "Write a hello world function in Javascript"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question>] Entering Chain run with input:
[0m{
  "input": "Write a hello world function in Javascript"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "Write a hello world function in Javascript"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnablePassthrough] Entering Chain run with input:
[0m{
  "input": "Write a hello world function in Javascript"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnablePassthrough] [0ms] Exiting Chain run with output:
[0m{
  "output": "Write a hello

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > chain:format_docs] Entering Chain run with input:
[0m[inputs]
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > chain:format_docs] [0ms] Exiting Chain run with output:
[0m{
  "output": "For testing in javascript we use Jest.\n\nWe prefer snake_case for naming variables over CamelCase in javascript.\n\nWe use REST API to communicate with the server."
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence] [446ms] Exiting Chain run with output:
[0m{
  "output": "For testing in javascript we use Jest.\n\nWe prefer snake_case for naming variables over CamelCase in javascript.\n\nWe use REST API to communicate with the server."
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<cont

## Grounding
- We can either override knowledge from the llm using RAG.
- Or also add our own knowledge and avoid having to train/fine tune the model
- Consider the following example that we only want to answer if it is part of our knowledge.

In [62]:
result = llm.invoke("Are we using CamelCase for our function names ?")
print(result.content)

Whether you should use CamelCase for function names depends on the coding standards and conventions of the particular programming language you are using, as well as the guidelines set by your development team or organization. 

Here are some general conventions:

1. **Python**: The PEP 8 style guide recommends using `snake_case` for function names.
   
2. **JavaScript**: The common convention is to use `camelCase` for function names.
   
3. **Java**: The convention is also to use `camelCase` for method names.
   
4. **C#**: It is typical to use `PascalCase` for method names, which is similar to CamelCase but with the first letter capitalized too.

5. **C++**: There is no single standard, but both `camelCase` and `PascalCase` are widely used, depending on the project or team standards.

It's important to consistently follow the conventions adopted by your project or team to ensure code readability and maintainability. If there's no set guideline, choose a style and apply it consistently

In [63]:
result = chain.invoke("Should we using CamelCase for function names in javascript?")
print(result)

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


Based on the given context, the preference is for using snake_case for naming variables in JavaScript. There is no specific mention of function naming conventions in the context provided. However, given the preference for snake_case for variables, it might suggest a similar preference for consistency. Therefore, it would be reasonable to consider using snake_case for function names as well, unless a different convention is specified for functions elsewhere.


In [64]:
result = chain.invoke("What is the authentication framework used in this app?")
print(result)

Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3


The context provided does not specify which authentication framework is used in the app.
