# bRAG: Introduction to Retrieval-Augmented Generation (RAG) Setup

These notebooks walk through the process of building RAG app(s) from scratch.

They will build towards a broader understanding of the RAG langscape, as shown here:

![Screenshot 2024-03-25 at 8.30.33 PM.png](attachment:c566957c-a8ef-41a9-9b78-e089d35cf0b7.png)

## Pre-requisites (optional but recommended)

### Only do the first step if you have never created a virtual environment for this repository. Otherwise, make sure that the Python Kernel that you selected is from your `venv/` folder.

In [1]:
# Create virtual environment
! python -m venv ../venv

Error: Command '['/venv/bin/python3', '-m', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1.


In [None]:
# Activate virtual Python environment
! source ../venv/bin/activate

In [None]:
# If your Python is not from your venv path, ensure that your IDE's kernel selection (on the top right corner) is set to the correct path
# (your path output should contain "...venv/bin/python")

! which python

/Users/taha/Desktop/bRAGAI/code/gh/bRAG-langchain/venv/bin/python


In [None]:
# Install all packages
! pip install -r ../requirements.txt --quiet

### * If you choose to skip the pre-requisites and install only the packages specific to this notebook using your global Python path environment, execute the command below; otherwise, proceed to the next step.

In [None]:
! pip install --quiet pinecone-client python-dotenv langchain langchain-community langchain-core langchain-openai beautifulsoup4 tiktoken numpy

## Environment

`(1) Packages`

In [None]:
import os
from dotenv import load_dotenv

# Load all environment variables from .env file
load_dotenv()

# Access the environment variables
langchain_tracing_v2 = os.getenv('LANGCHAIN_TRACING_V2')
langchain_endpoint = os.getenv('LANGCHAIN_ENDPOINT')
langchain_api_key = os.getenv('LANGCHAIN_API_KEY')

## LLM
openai_api_key = os.getenv('OPENAI_API_KEY')

## Pinecone Vector Database
pinecone_api_key = os.getenv('PINECONE_API_KEY')
pinecone_api_host = os.getenv('PINECONE_API_HOST')
index_name = os.getenv('PINECONE_INDEX_NAME')


`(2) LangSmith`

https://docs.smith.langchain.com/

In [None]:
os.environ['LANGCHAIN_TRACING_V2'] = langchain_tracing_v2
os.environ['LANGCHAIN_ENDPOINT'] = langchain_endpoint
os.environ['LANGCHAIN_API_KEY'] = langchain_api_key

`(3) API Keys`

In [None]:
os.environ['OPENAI_API_KEY'] = openai_api_key

#Pinecone keys
os.environ['PINECONE_API_KEY'] = pinecone_api_key
os.environ['PINECONE_API_HOST'] = pinecone_api_host
os.environ['PINECONE_INDEX_NAME'] = index_name

`(4) Pinecone Init`

In [None]:
from pinecone import Pinecone

pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])
index = pc.Index(os.environ['PINECONE_INDEX_NAME'])

  from tqdm.autonotebook import tqdm


## Part 1: Overview

[RAG quickstart](https://python.langchain.com/docs/tutorials/rag/)

In [None]:
from pprint import pprint
import bs4

from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Pinecone
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

#### INDEXING ####

# Load Documents
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Embed
vectorstore = Pinecone.from_documents(
    documents=splits,
    embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
    index_name=index_name
)

retriever = vectorstore.as_retriever()

#### RETRIEVAL and GENERATION ####

# Prompt
prompt = hub.pull("rlm/rag-prompt")

# LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.1)

# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Question
pprint(rag_chain.invoke("How does LangChain use vector stores for efficient data retrieval?"))

('LangChain uses vector stores to embed and store diverse data sources like '
 'documents, text, and images. When a user submits a query, the system '
 'retrieves the most relevant information from the vector store based on '
 'vector similarity. This retrieved context is then provided to the large '
 'language model (LLM) to enhance its ability to generate accurate responses.')


## Part 2: Indexing

![Screenshot 2024-02-12 at 1.36.56 PM.png](attachment:d1c0f19e-1f5f-4fc6-a860-16337c1910fa.png)

In [None]:
# Documents
question = "How does LangChain handle multi-turn conversations in chat models?"
document = "LangChain supports chat models that manage complex, multi-turn conversations by maintaining state across interaction turns. This is achieved through structured message handling and conversation memory, which allows for the retention of historical context. Features like structured outputs and JSON-based response formatting further enable seamless integration with downstream applications, making it ideal for context-dependent use cases such as customer support and conversational AI."

[Count tokens](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb) considering [~4 char / token](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)

In [None]:
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string(question, "cl100k_base")

12

[Text embedding models](https://python.langchain.com/docs/integrations/text_embedding/openai)

In [None]:
from langchain_openai import OpenAIEmbeddings
embd = OpenAIEmbeddings(model="text-embedding-3-large")
query_result = embd.embed_query(question)
document_result = embd.embed_query(document)
len(query_result)

3072

[Cosine similarity](https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions) is reccomended (1 indicates identical) for OpenAI embeddings.

In [None]:
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)

Cosine Similarity: 0.7714784751249568


[Document Loaders](https://python.langchain.com/docs/integrations/document_loaders/)

Includes file loaders for:
- [Webpages](https://python.langchain.com/docs/integrations/document_loaders/#webpages)
- [PDFs](https://python.langchain.com/docs/integrations/document_loaders/#pdfs)
- [Cloud Providers](https://python.langchain.com/docs/integrations/document_loaders/#cloud-providers)
- [Social Platforms](https://python.langchain.com/docs/integrations/document_loaders/#social-platforms)
- [Messaging Services](https://python.langchain.com/docs/integrations/document_loaders/#messaging-services)
- [Productivity tools](https://python.langchain.com/docs/integrations/document_loaders/#productivity-tools)
- [Common File Types](https://python.langchain.com/docs/integrations/document_loaders/#common-file-types)
  - CSVLoader: CSV files
  - DirectoryLoader: All files in a given directory
  - Unstructured: Many file types (see https://docs.unstructured.io/platform/supported-file-types)
  - JSONLoader: JSON files
  - BSHTMLLoader: HTML filess
- All document loaders

In [None]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

[Splitter](https://python.langchain.com/docs/how_to/recursive_text_splitter/)

> This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [None]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

[Vectorstores](https://python.langchain.com/docs/integrations/vectorstores/)

A vector store stores embedded data and performs similarity search.

50+ vectorstores available to choose from

In [None]:
# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Pinecone

vectorstore = Pinecone.from_documents(
    documents=splits,
    embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
    index_name=index_name
)

retriever = vectorstore.as_retriever()

## Part 3: Retrieval

In [None]:
# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Pinecone

vectorstore = Pinecone.from_documents(
    documents=splits,
    embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
    index_name=index_name
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

In [None]:
docs = retriever.invoke("How does LangChain handle multi-turn conversations in chat models?")

In [None]:
len(docs)

1

## Part 4: Generation

![Screenshot 2024-02-12 at 1.37.38 PM.png](attachment:f9b0e284-58e4-4d33-9594-2dad351c569a.png)

In [None]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

# Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])

In [None]:
# LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.1)

In [None]:
# Chain
chain = prompt | llm

In [None]:
# Run
chain.invoke({"context":docs,"question":"How does LangChain handle multi-turn conversations in chat models?"})

AIMessage(content='LangChain maintains state across conversation turns, making it suitable for prolonged, context-dependent conversations.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 495, 'total_tokens': 514, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-fd45bacc-e13a-4263-b397-20ad587bfa02-0', usage_metadata={'input_tokens': 495, 'output_tokens': 19, 'total_tokens': 514, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [None]:
from langchain import hub
prompt_hub_rag = hub.pull("rlm/rag-prompt")

In [None]:
prompt_hub_rag

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

[RAG chains](https://python.langchain.com/docs/how_to/sequence/)

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("How does LangChain enhance LLM responses with Retrieval-Augmented Generation (RAG)?")

'LangChain enhances LLM responses with Retrieval-Augmented Generation (RAG) by allowing models to access up-to-date information through Document Loaders, Text Splitters, Embedding Models, Vector Stores, Retrievers, and RAG Chains. This enables the retrieval and merging of external data with model responses, enhancing applications such as question answering systems and recommendation engines.'

# Conclusion: Basic RAG Setup

This document provides an overview of the basic setup for a Retrieval-Augmented Generation (RAG) application. It outlines the necessary steps to create a virtual environment, install required packages, and load environment variables for API keys.

Key components include:

1. **Environment Setup**: Instructions for creating a virtual environment and installing dependencies from a `requirements.txt` file.
2. **API Key Management**: Loading and setting environment variables for various services, including LangChain and Pinecone.
3. **Document Loading and Indexing**: Utilizing `WebBaseLoader` to load documents from a specified URL, followed by splitting the documents into manageable chunks using `RecursiveCharacterTextSplitter`.
4. **Embedding and Vector Store**: Creating embeddings for the documents and storing them in a Pinecone vector database for efficient retrieval.
5. **Retrieval and Generation**: Implementing a retrieval mechanism to fetch relevant documents based on user queries and generating responses using a language model (LLM).

This setup serves as a foundational framework for building RAG applications. Future notebooks will go deeper into each component, providing more advanced techniques and use cases for leveraging RAG effectively.