# Retrieval Augmented Generation Demo

## Prerequisites

### Make a `.env` file

```sh
cp .env.example .env
```

### Provide API keys in `.env` file 

- LANGCHAIN_API_KEY (optional)
  -  Sign up to https://smith.langchain.com/
  -  Go to Settings -> API Keys
  -  Create API Key
- GROQ_API_KEY
  - Sign up to https://console.groq.com/
  - Go to API Keys
  - Create API Key

### Check that everything is working

- Check that the configured Kernel is `.venv`
  - In the navigation bar, click `Kernel`
  - Click `Change Kernel...`
- Check that all cells are working
  - In the navigation bar, click `Kernel`
  - Click `Restart Kernel...`
  - In the navigation bar, click `Edit`
  - Click `Clear Outputs of All Cells`
  - In the navigation bar, click `Run`
  - Click `Run All Cells`

## Set up environment

In [14]:
import warnings
from dotenv import load_dotenv

warnings.filterwarnings('ignore')
load_dotenv()

True

## Set up tracing
Define `LANGCHAIN_API_KEY` in your `.env` file.  
This step is optional.

In [15]:
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"

## Set up LLM
Define `GROQ_API_KEY` in your `.env` file.

In [16]:
from langchain_groq import ChatGroq

llm = ChatGroq(model="llama3-8b-8192")

## Indexing: Load

In [17]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

len(docs[0].page_content)

43131

## Indexing: Split

In [18]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True,
)
all_splits = text_splitter.split_documents(docs)

len(all_splits)

66

## Indexing: Store

In [19]:
from langchain_chroma import Chroma
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

print("Indexing started...")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=HuggingFaceEmbeddings())
print("Indexing complete.")

Indexing started...
Indexing complete.


## Retrieval and Generation: Retrieve

In [20]:
import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

len(retrieved_docs)

6

## Retrieval and Generation: Generate

In [21]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

example_messages = prompt.invoke({
    "context": "filler context",
    "question": "filler question",
}).to_messages()

example_messages

[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]

In [22]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task decomposition is a process that breaks down a problem or task into smaller, more manageable subtasks or steps.