# RAG: Retrieval Augmented Generation
We will build a RAG example using [LangChain](https://python.langchain.com). Then, we will use [Streamlit](https://docs.streamlit.io/) to make a browser app from it.
The code in this tutorial is largely taken from an [official LangChain tutorial](https://python.langchain.com/docs/tutorials/rag/).

### Install required packages

In [None]:
pip install -q transformers==4.50.3 psycopg_binary streamlit

In [None]:
pip install -qU langchain-text-splitters langchain-community langgraph langchain-core langchain-huggingface langchain_postgres "langchain[openai]"

### Create a project

In [None]:
import digitalhub as dh

project = dh.get_or_create_project("rag-demo")

### Configure API keys and other environment variables
Configure the following environment variables in a `rag.env` file:

- `HF_TOKEN`: Your HuggingFace token
- `LANGSMITH_TRACING`: `true`
- `LANGSMITH_API_KEY`: Your LangChain API key
- `OPENAI_API_KEY`: Your OpenAI API key
- `PG_CONN_URL`: Connection URL to the Postgres database for PGVector, in the format: `postgresql+psycopg://user:password@host:port/database`

In [None]:
import os
from pathlib import Path
from dotenv import load_dotenv

env_path = Path(".") / "rag.env"
load_dotenv(dotenv_path=env_path, override=True)

### Chat model
A [chat model](https://python.langchain.com/docs/concepts/chat_models/) can interpret and generate natural-language text. We create a function to launch a pre-existing model on the platform and serve it.

In [None]:
chat_func = project.new_function(
    "chat", kind="huggingfaceserve", model_name="chatmodel", path="huggingface://meta-llama/meta-llama-3-8b-instruct"
)

In [None]:
chat_run = chat_func.run(
    action="serve",
    profile="1xa100",
    max_length="5000",
    envs=[{"name": "HF_TOKEN", "value": os.environ["HF_TOKEN"]}],
    wait=True,
)

In [None]:
chat_service_url = chat_run.refresh().status.to_dict()["service"]["url"]

In [None]:
from langchain.chat_models import init_chat_model

llm = init_chat_model("chatmodel", model_provider="openai", base_url=f"http://{chat_service_url}/openai/v1/")

### Embedding model
[Embedding models](https://python.langchain.com/docs/concepts/embedding_models/) map discrete data, such as words, to numerical vectors, which are more convenient for analysis, yet can still represent relationships between objects.

In [None]:
emb_func = project.new_function(
    "emb", kind="huggingfaceserve", model_name="embmodel", path="huggingface://thenlper/gte-base"
)

At the moment, you must serve the model manually from the platform. Set the backend to `HUGGINGFACE` and the inference task to `text-embedding`. Once served, copy its service URL and assign it to the following variable.

In [None]:
embedding_service_url = "<your_embedding_service_url>"

In [None]:
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

hf_embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=os.environ["HF_TOKEN"], api_url=f"http://{embedding_service_url}/v1/models/embmodel:predict"
)

In [None]:
class CEmbeddings(HuggingFaceInferenceAPIEmbeddings):
    def embed_documents(self, docs):
        return hf_embeddings.embed_documents(docs)["predictions"]


custom_embeddings = CEmbeddings(api_key=os.environ["HF_TOKEN"])

### Vector store
Embeddings are stored in [vector stores](https://python.langchain.com/docs/concepts/vectorstores/), which allow for similarity searches, based on the semantic vicinity of words.

In [None]:
from langchain_postgres import PGVector

vector_store = PGVector(
    embeddings=custom_embeddings,
    collection_name="my_docs",
    connection=os.environ["PG_CONN_URL"],
)

### Load and chunk contents
Split documents on new-line characters, as models have an easier time understanding the context of smaller inputs.

In [None]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(parse_only=bs4.SoupStrainer(class_=("post-content", "post-title", "post-header"))),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

_ = vector_store.add_documents(documents=all_splits)

### Define prompt and operations
We define an object type to contain question, relevant context, and answer.

We define two operations that will enrich this object: one takes the question and performs a similarity search to obtain and add context, the other uses the question and context to generate and add the answer.

In [None]:
from langchain import hub
from langchain_core.documents import Document
from typing_extensions import List, TypedDict

prompt = hub.pull("rlm/rag-prompt")


class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

### Compile application
We define a simple graph of operations: *retrieve* -> *generate*.

In [None]:
from langgraph.graph import START, StateGraph

graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

### Invoke

In [None]:
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

## Streamlit: creating a browser app
[Streamlit](https://docs.streamlit.io/) is a Python framework to create browser applications with little code. We will create an app with a form, where the user will provide a document for retrieval and a question.

Chat and embedding models must be available beforehand. If you've run the cells above, the models should be up, so we can environment variables to the services:

In [None]:
os.environ["CHAT_SERVICE_URL"] = chat_service_url
os.environ["EMBEDDING_SERVICE_URL"] = embedding_service_url

The following will create the Python file run by Streamlit. The code is largely the same as the steps above, with the necessary changes to define the Streamlit app.

In [None]:
%%writefile 'rag-streamlit-app.py'
import os
import bs4
import streamlit as st
from dotenv import load_dotenv
from langchain import hub
from langchain.chat_models import init_chat_model
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from pathlib import Path
from typing_extensions import List, TypedDict

# API keys
env_path = Path('.') / 'rag.env'
load_dotenv(dotenv_path=env_path, override=True)

# Chat model
llm = init_chat_model("chatmodel", model_provider="openai", base_url=f"http://{os.environ['CHAT_SERVICE_URL']}/openai/v1/")

# Embedding model
hf_embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=os.environ["HF_TOKEN"],
    api_url=f"http://{os.environ['EMBEDDING_SERVICE_URL']}/v1/models/embmodel:predict"
)

class CEmbeddings(HuggingFaceInferenceAPIEmbeddings):
    def embed_documents(self, docs):
        return hf_embeddings.embed_documents(docs)["predictions"]
custom_embeddings = CEmbeddings(api_key=os.environ["HF_TOKEN"])

# Vector store
vector_store = PGVector(
    embeddings=custom_embeddings,
    collection_name="my_docs",
    connection=os.environ["PG_CONN_URL"],
)

# Define prompt and operations
prompt = hub.pull("rlm/rag-prompt")

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

# Define graph of operations
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

# Streamlit setup
st.title("RAG App")
st.write("Welcome to the RAG (Retrieval-Augmented Generation) app.")
st.write("Please provide a link to the document to retrieve and your question.")
if "messages" not in st.session_state:
    st.session_state.messages = []

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

qa = st.container()

with st.form("rag_form", clear_on_submit=True):
    rag_document = st.text_input("Document", "")
    question = st.text_input("Question", "")
    submit = st.form_submit_button("Submit")
    
if submit:
    # Load and chunk contents
    if question:
        if rag_document:
            loader = WebBaseLoader(
                web_paths=(rag_document,),
                bs_kwargs=dict(),
            )
            docs = loader.load()
            
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
            all_splits = text_splitter.split_documents(docs)
            
            # Index chunks
            _ = vector_store.add_documents(documents=all_splits)
    
        st.session_state.messages.append({"role": "user", "content": question})
        with qa.chat_message("user"):
            st.write(question)
    
        response = graph.invoke({"question": question})
        st.session_state.messages.append({"role": "assistant", "content": response["answer"]})
        with qa.chat_message("assistant"):
            st.write(response["answer"])
    else:
        with qa.chat_message("assistant"):
            st.write("You didn't provide a question!")

## Launch and test the Streamlit app
This command launches the Streamlit app, based on the file written by the previous cell. To access the app, you will need to forward port 8501 in Coder. Try using the following to ask the app a question.

Document:
```
https://lilianweng.github.io/posts/2023-06-23-agent
```
Question:
```
What is task decomposition?
```

In [None]:
!streamlit run rag-streamlit-app.py --browser.gatherUsageStats false