*To execute this notebook, choose the `llm-rag` kernel in the dropdown above. You may need to hit the "Select another kernel" button, and refresh the kernels list.*


# Retrieval Augmented Generation

The last section introduce prompt engineering techniques. 
In this session we will extend this idea by 
1. searching through your own data,
2. returning the most relevant results, and
3. conditioning LLM continuations with these relevant facts. 

# The RAG Lifecycle

How can we use your own unstructured data like blog posts, documentation, and video collections to inform LLMs? 
1. Chunk the unstructured data
2. Compute embeddings on the chunks using a model
3. Index the embeddings 
4. Based on user queries, run vector similarity searches against the embeddings
5. Return the top K most similar vectors
6. Decode the vectors into the original data format
7. Use the "similar" data in prompts

# Chunking documents and pushing them to VectorDBs

The process begins with workflows that chunk data and upload the embedded chunks into vector storage.
There are tons of open-source and commercial vector storage solutions coming on the market. 
Of course there are many nuances that can be compared, but the APIs generally do the same jobs.

<img style="display: block; float: center; max-width: 80%; height: auto; margin: auto; float: none!important;" src="./vector-storage-solutions.png"/>

<center> <a href="https://www.singlestore.com/blog/choosing-a-vector-database-for-your-gen-ai-stack/"> Source </a></center>

Let's use Pinecone to guide our example in this notebook.

## Dependencies

In [None]:
pinecone_api_key = ...
openai_key = ...

## Manual index upload 

In [None]:
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key=pinecone_api_key)

index_name = "sample-index"
metric = "cosine"

In [None]:
from rag_tools.embedders.embedder import SentenceTransformerEmbedder

text_sample = [
    "Hello Gen AI friends!",
    "Metaflow helps you build production machine learning workflows",
    "Lots of people recognize machine learning systems require robust workflows",
]

# https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2
embedding_model = "paraphrase-MiniLM-L6-v2"

encoder = SentenceTransformerEmbedder(embedding_model, device="cpu")
embedding = encoder.embed(text_sample)

dimension=embedding.shape[1]

**Note**: If the following cell hangs, restart the notebook (you may need to cmd+shift+p and reload click "Developer Reload Window"), and go to Pinecone and manually create the index. Then, skip this cell. If the notebook is giving to many state issues, you can run these cells in the terminal instead after typing `python`. 

In [None]:
pc.create_index(
    name=index_name, 
    dimension=dimension, 
    metric=metric,
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    ) 
)

In [None]:
index = pc.Index(index_name)

In [None]:
upsert_response = index.upsert(
    vectors = [
        {'id': "0", 'values': embedding[0].tolist(), 'metadata': {'text': text_sample[0]}},
        {'id': "1", 'values': embedding[1].tolist(), 'metadata': {'text': text_sample[1]}}, 
        {'id': "2", 'values': embedding[2].tolist(), 'metadata': {'text': text_sample[2]}} 
    ]
)

In [None]:
query = "What does Metaflow do?"
vector = encoder.embed(query)
matches = index.query(vector=vector.tolist(), top_k=3, include_metadata=True)
matches = matches.to_dict()['matches']

In [None]:
matches

In [None]:
pc.delete_index(index_name)

## Chunk sample data

In [None]:
from rag_tools.filetypes.markdown import Mixin as Markdown

m = Markdown()
m.repo_params = [
    {
        "deployment_url": "docs.metaflow.org",
        "repository_path": "https://github.com/Netflix/metaflow-docs",
        "repository_ref": "master",
        "base_search_path": "docs",
        "exclude_paths": ["docs/v"],
        "exclude_files": ["README.md", "README"],
    }
]
df = m.load_df_from_repo_list()

word_count_threshold = 10
char_count_threshold = 25

# Filter out rows with less than N words or  M chars.
df = df[df.word_count > word_count_threshold]
df = df[df.char_count > char_count_threshold]

df

## Create embeddings

In [None]:
import pandas as pd

# Instantiate an encoder
encoder = SentenceTransformerEmbedder(embedding_model, device="cpu")

# Fetch docs from dataframe
docs = df.contents.tolist()

# Encode documents
embeddings = encoder.embed(docs) # takes ~30-45 seconds on average in sandbox instance
dimension = len(embeddings[0])
print("Dimension is %s" % dimension)

## Create an index

In [None]:
import pinecone
pc = Pinecone(api_key=pinecone_api_key)

index_name = "metaflow-documentation"
metric = "cosine" # https://docs.pinecone.io/docs/indexes#distance-metrics

In [None]:
if index_name not in pc.list_indexes():
    # https://docs.pinecone.io/reference/create_index
    pc.create_index(name=index_name, dimension=dimension, metric=metric, spec=ServerlessSpec(cloud="aws", region="us-east-1")) 
else:
    print(f"Index {index_name} already exists")

## Upsert document chunks

In [None]:
ids=df.index.values

# connect to the index
index = pc.Index(index_name)

vectors = [
    {'id': str(idx), 'values': emb.tolist(), 'metadata': {'text': txt},} 
    for idx, (txt, emb) in enumerate(zip(docs, embeddings))
]
upsert_response = index.upsert(vectors=vectors)

## Search the index

In [None]:
# We want to find relevant data in our Pinecone index to condition the LLM on.
query = "How do I specify conda dependencies in my flow?"

### But first... a benchmark with the vanilla LLM

In [None]:
from langchain.prompts.chat import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI

In [None]:
human_template = "{user_query}"
chat_prompt = ChatPromptTemplate.from_messages([("human", human_template)])
chat = ChatOpenAI(openai_api_key=openai_key)
response = chat(chat_prompt.format_messages(user_query=query))
print(response.content)

## Embed and match our search vector

In [None]:
# embed with sentence transformer
k = 5

vector = encoder.embed([query])[0]
matches = index.query(vector=vector.tolist(), top_k=k, include_metadata=True)
matches = matches.to_dict()['matches']

In [None]:
# Example format of Pinecone's index.query(...).to_dict()['matches']
# matches=[{'id': 'vec264', 'score': 0.6444936, 'values': [], 'sparse_values': {'indices': [], 'values': []}, 'metadata': {'text': 'The above instructions work even if you use [`@conda`\n decorators](/scaling/dependencies#managing-dependencies-with-conda-decorator) in your\n code; you need, however, to ensure that the `conda` binary is available in your `PATH`.\n The easiest way to do this is to set the `PATH` environment variable to properly include\n the path to the `conda` binary if it is in a non-standard location. In VSCode, you can\n simply add this value in the env section of launch.json and in PyCharm, the UI allows\n you to set environment variables.'}}, {'id': 'vec412', 'score': 0.5956618, 'values': [], 'sparse_values': {'indices': [], 'values': []}, 'metadata': {'text': 'The `@pypi` and `@conda` decorators allow you to make arbitrary packages\n available to Metaflow steps, as if you were installing them manually with\n `pip install` or `conda install`. This functionality works in conjuction\n with [local code packaging](/scaling/dependencies/project-structure), so\n steps can execute in safely isolated, remote-execution friendly environments\n that contain all dependencies they need.\n \n Crucially, when using `@pypi` or `@conda` steps can **only** access packages\n that are explicitly included either in the code package or specified in the\n decorator. System-wide packages are not available by design, making sure that\n all dependencies are explicitly specified and managed, ensuring\n reproducibility and stability of the flow.\n \n All examples below work interchangeably with `@pypi` and `@conda`. The\n examples include both `@pypi` lines as well as `@conda` lines commented out,\n so you can easily test both the decorators. In a real-life setting, you\n would [use either `@conda` or `@pypi`](/scaling/dependencies/conda-vs-pypi).'}}, {'id': 'vec47', 'score': 0.5355435, 'values': [], 'sparse_values': {'indices': [], 'values': []}, 'metadata': {'text': 'The `@conda_base` decorator specifies what libraries should be made available for all steps of a flow.\n \n The libraries are installed from [Conda repositories](https://anaconda.org/). For more information, see [Managing External Libraries](/scaling/dependencies).\n \n \n \n <DocSection type="decorator" name="conda_base" module="metaflow" show_import="True" heading_level="3" link="https://github.com/Netflix/metaflow/tree/master/metaflow/plugins/conda/conda_flow_decorator.py#L7">\n <SigArgSection>\n <SigArg name="..." />\n </SigArgSection>\n <Description summary="Specifies the Conda environment for all steps of the flow." extended_summary="Use `@conda_base` to set common libraries required by all\\nsteps and use `@conda` to specify step-specific additions." />\n <ParamSection name="Parameters">\n \t<Parameter name="libraries" type="Dict[str, str], default: {}" desc="Libraries to use for this flow. The key is the name of the package\\nand the value is the version to use." />\n \t<Parameter name="python" type="str, optional" desc="Version of Python to use, e.g. \'3.7.4\'. A default value of None means\\nto use the current Python version." />\n \t<Parameter name="disabled" type="bool, default: False" desc="If set to True, disables Conda." />\n </ParamSection>\n </DocSection>'}}, {'id': 'vec314', 'score': 0.5200169, 'values': [], 'sparse_values': {'indices': [], 'values': []}, 'metadata': {'text': "As shown above, `@project` guarantees that all flows linked together within the\n same project and branch are isolated from other deployments. However, sometimes\n you may want to depend on an upstream flow that is deployed outside of your\n branch. For instance, you may want to deploy a variant of a downstream\n workflow, like `SecondProjectFlow` above, while reusing results from an\n existing upstream flow, like `FirstProjectFlow`.\n \n You can accomplish this by specifying explicitly the project-branch name that\n you want to depend on. For instance, this line makes a flow depend on Alice's\n deployment regardless of the branch the flow is deployed on:\n \n ```python\n @trigger_on_finish(flow='variant_demo.user.alice.FirstProjectFlow')\n ```"}}, {'id': 'vec419', 'score': 0.51605517, 'values': [], 'sparse_values': {'indices': [], 'values': []}, 'metadata': {'text': "When using `--environment=conda` or `--environment=pypi` all steps are executed in\n isolated environments. As a result, the steps don't have access to packages that are\n installed system-wide. This is desirable, as it makes the flow more reproducible as it\n doesn't depend on packages that may exist just in your environment.\n \n However, sometimes a need may arise to be able to access a system-wide package in one\n step, while using isolated environments in other steps. For instance, you may use\n [a custom Docker image](/scaling/dependencies/containers) in conjuction with `@pypi`\n or `@conda`, accessing packages directly from the image in a step.\n \n To make this possible, you can set `@conda(disabled=True)` or `@pypi(disabled=True)` at\n the step level. A step with PyPI/Conda disabled behaves as if the flow runs without\n `--environment`.\n \n To demonstrate this, consider this flow, `peekabooflow.py`, that prints out the path\n of the Python interpreter used in each step:\n \n ```python\n import sys\n from metaflow import FlowSpec, step, conda_base, conda\n \n @conda_base(python='3.9.13')\n class PeekabooFlow(FlowSpec):\n \n     @step\n     def start(self):\n         print(sys.executable)\n         self.next(self.peekaboo)\n \n     @conda(disabled=True)\n     @step\n     def peekaboo(self):\n         print(sys.executable)\n         self.next(self.end)\n \n     @step\n     def end(self):\n         print(sys.executable)\n \n if __name__ == '__main__':\n     PeekabooFlow()\n ```\n Run the flow as usual:\n ```bash\n $ python peekabooflow.py --environment=conda run\n ```\n Notice how the path is the same in the `start` and `end` steps but different in the\n `peekaboo` step which uses a system-wide Python installation."}}]

In [None]:
row_idxs = []
for m in matches:
    row_idxs.append(int(m['id']))

In [None]:
row_idxs

In [None]:
retrived_results = df.iloc[row_idxs, :]
retrived_results

## Using retrieved results in prompts

In [None]:
%env TOKENIZERS_PARALLELISM=false

In [None]:
system_message = """
You are a helpful assistant that translates help learners use Metaflow to build production-grade machine learning workflows.
Here is some relevant context you can use, each with links to a page in the Metaflow documentation where the context is retrieved from:
"""

context_template = """
{system_message}

{context}

Use the above pieces of context to condition the response.
"""
 
_context = ""
for _, row in retrived_results.iterrows():
    _context += "\n### context: {}\n### url: {} \n".format(
        row.contents, row.page_url
    )

human_template = "{user_query}"

In [None]:
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", context_template),
    ("human", human_template),
])

chat = ChatOpenAI(openai_api_key=openai_key)

In [None]:
response = chat(chat_prompt.format_messages(user_query=query, context=_context, system_message=system_message))

In [None]:
from IPython.display import Markdown 
Markdown(response.content)

# Summary

In this notebook, you have seen how to unroll the core loop of a RAG pipeline for teaching LLMs how to condition their respnoses on your data. 
In the next lesson, we will automate these workflows using Metaflow, so you can build reactive systems that run data chunking, cleaning, and indexing pipelines when data is updated.