# Build RAG with Milvus, Sambanova and Langchain

This notebook will show you how to build a RAG (Retrieval-Augmented Generation) pipeline with Milvus and SNCloud.

The RAG system combines a retrieval system with a generative model to generate new text based on a given prompt. The system first retrieves relevant documents from a corpus using Milvus, and then uses a generative model to generate new text based on the retrieved documents.


## Preparation

> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the "Runtime" menu at the top of the screen, and select "Restart session" from the dropdown menu).

We will use SNCloud as the LLM in this example. You should prepare the [api key](https://cloud.sambanova.ai/) `SAMBANOVA_API_KEY`.

In [1]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_milvus import Milvus
from langchain_sambanova import ChatSambaNovaCloud
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub
import os

In [None]:
os.environ["SAMBANOVA_URL"] = "https://api.sambanova.ai/v1/chat/completions"
os.environ["SAMBANOVA_API_KEY"] = "YOUR-API-KEY"

### Prepare the data

In [3]:
file_path = "../data/SambaNova_Solution.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

print(len(docs))

6


In [4]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

### Prepare the LLM and Embedding Model

We initialize the Langchain SNCloud Chat Model.
We initialize the HuggingFace SentenceTransformer to prepare the embedding model (It can take up to 60 seconds to load the model for the first time).

In [None]:
llm = ChatSambaNovaCloud(
    model="Meta-Llama-3.1-405B-Instruct", max_tokens=1024, temperature=0.7, top_p=0.01
)

embeddings = HuggingFaceInstructEmbeddings()

prompt = hub.pull("rlm/rag-prompt")

## Load data into Milvus

### Create the Collection and insert data

In [6]:
URI = "../data/milvus_example.db"

vector_store = Milvus.from_documents(
    splits,
    embeddings,
    collection_name="langchain_example",
    connection_args={"uri": URI},
)

### Retrieve data for a query

Let's specify a question for the vectordb.

In [7]:
retriever = vector_store.as_retriever()

## Build RAG

### Retrieve data for a query

Let's specify a frequent question about Milvus.

In [8]:
question = "What is this document about?"

### Use LLM to get a RAG response

Convert the retrieved documents into a string format.

In [9]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

Build the rag system using SNCloud Meta-Llama-3.1-405B-Instruct, HuggingFace embeddings and milvus vectordb

In [10]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke(question)

'This document is about SambaNova DataScale for Science, a platform that enables high-performance computing for various scientific applications, including high-resolution medical image analysis, large language models for science, and multi-physics simulation workloads. The platform aims to revolutionize research by unlocking insights trapped in unstructured data and accelerating discoveries. It delivers industry-leading performance and overcomes the limitations of traditional GPU-based systems.'