# Build RAG with Milvus, Sambanova and Langchain

This notebook will show you how to build a RAG (Retrieval-Augmented Generation) pipeline with Milvus and SNCloud.

The RAG system combines a retrieval system with a generative model to generate new text based on a given prompt. The system first retrieves relevant documents from a corpus using Milvus, and then uses a generative model to generate new text based on the retrieved documents.


## Preparation

> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the "Runtime" menu at the top of the screen, and select "Restart session" from the dropdown menu).

We will use SNCloud as the LLM in this example. You should prepare the [api key](https://cloud.sambanova.ai/) `SAMBANOVA_API_KEY`.

In [12]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_milvus import Milvus
from langchain_community.chat_models.sambanova import ChatSambaNovaCloud
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub
import os

In [None]:
os.environ["SAMBANOVA_URL"] = "https://api.sambanova.ai/v1/chat/completions"
os.environ["SAMBANOVA_API_KEY"] = "YOUR-API-KEY"

### Prepare the data

In [14]:
file_path = "data/QuickStart.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

print(len(docs))

2


In [15]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

### Prepare the LLM and Embedding Model

We initialize the OpenAI client to prepare SNCloud LLM.
We initialize the HuggingFace SentenceTransformer to prepare the embedding model.

In [None]:
llm = ChatSambaNovaCloud(
    model="Meta-Llama-3.1-405B-Instruct", max_tokens=1024, temperature=0.7, top_k=1, top_p=0.01
)

embeddings = HuggingFaceInstructEmbeddings()

prompt = hub.pull("rlm/rag-prompt")

## Load data into Milvus

### Create the Collection and insert data

In [17]:
URI = "./data/milvus_example.db"

vector_store = Milvus.from_documents(
    splits,
    embeddings,
    collection_name="langchain_example",
    connection_args={"uri": URI},
)

E20241118 11:30:09.260223 370037 collection_data.cpp:84] [SERVER][Insert][] Insert data failed, errs: attempt to write a readonly database


### Retrieve data for a query

Let's specify a question for the vectordb.

In [None]:
retriever = vector_store.as_retriever()

## Build RAG

### Retrieve data for a query

Let's specify a frequent question about Milvus.

In [20]:
question = "What is this document about?"

### Use LLM to get a RAG response

Convert the retrieved documents into a string format.

In [21]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

Build the rag system using SNCloud Meta-Llama-3.1-405B-Instruct, HuggingFace embeddings and milvus vectordb

In [22]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke(question)

'This document appears to be about building a complete Enterprise knowledge retriever app using RAG application with Langchain and Streamlit in Python. It also provides information on how to join the SambaNova Developer Community forum for support and communication. Additionally, it mentions the FastAPI Reference Doc and Cookbook for further guidance.'