## 📝 Sample Code: RAG with ChatNVIDIA + NVIDIAEmbeddings + Chroma

We’ll use:
* ✅ ChatNVIDIA (to chat with an NVIDIA LLM)
* ✅ NVIDIAEmbeddings (to embed our documents/questions)
* ✅ Chroma (as the vector DB for storing and retrieving embeddings)

This example will:

1. Load documents

2. Split documents into chunks

3. Initialize NVIDIA embeddings

4. Store embeddings in Chroma vector DB

5. Load the Chroma DB for retrieval

6. Create RAG chain

7. Ask the question

8. Print the answer ans sources


In [None]:
!hostname

In [None]:
!pip install -q langchain-community langchain-chroma langchain-nvidia-ai-endpoints

In [None]:
import os
from google.colab import userdata
os.environ['NVIDIA_API_KEY'] = userdata.get('NVIDIA_API_KEY')

In [None]:
# SKIP THIS CELL, in case you access local NIM on your own server
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
apikey = os.getenv('NVIDIA_API_KEY', "no-pass")

In [None]:
!git clone https://github.com/manote101/Building-Apps-with-NIM.git

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

# --- Configuration ---
LLM_ENDPOINT = "https://integrate.api.nvidia.com/v1"
LLM_MODEL = "meta/llama-3.2-3b-instruct"
# LLM_MODEL ="nvidia/llama-3.1-nemotron-nano-vl-8b-v1"
EMBEDDING_ENDPOINT = "https://integrate.api.nvidia.com/v1"
EMBEDDING_MODEL = "nvidia/llama-3.2-nv-embedqa-1b-v2"

# 1️⃣ Load your documents
loader = TextLoader("Building-Apps-with-NIM/data/doc1.txt")  # Replace with your file
documents = loader.load()

# 2️⃣ Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)
print(f"Number of chunks: {len(docs)}")

# 3️⃣ Initialize NVIDIA embeddings
embedding_function = NVIDIAEmbeddings(
    base_url=EMBEDDING_ENDPOINT,
    model=EMBEDDING_MODEL
)

In [None]:
docs[0]

In [None]:
docs[4]

In [None]:
docs[5]

In [None]:
# 4️⃣ Store embeddings in Chroma vector DB
vector_db = Chroma.from_documents(
    documents=docs,
    embedding=embedding_function,
    persist_directory="./chroma_db"  # Directory to save your Chroma DB
)
# vector_db.persist()

# 5️⃣ Load the Chroma DB for retrieval
retriever = vector_db.as_retriever(search_kwargs={"k": 3})

# 6️⃣ Initialize ChatNVIDIA (LLM)
llm = ChatNVIDIA(
    base_url=LLM_ENDPOINT,
    model=LLM_MODEL,
    temperature=0  # Lower temperature for factual answers
)

In [None]:
# 7️⃣ Create RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# 8️⃣ Ask a question
query = "Are there any service providers/ISVs who alreadyb implemented Nemo Microservices?"
result = qa_chain(query)

# 9️⃣ Print the answer and sources
print("Answer:", result["result"])
print("\nSources:")
for doc in result["source_documents"]:
    print("-", doc.metadata["source"])

### 🔥 What happens here:
* ✅ Embeds your document chunks into vector space (with NVIDIAEmbeddings)
* ✅ Stores and retrieves them from ChromaDB
* ✅ Adds context from retrieved chunks to ChatNVIDIA for final answering

### You can try with Thai

In [None]:
# 8️⃣ Ask a question
query = "มีใครใช้ Nemo microservices บ้าง"
result = qa_chain(query)

# 9️⃣ Print the answer and sources
print("Answer:", result["result"])
print("\nSources:")
for doc in result["source_documents"]:
    print("-", doc.metadata["source"])