# Retrieval Augmented Generation (RAG)

In this section, we implement a complete RAG pipeline for answering questions based on a given context. Using the LangChain library, we'll walk through the entire process—from retrieving relevant context to generating accurate answers.

So far, we have done -
1. **Indexing**: Organize the raw documents into a structured format suitable for processing, such as splitting them into chunks or passages for more efficient retrieval.

2. **Embedding**: Convert each text chunk into a dense vector representation using a pre-trained embedding model. These embeddings capture the semantic meaning of the content.

3. **Vector Store**: Store the embeddings in a vector database (Qdrant in our case), allowing fast and scalable similarity search across the document collection.

4. **Retrieval and Generation**: In this notebook, given a user query, retrieve the most relevant document chunks from the vector store and feed them into a language model (EVE) to generate a context-aware, accurate response.

In [1]:
from langchain_huggingface import HuggingFaceEmbeddings

# Load the embeddings model
model_name = "nasa-impact/nasa-smd-ibm-st-v2"
encode_kwargs = {"normalize_embeddings": True}
embedder = HuggingFaceEmbeddings(
    model_name=model_name,  encode_kwargs=encode_kwargs
)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
query = "what is tropomi?"
query_vector = embedder.embed_query(query)

In [3]:
from qdrant_client import QdrantClient
import os

from dotenv import load_dotenv
load_dotenv()

# get your keys from the qdrant UI
QDRANT_API_KEY = os.environ['QDRANT_API_KEY']
QDRANT_URL = os.environ['QDRANT_URL']

collection_name = "ingestion_demo"
client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)

In [4]:
results = client.query_points(
    collection_name=collection_name,
    query=query_vector,
    limit=1  # number of similar results you want
)

In [5]:
for result in results.points:
    print(f"retrieval score: {result.score}")
    print(f"retrieved chunk: {result.payload['content']}")

retrieval score: 0.7168092
retrieved chunk: TROPOMI (Tropospheric Monitoring Instrument) is a cutting-edge satellite instrument aboard the European Copernicus Sentinel-5 Precursor (S5P) satellite, launched in October 2017." 
            It plays an essential role in gathering data that helps scientists to better understand atmospheric processes and environmental changes. 
            TROPOMI’s high-resolution data contributes to global efforts in monitoring air quality, tracking climate trends, and protecting the ozone layer, 
            making it a key tool for advancing environmental science and policy worldwide.


## Generation

Once the relevant context is retrieved, it is passed to an LLM to generate a coherent and informed response based on both the query and the retrieved context.

This approach ensures that the generated answers are grounded in the source documents, improving accuracy and reducing hallucination.

In [6]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="Qwen/Qwen3-0.6B",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
    ),
)

chat_model = ChatHuggingFace(llm=llm)

Device set to use mps:0
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [7]:
from langchain.messages import (
    HumanMessage,
    SystemMessage,
)

system_message = '''You are an expert assistant that answers questions about different topics.
If you don't know the answer, just say "I don't know." Don't try to make up an answer.
Use only the following pieces of context to answer the question at the end.
Do not use any prior knowledge.'''


messages = [
    SystemMessage(content=system_message),
    HumanMessage(
        content=f"Context: {result.payload['content']} Question: {query}"
    ),
]

response = chat_model.invoke(messages)

`generation_config` default values have been modified to match model-specific defaults: {'do_sample': True}. If this is not desired, please set these values explicitly.


In [10]:
print(response.content)

<|im_start|>system
You are an expert assistant that answers questions about different topics.
If you don't know the answer, just say "I don't know." Don't try to make up an answer.
Use only the following pieces of context to answer the question at the end.
Do not use any prior knowledge.<|im_end|>
<|im_start|>user
Context: TROPOMI (Tropospheric Monitoring Instrument) is a cutting-edge satellite instrument aboard the European Copernicus Sentinel-5 Precursor (S5P) satellite, launched in October 2017." 
            It plays an essential role in gathering data that helps scientists to better understand atmospheric processes and environmental changes. 
            TROPOMI’s high-resolution data contributes to global efforts in monitoring air quality, tracking climate trends, and protecting the ozone layer, 
            making it a key tool for advancing environmental science and policy worldwide. Question: what is tropomi?<|im_end|>
<|im_start|>assistant
<think>
Okay, let's see. The user is