# RAG Application

![Simple RAG](../../images/simple_rag.png)

In this notebook, we're going to set up a simple RAG application that we'll be using as we learn more about LangSmith.

RAG (Retrieval Augmented Generation) is a popular technique for providing LLMs with relevant documents that will enable them to better answer questions from users. 

In our case, we are going to index some LangSmith documentation!

LangSmith makes it easy to trace any LLM application, no LangChain required!

### Setup

Make sure you set your environment variables, including your OpenAI API key.

In [11]:
# You can set them inline!
import os
os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"
os.environ["LANGSMITH_API_KEY"] = "LANGSMITH_API_KEY"
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "langsmith-academy"

In [12]:
# Or you can use a .env file
from dotenv import load_dotenv
load_dotenv(dotenv_path="D:\lakshya\intro-to-langsmith\ls-academy\.env", override=True)

  load_dotenv(dotenv_path="D:\lakshya\intro-to-langsmith\ls-academy\.env", override=True)


True

### Simple RAG application

In [13]:
from langsmith import traceable
from openai import OpenAI
from typing import List
import nest_asyncio
import sys
import langchain_community
sys.path.append(os.path.abspath("."))
from utils import get_vector_db_retriever




MODEL_PROVIDER = "openai"
MODEL_NAME = "gpt-4o-mini"
APP_VERSION = 1.0
RAG_SYSTEM_PROMPT = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the latest question in the conversation. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
"""

openai_client = OpenAI()
nest_asyncio.apply()
retriever = get_vector_db_retriever()

"""
retrieve_documents
- Returns documents fetched from a vectorstore based on the user's question
"""
@traceable(run_type="chain")
def retrieve_documents(question: str):
    return retriever.invoke(question)

"""
generate_response
- Calls `call_openai` to generate a model response after formatting inputs
"""
@traceable(run_type="chain")
def generate_response(question: str, documents):
    formatted_docs = "\n\n".join(doc.page_content for doc in documents)
    messages = [
        {
            "role": "system",
            "content": RAG_SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": f"Context: {formatted_docs} \n\n Question: {question}"
        }
    ]
    return call_openai(messages)

"""
call_openai
- Returns the chat completion output from OpenAI
"""
@traceable(run_type="llm")
def call_openai(
    messages: List[dict], model: str = MODEL_NAME, temperature: float = 0.0
) -> str:
    return openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
    )

"""
langsmith_rag
- Calls `retrieve_documents` to fetch documents
- Calls `generate_response` to generate a response based on the fetched documents
- Returns the model response
"""
@traceable(run_type="chain")
def langsmith_rag(question: str):
    documents = retrieve_documents(question)
    response = generate_response(question, documents)
    return response.choices[0].message.content


Fetching pages: 100%|##########| 197/197 [00:43<00:00,  4.54it/s]


This should take a little less than a minute. We are indexing and storing LangSmith documentation in a SKLearn vector database.

In [16]:
question = "What is LangSmith used for?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"website": "www.google.com"}})
print(ai_answer)

LangSmith is a platform designed for building production-grade LLM applications, allowing users to monitor and evaluate their applications for reliability and performance. It provides features for tracing application requests, evaluating application quality over time, and testing prompts with version control. LangSmith is framework agnostic, meaning it can be used with or without LangChain's frameworks.


### Let's take a look in LangSmith!

In [19]:
LANGSMITH_TRACING='true'
LANGSMITH_ENDPOINT='https://api.smith.langchain.com'
LANGSMITH_API_KEY='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'
LANGSMITH_PROJECT='pr-grumpy-daily-45'
OPENAI_API_KEY = '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'

from langgraph.prebuilt import create_react_agent


def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"


agent = create_react_agent(
    model="openai:gpt-5-mini",
    tools=[get_weather],
    prompt="You are a helpful assistant.",
)

# Run the agent
agent.invoke(
    {"messages": [{"role": "user", "content": "What is the weather in San Francisco?"}]}
)

{'messages': [HumanMessage(content='What is the weather in San Francisco?', additional_kwargs={}, response_metadata={}, id='1874a56a-ea12-4e02-9b07-798edbfc1cc6'),
  AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_KNdASjJCOHfByHaY5FMx8Fez', 'function': {'arguments': '{"city":"San Francisco"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 24, 'prompt_tokens': 142, 'total_tokens': 166, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-CMtSp7gf4kgav2K9dDfq5jeEXn4ud', 'service_tier': 'priority', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--3e7294a5-0e4b-43cd-99f3-5f558408138c-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'San Francisco'