# Introduction to LangChain and RAG

Welcome to the first part of our workshop! In this session, we'll explore how to build AI-powered applications using **LangChain**, a popular framework for developing applications with Large Language Models (LLMs). We'll start with a simple chatbot and then enhance it with Retrieval Augmented Generation (RAG).

## Setting Up Our Environment

First, we need to set up our environment. We'll use OpenAI's models, so we need an API key. You can define your `OPENAI_API_KEY` in the `.env` file.

The code retrieve the key and sets some global configurations:
- `LLM_MODEL`: The specific model we'll use
- `LLM_TEMPERATURE`: Controls randomness in responses (0 means very deterministic)

In [None]:
import os

In [None]:
if not os.environ.get("OPENAI_API_KEY"):
    raise ValueError("Please set OPENAI_API_KEY environment variable")

LLM_MODEL = "gpt-4o-mini"
LLM_TEMPERATURE = 0

## Building a Simple ChatBot

Let's start with creating a basic chatbot using **LangChain**. We'll use:
- `ChatOpenAI`: The interface to OpenAI's chat models
- `SystemMessage`: Defines the bot's behavior and role
- `HumanMessage`: Represents user input

Our chatbot will act as a Financial Analyst. We'll create it by:
1. Instantiating the model
2. Defining a system prompt that sets the bot's role
3. Sending a user query and getting a response with `.invoke()`

This demonstrates the basic pattern of LLM interactions: prompt → response.

In [None]:
from IPython.display import Markdown
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI

In [None]:
# TODO: Create a ChatOpenAI instance with the LLM model and temperature
base_model = ...

In [None]:
BASE_PROMPT = """
You are a Financial Analyst. Do your best to help the client with their request based on your expertise. Give a succinct and clear response.
"""

In [None]:
# Request from the client
request = "I want to invest in the technology sector. Can you please define an investment strategy?"

# Message list for the base model
messages = [
    SystemMessage(BASE_PROMPT),
    HumanMessage(request),
]

# Invoke the model with the messages
response = base_model.invoke(messages)

In [None]:
Markdown(response.content)

## Understanding Retrieval Augmented Generation (RAG)

Now comes the exciting part! RAG is a technique that enhances LLM responses by giving them access to external knowledge. Instead of relying solely on the model's training data, we can provide relevant information from our own database.

### Vector Database Setup

To implement RAG, we need:
1. A collection of documents (in our case, a currated set of 1'000 articles from Bloomberg financial news)
2. A way to search these documents efficiently (vector database)
3. A function to retrieve relevant information based on queries

Here we use:
- `Chroma`: A vector database for storing and retrieving documents
- `OpenAIEmbeddings`: Converts text into vector representations

Let's first set up the global configuration for our retriever.

In [None]:
EMBEDDING_MODEL = "text-embedding-3-small"
RETRIEVAL_K = 3

We'll then define helper functions to load our documents and store them in a vector store.

In [None]:
import pickle

from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

In [None]:
def load_documents(pickle_filepath: str) -> list[Document]:
    """Load documents from a pickle file."""
    with open(pickle_filepath, "rb") as file:
        return pickle.load(file)


def initialize_vector_store(document_chunks: list[Document]) -> Chroma:
    """Reset the Chroma collection and initialize a vector store using document chunks."""
    Chroma().reset_collection()
    embedding_model = OpenAIEmbeddings(model=EMBEDDING_MODEL)
    return Chroma.from_documents(documents=document_chunks, embedding=embedding_model)

Let's load our documents and inspect the first one.

In [None]:
data_dir = "../data/"
data_file = "bloomberg_financial_news_1k.pkl"

# Load the documents from the pickle file
documents = load_documents(os.path.join(data_dir, data_file))

In [None]:
doc_str = f"{documents[0].metadata['Headline']}\n\n{documents[0].page_content}"
Markdown(doc_str)

### Initializing the Vector Store and the Retriever

The vector store and retriever are key components of our RAG system. Here's what happens in this section:

- Initialize a new Chroma vector store with these documents
- Create a retriever that will fetch the `RETRIEVAL_K` most relevant documents according to their embedding

In [None]:
# Initialize the vector store with the documents
vector_store = initialize_vector_store(documents)

# Create a retriever instance from the vector store
retriever = vector_store.as_retriever(search_kwargs={"k": RETRIEVAL_K})

The retriever acts like a smart search engine - when given a question or topic, it returns the most relevant documents from our database. It does so by finding the documents similar embeddings to the query. In LangChain, this is also done with `.invoke()`. Let's try an example.

In [None]:
retrieval_query = "tech sector market trends"

# TODO: Invoke the retriever with the retrieval query
retrieved_documents = ...

In [None]:
display(retrieved_documents)

### Creating the RAG System

We can now augment our basic chatbot by providing it access to the retriever using **LangChain** tools, which allow the model to:
- Query the document database if needed
- Provide an answer based on the retrieved documents

#### Creating a tool with LangChain

We can create a tool using the `@tool` decorator from **LangChain** and provide it to the model using `.bind_tools()`. The model will receive all the relevant information about the tool thanks to the decorator. This way it knows how it works and can decide when to use it.

In [None]:
from langchain_core.tools import tool

In [None]:
@tool
def retrieval(retrieval_query: str) -> list[Document]:
    """Retrieve documents based on a query."""
    # TODO: Return documents from the retriever
    return ...


# Create a list of tools and a dictionnary of tool functions by name
tools = [retrieval]
tools_by_name = {tool.name: tool for tool in tools}

In [None]:
RAG_PROMPT = """
You are a Financial Analyst with access to a Bloomberg Financial News database.

Query the database to help the client with their request. Give a succinct and clear response based on the information you find.
"""

# TODO: Create the RAG model by binding the base model with the retrieval tool
rag_model = ...

In [None]:
request = "I want to invest in the technology sector. Can you please define an investment strategy?"

# TODO: Message list for the RAG model
messages = ...

# TODO: Invoke the RAG model with the messages
rag_response = ...

Let's check the answer. As we can see its content is empty, but a tool call has been made.

In [None]:
Markdown(f"Content: {rag_response.content}\n\nTool Calls: {rag_response.tool_calls}")

Let's use the retrieval tool to retrieve documents following the model's query.

In [None]:
# Check if the RAG model response contains tool calls
if rag_response.tool_calls:
    # Get the first tool call from the response
    tool_call = rag_response.tool_calls[0]

    # Get the tool from the tool call
    tool = tools_by_name[tool_call["name"]]

    # Invoke the tool with the tool call arguments
    documents = tool.invoke(tool_call["args"])

    # Combine the retrieved documents into a single string
    documents_str = "\n\n".join(
        [f"{doc.metadata['Headline']}\n\n{doc.page_content}\n" for doc in documents]
    )

In [None]:
Markdown(documents_str)

We can now add the tool's output to the message chain with `ToolMessage`, so the model can answer based on the retrieved documents.

*Note: Here we use the base model instead of the RAG model to limit our agent to one retrieval call. A fully autonomous agent could decide to make subsequent calls to best answer the request*

In [None]:
from langchain_core.messages import ToolMessage

In [None]:
# TODO: Message list with the retrieved documents for the base model
# - RAG System prompt
# - Client request
# - RAG model response
# - Retrieval tool response
messages = [
    ...,
    ...,
    ...,
    ToolMessage(..., tool_call_id=tool_call["id"]),
]

# TODO: Invoke the base model with the messages
response = ...

In [None]:
Markdown(response.content)

## Practical Tips

- Watch the temperature setting: Lower values (like 0) are usually better for factual responses
- Pay attention to the number of retrieved documents (`RETRIEVAL_K`): More isn't always better
- The system prompt is crucial: It sets the context and behavior of your bot

## Conclusion

You just learned how to create a chatbot and augment it with a retrieval tool using **LangChain**, this concludes the first part of our workshop!

In the next section, we'll discover **LangGraph** and show how it allows to build sophisticated and flexible LLMs workflows.