# <a id='toc1_'></a>[Build Agentic Rag with langgraph](#toc0_)
https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/

**Table of contents**<a id='toc0_'></a>    
- [Build Agentic Rag with langgraph](#toc1_)    
- [Process Documents](#toc2_)    
- [Create the retrieval tools](#toc3_)    
  - [Store in a vector store (FAISS)](#toc3_1_)    
- [Grade documents](#toc4_)    
- [Rewrite question](#toc5_)    
      - [Test de reecriture de la question apres une reponse non concluante du tool](#toc5_1_1_1_)    
- [Generate an answer](#toc6_)    
      - [Test de la generation de reponse](#toc6_1_1_1_)    
- [Assemble the graph](#toc7_)    
- [Run the agentic RAG](#toc8_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [78]:
# %%capture --no-stderr
# %pip install -U --quiet langgraph "langchain[openai]" langchain-community langchain-text-splitters

In [79]:
import os

from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import ChatOpenAI
from langgraph.graph import MessagesState
from dotenv import load_dotenv

load_dotenv()

PDF_PATH = "./data"
EMBEDDING_MODEL_PATH = "./models/sentence_transformers"
SAMPLE_QUERY= "Quels sont les principaux composants du modèle Transformer ?"
SAMPLE_YES = "Les principaux composants du modèle Transformer sont l'encodeur, le décodeur, et le mécanisme d'attention."

# <a id='toc2_'></a>[Process Documents](#toc0_)

In [80]:
def load_pdf_from_directory(dir_path: str):
    """Load local pdf files as Langchain document objects"""
    documents = []
    for filename in os.listdir(dir_path):
        file_path = os.path.join(dir_path, filename)
        if os.path.isfile(file_path) and filename.lower().endswith('.pdf'):
            try:
                loader = PyPDFLoader(file_path)
                documents.extend(loader.load())
            except Exception as e:
                print(f"Error loading file {filename}: {e}")
    return documents




docs_before_split = load_pdf_from_directory(PDF_PATH)
print(docs_before_split[1].page_content)

1 Introduction
Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
transduction problems such as language modeling and machine translation [ 35, 2, 5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden
states ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently
sequential nature precludes parallelization within training examples, which becomes critical at longer
sequence lengths, as memory constraints limit batching across examples. Recent work has achieved
significant improvements in computational efficiency through facto

# <a id='toc3_'></a>[Create the retrieval tools](#toc0_)

In [81]:
huggingface_embedding_model = HuggingFaceBgeEmbeddings(
    model_name=EMBEDDING_MODEL_PATH,
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True},
)


# Initialiser le chunker
text_splitter = SemanticChunker(huggingface_embedding_model)

# Decoupe des data en chunks
docs_after_split = text_splitter.split_documents(docs_before_split)

## <a id='toc3_1_'></a>[Store in a vector store (FAISS)](#toc0_)

In [82]:
vectorstore = FAISS.from_documents(docs_after_split, huggingface_embedding_model)

In [83]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

In [84]:
from langchain_core.messages import HumanMessage, ToolMessage
from langchain.tools.retriever import create_retriever_tool

# Initialiser le modèle
response_model = ChatOpenAI(
    model=os.getenv("MODEL_NAME"),
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    openai_api_base=os.getenv("OPENAI_API_BASE"),
    max_tokens=8000,
    temperature=0,
)

# Créer l'outil de récupération
retriever_tool = create_retriever_tool(
    retriever=retriever,
    name="retrieve_blog_posts",
    description="Search and return information about attention is all you need paper.",
)

# Lier l'outil au modèle
model_with_tools = response_model.bind_tools([retriever_tool])


result = retriever_tool.invoke({"query": SAMPLE_QUERY})
print(result)

Table 3: Variations on the Transformer architecture. Unlisted values are identical to those of the base
model. All metrics are on the English-to-German translation development set, newstest2013. Listed
perplexities are per-wordpiece, according to our byte-pair encoding, and should not be compared to
per-word perplexities. N d model dff h d k dv Pdrop ϵls
train PPL BLEU params
steps (dev) (dev) ×106
base 6 512 2048 8 64 64 0.1 0.1 100K 4.92 25.8 65
(A)
1 512 512 5.29 24.9
4 128 128 5.00 25.5
16 32 32 4.91 25.8
32 16 16 5.01 25.4
(B) 16 5.16 25.1 58
32 5.01 25.4 60
(C)
2 6.11 23.7 36
4 5.19 25.3 50
8 4.88 25.5 80
256 32 32 5.75 24.5 28
1024 128 128 4.66 26.0 168
1024 5.12 25.4 53
4096 4.75 26.2 90
(D)
0.0 5.77 24.6
0.2 4.95 25.5
0.0 4.67 25.3
0.2 5.47 25.7
(E) positional embedding instead of sinusoids 4.92 25.7
big 6 1024 4096 16 0.3 300K 4.33 26.4 213
development set, newstest2013. We used beam search as described in the previous section, but no
checkpoint averaging. We present these re

In [85]:
# Tester l'appel à l'outil
input_message = HumanMessage(content=SAMPLE_QUERY)

# Obtenir la réponse du modèle
response = model_with_tools.invoke([input_message])
print("Model response:", response)

Model response: content="<think>\nOkay, the user is asking about the main components of the Transformer model. Let me recall what I know. The Transformer is a type of neural network used in natural language processing. It's different from RNNs because it uses self-attention mechanisms. The key components include the encoder and decoder, which process the input sequence, and the attention mechanism, which helps in finding relevant parts of the input. Also, there's the attention layer that allows the model to focus on specific parts of the input. I should make sure to mention these components clearly. Let me structure the answer to cover each part and explain how they work together.\n</think>\n\nLes principaux composants du modèle Transformer sont :\n\n1. **Encoder** :  \n   - Processe les séquences d'input en utilisant des **matrices de transition** (W) pour convertir les données en vecteurs d'attention.  \n   - Intégre des **matrices de transition** (W) pour transformer les données en 

In [86]:
import json
import re


def generate_query_or_response(state: MessagesState):
    messages = state["messages"]
    response = model_with_tools.invoke(messages)

    if hasattr(response, 'tool_calls') and response.tool_calls:
        tool_calls = response.tool_calls
        for tool_call in tool_calls:
            if tool_call['name'] == 'retrieve_blog_posts':
                # Exécuter l'outil avec les arguments
                tool_response = retriever_tool.invoke(tool_call['arguments'])
                # print("Tool response:", tool_response)  # Afficher la réponse de l'outil

                # Ajouter la réponse de l'outil aux messages
                tool_message = ToolMessage(content=str(tool_response), tool_call_id=tool_call['id'])

                # Réinvoker le modèle avec la réponse de l'outil
                final_response = model_with_tools.invoke(messages + [tool_message])

                # Retourner l'état final avec les messages mis à jour
                return MessagesState(messages=[final_response])
    else:
        # Si aucun appel d'outil n'est détecté dans les métadonnées, vérifier le contenu pour des balises <tool_call>
        tool_call_pattern = re.compile(r'<tool_call>(.*?)</tool_call>', re.DOTALL)
        matches = tool_call_pattern.findall(response.content)
        if matches:
            for match in matches:
                try:
                    tool_call = json.loads(match)
                    if tool_call['name'] == 'retrieve_blog_posts':
                        # Exécuter l'outil avec les arguments
                        tool_response = retriever_tool.invoke(tool_call['arguments'])
                        # print("Tool response:", tool_response)  # Afficher la réponse de l'outil

                        # Ajouter la réponse de l'outil aux messages
                        tool_message = ToolMessage(content=str(tool_response), tool_call_id='manual-tool-call')

                        # Réinvoker le modèle avec la réponse de l'outil
                        final_response = model_with_tools.invoke(messages + [tool_message])

                        # Retourner l'état final avec les messages mis à jour
                        return MessagesState(messages=[final_response])
                except json.JSONDecodeError as e:
                    print("Error decoding tool call:", e)
        else:
            return MessagesState(messages=[response])
    

# <a id='toc4_'></a>[Grade documents](#toc0_)

In [87]:
from pydantic import BaseModel, Field
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
import os

# Initialiser le modèle de notation
grader_model = ChatOpenAI(
    model=os.getenv("MODEL_NAME"),
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    openai_api_base=os.getenv("OPENAI_API_BASE"),
    max_tokens=8000,
    temperature=0,
)

#Pydantic class
class GradeDocuments(BaseModel):
    """Use a binary score for relevance check"""
    binary_score: str = Field(
        description="Relevance score: 'yes' if relevant, or 'no' if not relevant"
    )
    
    
    # Définir le prompt de notation amélioré
GRADE_PROMPT = (
    "You are a grader assessing the relevance of a retrieved document to a user question. \n"
    "Here is the retrieved document: \n\n {context} \n\n"
    "Here is the user question: {question} \n"
    "If the document contains keywords or semantic meaning related to the user question, grade it as relevant. \n"
    "Give a binary score 'yes' or 'no' to indicate whether the document is relevant to the question. \n"
    "Be very strict in your evaluation. Only return 'yes' if the document is clearly relevant. \n"
    "Here are some examples: \n"
    "Example 1: \n"
    "Document: Reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering. \n"
    "Question: What does Lilian Weng say about types of reward hacking? \n"
    "Score: yes \n"
    "Example 2: \n"
    "Document: Meow \n"
    "Question: What does Lilian Weng say about types of reward hacking? \n"
    "Score: no \n"
    "Now, evaluate the following document and question: \n"
)

In [88]:
def grade_documents(state: MessagesState):
    """Détermine si les documents sont pertinents ou non pour répondre à la question."""
    question = state["messages"][0].content
    context = state["messages"][-1].content
    prompt = GRADE_PROMPT.format(question=question, context=context)
    response = (
        grader_model
        .with_structured_output(GradeDocuments)
        .invoke([HumanMessage(content=prompt)])
    )
    score = response.binary_score

    if score == "yes":
        return "generate_answer"
    else:
        return "rewrite_question"


In [89]:
# Exemple de cas avec un score "yes"
input_yes = {
    "messages": [
        HumanMessage(content=SAMPLE_QUERY),
        ToolMessage(content="", tool_call_id="1", name="retrieve_blog_posts", additional_kwargs={"args": {"query": "types of reward hacking"}}),
        ToolMessage(content=SAMPLE_YES, tool_call_id="1")
    ]
}

# Appeler la fonction grade_documents
next_step_yes = grade_documents(input_yes)
print("Next step for relevant document:", next_step_yes)

Next step for relevant document: generate_answer


In [90]:
# Exemple de cas avec un score "no"
input_no = {
    "messages": [
        HumanMessage(content=SAMPLE_QUERY),
        ToolMessage(content="", tool_call_id="1", name="retrieve_blog_posts", additional_kwargs={"args": {"query": "types of reward hacking"}}),
        ToolMessage(content="blablabla", tool_call_id="1")
    ]
}

# Appeler la fonction grade_documents
next_step_no = grade_documents(input_no)
print("Next step for irrelevant document:", next_step_no)

Next step for irrelevant document: rewrite_question


# <a id='toc5_'></a>[Rewrite question](#toc0_)

In [91]:
REWRITE_PROMPT = (
    "Look at the input and try to reason about the underlying semantic intent / meaning.\n"
    "Here is the initial question:"
    "\n ------- \n"
    "{question}"
    "\n ------- \n"
    "Formulate an improved question:"
)

In [92]:
def rewrite_question(state: MessagesState):
    """Réécrit/Reformule la question originale de l'utilisateur."""
    messages = state["messages"]
    question = messages[0].content
    prompt = REWRITE_PROMPT.format(question=question)
    response = response_model.invoke([HumanMessage(content=prompt)])
    return {"messages": [HumanMessage(content=response.content)]}


#### <a id='toc5_1_1_1_'></a>[Test de reecriture de la question apres une reponse non concluante du tool](#toc0_)

In [93]:
# Exemple de cas avec un score "no"
input_no = {
    "messages": [
        HumanMessage(content=SAMPLE_QUERY),
        ToolMessage(content="", tool_call_id="1", name="retrieve_blog_posts", additional_kwargs={"args": {"query": "types of reward hacking"}}),
        ToolMessage(content="blablabla", tool_call_id="1")
    ]
}
response = rewrite_question(input_no)
print(response["messages"][-1].content)


<think>
Okay, the user wants me to formulate an improved question based on the initial input. The original question is "Quels sont les principaux composants du modèle Transformer ?" which translates to "What are the main components of the Transformer model?" 

First, I need to understand the user's intent. They probably want a more specific or better phrased question. The original question is straightforward but maybe too general. Let me think about possible improvements.

The user might be looking for a question that invites a detailed explanation of the components. Maybe they want to ask about the key parts of the Transformer model. So, instead of "What are the main components," perhaps a more precise phrasing would be better. For example, "What are the primary components of the Transformer model?" or "What are the main building blocks of the Transformer model?" 

I should also consider if there's a nuance in the original question. The term "principaux" (main) is good, but maybe "key

# <a id='toc6_'></a>[Generate an answer](#toc0_)

In [94]:
GENERATE_PROMPT = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer the question. "
    "If you don't know the answer, just say that you don't know. "
    "Use three sentences maximum and keep the answer concise.\n"
    "Question: {question} \n"
    "Context: {context}"
)

In [95]:
def generate_answer(state: MessagesState):
    """Génère une réponse."""
    question = state["messages"][0].content
    context = state["messages"][-1].content
    prompt = GENERATE_PROMPT.format(question=question, context=context)
    response = response_model.invoke([HumanMessage(content=prompt)])
    return {"messages": [response]}


#### <a id='toc6_1_1_1_'></a>[Test de la generation de reponse](#toc0_)

In [96]:
# Avec une reponse incomplete du tool
input_no = {
    "messages": [
        HumanMessage(content=SAMPLE_QUERY),
        ToolMessage(content="", tool_call_id="1", name="retrieve_blog_posts", additional_kwargs={"args": {"query": "types of reward hacking"}}),
        ToolMessage(content="blablabla", tool_call_id="1")
    ]
}

response = generate_answer(input_no)
response["messages"][-1].pretty_print()


<think>
Okay, the user is asking about the main components of the Transformer model. The context provided is "blablabla," which seems to be a placeholder or a random string. Since there's no actual information given about the Transformer components in the context, I can't answer based on that. I need to inform the user that I don't have the necessary information to provide the answer. But since the user wants three sentences, I'll make sure to mention that the context is empty and that I don't know the answer.
</think>

The context provided does not include information about the components of the Transformer model. I don't know the answer.


In [97]:
# Avec une reponse complete du tool
input_yes = {
    "messages": [
        HumanMessage(content=SAMPLE_QUERY),
        ToolMessage(content="", tool_call_id="1", name="retrieve_blog_posts"),
        ToolMessage(content=SAMPLE_YES,
                   tool_call_id="1")
    ]
}

response = generate_answer(input_yes)
response["messages"][-1].pretty_print()


<think>
Okay, the user is asking about the main components of the Transformer model. The context provided says the main components are the encoder, decoder, and attention mechanism. I need to make sure I use three sentences and keep it concise. Let me check the context again to confirm. Yep, it lists those three components. I should present that clearly.
</think>

Les principaux composants du modèle Transformer sont l'encodeur, le décodeur et le mécanisme d'attention.


# <a id='toc7_'></a>[Assemble the graph](#toc0_)

In [98]:
from langgraph.graph import START, END, StateGraph, MessagesState
from langgraph.prebuilt import ToolNode, tools_condition

# Initialiser le graphe
workflow = StateGraph(MessagesState)

# Ajouter les nœuds
workflow.add_node("generate_query_or_response", generate_query_or_response)
workflow.add_node("retrieve", ToolNode([retriever_tool]))
workflow.add_node("rewrite_question", rewrite_question)
workflow.add_node("generate_answer", generate_answer)

# Ajouter les arêtes
workflow.add_edge(START, "generate_query_or_response")

# Ajouter les arêtes conditionnelles
workflow.add_conditional_edges(
    "generate_query_or_response",
    tools_condition,
    {
        "tools": "retrieve",
        END: END
    }
)

workflow.add_conditional_edges(
    "retrieve",
    grade_documents,
    {
        "generate_answer": "generate_answer",
        "rewrite_question": "rewrite_question"
    }
)

workflow.add_edge("generate_answer", END)
workflow.add_edge("rewrite_question", "generate_query_or_response")

# Compiler le graphe
graph = workflow.compile()


# <a id='toc8_'></a>[Run the agentic RAG](#toc0_)

In [101]:
# Définir l'entrée initiale
initial_input = MessagesState(messages=[HumanMessage(content= "Qu'est-ce que le mécanisme d'attention dans le contexte du papier Attention Is All You Need ?" )])

# Exécuter le graphe avec l'entrée initiale
try:
    result = graph.invoke(initial_input)
    # print("Final response:", result)
    # Afficher tous les messages intermédiaires
    for message in result['messages']:
        print("Message:", message.content)
except Exception as e:
    print("An error occurred:", e)


Message: Qu'est-ce que le mécanisme d'attention dans le contexte du papier Attention Is All You Need ?
Message: <think>
Okay, the user is asking about the attention mechanism in the context of the paper "Attention Is All You Need." Let me start by recalling what I know. The paper is about attention mechanisms, specifically how to focus on certain parts of a document. The user mentioned isolated attentions from the word 'its' for positions 5 and 6, which suggests that the attention model is designed to highlight specific words.

The user also provided some references to other papers, like [25] Mitchell et al., which talk about building a large corpus. The references seem to be about different aspects of attention mechanisms, such as how to handle long sequences and the use of self-attention. 

I need to make sure I explain the attention mechanism clearly. The key points from the user's question are that the attention is focused on the word 'its' and that the model uses a neighborhood ar

In [100]:
graph.get_graph().print_ascii()
# or
# print(graph.get_graph().draw_mermaid())

                                  +-----------+                         
                                  | __start__ |                         
                                  +-----------+                         
                                        *                               
                                        *                               
                                        *                               
                         +----------------------------+                 
                         | generate_query_or_response |                 
                         +----------------------------+                 
                       .....            *            .....              
                  .....                 *                 .....         
               ...                      *                      .....    
    +----------+                        *                           ... 
    | retrieve |..                      *          

![alt text](final_graph.png)