# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!

  - 🤝 Breakout Room #2:
  1. Evaluating the LangGraph Application with LangSmith
  2. Adding Helpfulness Check and "Loop" Limits
  3. LangGraph for the "Patterns" of GenAI

# 🤝 Breakout Room #1

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

In [4]:
!pip install -qU langchain langchain_openai langchain-community langgraph arxiv duckduckgo_search==5.3.1b1

## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [5]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [6]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE4 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

<div style="border: 2px solid white; background: black; padding: 10px;">

#### 🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

Added:
- DuckDuckGoSearchRun()
- ArxivQueryRun()

</div>

In [7]:

from langchain.tools import tool

@tool
def add(a: int, b: int) -> int:
    """ Add two numbers"""
    print("adding()")
    return a + b

In [9]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun

tool_belt = [
   DuckDuckGoSearchRun(),
   ArxivQueryRun(),
]

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [10]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o", temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [11]:
model = model.bind_tools(tool_belt)

<div style="border: 2px solid white; background: black; padding: 10px;">

#### ❓ Question #1:

How does the model determine which tool to use?

#### ! Answer #1:

There are a couple of ways the model can know which tool to use:
- Explicit - if the tool is requested within the input the model could match on name or description
- Context - based on the question the model can identify the context of the question and find a best fit tool based on the tool name and description
- Complex - the language model may have been trained or finetuned to decide which tool to use

</div>

## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [12]:
from typing import TypedDict, Annotated, Text
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]
  context: Annotated[Text, None]

## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [13]:
from langgraph.prebuilt import ToolNode

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

tool_node = ToolNode(tool_belt)

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `tool_node` is a node which can call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [14]:
from langgraph.graph import StateGraph, END

uncompiled_graph = StateGraph(AgentState)

uncompiled_graph.add_node("agent", call_model)
uncompiled_graph.add_node("action", tool_node)

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [15]:
uncompiled_graph.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [16]:
def should_continue(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [17]:
uncompiled_graph.add_edge("action", "agent")

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [18]:
compiled_graph = uncompiled_graph.compile()

<div style="border: 2px solid white; background: black; padding: 10px;">

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

#### ! Answer #2:

Conceptually there is no limit to the number of times this could cycle. Technically it would probably end up using up sufficient resources etc to hit token limits, out of memory condition etc.

The most simple way to limit the number of cycles is to set a counter and track the number of repeats are performed. There are a number of different ways this could be implemented:
- Counter in the agent node - set a counter and increment each time the action node is called. When the counter reaches a specified number, the agent then ends the cycle even if the other conditions have not been met.
- Counter within action node - the action itself could keep track of how many times it has been called and raise an exception when the limit is reached
- Counter within the conditional edge - the should-continue edge could check the counter and then decide to end

</div>

In [20]:
from langchain_core.messages import AIMessage, ToolMessage
from pprint import pprint

def print_messages(values):
    for message in values["messages"]:
        if isinstance(message, AIMessage):
            print("AI Message Content:")
            print(message.content)
            print("\nadditional_kwargs:")
            pprint(message.additional_kwargs, indent=4)
            print("\nresponse_metadata:")
            pprint(message.response_metadata, indent=4)
            print("\nusage_metadata:")
            pprint(message.usage_metadata, indent=4)
            print("\n" + "-" * 50 + "\n")
        
        elif isinstance(message, ToolMessage):
            print("Tool Message Content:")
            print(message.content)
            print(f"\nTool Name: {message.name}")
            print(f"Tool Call ID: {message.tool_call_id}")
            print("\n" + "-" * 50 + "\n")

## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [21]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="Who is the current captain of the Winnipeg Jets?")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")
        # print_messages(values)


Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_SK038R3spy4IadOxlPvigp6T', 'function': {'arguments': '{"query":"current captain of the Winnipeg Jets 2023"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 156, 'total_tokens': 181}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-bf647c0d-8f70-44d5-993f-0a441ad3e2db-0', tool_calls=[{'name': 'duckduckgo_search', 'args': {'query': 'current captain of the Winnipeg Jets 2023'}, 'id': 'call_SK038R3spy4IadOxlPvigp6T', 'type': 'tool_call'}], usage_metadata={'input_tokens': 156, 'output_tokens': 25, 'total_tokens': 181})]



Receiving update from node: 'action'
[ToolMessage(content='Adam Lowry was named captain of the Winnipeg Jets on Tuesday. ... Hughes named Canucks captain, replaces Horvat Sep 11, 2023. 

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [22]:
inputs = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using DuckDuckGo.")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        if node == "action":
          print(f"Tool Used: {values['messages'][0].name}")
        print(values["messages"])
        if node == "agent":
           if values["messages"][0].tool_calls:
              print(values["messages"][0].tool_calls)

        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_xrBZwG3jb7tcDypjbHWDMFMC', 'function': {'arguments': '{"query": "QLoRA"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_JmDPfsHyDbJL26jAmJ1OfuoP', 'function': {'arguments': '{"query": "latest Tweet"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 173, 'total_tokens': 223}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-04b67dd1-7be9-458d-88e5-aa8e9e7cdb54-0', tool_calls=[{'name': 'arxiv', 'args': {'query': 'QLoRA'}, 'id': 'call_xrBZwG3jb7tcDypjbHWDMFMC', 'type': 'tool_call'}, {'name': 'duckduckgo_search', 'args': {'query': 'latest Tweet'}, 'id': 'call_JmDPfsHyDbJL26jAmJ1OfuoP', 'type': 'tool_call'}], usage_metadata={'input_tokens': 173, 'output_tokens': 50, 'total_tokens': 223})]
[{'name': '

Better formatted responses: 

<table>
<tr>
<td width="20%" style="vertical-align: top;">

Receiving update from node: 'agent'

AI Message Content: (No direct content, tools were invoked)

Tool Calls:
1. Tool Name: arxiv
   - Arguments: {"query": "QLoRA"}
   - Call ID: call_5SHb1kz1wVy8u4U5TQBpeSUL
   - Type: function

2. Tool Name: duckduckgo_search
   - Arguments: {"query": "Tim Dettmers latest Tweet"}
   - Call ID: call_HXyuGGbbhkfwkHni4dhZsfFw
   - Type: function

3. Tool Name: duckduckgo_search
   - Arguments: {"query": "Mike Lewis latest Tweet"}
   - Call ID: call_YT2OwPJfYInecDmBMdI9znl6
   - Type: function

4. Tool Name: duckduckgo_search
   - Arguments: {"query": "Yunqing Dou latest Tweet"}
   - Call ID: call_sW2gtUpcsPUsGIvKKwfxFMVc
   - Type: function

5. Tool Name: duckduckgo_search
   - Arguments: {"query": "Armen Aghajanyan latest Tweet"}
   - Call ID: call_odsmxts9jkssZD67y0xucSDO
   - Type: function

Response Metadata:
- Model Name: gpt-4o-2024-05-13
- System Fingerprint: fp_157b3831f5
- Finish Reason: tool_calls
- Token Usage:
  - Prompt Tokens: 173
  - Completion Tokens: 121
  - Total Tokens: 294
  </td>
  <td width = "20%" style="vertical-align: top;">
Receiving update from node: 'action'


Tool Used: arxiv
Content:
Published: 2023-05-23
Title: QLoRA: Efficient Finetuning of Quantized LLMs
Snippet: Published: 2023-05-23 Title: QLoRA: Efficient Finetuning of Quantized LLMs Authors: Tim Dettmers, Artidoro ...

Published: 2024-05-27
Title: Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Snippet: Published: 2024-05-27 Title: Accurate LoRA-Finetuning Quantization of LLMs via Information Retention ...

Published: 2024-06-12
Title: Exploring Fact Memorization and Style Imitation in LLMs Using QLoRA: An Experimental Study and Quality Assessment Methods
Snippet: Published: 2024-06-12 Title: Exploring Fact Memorization and Style Imitation in LLMs Using QLoRA: An ...

Tool Used: duckduckgo_search
Content:
Snippet: Tim Dettmers, Ruslan A. Svirschevski, Vage Egiazarian, Denis Kuznedelev, ... (SpQR), a new compressed ...

Tool Used: duckduckgo_search
Content:
Snippet: Michael Lewis, the financial journalist who wrote a controversial book about the downfall of Sam Ban ...

Tool Used: duckduckgo_search
Content:
Snippet: Translator: Dj22031 Editor: Dj22031 Advance chapters available for patrons on Patreon. And a chapter ...

Tool Used: duckduckgo_search
Content:
Snippet: Armen Aghajanyan introduced Chameleon, FAIR's latest work on multimodal models, training 7B and 34B mo ...
  </td>
  <td width="20%" style="vertical-align: top;">
Receiving update from node: 'agent'
<br />  

AI Message Content:
Snippet: ### QLoRA Paper on Arxiv\nTitle: QLoRA: Efficient Finetuning of Quantized LLMs \nAuthors: Tim D ...

Latest Tweets from Authors:
Tim Dettmers:

Tweet Snippet: "I'm excited to announce our latest paper, introducing a family of early-fusion token-in token-out ...
Source: Tim Dettmers' Twitter
Mike Lewis:

Tweet Snippet: "Best-selling author Michael Lewis, who had unprecedented access to FTX founder Sam Bankman-Fr ...
Source: Mike Lewis' Twitter
Yunqing Dou:

Tweet Snippet: "The temperature seems to have come down! After a night of caring for a sick child at the hospi ...
Source: Yunqing Dou's Twitter
Armen Aghajanyan:

Tweet Snippet: "I'm excited to announce our latest paper, introducing a family of early-fusion token-in token-out ...
Source: Armen Aghajanyan's Twitter

Response Metadata:
- Model Name: gpt-4o-2024-05-13
- System Fingerprint: fp_157b3831f5
- Finish Reason: stop
- Token Usage:
   - Prompt Tokens: 2569
   - Completion Tokens: 484
   - Total Tokens: 3053

  </td>


<div style="border: 2px solid white; background: black; padding: 10px;">

#### 🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

1. The agent received the state containing the question
2. It called the gpt-4o model to identify the tool calls needed. These are all function calls:
    1. arxiv with the query "QLoRA"
    2. duckduckgo_search with the query "Tim Dettmers latest Tweet"
    3. duckduckgo_search with the query "Mike Lewis latest Tweet"
    4. duckduckgo_search with the query "Yunqing Dou latest Tweet"
    5. duckduckgo_search with the query "Armen Aghajanyan latest Tweet" 
3. The agent then called should_continue conditional edge and due to the presence of the items in the tool_calls array the tools action was called
4. The action node took the specified actions and updated the messages with data from the results of the actions
5. The agent called the gpt-40 LLM with the augmented data and the question and got the response
6. The agent called the should_continue conditional edge and due to no tool calls required the process ended

## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [None]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | compiled_graph | parse_output

In [None]:
agent_chain.invoke({"question" : "What is OOUX?"})

'OOUX stands for Object-Oriented User Experience. It is a design methodology that focuses on structuring and organizing digital products around the objects that users interact with, rather than around tasks or features. The goal of OOUX is to create a more intuitive and user-friendly experience by aligning the design with the way users naturally think about and understand the world.\n\nKey principles of OOUX include:\n\n1. **Object Mapping**: Identifying the key objects that users will interact with in the system.\n2. **Object Modeling**: Defining the attributes and relationships of these objects.\n3. **Object Prioritization**: Determining which objects are most important to the user experience.\n4. **Object Interaction**: Designing how users will interact with these objects.\n\nBy focusing on objects, OOUX aims to create a more cohesive and understandable user experience, making it easier for users to navigate and use digital products.'


<div style="border: 2px solid white; background: black; padding: 10px;">

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```
</div>

<div style="border: 2px solid white; background: black; padding: 10px;">

#### 🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

Done 

</div>

In [None]:
questions = [
    "What are the four key components of the ORCA methodology in OOUX?",
    "What role do Calls-to-Action or CTAs play in the OOUX ORCA methodology?",
    "How does OOUX ORCA methodology differ from traditional UX design approaches",
    "Who came up with the OOUX ORCA methodology used in OOUX?",
    "What is noun-foraging in the context of OOUX, and why is it important?",
    "What are the key outputs from the OOUX ORCA methodology?"
]

answers = [
    {"must_mention" : ["Object", "Relationship"]},
    {"must_mention" : ["role", "user"]},
    {"must_mention" : ["object", "relationship"]},
    {"must_mention" : ["Sophia", "Prater"]},
    {"must_mention" : ["objects", "requirements"]},
    {"must_mention" : ["Object Map", "Calls-to-Action"]},
]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [None]:
from langsmith import Client

client = Client()
dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the OOUX to Evaluate RAG."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

<div style="border: 2px solid white; background: black; padding: 10px;">

#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

#### ! Answer #3:
The questions and answers are associated through their position in their respective arrays, or their index.

This can be problematic if the two are created manually. If an entry is accidentally inserted into one array and not the other, the answers would now be incorrect.

So maintenance and updating could be a problem and possible cause of introducing errors into the process

This could be mitigated by:
- adding a unique identifier to the question and answer - this too can become a maintenance problem
- use a dictionary to incorporate question-answer pairing - this is a better solution and provides easier maintenance


</div>

### Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [None]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

<div style="border: 2px solid white; background: black; padding: 10px;">

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

This evaluation has some problems:
- It requires all of the phrases in the must mention to be present in the response - it is all or nothing
- It is a case sensitive search - so that there must be an exact match in spelling and casing - this could be problematic as the word is the same no matter how it is cased. Also if punctuation is different
- Spelling errors especially with people's names would cause mismatches
- A required word could be wholy contained within another word and be incorrectly assigned as correct
- It is also assuming the presence of the word is correct - it could be a lucky error

Providing checks that are case insensitive and are more fuzzy searches could improve this metric

</div>

Now that we have created our custom evaluator - let's initialize our `RunEvalConfig` with it!

In [None]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention],
)

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [None]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'RAG Pipeline - Evaluation - e9175952' at:
https://smith.langchain.com/o/c97b0028-7cab-5f76-a748-1369ba450931/datasets/d3b782a8-33d4-4bbb-9f21-c4615cbd7854/compare?selectedSessions=b8e019b0-8e69-4c75-936a-922247d80d26

View all tests for Dataset Retrieval Augmented Generation - Evaluation Dataset - 6de61221 at:
https://smith.langchain.com/o/c97b0028-7cab-5f76-a748-1369ba450931/datasets/d3b782a8-33d4-4bbb-9f21-c4615cbd7854
[------------------------------------------------->] 6/6

Unnamed: 0,feedback.must_mention,error,execution_time,run_id
count,6,0.0,6.0,6
unique,2,0.0,,6
top,True,,,0e2cac21-f3e7-4e44-8b42-e257b75d41cc
freq,5,,,1
mean,,,4.80136,
std,,,3.175931,
min,,,1.104858,
25%,,,3.500257,
50%,,,4.08973,
75%,,,5.283324,


{'project_name': 'RAG Pipeline - Evaluation - e9175952',
 'results': {'ac2d04ae-a87d-4711-908a-748804fa48e2': {'input': {'question': 'What are the four key components of the ORCA methodology in OOUX?'},
   'feedback': [EvaluationResult(key='must_mention', score=True, value=None, comment=None, correction=None, evaluator_info={}, feedback_config=None, source_run_id=UUID('f328c84f-11fe-4ce7-ae49-f1c53e091a45'), target_run_id=None)],
   'execution_time': 3.518624,
   'run_id': '0e2cac21-f3e7-4e44-8b42-e257b75d41cc',
   'output': 'The ORCA methodology in Object-Oriented User Experience (OOUX) is a structured approach to designing user experiences that are intuitive and user-centered. ORCA stands for Objects, Relationships, Calls-to-Action, and Attributes. Here are the four key components:\n\n1. **Objects**: These are the primary nouns or entities within the system. Objects represent the core elements that users interact with. Identifying the key objects is the first step in understanding th

## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add to our existing conditional edge to obtain the behaviour we desire.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [None]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

<div style="border: 2px solid white; background: black; padding: 10px;">

#### 🏗️ Activity #5:

Please write markdown for the following cells to explain what each is doing.

</div>

We create a graph which is bound to our AgentState

The two nodes (agent and action) are created in the graph using the call_model and tool_node nodes that were defined earlier

In [None]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", call_model)
graph_with_helpfulness_check.add_node("action", tool_node)

We create the starting or entry point for the graph into the agent node

In [None]:
graph_with_helpfulness_check.set_entry_point("agent")

We are creating a conditional edge to determine whether to call the action node and if not to determine if the answer is helpful. 
If it is helpful, it will return "end" causing the cycle to end
If it is not helpful, it will return "continue" causing the cycle to repeat


In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

def tool_call_or_helpful(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    return "end"

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is helpful or not. Please indicate a helpful answer with a 'Y' and an unhelpful as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4")

  helpfulness_chain = prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    return "end"
  else:
    return "continue"

<div style="border: 2px solid white; background: black; padding: 10px;">

#### 🏗️ Activity #4:

Please write what is happening in our `tool_call_or_helpful` function!

The function tool_call_or_helpful is called and is passed the state
- it checks the last message
- if the last message contains tool calls, it returns "action", resulting in the action node being called and the tools executed
- if there are more than 10 messages, it assumes we are in an infinite loop and it returns "END"
- the first message (the question) and the final response are added to a template, passed through the chain to the model to determine if the answer is helpful
- if the answer was helpful it returns "end" and the cycle ends
- if the answer was not helpful it returns "continue" and the cucle continues


</div>

The conditional edge tool_call_or_helpful is attached to the agent node. 

If it returns "continue" the agent node is called
If it returns "action" the action node is called
If it returns "end" the cycle is ended

In [None]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    tool_call_or_helpful,
    {
        "continue" : "agent",
        "action" : "action",
        "end" : END
    }
)

Adding an edge to a graph that has already been compiled. This will not be reflected in the compiled graph.


ValueError: Branch with name `tool_call_or_helpful` already exists for node `agent`

An edge is added between the action and the agent

In [None]:
graph_with_helpfulness_check.add_edge("action", "agent")

Adding an edge to a graph that has already been compiled. This will not be reflected in the compiled graph.


The graph is compiled

In [None]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()

We create the state and populate it with the qeustion

the agent is called passing in the inputs and requesting it to operate in stream mode
As chunks are returned they are displayed

In [None]:
inputs = {"messages" : [HumanMessage(content="Related to machine learning, what is LoRA? Also, who is Tim Dettmers? Also, what is Attention?")]}

async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_pjS5V8V4qThLOJeitCdeIqIN', 'function': {'arguments': '{"query": "LoRA machine learning"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_zahkkZUXV474ITd26LqJ9nX9', 'function': {'arguments': '{"query": "Tim Dettmers"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_Q7EBvOMZv70gLUsYvupG7VXW', 'function': {'arguments': '{"query": "Attention in machine learning"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 76, 'prompt_tokens': 171, 'total_tokens': 247}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-f8ce237e-76d8-4051-8057-907dc3a77356-0', tool_calls=[{'name': 'duckduckgo_search', 'args': {'query': 'LoRA machine learning'}, 'id': 'call_pjS5V8V4qThLOJeitCdeIqIN', 'type': 'tool_call'}, 

### Task 4: LangGraph for the "Patterns" of GenAI

Let's ask our system about the 4 patterns of Generative AI:

1. Prompt Engineering
2. RAG
3. Fine-tuning
4. Agents

In [None]:
patterns = ["prompt engineering", "RAG", "fine-tuning", "LLM-based agents"]

In [None]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  print(messages["messages"][-1].content)
  print("\n\n")

Would you like more detailed information on any specific aspect of prompt engineering or its applications?



If you have any more questions or need further details on RAG or related topics, feel free to ask!



Would you like more detailed information on any specific aspect of fine-tuning or its applications?



Would you like more detailed information on any specific aspect of LLM-based agents or their applications?



