# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!

- 🤝 Breakout Room #2:
  1. Evaluating the LangGraph Application with LangSmith
  2. Adding Helpfulness Check and "Loop" Limits
  3. LangGraph for the "Patterns" of GenAI

# 🤝 Breakout Room #1

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies


## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [2]:
os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY")

In [3]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE7 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain-community/tree/main/libs/community) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Tavily Search Results](https://github.com/langchain-ai/langchain-community/blob/main/libs/community/langchain_community/tools/tavily_search/tool.py)
- [Arxiv](https://github.com/langchain-ai/langchain-community/blob/main/libs/community/langchain_community/tools/arxiv/tool.py)

#### 🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

In [4]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.tools.arxiv.tool import ArxivQueryRun

tavily_tool = TavilySearchResults(max_results=5)

tool_belt = [
    tavily_tool,
    ArxivQueryRun(),
]

  tavily_tool = TavilySearchResults(max_results=5)


### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [5]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4.1-nano", temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [6]:
model = model.bind_tools(tool_belt)

#### ❓ Question #1:

How does the model determine which tool to use?

##### ✅ Answer:
1. Tools have descriptions. Model can select a tool that has a description that matches task at hand.
2. But more importantly, we write a prompt that tells the model what tool to use in what situation. 

## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [7]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [8]:
from langgraph.prebuilt import ToolNode

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

tool_node = ToolNode(tool_belt)

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `tool_node` is a node which can call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [9]:
from langgraph.graph import StateGraph, END

uncompiled_graph = StateGraph(AgentState)

uncompiled_graph.add_node("agent", call_model)
uncompiled_graph.add_node("action", tool_node)

<langgraph.graph.state.StateGraph at 0x11571d160>

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [10]:
uncompiled_graph.set_entry_point("agent")

<langgraph.graph.state.StateGraph at 0x11571d160>

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [11]:
def should_continue(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

<langgraph.graph.state.StateGraph at 0x11571d160>

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [12]:
uncompiled_graph.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x11571d160>

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [13]:
simple_agent_graph = uncompiled_graph.compile()

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

##### ✅ Answer:
1. By default, LangGraph sets recursion_limit to 25, which means that, by default, our graph will stop after 25 cycles. LangGraph shows GraphRecursionLimit error when we hit the maximum number.
2. We can set recursion_limit to a different number, thus, imposing a limit of our own choosing based on our use case and expected number of cycles. 


## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [14]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="Who is the current captain of the Winnipeg Jets?")]}

async for chunk in simple_agent_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zCLk6w4W2zG90MkUMklmZOwn', 'function': {'arguments': '{"query":"current captain of the Winnipeg Jets"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 162, 'total_tokens': 185, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': None, 'id': 'chatcmpl-BsSxrFOGOx41X8ACwEd56vyMQfYoW', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--a6522bd8-0ed4-480d-80c0-ea0f9dfbdd21-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'current captain of the Winnipeg Jets'}, 'id': 'call_zCLk6w4W2zG90MkUMklmZOwn', 'type': 

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [15]:
inputs = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using Tavily!")]}

async for chunk in simple_agent_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        if node == "action":
          print(f"Tool Used: {values['messages'][0].name}")
        print(values["messages"])

        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_N9W6wJL964vYnfUKdFCOHsMG', 'function': {'arguments': '{"query": "QLoRA"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_qKsIvqaZ9ne6f70tPTDVgNqN', 'function': {'arguments': '{"query": "latest Tweet of author"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 54, 'prompt_tokens': 178, 'total_tokens': 232, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': None, 'id': 'chatcmpl-BsSz8fhlOhpB6jfBFY8S79UIFEsA0', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--a38efa5d-910a-4aa5-becc-5721c3eacea7-0', tool_calls=[{'name': 'arxiv', 'args': {'que

#### 🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

##### ✅ Answer:
1. Agent used two tools in parallel: Arxiv search (with query "QLoRA") and TavilySearch (with query "latest Tweet of author").
1.1 The Arxiv search tool usage is correct.
1.2 The TavilySearch tool usage is not correct; this is possibly due to it being run in parallel, so the model has no Arxiv information to rely on when specifying the query for TavilySearch. 
2. Agent received action outputs.
2.1 The Arxiv search tool usage output is correct.
2.2 The TavilySearch search tool usage output is meaningless within the context of initial query because it returned random popular twitter accounts (JK Rowling, author2author, etc.)
3. Agent synthesised recieved information into one cohesive answer to the user. Agent correctly realised that received twitter information is unrelated to the original query, so it offered to search for tweets once again in the final answer.

Overall, the agent used tools, received output, and then provided an answer. However, this lead to the incorrect approach as agent needed to first use Arxiv tool, and only then TavilySearch. Because agent didn't do that and instead run both in parallel, agent retrieved meaningless twitter accounts unrelated to authors of the found paper on Arxiv.

# 🤝 Breakout Room #2

## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [16]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain_with_formatting = convert_inputs | simple_agent_graph | parse_output

In [17]:
agent_chain_with_formatting.invoke({"question" : "What is RAG?"})

"RAG can refer to different concepts depending on the context. Could you please specify whether you're asking about RAG in the context of project management, machine learning, or another field?"

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```

#### 🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

##### ✅ Answer:
Added multiple Questions-Answers without removing the original ones.

In [18]:
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?",
    "What is the broader impact of the QLoRA system?",
    "How was the QLoRA system evaluated?",
    "Where to authors of the QLoRA paper work?",
    "What dataset was used to tune Guanaco 65B?",
    "What considerations should the reader of the QLoRA paper take into account?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
    {"must_mention" : ["performance", "mobile"]},
    {"must_mention" : ["MMLU", "benchmark"]},
    {"must_mention" : ["University of Washington"]},
    {"must_mention" : ["OASST1"]},
    {"must_mention" : ["multilingual", "cross-entropy"]}
]


Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [19]:
from langsmith import Client

client = Client()

dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

{'example_ids': ['b616c35a-715f-4481-9441-4af4223cd951',
  '6ba2b2b9-33f2-4c9d-8cef-fb05b293ba52',
  '90c01653-bdc7-4d51-beb2-6a67e2204fde',
  'b3a4daa5-72d3-486c-94ee-10083acbf3ea',
  '141b8c94-af30-4146-8a5b-6312e1736feb',
  'fe738f9f-2f42-4382-93fd-92f30ee3e1f4',
  '061e5706-7adf-454e-a5ab-ce60a705b01a',
  '4c505596-4eae-4b1b-8569-f47eeb2e0074',
  '3c203d8f-0410-47aa-846b-592cd15e8b3a',
  '3f58a520-5ad0-4384-b8ae-59993b852216',
  'c0421ca0-8699-413e-893c-844d815c18dd'],
 'count': 11}

#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

##### ✅ Answer:
1. From the technical standpoint, Quesions and Answers are associated based on the index in the list, which makes it really easy to mess things up. Even one addition/deletion of question and/or answer without parallel change in another list will make every question and answer incorrect as indices will be out of sync. 
2. From the AI Eval standpoint, correctness is judged based on specific words being used in the response. This will lead to significant errors in cases where multiple appropriate synonyms are possible. We are removing degrees of freedom from the LLM by being overly specific in our AI Eval. This may be important in some instances, but overall it tells us nothing about the true correctness (synonyms can be used; correct words may be used with incorrect explanations, etc.)


### Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [21]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

##### ✅ Answer:
1. We can change how score is calculated to allow partially correct answers get a non-zero score.
1.1 Right now, "score = all(...)" means that score is 1 when all phrases are matched in the answer, and 0 if at least one is not matched. 
1.2 We can change it to "score = sum(...)/len(required)". This will allow partially correct answer to contribute to the score (e.g., if 1 out of 2 required is mentioned in the answer, score will now be 0.5)
2. (Not related to the metric itself) I'd prefer to use LLM as a Judge here to judge overall correctness, even if the specific phrase is not in the answer. 

P.S.
- Alternatively to p.1, we can change our dataset, so that each question has just one associated correct phrase.  

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [22]:
experiment_results = client.evaluate(
    agent_chain_with_formatting,
    data=dataset_name,
    evaluators=[must_mention],
    experiment_prefix=f"Search Pipeline - Evaluation - {uuid4().hex[0:4]}",
    metadata={"version": "1.0.0"},
)

View the evaluation results for experiment: 'Search Pipeline - Evaluation - c525-cd206bba' at:
https://smith.langchain.com/o/190538f0-9737-4132-9112-5b0de958d02b/datasets/af84022b-6fbe-4308-b1c3-d9ef34b3abd7/compare?selectedSessions=b223985c-2bc8-4d46-844d-712acd816e1d




0it [00:00, ?it/s]

In [23]:
experiment_results

## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add to our existing conditional edge to obtain the behaviour we desire.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [24]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

#### 🏗️ Activity #5:

Please write markdown for the following cells to explain what each is doing.

##### YOUR MARKDOWN HERE
##### ✅ Answer:
The cell below:
1. Initialises an empty graph 
2. Adds two nodes on the graph:
2.1 Agent node (invoking LLM)
2.2 Action node (using tools)

At the moment, the nodes are not connected yet, so our graph lacks one key piece of being a graph — edges between nodes. 

In [41]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", call_model)
graph_with_helpfulness_check.add_node("action", tool_node)

<langgraph.graph.state.StateGraph at 0x1216c0550>

##### YOUR MARKDOWN HERE
##### ✅ Answer:
The cell below:
1. Sets the entry point to the graph — beginning of the graph is in the agent node (llm invokation)

In [42]:
graph_with_helpfulness_check.set_entry_point("agent")

<langgraph.graph.state.StateGraph at 0x1216c0550>

##### YOUR MARKDOWN HERE
##### ✅ Answer:
The cell below:
1. Imports necessary methods from lagchain libs (used in the cell)
2. Defines a new function that takes a state into itself:
3. Appends last message from the state to last_message variable
4. Checks whether last message is a tool call; if true, return "action" (route to "action")
5. Appends first message to initial_query variable, last message to final_response variable (same as last_message above, so a bit redundant code-wise, but may be a clearer approach as the variable names differentiate between types of response)
6. Checks whether number of messages is more than 10; if true return "END" (route to "END")
7. Specifies the prompt template for an evaluator that is provided with initial query and final response. The evaluator needs to return helpfulness (yes/no).
8. Model and prompt are initialised
9. Runnable chain is set up (set template -> init model -> parse output)
10. Invoke chain (with parameters for initial query and final answer propagated)
11. Checks whether response is helpful; if true, route to "END"; if false, route to "continue" (try again, essentially)

**In short: the function defines the rules for routing in the graph based on specific conditions. The function evaluates (through LLM) whether the final_response to the initial_query is helpful, and ends the graph if it is; if not, it routes it to continue.**

In [43]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

def tool_call_or_helpful(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    return "END"

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  helpfullness_prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4.1-mini")

  helpfulness_chain = helpfullness_prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    return "end"
  else:
    return "continue"

#### 🏗️ Activity #4:

Please write what is happening in our `tool_call_or_helpful` function!

##### YOUR MARKDOWN HERE
##### ✅ Answer:
It seems like the homework should've been about writing out what happens in cells ABOVE, not BELOW. 

Please check the previous markdown cell where I outlined what tool_call_or_helpful function does. 

In short: the function defines the rules for routing in the graph based on specific conditions. The function evaluates (through LLM) whether the final_response to the initial_query is helpful, and ends the graph if it is; if not, it routes it to continue.

To avoid breaking the flow, I'll continue describing cells BELOW.

The cell below:
1. Connects nodes in the following way:
2. We start from Agent
3. Agent -> tool_call_or_helpful
4. Based on tool_call_or_helpful, it can go:
5. If tool_call_or_helpful indicates "continue", then Graph go to Agent node.
6. If tool_call_or_helpful indicates "action", then Graph go to Action node.
7. If tool_call_or_helpful indicates "end", then Graph goes to End node.
We are using conditional edges to indicate that there is a decision to be made, and based on the output of that decision it can go in different ways. 

In [44]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    tool_call_or_helpful,
    {
        "continue" : "agent",
        "action" : "action",
        "end" : END
    }
)

<langgraph.graph.state.StateGraph at 0x1216c0550>

##### YOUR MARKDOWN HERE
##### ✅ Answer:
The cell below:
1. Creates one more edge, connecting action back to agent. We need this because otherwise action output can't get back to the agent, and we'll be stuck with agent initiating an action, but never getting the result and control back. 

In [45]:
graph_with_helpfulness_check.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x1216c0550>

##### YOUR MARKDOWN HERE
##### ✅ Answer:
The cell below:
1. Compiles the graph. We need to compile because right now it is just a description of our graph, but not a runnable. Compile transforms that description into a Runnable. 

In [46]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()

##### YOUR MARKDOWN HERE
##### ✅ Answer:
The cell below:
1. Initiates a human message to be sent to our runnable. 
2. Streams back execution of our Graph.

(After running the cell):
1. Agent correctly uses TavilySearch three times (for each question in our input)
2. Agent receives output from TavilySearch
3. Agent tells us the answer. Even though we don't see tool_call_or_helpful in the updates, manual inspection of the LangSmith Tracing shows that it was used, and it first correctly routed us to action, and then correctly routed us to End. 

In [49]:
inputs = {"messages" : [HumanMessage(content="Related to machine learning, what is LoRA? Also, who is Tim Dettmers? Also, what is Attention?")]}

async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_NZfWoHT3lgZWdv6t2zdQkmp5', 'function': {'arguments': '{"query": "LoRA machine learning"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_ktrRkz58PSnRJyt3uO5G5d7Y', 'function': {'arguments': '{"query": "Tim Dettmers"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_nIUPqyhIjiCAMVxC5SLq8uH1', 'function': {'arguments': '{"query": "Attention in machine learning"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 79, 'prompt_tokens': 177, 'total_tokens': 256, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': None, 'id': 'chatcmpl-BsU

### Task 4: LangGraph for the "Patterns" of GenAI

Let's ask our system about the 4 patterns of Generative AI:

1. Prompt Engineering
2. RAG
3. Fine-tuning
4. Agents

In [50]:
patterns = ["prompt engineering", "RAG", "fine-tuning", "LLM-based agents"]

In [51]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  print(messages["messages"][-1].content)
  print("\n\n")

Prompt engineering is the process of designing and refining prompts to effectively communicate with AI language models, such as GPT, to obtain desired responses. It involves crafting prompts that are clear, specific, and contextually appropriate to guide the AI in generating accurate and relevant outputs.

Prompt engineering has gained significant prominence with the rise of large language models (LLMs) like GPT-3, which were released around 2020. As these models became more capable and widely used, the importance of designing effective prompts to harness their full potential also grew. The concept of prompt engineering started to break onto the scene in 2020-2021, coinciding with the increasing adoption of LLMs in various applications and the recognition that prompt design is crucial for maximizing their usefulness.

Would you like more detailed information on the history or techniques of prompt engineering?



RAG, which stands for Retrieval-Augmented Generation, is a technique in na