# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!

  - 🤝 Breakout Room #2:
  1. Evaluating the LangGraph Application with LangSmith
  2. Adding Helpfulness Check and "Loop" Limits
  3. LangGraph for the "Patterns" of GenAI

# 🤝 Breakout Room #1

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

In [1]:
!pip install -qU langchain langchain_openai langchain-community langgraph arxiv duckduckgo_search==5.3.1b1

## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [2]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [3]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE4 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

####🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

We are using `duckduckgo_search` and `arxiv_query`

In [4]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun

duckduckgo_search = DuckDuckGoSearchRun()
arxiv_query = ArxivQueryRun()

tool_belt = [
   duckduckgo_search,
   arxiv_query
]

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [5]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o", temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [6]:
model = model.bind_tools(tool_belt)

#### ❓ Question #1:

How does the model determine which tool to use?

# Answer
When there are multiple tools available, the model can use a routing mechanism called call_tools, which takes the output from the model (usually a JSON or similar structured output) and decides which tool to invoke based on this data.
In addition that the model is provided with a list of tools available with their description 

```python
# Description for the arxiv tool
description: str = (
        "A wrapper around Arxiv.org "
        "Useful for when you need to answer questions about Physics, Mathematics, "
        "Computer Science, Quantitative Biology, Quantitative Finance, Statistics, "
        "Electrical Engineering, and Economics "
        "from scientific articles on arxiv.org. "
        "Input should be a search query."
    )

# Description for the duckduck tool
description: str = (
        "A wrapper around DuckDuckGo Search. "
        "Useful for when you need to answer questions about current events. "
        "Input should be a search query."
    )
    
```


## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [7]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [8]:
from langgraph.prebuilt import ToolNode

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

tool_node = ToolNode(tool_belt)

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `tool_node` is a node which can call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [9]:
from langgraph.graph import StateGraph, END

uncompiled_graph = StateGraph(AgentState)

uncompiled_graph.add_node("agent", call_model)
uncompiled_graph.add_node("action", tool_node)

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [10]:
uncompiled_graph.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [11]:
def should_continue(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [12]:
uncompiled_graph.add_edge("action", "agent")

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [13]:
compiled_graph = uncompiled_graph.compile()

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

# Answer

There is no predefined limit on the number of iterations or cycles. However, to prevent the possibility of an infinite loop, you can implement a counter that tracks the number of iterations. By adding a condition to this counter, you can specify a maximum threshold. If the counter exceeds this threshold, the loop is terminated, effectively imposing a limit on the number of cycles.
 

## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [14]:
from langchain_core.messages import HumanMessage

# inputs = {"messages" : [HumanMessage(content="Who is the current captain of the Winnipeg Jets?")]}
inputs = {"messages" : [HumanMessage(content="Who are the author of 'ReAct: Synergizing Reasoning and Acting in Language Models' paper?")]}


async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_OU2YynDHygr54Mnd73G68rEl', 'function': {'arguments': '{"query":"ReAct: Synergizing Reasoning and Acting in Language Models"}', 'name': 'arxiv'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 168, 'total_tokens': 194}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-7f3fc5b5-ab42-4e30-864f-7c252912224f-0', tool_calls=[{'name': 'arxiv', 'args': {'query': 'ReAct: Synergizing Reasoning and Acting in Language Models'}, 'id': 'call_OU2YynDHygr54Mnd73G68rEl', 'type': 'tool_call'}], usage_metadata={'input_tokens': 168, 'output_tokens': 26, 'total_tokens': 194})]



Receiving update from node: 'action'
[ToolMessage(content='Published: 2023-03-10\nTitle: ReAct: Synergizing Reasoning and Acting in Language Models\nAuthors: Shunyu Yao, Jeffr

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [17]:
inputs = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using DuckDuckGo.")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        if node == "action":
          print(f"Tool Used: {values['messages'][0].name}")
        print(values["messages"])

        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_cQB76mjO1IYlvWwB6p16wv8c', 'function': {'arguments': '{"query": "QLoRA"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_xf55Gm8ZTKPJ2Hsuc3WM14Ht', 'function': {'arguments': '{"query": "latest Tweet"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 173, 'total_tokens': 223}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-ec18a8be-9e5f-4e50-a8f1-838508575b9d-0', tool_calls=[{'name': 'arxiv', 'args': {'query': 'QLoRA'}, 'id': 'call_cQB76mjO1IYlvWwB6p16wv8c', 'type': 'tool_call'}, {'name': 'duckduckgo_search', 'args': {'query': 'latest Tweet'}, 'id': 'call_xf55Gm8ZTKPJ2Hsuc3WM14Ht', 'type': 'tool_call'}], usage_metadata={'input_tokens': 173, 'output_tokens': 50, 'total_tokens': 223})]



Receivin

####🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

# Answer
1. User asks the question "Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using DuckDuckGo."  This prompt is explicit in giving instruction to LLM model to use the tools from the toolbelt.
2. We start the compiled graph using an async `astream` function passing in the input.
3. As `agent` is set as the entrypoint in the graph and the input (state object) is sent to the `agent`
4. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
5. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node. In the response we can see five `ToolMessage` and explained below in order
   1. On close inspection the first `ToolMessage` is the response from the `arxiv` tool providing the Paper on QLora with the authors. Note that there 4 authors for this paper.
   2. Next 4 `ToolMessage` are the response from the tool `duckduckgo_search` one for each author finding the latest tweets. Important point to note it is not clear which Author' latest tweets are in the tool reponse.
7. The agent node added a response to the state object and passed it along the conditional edge
8. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [18]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | compiled_graph | parse_output

In [19]:
agent_chain.invoke({"question" : "What is RAG?"})

"RAG stands for Retrieval-Augmented Generation. It is a technique used in natural language processing (NLP) that combines retrieval-based methods with generative models to improve the quality and relevance of generated text. Here's a brief overview of how it works:\n\n1. **Retrieval**: In the first step, the system retrieves relevant documents or pieces of information from a large corpus based on the input query. This is typically done using a retrieval model like BM25 or a dense retrieval model like DPR (Dense Passage Retrieval).\n\n2. **Augmentation**: The retrieved documents are then used to augment the input query. This means that the information from the retrieved documents is combined with the original query to provide more context and relevant information.\n\n3. **Generation**: Finally, a generative model, such as a transformer-based model like GPT-3, uses the augmented input to generate a response. The additional context from the retrieved documents helps the generative model p

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```

####🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

In [68]:
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?",
    "What is the primary goal of the QLoRA method?",
    "What are the main components of the QLoRA approach?",
    "What datasets were used to evaluate the QLoRA method?",
    "Who are the primary authors of the QLoRA paper?",
    "What models are fine-tuned using the QLoRA method?",
    "How does QLoRA achieve efficiency in fine-tuning?",
    "What are the key results reported in the QLoRA paper?",
    "What limitations of QLoRA are discussed in the paper?",
    "How does QLoRA differ from other fine-tuning methods?"
]

answers = [
    {"must_mention": ["paged", "optimizer"]},
    {"must_mention": ["NF4", "NormalFloat"]},
    {"must_mention": ["ground", "context"]},
    {"must_mention": ["Tim", "Dettmers"]},
    {"must_mention": ["PyTorch", "TensorFlow"]},
    {"must_mention": ["reduce", "parameters"]},
    {"must_mention": ["efficient", "fine-tuning"]},
    {"must_mention": ["low-rank", "adapters", "quantization"]},
    {"must_mention": ["datasets", "evaluation"]},
    {"must_mention": ["Tim", "Dettmers"]},
    {"must_mention": ["transformer", "models", "GPT-3", "BERT"]},
    {"must_mention": ["quantization", "efficiency"]},
    {"must_mention": ["results", "benchmarks", "performance"]},
    {"must_mention": ["limitations", "future work"]},
    {"must_mention": ["contrast", "methods", "fine-tuning"]}
]


In [69]:
examples = [{"question": q, "answer": a} for q, a in zip(questions, answers)]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [70]:
from langsmith import Client

client = Client()
dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
)

# client.create_examples(
#     inputs=[{"question" : q} for q in questions],
#     outputs=answers,
#     dataset_id=dataset.id,
# )
client.create_examples(
    inputs=[{"question": qa["question"]} for qa in examples],
    outputs=[{"answer": qa["answer"]} for qa in examples],
    dataset_id=dataset.id,
)



#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

1. As per this create example; in python the list are are ordered and it will be fed to this function in the order they are specified. The problem with this approach is that it can go out of sequence if an item is added or removed without updating the other list. It would be better to use different programming construct like dictionory in a pair of question and answer. 
 

### Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [71]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

1. We are using key words which should be present in the output to determine the LLM output is valid or not. Also the LLM generate the response which is accurate but does not have the words we expect to be present. Presence of the keyword doesn't mean that it is relevant and accurate. 
2. I would compare the semantic meaning of the response with these key words and determine whther they colosets to each other.  I added custom evaluator which uses semantic comparison in addition to the key work match.

In [72]:
from langchain_openai.embeddings import OpenAIEmbeddings
import numpy as np
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

@run_evaluator
def semantic_match(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required_phrases = example.outputs.get("must_mention") or []
    
    prediction_embedding = embedding_model.embed_query(prediction)
    
    required_embeddings = [embedding_model.embed_query(phrase) for phrase in required_phrases]

    prediction_embedding = np.array(prediction_embedding)
    required_embeddings = np.array(required_embeddings)
    
    # Compute cosine similarity
    def cosine_similarity(a, b):
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
    
    # Check similarity for each required phrase
    similarity_threshold = 0.7
    similarities = [cosine_similarity(prediction_embedding, req_emb) for req_emb in required_embeddings]
    
    # Determine if all required phrases have similarity above the threshold
    score = all(similarity >= similarity_threshold for similarity in similarities)

    return EvaluationResult(key="must_semantic_match", score=score)


Now that we have created our custom evaluator - let's initialize our `RunEvalConfig` with it!

In [73]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention,semantic_match],
)

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [74]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'RAG Pipeline - Evaluation - 8eaf11aa' at:
https://smith.langchain.com/o/e319c8f1-73bd-5d44-8897-d19a191ebc54/datasets/8c9adff8-853f-46bd-82ac-c48e51ccd15a/compare?selectedSessions=717135f8-68a1-491a-9f2c-b966f4da9e30

View all tests for Dataset Retrieval Augmented Generation - Evaluation Dataset - f27ed691 at:
https://smith.langchain.com/o/e319c8f1-73bd-5d44-8897-d19a191ebc54/datasets/8c9adff8-853f-46bd-82ac-c48e51ccd15a
[------------------------------------------------->] 15/15

Unnamed: 0,feedback.must_mention,feedback.must_semantic_match,error,execution_time,run_id
count,15,15,0.0,15.0,15
unique,1,1,0.0,,15
top,True,True,,,1f2a3d4b-34d1-4c6b-89eb-923db57eb674
freq,15,15,,,1
mean,,,,6.265765,
std,,,,2.606863,
min,,,,1.722211,
25%,,,,4.653603,
50%,,,,6.200908,
75%,,,,7.288406,


{'project_name': 'RAG Pipeline - Evaluation - 8eaf11aa',
 'results': {'abe28e08-a833-48ed-8712-0f9aed69f84b': {'input': {'question': 'What limitations of QLoRA are discussed in the paper?'},
   'feedback': [EvaluationResult(key='must_mention', score=True, value=None, comment=None, correction=None, evaluator_info={}, feedback_config=None, source_run_id=UUID('1bd8e610-5558-4714-bc0e-189be13f44b3'), target_run_id=None),
    EvaluationResult(key='must_semantic_match', score=True, value=None, comment=None, correction=None, evaluator_info={}, feedback_config=None, source_run_id=UUID('bec6004a-fd3f-42f6-b422-047ab0b57242'), target_run_id=None)],
   'execution_time': 6.728893,
   'run_id': '1f2a3d4b-34d1-4c6b-89eb-923db57eb674',
   'output': 'The paper "QLoRA: Efficient Finetuning of Quantized LLMs" discusses several limitations of QLoRA:\n\n1. **Memory Management**: While QLoRA introduces innovations like 4-bit NormalFloat (NF4) and double quantization to reduce memory usage, managing memory 

# Semantic Match Langsmith 

![Langsmith](./semantic-match.png)

## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add to our existing conditional edge to obtain the behaviour we desire.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [75]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

####🏗️ Activity #5:

Please write markdown for the following cells to explain what each is doing.

##### YOUR MARKDOWN HERE

We are instantiating a new `StateGraph` as `graph_with_helpfulness_check` which uses the `AgentState` to keep track of the current state when it is passed to another agent or a nodes.The idea is to run through the graph and finally determine the helpfulness of the response generated by the underlying LLMs and the tools.
Then we are adding two nodes in the graph where the first node is `agent` and set to call the fuction `call_model` and the second node is `action` which is added as `tool_node`. 
Just to remind what the `call_model` and `tool_node` represents here
   `call_model` - Is a custom fuction which takes the current state and calls the LLM model then the response is sent back to the caller.
   `tool_node` - Is an object of class `ToolNode` and instantiated with a list of tools to consume. This tool node is capable of using the tools `duckduckgo_search`, `arxiv_query`.

In [76]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", call_model)
graph_with_helpfulness_check.add_node("action", tool_node)

##### YOUR MARKDOWN HERE

Now we are goint to set an entry point which is in our case an `agent` This agent is responsible for understanding the user questions and confer with LLM to see if we need to use any tools subsequently to generate a response.

In [77]:
graph_with_helpfulness_check.set_entry_point("agent")

##### YOUR MARKDOWN HERE

This function, tool_call_or_helpful, is designed to decide whether to invoke an action node or assess the helpfulness of a response generated by a StateGraph.

The function first examines the latest message in the state to determine if it includes any tool calls using the `AIMessage.tool_calls` property. If tool calls are present, it immediately returns "action", indicating that an action node should be called.

If no tool calls are detected, the function proceeds to evaluate the helpfulness of the response. It does this using initial query and the final response from the state message history. 

If the number of messages in the conversation exceeds 10, the function returns "END", terminating the process. This is to avoid the graph going in loop until helpful response is generated. 

A prompt template is created, instructing the model to assess the helpfulness of the final response based on the initial query. The template expects a 'Y' for helpful and an 'N' for unhelpful.
This prompt template is then used to construct a chain, which includes the ChatOpenAI model (using GPT-4) and a StrOutputParser to interpret the model’s output.

The model is invoked with the initial query and final response, and the output is parsed to determine if the response is helpful ('Y') or not ('N').

If the response is found to be helpful ('Y'), the function returns "end", indicating that the process should stop. Otherwise, it returns "continue", suggesting the need for further iteration in the loop until a helpful response is generated.

In [78]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

def tool_call_or_helpful(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    return "END"

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4")

  helpfulness_chain = prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    return "end"
  else:
    return "continue"

####🏗️ Activity #4:

Please write what is happening in our `tool_call_or_helpful` function!

##### YOUR MARKDOWN HERE

1. Retrieve the last message from the state to check if it contains any tool calls.

2. Use the tool_calls attribute of the last message to determine if an action node needs to be invoked. If tool calls are found, return "action".

3. Extract the first and last messages in the state for comparison to determine the helpfulness of the response.

4. Check if the total number of messages in the state exceeds 10. If it does, return "END" to terminate further processing.

5. Create a prompt template that instructs the language model to evaluate the helpfulness of the response based on the initial query and final response.

6. Construct a processing chain that includes the language model (ChatOpenAI) and an output parser (StrOutputParser) to handle the model's output.

7. Use the defined prompt template and chain to invoke the model, passing in the initial query and final response for evaluation.

8. Parse the model’s output to check if the response was deemed helpful ('Y') or not ('N').

9. If the response is helpful, return "end". If not, return "continue" to indicate that further iterations are required to find a helpful response.


We now add conditional edges for the node `agent` where a decision is made by calling the function `tool_call_or_helpful` and the output is lookedup in the objects `path_map` (3rd arguments in the `add_conditional_edges` function). 
If the function return `continue` that means it will start another iteration of the same task and it indicates that the generated content is not helpfull and failed by our helpfulness check.
If the function return `action` that means a tool call is necessary to have relevant information for generating valid and quality output. 
If the function return `end` then we reached the maximum number of iteration that we are allowed to perform. This limit is imposed to avoid the graph going into infinite loop.

In [79]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    tool_call_or_helpful,
    {
        "continue" : "agent",
        "action" : "action",
        "end" : END
    }
)

##### YOUR MARKDOWN HERE

Add the final communication link from `action` node to `agent` node.

In [81]:
graph_with_helpfulness_check.add_edge("action", "agent")

##### YOUR MARKDOWN HERE

Once the the nodes, edges are setup in the graph, we compile the graph ready for invokation.

In [82]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()

##### YOUR MARKDOWN HERE

Below code invokes the agent asynchronously to process a query and stream updates as the agent progresses through its processing graph. The agent processes the input query in real-time, providing incremental updates from different nodes, and these updates are printed out as they are received.

The human message contains 3 questions in the same prompt. As we can see from the output below that the output of the agent is empty but the `tool_calls` property has 3 entries. The LLM model understood that there are 3 seperate questions to seach in the internet and rightly it picked the `duckduckgo_search` tools for searching the internet.

As per our conditional edge setup the next action would be to invoke he `action` node which runs necessary tools. Once all actions are complete the responses are sent back to the `agent'.

Finally the agent is generates a response based on the searches maded previously. Then the execution stopped as the agent ouput is checked for helpfulness. 

In [83]:
inputs = {"messages" : [HumanMessage(content="Related to machine learning, what is LoRA? Also, who is Tim Dettmers? Also, what is Attention?")]}

async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_57FNgm5EPmrnYbi6sOtly32Q', 'function': {'arguments': '{"query": "LoRA machine learning"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_LYZodyniCY5ww90bFvIttzZU', 'function': {'arguments': '{"query": "Tim Dettmers"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_i7snUV6tzNiPAA5POOrZqvPJ', 'function': {'arguments': '{"query": "Attention in machine learning"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 76, 'prompt_tokens': 171, 'total_tokens': 247}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-67d9d37a-5b38-4c1b-8049-95ad8336e99e-0', tool_calls=[{'name': 'duckduckgo_search', 'args': {'query': 'LoRA machine learning'}, 'id': 'call_57FNgm5EPmrnYbi6sOtly32Q', 'type': 'tool_call'}, 

### Task 4: LangGraph for the "Patterns" of GenAI

Let's ask our system about the 4 patterns of Generative AI:

1. Prompt Engineering
2. RAG
3. Fine-tuning
4. Agents

In [84]:
patterns = ["prompt engineering", "RAG", "fine-tuning", "LLM-based agents"]

In [85]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  print(messages["messages"][-1].content)
  print("\n\n")

Prompt engineering is a concept primarily associated with the field of artificial intelligence, particularly in the context of natural language processing (NLP) and large language models like GPT-3. It involves the design and crafting of prompts (input text) to elicit desired responses from AI models. The goal is to optimize the input to get the most accurate, relevant, or useful output from the model.

### Key Aspects of Prompt Engineering:
1. **Crafting Effective Prompts**: Designing prompts that are clear, specific, and structured in a way that the AI can understand and respond to appropriately.
2. **Iterative Testing**: Continuously refining prompts based on the responses received to improve the quality and relevance of the output.
3. **Understanding Model Behavior**: Gaining insights into how the model interprets different types of input to better predict and guide its responses.

### Emergence of Prompt Engineering:
Prompt engineering became more prominent with the advent of larg