# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!

  - 🤝 Breakout Room #2:
  1. Evaluating the LangGraph Application with LangSmith
  2. Adding Helpfulness Check and "Loop" Limits
  3. LangGraph for the "Patterns" of GenAI

# 🤝 Breakout Room #1

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

In [1]:
!pip install -qU langchain langchain_openai langchain-community langgraph arxiv duckduckgo_search==5.3.1b1

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-api-core 2.15.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0.dev0,>=3.19.5, but you have protobuf 5.27.4 which is incompatible.[0m[31m
[0m

## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key:··········


In [5]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE4 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

####🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

In [18]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun

web_search = DuckDuckGoSearchRun()
research_search = ArxivQueryRun()

tool_belt = [   
    web_search,
    research_search
]

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [19]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o", temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [20]:
model = model.bind_tools(tool_belt)

#### ❓ Question #1:

How does the model determine which tool to use?

##### Answer:
- The gpt-4o model (LLM) decides which tool to use by analyzing the user's input and determining which tool is best suited to fulfill the request by iterating through all the available tools.
    - Reference: https://platform.openai.com/docs/guides/function-calling/if-the-model-generated-a-function-call

    - Reference: https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/

- Example 1: If the model decides there is no need to call a tool, it will respond like this:
    - chat.completionsMessage(content='Hi there! I can help with that. Can you please provide your order ID?', role='assistant', function_call=None, tool_calls=None)

- Example 2: If the model decides there is a need to call a tool, it will respond like this:
    - tool_calls=[
    chat.completionsMessageToolCall(
        id='call_62136354', 
        function=Function(
            arguments='{"order_id":"order_12345"}', 
            name='get_delivery_date'), 
        type='function')
]






## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [21]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [22]:
from langgraph.prebuilt import ToolNode

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

tool_node = ToolNode(tool_belt)

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `tool_node` is a node which can call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [23]:
from langgraph.graph import StateGraph, END

uncompiled_graph = StateGraph(AgentState)

uncompiled_graph.add_node("agent", call_model)
uncompiled_graph.add_node("action", tool_node)

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [24]:
uncompiled_graph.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [25]:
def should_continue(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [26]:
uncompiled_graph.add_edge("action", "agent")

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [28]:
compiled_graph = uncompiled_graph.compile()

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

##### Answer:
- In the above code we don't have any limit on no.of times the cycle can execute
- We can set the counter variable with in the state object of the Agent, and for every update (with in the call_model function) we increment the counter value and for every function call of the should_continue we will verify the counter value is less than the define value of the max cycle count.

REF: https://github.com/langchain-ai/langchain/discussions/20258#discussioncomment-10123202

Ex: 
    
    MAX_CYCLES = 5  # max no.of cycles to be allowed in the cycle

    class AgentState(TypedDict):
        messages: Annotated[list, add_messages]
        cycle_count: int  # Track the number of cycles
    def call_model(state):
        messages = state["messages"]
        response = model.invoke(messages)
        state["cycle_count"] += 1  # Increment cycle count
        return {"messages": [response]}
    def should_continue(state):
        last_message = state["messages"][-1]
        
        # Stop if the cycle limit is reached
        if state["cycle_count"] >= MAX_CYCLES:
            return END
        
        # Continue if there's a tool call in the last message
        if last_message.tool_calls:
            return "action"
        
        return END
    
    uncompiled_graph = StateGraph(AgentState)
    .
    .
    .


    


## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [29]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="Who is the current captain of the Winnipeg Jets?")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_RFPjTo05am3pkCy9qAstFPGD', 'function': {'arguments': '{"query":"current captain of the Winnipeg Jets 2023"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 156, 'total_tokens': 181}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-968b3efc-54d9-4331-b235-74b230b79ccf-0', tool_calls=[{'name': 'duckduckgo_search', 'args': {'query': 'current captain of the Winnipeg Jets 2023'}, 'id': 'call_RFPjTo05am3pkCy9qAstFPGD', 'type': 'tool_call'}], usage_metadata={'input_tokens': 156, 'output_tokens': 25, 'total_tokens': 181})]



Receiving update from node: 'action'
[ToolMessage(content='The Winnipeg Jets will have a captain for the 2023-24 season. After going captain-less in 2022-23, the Winnipeg Jets unveiled 

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [30]:
inputs = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using DuckDuckGo.")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        if node == "action":
          print(f"Tool Used: {values['messages'][0].name}")
        print(values["messages"])

        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_GzidXeNMkq2LZBiuzTRu69bI', 'function': {'arguments': '{"query": "QLoRA"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_65JVAMGXeZJJCYpv4VaYqhFD', 'function': {'arguments': '{"query": "latest Tweet"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 173, 'total_tokens': 223}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-9bd76f9d-0528-40b8-aea0-440c8d12b57e-0', tool_calls=[{'name': 'arxiv', 'args': {'query': 'QLoRA'}, 'id': 'call_GzidXeNMkq2LZBiuzTRu69bI', 'type': 'tool_call'}, {'name': 'duckduckgo_search', 'args': {'query': 'latest Tweet'}, 'id': 'call_65JVAMGXeZJJCYpv4VaYqhFD', 'type': 'tool_call'}], usage_metadata={'input_tokens': 173, 'output_tokens': 50, 'total_tokens': 223})]



Receivin

####🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

##### Answer:

- Initial Request and State Population: The state object was populated with the initial user request: "Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using DuckDuckGo."

- First Agent Node Execution: The agent node (call_model) was invoked with the state object. The agent identified the need to perform two tool calls: one to search Arxiv for the QLoRA paper and another to search for the authors' latest tweets using DuckDuckGo. The agent added an AIMessage to the state object, specifying tool calls for Arxiv and DuckDuckGo searches

- First Action Node Execution (Arxiv Search): The action node (tool_node) executed the Arxiv search for "QLoRA". The tool returned a list of papers related to QLoRA, including their titles, publication dates, and summaries. This information was added to the state object

- Second Agent Node Execution: The agent node processed the Arxiv results and recognized the need to continue searching for the latest tweets of the authors. The agent prepared additional tool calls for DuckDuckGo searches, each aimed at finding the latest tweet of a specific author

- Second Action Node Execution (DuckDuckGo Searches): The action node executed DuckDuckGo searches for each author’s latest tweet. 
    - Rate Limit Error: The search for Tim Dettmers' latest tweet encountered a rate limit error, resulting in an error message being added to the state object. 
    - Successful Searches: The searches for the other authors (Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer) were successful, and the tool returned various pieces of information related to their names, though it did not specifically find their latest tweets. This information was added to the state object

- Final Agent Node Execution and Conclusion: The agent node processed the results, including the successful Arxiv search and the mixed results from the DuckDuckGo searches. The agent generated a final AIMessage summarizing the details of the QLoRA paper and the latest information it could find about the authors, noting that one of the Twitter searches encountered a rate limit error. The conditional edge received the state object, found no additional tool calls, and passed the state object to END






## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [32]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | compiled_graph | parse_output

In [33]:
agent_chain.invoke({"question" : "What is RAG?"})

'RAG stands for Retrieval-Augmented Generation. It is a technique used in natural language processing (NLP) and machine learning to improve the performance of language models by combining retrieval-based methods with generative models. Here’s a brief overview of how it works:\n\n1. **Retrieval**: In the first step, the system retrieves relevant documents or pieces of information from a large corpus or database. This is typically done using a retrieval model, such as BM25 or a dense retrieval model like DPR (Dense Passage Retrieval).\n\n2. **Augmentation**: The retrieved documents are then used to augment the input to the generative model. This means that the generative model has access to additional context or information that can help it produce more accurate and relevant responses.\n\n3. **Generation**: Finally, the generative model, such as GPT-3 or BERT, uses the augmented input to generate a response. The additional context provided by the retrieved documents helps the model gener

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```

####🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

In [34]:
questions = [
    "What is Celery used for in Python?",
    "What backend can be used with Celery to store task results?",
    "What is RabbitMQ used for?",
    "How does RabbitMQ ensure message durability?",
    "What is FastAPI?",
    "How does FastAPI handle data validation?",
    "What is Apache Airflow used for?",
    "How does Airflow schedule tasks?",
    "What is a DAG in Airflow?",
    "How does Airflow handle task dependencies?"
]

answers = [
    {"must_mention" : ["distributed", "tasks"]},
    {"must_mention" : ["Redis", "backend"]},
    {"must_mention" : ["message", "broker"]},
    {"must_mention" : ["persistent", "storage"]},
    {"must_mention" : ["web", "framework"]},
    {"must_mention" : ["Pydantic", "validation"]},
    {"must_mention" : ["workflow", "orchestration"]},
    {"must_mention" : ["scheduler", "timing"]},
    {"must_mention" : ["Directed", "Acyclic", "Graph"]},
    {"must_mention" : ["upstream", "downstream"]}
]


Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [35]:
from langsmith import Client

client = Client()
dataset_name = f"Python Distributed Task Execusion Using Celery FrameWork - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the python celery framework to Evaluate RAG."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

##### Answer: 
 - The correct answers are associated with the questions by the order in which they are presented. Each answer corresponds to the question at the same index in their respective lists

 - For example:
   - The first question "What is Celery used for in Python?" corresponds to the first answer {"must_mention" : ["distributed", "tasks"]}.
   - The second question "What backend can be used with Celery to store task results?" corresponds to the second answer {"must_mention" : ["Redis", "backend"]}, and so on.

 - This approach is not inherently problematic as long as the order is maintained correctly. However, it does rely on the correct alignment of the lists. If the lists get out of sync (e.g., a question or answer is added or removed without adjusting both lists), it can lead to incorrect associations, which would result in the LLM being tested or trained with incorrect information.

 - The below structure ensures clarity and avoids potential issues with list misalignment
    qa_pairs = [
      {"question": "What is Celery used for in Python?", "answer": {"must_mention" : ["distributed", "tasks"]}},
      {"question": "What backend can be used with Celery to store task results?", "answer": {"must_mention" : ["Redis", "backend"]}}
    ]




### Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [38]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output").lower() or "" # making the output of the llm to lower-case for making it case insensitive for comparing with ground truth result
    required = example.outputs.get("must_mention") or []
    score = all(phrase.lower() in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

##### ANSWER: 
- Case-Insensitive Matching: Modify the metric to ignore case differences when checking for required phrases, ensuring that "Backend" and "backend" are treated the same.


- Fuzzy Matching: Implement fuzzy matching to account for slight variations in phrasing, allowing for matches even if the wording isn't exact
 ex: 
  ```
    from fuzzywuzzy import fuzz
    def must_mention(run, example) -> EvaluationResult:
        prediction = run.outputs.get("output") or ""
        required = example.outputs.get("must_mention") or []
        score = all(any(fuzz.partial_ratio(phrase, prediction) > 80 for phrase in required))
        return EvaluationResult(key="must_mention", score=score)
  ```

Now that we have created our custom evaluator - let's initialize our `RunEvalConfig` with it!

In [39]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention],
)

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [41]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'RAG Pipeline - Evaluation - 2b1e26e7' at:
https://smith.langchain.com/o/643e7e92-2e20-5db2-857b-11b791f166e5/datasets/2479a375-f72c-4d6a-877f-63b189da1707/compare?selectedSessions=00fdf7bb-bb14-498a-93a7-13f4a1bc120d

View all tests for Dataset Python Distributed Task Execusion Using Celery FrameWork - 9c9be26f at:
https://smith.langchain.com/o/643e7e92-2e20-5db2-857b-11b791f166e5/datasets/2479a375-f72c-4d6a-877f-63b189da1707
[------------------------------------------------->] 10/10

Unnamed: 0,feedback.must_mention,error,execution_time,run_id
count,10,0.0,10.0,10
unique,2,0.0,,10
top,True,,,1a4ab960-f341-40d6-bc28-bae29078ceb7
freq,7,,,1
mean,,,5.105512,
std,,,1.875904,
min,,,3.450379,
25%,,,3.622051,
50%,,,4.843229,
75%,,,5.786199,


{'project_name': 'RAG Pipeline - Evaluation - 2b1e26e7',
 'results': {'6aa80829-ebc8-4c93-9cbc-b782c3aab399': {'input': {'question': 'What is a DAG in Airflow?'},
   'feedback': [EvaluationResult(key='must_mention', score=True, value=None, comment=None, correction=None, evaluator_info={}, feedback_config=None, source_run_id=UUID('f705973d-6a3e-4df0-923d-38e9297b85c4'), target_run_id=None)],
   'execution_time': 6.298502,
   'run_id': '1a4ab960-f341-40d6-bc28-bae29078ceb7',
   'output': "In Apache Airflow, a Directed Acyclic Graph (DAG) is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. Here are some key points about DAGs in Airflow:\n\n1. **Directed**: The tasks are ordered, meaning that each task points to one or more tasks that should be executed after it.\n\n2. **Acyclic**: The graph does not contain any cycles, which means that you cannot return to a task once it has been executed. This ensures that the workflow 

## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add to our existing conditional edge to obtain the behaviour we desire.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [44]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

####🏗️ Activity #5:

Please write markdown for the following cells to explain what each is doing.

- we are  creating a new StateGraph using the AgentState class to track the agent's state


- We are adding the "agent" node to the graph, which will call the model to generate responses (which is nothing but loading our function `call_model` which will take the last message and invoke the model and will append back the response to the AgentState )

- We are adding the "action" node to the graph, which will handle tool calls based on the agent's output, (which is nothing but our defined tool belt)


In [45]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", call_model)
graph_with_helpfulness_check.add_node("action", tool_node)

 - Set the entry point of the graph to the "agent" node, This means that the graph's execution will start by calling the model through the "agent" node (It will take the user query as input)


In [46]:
graph_with_helpfulness_check.set_entry_point("agent")

In [47]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

def tool_call_or_helpful(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    return "END"

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4")

  helpfulness_chain = prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    return "end"
  else:
    return "continue"

####🏗️ Activity #4:

Please write what is happening in our `tool_call_or_helpful` function!

- Here we are defining `tool_call_or_helpful` function, which serves as a conditional edge in a state graph. Here's a brief overview:
    - The function first checks if the last message in the state contains any tool calls. If it does, the graph proceeds to the "action" node (that means it will execute the appropriate function call duckduckgo/arxiv)

    - Loop Limit: If the number of messages exceeds 10, the function returns "END" to prevent infinite looping.

    - If loop does not exceeds 10 cycles then Helpfulness Check: The function then evaluates the helpfulness of the final response using a GPT-4 model. It uses a PromptTemplate to structure the input for the model, which is then parsed by a StrOutputParser. If the model determines the response is helpful ("Y"), the function returns "end." Otherwise, it returns "continue" to prompt further action or revision.

- Add conditional edges to the "agent" node in the graph
- The function tool_call_or_helpful determines the next step based on the current state


In [48]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    tool_call_or_helpful,
    {
        "continue" : "agent",
        "action" : "action",
        "end" : END
    }
)

- Add an edge from the "action" node back to the "agent" node
- This creates a loop where, after performing an action, the agent node is invoked again to process the results
- If don't add this edge, our process will get stuck in the action it's self, since there is no path after that.


In [54]:
graph_with_helpfulness_check.add_edge("action", "agent")

Adding an edge to a graph that has already been compiled. This will not be reflected in the compiled graph.


- Compile the graph with helpfulness check into an executable form, This creates a runnable agent from the graph that includes the defined nodes and edges



In [50]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()

- Define the input message that will be processed by the agent

- Execute the compiled agent asynchronously with streaming updates, The loop iterates over each chunk of the response generated by the agent, For each node in the chunk, it prints the node's name and the messages generated at that stage



In [51]:
inputs = {"messages" : [HumanMessage(content="Related to machine learning, what is LoRA? Also, who is Tim Dettmers? Also, what is Attention?")]}

async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_qYHh9yGWl8ZXBbvUxUaJLkRy', 'function': {'arguments': '{"query": "LoRA machine learning"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_VQDw7q9RMPsrv9Lf40UW9phx', 'function': {'arguments': '{"query": "Tim Dettmers"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_lynT8JvjXXznxTdhYItdRGPS', 'function': {'arguments': '{"query": "Attention in machine learning"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 76, 'prompt_tokens': 171, 'total_tokens': 247}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d286620b-15b8-4f34-86ef-5fbfb1377040-0', tool_calls=[{'name': 'duckduckgo_search', 'args': {'query': 'LoRA machine learning'}, 'id': 'call_qYHh9yGWl8ZXBbvUxUaJLkRy', 'type': 'tool_call'}, 

### Task 4: LangGraph for the "Patterns" of GenAI

Let's ask our system about the 4 patterns of Generative AI:

1. Prompt Engineering
2. RAG
3. Fine-tuning
4. Agents

In [52]:
patterns = ["prompt engineering", "RAG", "fine-tuning", "LLM-based agents"]

In [53]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  print(messages["messages"][-1].content)
  print("\n\n")

Prompt engineering is a concept primarily associated with the field of artificial intelligence, particularly in the context of natural language processing (NLP) and large language models like GPT-3. It involves the design and crafting of prompts (input text) to elicit desired responses from AI models. The goal is to optimize the input to get the most accurate, relevant, or useful output from the model.

### Key Aspects of Prompt Engineering:
1. **Crafting Effective Prompts**: Designing prompts that are clear, specific, and tailored to the task at hand.
2. **Iterative Testing**: Continuously refining prompts based on the responses received to improve the quality of the output.
3. **Understanding Model Behavior**: Gaining insights into how the model interprets different types of input to better predict and influence its responses.

### Emergence of Prompt Engineering:
Prompt engineering became more prominent with the advent of large-scale language models like OpenAI's GPT-3, which was re