# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!

  - 🤝 Breakout Room #2:
  1. Evaluating the LangGraph Application with LangSmith
  2. Adding Helpfulness Check and "Loop" Limits
  3. LangGraph for the "Patterns" of GenAI

# 🤝 Breakout Room #1

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

In [1]:
from IPython.display import display, Markdown

def pretty_print(message: str) -> str:
    display(Markdown(message))

In [2]:
!pip install -qU langchain langchain_openai langchain-community langgraph arxiv duckduckgo_search==5.3.1b1

## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [3]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [4]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE4 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

####🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

In [5]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun

tool_belt = [
    DuckDuckGoSearchRun(),
    ArxivQueryRun()
]

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [6]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o", temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [7]:
model = model.bind_tools(tool_belt)

#### ❓ Question #1:

How does the model determine which tool to use?

**Answer**

Every tools comes with a description field, which is assigned at class declaration. If we use `@tool` decorator to convert a function into a tool then the docstring of the decorated function becomes the description of the tool. The LLM that the agent uses decides if and which tool to use on the basis of the prompt and the descriptions of the available tools.

## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [79]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage

from collections import defaultdict

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [80]:
from langgraph.prebuilt import ToolNode

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

tool_node = ToolNode(tool_belt)

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `tool_node` is a node which can call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [81]:
from langgraph.graph import StateGraph, END

uncompiled_graph = StateGraph(AgentState)

uncompiled_graph.add_node("agent", call_model)
uncompiled_graph.add_node("action", tool_node)

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [82]:
uncompiled_graph.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [83]:
def should_continue(state):
  last_message = state["messages"][-1]

  print(f"So far executed {len(state['messages'])} steps.")

  if last_message.tool_calls:
    return "action"

  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [84]:
uncompiled_graph.add_edge("action", "agent")

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [85]:
compiled_graph = uncompiled_graph.compile()

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

**Answer**

There is no specific limit to the number of cycles. However we can specify this limit and check it in the `should_continue` function:

```python
MAX_ITERATIONS = 10 # or any other positive number

def should_continue(state):
  last_message = state["messages"][-1]

  if len(state["messages"]) > MAX_ITERATIONS:
    return END

  if last_message.tool_calls:
    return "action"

  return 
```

## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [88]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="Who is the current captain of the Winnipeg Jets?")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        #print(values["messages"])
        pretty_print(str(values["messages"]))
        # print("\n\n")

So far executed 2 steps.
Receiving update from node: 'agent'


[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_2gn5GJhTZOtZQ3HqLepEADtm', 'function': {'arguments': '{"query":"current captain of the Winnipeg Jets 2023"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 156, 'total_tokens': 181}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-0bdbb4c2-de0e-425a-b68b-734ff6c1fd7b-0', tool_calls=[{'name': 'duckduckgo_search', 'args': {'query': 'current captain of the Winnipeg Jets 2023'}, 'id': 'call_2gn5GJhTZOtZQ3HqLepEADtm', 'type': 'tool_call'}], usage_metadata={'input_tokens': 156, 'output_tokens': 25, 'total_tokens': 181})]

Receiving update from node: 'action'


[ToolMessage(content='Adam Lowry was named captain of the Winnipeg Jets on Tuesday. ... Sep 20, 2023. Latest News. Inside look at Vegas Golden Knights Aug 30, 2024. Vegas Golden Knights fantasy projections for 2024-25 Lowry will follow Andrew Ladd and Blake Wheeler to serve as the third captain of the new Winnipeg Jets franchise. - Sep 12, 2023. After a season without a captain, the Winnipeg Jets have named ... The Winnipeg Jets will have a captain for the 2023-24 season. After going captain-less in 2022-23, the Winnipeg Jets unveiled Adam Lowry as the club\'s new captain on Tuesday morning. "When I ... Posted September 12, 2023 9:29 am. Centre Adam Lowry was named the Winnipeg Jets new captain on Tuesday. Lowry is the third Jets captain since the team moved from Atlanta to Winnipeg in 2011. He follows Andrew Ladd and Blake Wheeler, who served as captain for five and six years respectively. — Winnipeg Jets (@NHLJets) September 6, 2023. In some ways, the Jets are now one more step removed from the Paul Maurice and Wheeler-led era, and further into the next phase of the Jets. Which will evidently be led by Adam Lowry and Josh Morrissey, and for now, Mark Scheifele and Rick Bowness.', name='duckduckgo_search', tool_call_id='call_2gn5GJhTZOtZQ3HqLepEADtm')]

So far executed 4 steps.
Receiving update from node: 'agent'


[AIMessage(content='The current captain of the Winnipeg Jets is Adam Lowry. He was named captain on September 12, 2023.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 478, 'total_tokens': 504}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'stop', 'logprobs': None}, id='run-b91ef67d-c68b-4ded-a595-62365d4346a8-0', usage_metadata={'input_tokens': 478, 'output_tokens': 26, 'total_tokens': 504})]

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [106]:
inputs = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using DuckDuckGo.")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        pretty_print(f"Receiving update from node: **{node}**")
        if node == "action":
          print("    Tools Used:")
          for x in values['messages']: pretty_print(f"{x.name}")
        # print(">>>")
        # pretty_print(str(values["messages"]))
        # print("<<<")
        for x in values["messages"]:
          print("    Content:")
          if isinstance(x.content, str):
             pretty_print(x.content)
          else:
            for y in x.content: pretty_print(f"{y}")
          print("    Additional kwargs:")
          for y in x.additional_kwargs: pretty_print(f"{y}")
          try:
            print("    Tool calls:")
            for y in x.tool_calls: pretty_print(f"{y}")
          except:
            print("    No tool calls requested")
        print("\n\n")

So far executed 2 steps.


Receiving update from node: **agent**

    Content:




    Additional kwargs:


tool_calls

refusal

    Tool calls:


{'name': 'arxiv', 'args': {'query': 'QLoRA'}, 'id': 'call_44nXHWXEiNrXi5CyFpMcgIHb', 'type': 'tool_call'}

{'name': 'duckduckgo_search', 'args': {'query': 'latest Tweet'}, 'id': 'call_4YrjCmmCknpTDHgyhxNAi92s', 'type': 'tool_call'}






Receiving update from node: **action**

    Tools Used:


arxiv

duckduckgo_search

    Content:


Published: 2023-05-23
Title: QLoRA: Efficient Finetuning of Quantized LLMs
Authors: Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer
Summary: We present QLoRA, an efficient finetuning approach that reduces memory usage
enough to finetune a 65B parameter model on a single 48GB GPU while preserving
full 16-bit finetuning task performance. QLoRA backpropagates gradients through
a frozen, 4-bit quantized pretrained language model into Low Rank
Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all
previous openly released models on the Vicuna benchmark, reaching 99.3% of the
performance level of ChatGPT while only requiring 24 hours of finetuning on a
single GPU. QLoRA introduces a number of innovations to save memory without
sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is
information theoretically optimal for normally distributed weights (b) double
quantization to reduce the average memory footprint by quantizing the
quantization constants, and (c) paged optimziers to manage memory spikes. We
use QLoRA to finetune more than 1,000 models, providing a detailed analysis of
instruction following and chatbot performance across 8 instruction datasets,
multiple model types (LLaMA, T5), and model scales that would be infeasible to
run with regular finetuning (e.g. 33B and 65B parameter models). Our results
show that QLoRA finetuning on a small high-quality dataset leads to
state-of-the-art results, even when using smaller models than the previous
SoTA. We provide a detailed analysis of chatbot performance based on both human
and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable
alternative to human evaluation. Furthermore, we find that current chatbot
benchmarks are not trustworthy to accurately evaluate the performance levels of
chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to
ChatGPT. We release all of our models and code, including CUDA kernels for
4-bit training.

Published: 2024-05-27
Title: Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Authors: Haotong Qin, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda Liu, Jie Luo, Xianglong Liu, Michele Magno
Summary: The LoRA-finetuning quantization of LLMs has been extensively studied to
obtain accurate yet compact LLMs for deployment on resource-constrained
hardware. However, existing methods cause the quantized LLM to severely degrade
and even fail to benefit from the finetuning of LoRA. This paper proposes a
novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate
through information retention. The proposed IR-QLoRA mainly relies on two
technologies derived from the perspective of unified information: (1)
statistics-based Information Calibration Quantization allows the quantized
parameters of LLM to retain original information accurately; (2)
finetuning-based Information Elastic Connection makes LoRA utilizes elastic
representation transformation with diverse information. Comprehensive
experiments show that IR-QLoRA can significantly improve accuracy across LLaMA
and LLaMA2 families under 2-4 bit-widths, e.g., 4- bit LLaMA-7B achieves 1.4%
improvement on MMLU compared with the state-of-the-art methods. The significant
performance gain requires only a tiny 0.31% additional time consumption,
revealing the satisfactory efficiency of our IR-QLoRA. We highlight that
IR-QLoRA enjoys excellent versatility, compatible with various frameworks
(e.g., NormalFloat and Integer quantization) and brings general accuracy gains.
The code is available at https://github.com/htqin/ir-qlora.

Published: 2024-06-12
Title: Exploring Fact Memorization and Style Imitation in LLMs Using QLoRA: An Experimental Study and Quality Assessment Methods
Authors: Eugene Vyborov, Oleksiy Osypenko, Serge Sotnyk
Summary: There are various methods for adapting LLMs to different domains. The most
common methods are prompting, finetuning, and RAG. In this w

    Additional kwargs:
    Tool calls:
    No tool calls requested
    Content:


The latest Twitter news and updates. Twitter is a social networking service, primarily microblogging but also a picture and video sharing service, founded by Jack Dorsey, Noah Glass, Biz Stone and ... Twitter adds 'glorifying violence' warning to Trump tweet. Twitter has added a warning to one of President Donald Trump's tweets about protests in Minneapolis. ... In his latest broadside ... In more serious matters, EU Commissioner Thierry Breton visited Twitter's headquarters to conduct a stress test for the new Digital Services Act regulating everything from social media content ... Introducing a new form of Free (v2) access for write-only use cases and those testing the Twitter API with 1,500 Tweets/month at the app level, media upload endpoints, and Login with Twitter. Get ... Sept. 15, 2023. The federal prosecutors who charged former President Donald J. Trump with a criminal conspiracy over his attempts to overturn the 2020 election obtained 32 private messages from ...

    Additional kwargs:
    Tool calls:
    No tool calls requested



So far executed 5 steps.


Receiving update from node: **agent**

    Content:




    Additional kwargs:


tool_calls

refusal

    Tool calls:


{'name': 'duckduckgo_search', 'args': {'query': 'Tim Dettmers latest Tweet'}, 'id': 'call_FUmCGk79X5eaMb8uB38SpqCb', 'type': 'tool_call'}

{'name': 'duckduckgo_search', 'args': {'query': 'Artidoro Pagnoni latest Tweet'}, 'id': 'call_i12ExzlqnJR5JRZTJLryhn0n', 'type': 'tool_call'}

{'name': 'duckduckgo_search', 'args': {'query': 'Ari Holtzman latest Tweet'}, 'id': 'call_nJ9f6oxmACdQtRHasaaeGYGK', 'type': 'tool_call'}

{'name': 'duckduckgo_search', 'args': {'query': 'Luke Zettlemoyer latest Tweet'}, 'id': 'call_DzFCjSkMVofz4CUuRlwXlrQp', 'type': 'tool_call'}






Receiving update from node: **action**

    Tools Used:


duckduckgo_search

duckduckgo_search

duckduckgo_search

duckduckgo_search

    Content:


Tim Walz's conservative older brother, Jeff, told NewsNation that the "stories" he alluded to in a recent Facebook post are limited to stuff like puking on his siblings due to car sickness when ... Tim Walz was born in West Point, Neb. and grew up in rural Valentine. He previously described his upbringing as coming "from a town of 400 — 24 kids in a class, 12 cousins, farming, those ... Allen School Ph.D. student Tim Dettmers accepted the grand prize for QLoRA, a novel approach to finetuning pretrained models that significantly reduces the amount of GPU memory required — from over 780GB to less than 48GB — to finetune a 65B parameter model. With QLoRA, the largest publicly available models can be finetuned on a single ... In the chat quoted in the complaint, researcher Tim Dettmers talks about his back-and-forth with Meta's legal department whether the use of the book files as training data would be "legally ok ... Tech Moves: AI researcher Yejin Choi leaves Univ. of Washington and Allen Institute for AI. by Todd Bishop & Taylor Soper on August 2, 2024. Yejin Choi, who was named a 2022 MacArthur Fellow and ...

    Additional kwargs:
    Tool calls:
    No tool calls requested
    Content:


Artidoro Pagnoni. artidoro. Follow. mrm8488's profile picture Weyaxi's profile picture nezubn's profile picture. ... artidoro/model-tvergho. Updated Nov 18, 2023. artidoro/model-vinaic. Updated Nov 18, 2023. artidoro/model-vinaia. Updated Nov 18, 2023. datasets. None public yet. Company We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly ... In this paper, we address these aforementioned challenges associated with financial data and introduce FinGPT, an end-to-end open-source framework for financial large language models (FinLLMs). Adopting a data-centric approach, FinGPT underscores the crucial role of data acquisition, cleaning, and preprocessing in developing open-source FinLLMs. efficient finetuning of quantized LLMs. AUTHORs: Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer Authors Info & Claims. NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems. Article No.: 441, Pages 10088 - 10115. Published: 30 May 2024 Publication History. Saturday, June 29, 2024. Introduction. Fine-tuning large language models (LLMs) is a common practice to adapt them for specific tasks, but it can be computationally expensive. LoRA (Low-Rank Adaptation) is a technique that makes this process more efficient by introducing small adapter modules to the model. These adapters capture task-specific ...

    Additional kwargs:
    Tool calls:
    No tool calls requested
    Content:


MSNBC host Ari Melber, during an interview with Trump campaign adviser Corey Lewandowski on Wednesday, threatened him with a defamation lawsuit for quoting the anchor calling the former President ... The team behind QLoRA includes Allen School Ph.D. student Artidoro Pagnoni; alum Ari Holtzman (Ph.D., '23), incoming professor at the University of Chicago; and professor Luke Zettlemoyer, who is also a research manager at Meta. Madrona Prize First Runner Up / Punica: Multi-Tenant LoRA Fine-tuned LLM Serving Corey Lewandowski, who joined Trump's presidential campaign team just two weeks ago after he was fired from his campaign in 2016, appeared on Wednesday's episode of The Beat The heated on-air dispute last night between MSNBC's Ari Melber and Donald Trump campaign adviser Corey Lewandowski didn't end with Wednesday's segment: Today, Lewandowski tweeted a video in ... The Astros are set to recall catcher Cesar Salazar from Triple-A Sugar Land for the September 1 roster expansion, allowing more flexibility for main catchers Yainer Diaz and Victor Caratini.

    Additional kwargs:
    Tool calls:
    No tool calls requested
    Content:


Luke Zettlemoyer is a research manager and site lead for FAIR Seattle. He is also a Professor in the Allen School of Computer Science & Engineering at the University of Washington. His research is in empirical computational semantics, where the goal is to build models that recover representations of the meaning of natural language text. Today we're joined by Luke Zettlemoyer, professor at University of Washington and a research manager at Meta. In our conversation with Luke, we cover multimodal generative AI, the effect of data on models, and the significance of open source and open science. We explore the grounding problem, the need for visual grounding and embodiment in View a PDF of the paper titled MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts, by Xi Victoria Lin and Akshat Shrivastava and Liang Luo and Srinivasan Iyer and Mike Lewis and Gargi Ghosh and Luke Zettlemoyer and Armen Aghajanyan ﻿ Twitter ﻿ Reddit. Join our list for notifications and early access to events ... About this Episode. Today we're joined by Luke Zettlemoyer, professor at University of Washington and a research manager at Meta. In our conversation with Luke, we cover multimodal generative AI, the effect of data on models, and the significance of open ... Provost Tricia Serio and President Ana Mari Cauce have appointed a Task Force on Artificial Intelligence to address these issues and to suggest a UW-wide AI strategy. Chaired by Andreas Bohman, vice president of UW-IT and the University's chief information officer, and Anind Dey, dean of the Information School, the task force will initially ...

    Additional kwargs:
    Tool calls:
    No tool calls requested



So far executed 10 steps.


Receiving update from node: **agent**

    Content:


Here are the latest updates related to the authors of the QLoRA paper:

### Tim Dettmers
- **Latest News**: Tim Dettmers, an Allen School Ph.D. student, accepted the grand prize for QLoRA, a novel approach to finetuning pretrained models that significantly reduces the amount of GPU memory required—from over 780GB to less than 48GB—to finetune a 65B parameter model. With QLoRA, the largest publicly available models can be finetuned on a single GPU.

### Artidoro Pagnoni
- **Latest News**: Artidoro Pagnoni is involved in the development of QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. He is also associated with other projects like FinGPT, an end-to-end open-source framework for financial large language models (FinLLMs).

### Ari Holtzman
- **Latest News**: Ari Holtzman, an alum of the Allen School (Ph.D., '23) and incoming professor at the University of Chicago, is part of the team behind QLoRA. He has also been mentioned in the context of a heated on-air dispute involving MSNBC host Ari Melber and Trump campaign adviser Corey Lewandowski.

### Luke Zettlemoyer
- **Latest News**: Luke Zettlemoyer is a research manager and site lead for FAIR Seattle, as well as a Professor in the Allen School of Computer Science & Engineering at the University of Washington. His research focuses on empirical computational semantics. He has been involved in discussions about multimodal generative AI, the effect of data on models, and the significance of open source and open science.

Unfortunately, I couldn't find their latest Tweets directly. If you need more specific information or direct links to their social media profiles, you might want to check their official Twitter accounts or other social media platforms.

    Additional kwargs:


refusal

    Tool calls:





####🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer:

**Answer**

I multiple executions of this workflow the agent arrived at wrong answers, in different ways:
- Agent asks for the latest tweets of people she considers authors of QLoRA paper, before verifying the list of authors. As a result the it looks for tweets from wrong people.
- Agent asks for "latest tweets" without specifying names of people.
- Agent asks for "latest tweets" of correct people but collects wrong data.
- Agent faces duckduck go rate limit and fails to collect the tweets, does not tries again.

The execution of the graph looks different at different runs. Here is the list of steps performed in the run shown above:
1. User query received and read.
2. Agent decides to use two tools:
    - pull the QLoRA paper (abstract) from arxiv
    - search for "latest tweets" using duckduckgo
3. Based on the QLoRA abstract (contains list of authors) and a useless search result for "latest tweets" the agent decides to search the Internet for the latest tweet of each of the authors.
4. Based on collected output - which seems not to contain the actual tweets - it builds a response which talks about "latest updates" instead of "latest tweets".


## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [107]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | compiled_graph | parse_output

In [108]:
pretty_print(agent_chain.invoke({"question" : "What is RAG?"}))

So far executed 2 steps.


RAG stands for Retrieval-Augmented Generation. It is a technique used in natural language processing (NLP) and machine learning to improve the performance of language models by combining retrieval-based methods with generative models. Here's a brief overview of how it works:

1. **Retrieval**: In the first step, the system retrieves relevant documents or pieces of information from a large corpus based on the input query. This is typically done using a retrieval model, such as BM25 or a dense retrieval model like DPR (Dense Passage Retrieval).

2. **Augmentation**: The retrieved documents are then used to augment the input query. This means that the information from the retrieved documents is combined with the original query to provide more context and relevant information.

3. **Generation**: Finally, a generative model, such as GPT-3 or BERT, uses the augmented input to generate a response. The generative model can produce more accurate and contextually relevant answers because it has access to additional information retrieved in the first step.

RAG is particularly useful in scenarios where the input query is complex or requires specific knowledge that may not be fully captured by the generative model alone. By leveraging external information sources, RAG can enhance the quality and relevance of the generated responses.

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```

####🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

In [115]:
questions = [
    "What is the capital city of Poland?",
    "How to store key-value pairs in Python?",
    "What capabilities agents add to LLM-based applications?",
    "Who is the current president of United States",
    "What is the most popular framework for distributed computations?",
]

answers = [
    {"must_mention" : ["Warsaw"]},
    {"must_mention" : ["dictionary", "dict"]},
    {"must_mention" : ["tool", "reasoning"]},
    {"must_mention" : ["Joe", "Biden"]},
    {"must_mention" : ["Apache", "Spark"]},
]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [116]:
from langsmith import Client

client = Client()
dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

**Answer**

Not sure I understand the question. For every question we define criteria of the answer being correct. Here we ask for specific terms to appear in the answer. Such criterium may have following shortcomings:
- answer may contain lots of irrelevant information and still pass as correct
- answer may by correct while not containing all required terms

In [117]:
pretty_print(agent_chain.invoke({"question" : "How are the correct answers associated with the questions in LangSmith evaluation?"}))

So far executed 2 steps.


In LangSmith evaluation, the correct answers are typically associated with questions through a process that involves defining a set of expected answers or ground truth for each question. This process can vary depending on the specific evaluation framework or methodology being used, but generally includes the following steps:

1. **Question Formulation**: Clearly define the questions that need to be evaluated. These questions should be specific and unambiguous.

2. **Ground Truth Creation**: Establish the correct answers for each question. This can be done by subject matter experts who provide the most accurate and reliable answers based on their knowledge and available information.

3. **Answer Mapping**: Each question is mapped to its corresponding correct answer. This mapping is crucial for the evaluation process as it serves as the reference against which the responses will be compared.

4. **Evaluation Criteria**: Define the criteria for evaluating the responses. This can include exact match, partial match, relevance, correctness, completeness, and other relevant metrics.

5. **Automated or Manual Evaluation**: Use automated tools or manual review to compare the provided answers with the correct answers. Automated tools can include algorithms that check for exact matches or use natural language processing techniques to assess the similarity between the provided and correct answers.

6. **Scoring and Feedback**: Assign scores based on the evaluation criteria and provide feedback on the performance. This helps in understanding how well the responses align with the correct answers.

In some advanced evaluation systems, machine learning models may be used to predict the correctness of answers based on training data that includes a large set of questions and their corresponding correct answers.

If you have a specific context or framework in mind for LangSmith evaluation, please provide more details so I can give a more tailored explanation.

### Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [118]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

**Answer**

Ways to improve the 'must_mention' metric:
- allow for ignoring letter casing
- allow for other options than mentioning all listed terms e.g. at least one term must appear in the answer for it pass as correct

Now that we have created our custom evaluator - let's initialize our `RunEvalConfig` with it!

In [119]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention],
)

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [120]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'RAG Pipeline - Evaluation - 1a6d3ccf' at:
https://smith.langchain.com/o/36ad7c64-e702-5d92-ad6e-24634a4b396a/datasets/c3a44be7-db1f-4956-802f-b270727d739f/compare?selectedSessions=5404d41a-497b-4fe2-b31e-bacdff10bfee

View all tests for Dataset Retrieval Augmented Generation - Evaluation Dataset - 661145e9 at:
https://smith.langchain.com/o/36ad7c64-e702-5d92-ad6e-24634a4b396a/datasets/c3a44be7-db1f-4956-802f-b270727d739f
[>                                                 ] 0/5So far executed 2 steps.
So far executed 2 steps.
So far executed 2 steps.
[--------->                                        ] 1/5So far executed 4 steps.
So far executed 4 steps.
[------------------->                              ] 2/5So far executed 2 steps.
[--------------------------------------->          ] 4/5So far executed 2 steps.
[------------------------------------------------->] 5/5

Unnamed: 0,feedback.must_mention,error,execution_time,run_id
count,5,0.0,5.0,5
unique,2,0.0,,5
top,True,,,9d6f84ca-a5be-441a-86ac-325b8801e48e
freq,4,,,1
mean,,,4.099279,
std,,,3.334678,
min,,,0.631296,
25%,,,2.761924,
50%,,,2.820014,
75%,,,4.848421,


{'project_name': 'RAG Pipeline - Evaluation - 1a6d3ccf',
 'results': {'9ab6ae6f-a014-4f65-b111-f6e223c9afa0': {'input': {'question': 'What is the capital city of Poland?'},
   'feedback': [EvaluationResult(key='must_mention', score=True, value=None, comment=None, correction=None, evaluator_info={}, feedback_config=None, source_run_id=UUID('3943bddd-0615-427e-b90d-7384266f4920'), target_run_id=None)],
   'execution_time': 0.631296,
   'run_id': '9d6f84ca-a5be-441a-86ac-325b8801e48e',
   'output': 'The capital city of Poland is Warsaw.',
   'reference': {'must_mention': ['Warsaw']}},
  'd2932d26-cf60-407e-b6aa-469c823fe239': {'input': {'question': 'How to store key-value pairs in Python?'},
   'feedback': [EvaluationResult(key='must_mention', score=True, value=None, comment=None, correction=None, evaluator_info={}, feedback_config=None, source_run_id=UUID('b78fc3aa-201b-438c-bd3c-ec16fe72299b'), target_run_id=None)],
   'execution_time': 9.434741,
   'run_id': '02d91328-199c-461d-91ca-9c

## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add to our existing conditional edge to obtain the behaviour we desire.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [146]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

####🏗️ Activity #5:

Please write markdown for the following cells to explain what each is doing.

**Answer**

Create `graph_with_helpfulness_check` as a state machine (in the form of a StateGraph) that tracks transitions between different states (nodes). Two nodes are added to the graph:
- "agent" is associated with the function `call_model` (calling an LLM)
- "action" is associated with the function `tool_node` (function calling)

In [147]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", call_model)
graph_with_helpfulness_check.add_node("action", tool_node)

**Answer**

The entry point for the graph is set to "agent", meaning that the state machine starts with the agent node.

In [148]:
graph_with_helpfulness_check.set_entry_point("agent")

**Answer**

The `tool_call_or_helpful` function defines logic to transition between states based on the context of the conversation. It accesses the last message in the `messages` list from the agent's `state`.

If the last message contains a tool call (the agent has requested the use of a tool), the function returns "action", which triggers a transition to the "action" node (`tool_node`).

The `initial_query` is the first message, and the `final_response` is the last message in the list.

If there are more than 10 messages in the conversation, the function returns "END", signaling that the conversation should terminate.

The prompt template is used to check the helpfulness of the agent's response. It compares the `initial_query` and `final_response` and asks whether the final response is helpful (indicated by 'Y') or not (indicated by 'N').

If the helpfulness check returns a 'Y', the conversation is deemed helpful and the function returns "end", which leads to the conclusion of the conversation.

If the helpfulness check returns 'N', the conversation is marked as unhelpful, and the function returns "continue", indicating that the conversation should proceed.

In [149]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

def tool_call_or_helpful(state):
  last_message = state["messages"][-1]

  pretty_print(f"Messages count = {len(state['messages'])}")

  if last_message.tool_calls:
    pretty_print("Call to action!")
    return "action"

  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    pretty_print("Limit of messages exceeded, END. The response is:")
    pretty_print(final_response.content)
    return "END"

  pretty_print("Evaluate the final response.")

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4")

  helpfulness_chain = prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    pretty_print('Helfpul! The response is:')
    pretty_print(final_response.content)
    return "end"
  else:
    pretty_print('Not helpful :(')
    return "continue"

####🏗️ Activity #4:

Please write what is happening in our `tool_call_or_helpful` function!

**Answer**

`tool_call_or_helpful` defines a conditional node in the graph. If the agent ask for a tool call, then we move to the `action` state i.e. tool calling. Note: if agent constantly calls for a tool, it will create an infinite loop.

If the agent provides a response then the number of messages in the graph state is being checked: if it exceeds 10, then the response is returned to the user. If it is no more than 10 then the response becomes evaluated for its helfulness. If it is found helful, then the execution ends. If not, then execution proceeds back to the `agent` node.

In [150]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    tool_call_or_helpful,
    {
        "continue" : "agent",
        "action" : "action",
        "end" : END
    }
)

**Answer**

Once action is completed (function calling) then the output is passed to the agent.

In [151]:
graph_with_helpfulness_check.add_edge("action", "agent")

**Answer**

Compile the graph to make it executable.

In [152]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()

**Answer**

Call the graph with an example query.

In [153]:
inputs = {"messages" : [HumanMessage(content="Related to machine learning, what is LoRA? Also, who is Tim Dettmers? Also, what is Attention?")]}

async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        # print(f"Receiving update from node: '{node}'")
        # print(values["messages"])
        # print("\n\n")
        pretty_print(f"Receiving update from node: **{node}**")
        if node == "action":
          print("    Tools Used:")
          for x in values['messages']: pretty_print(f"{x.name}")
        # print(">>>")
        # pretty_print(str(values["messages"]))
        # print("<<<")
        for x in values["messages"]:
          print("    Content:")
          if isinstance(x.content, str):
             pretty_print(x.content)
          else:
            for y in x.content: pretty_print(f"{y}")
          print("    Additional kwargs:")
          for y in x.additional_kwargs: pretty_print(f"{y}")
          try:
            print("    Tool calls:")
            for y in x.tool_calls: pretty_print(f"{y}")
          except:
            print("    No tool calls requested")
        print("\n\n")

Messages count = 2

Call to action!

Receiving update from node: **agent**

    Content:




    Additional kwargs:


tool_calls

refusal

    Tool calls:


{'name': 'duckduckgo_search', 'args': {'query': 'LoRA machine learning'}, 'id': 'call_MDG6mzJVHtsHMc42yRRXcnVq', 'type': 'tool_call'}

{'name': 'duckduckgo_search', 'args': {'query': 'Tim Dettmers'}, 'id': 'call_WFSBzzNTrX3OIBYWqihD3r5V', 'type': 'tool_call'}

{'name': 'duckduckgo_search', 'args': {'query': 'Attention in machine learning'}, 'id': 'call_PiZSFPUUvlE7ZTPkWbdGVZ2a', 'type': 'tool_call'}






Receiving update from node: **action**

    Tools Used:


duckduckgo_search

duckduckgo_search

duckduckgo_search

    Content:


Let's jump on LoRA. Low-Rank Adaptation of LLMs (LoRA) So, in usual fine-tuning, we. Take a pretrained model. Do Transfer Learning over new training data to slightly adjust these pre-trained weights LoRA's approach to decomposing ( Δ W ) into a product of lower rank matrices effectively balances the need to adapt large pre-trained models to new tasks while maintaining computational efficiency. The intrinsic rank concept is key to this balance, ensuring that the essence of the model's learning capability is preserved with significantly ... Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and full finetuning on two target domains, programming and mathematics. We consider both the instruction finetuning ($\\approx$100K prompt-response ... For example, on the subset of the GLUE dataset with T5-Base, LoRA-GA outperforms LoRA by 5.69% on average. On larger models such as Llama 2-7B, LoRA-GA shows performance improvements of 0.34, 11.52%, and 5.05% on MT-bench, GSM8K, and Human-eval, respectively. Additionally, we observe up to 2-4 times convergence speed improvement compared to ... This is where Low-Rank Adaptation (LoRA) technique appeared to be a game-changer enabling fine-tuning very big LLMs to specific task on limited resources and datasets. LoRA introduces a seemingly-simple yet powerful and cost-effective way to fine-tune LLMs and adapt them to a specific task by integrating low-rank matrices into the model's ...

    Additional kwargs:
    Tool calls:
    No tool calls requested
    Content:


— Tim Dettmers is joining Ai2 as an AI researcher. Dettmers specializes in efficient deep learning at the intersection of machine learning, NLP, and computer systems with a focus on quantization ... Allen School Ph.D. student Tim Dettmers accepted the grand prize for QLoRA, a novel approach to finetuning pretrained models that significantly reduces the amount of GPU memory required — from over 780GB to less than 48GB — to finetune a 65B parameter model. With QLoRA, the largest publicly available models can be finetuned on a single ... Its purpose is to make cutting-edge research by Tim Dettmers, a leading academic expert on quantization and the use of deep learning hardware accelerators, accessible to the general public. QLoRA: One of the core contributions of bitsandbytes towards the democratization of AI. Tim Dettmers. Video. Tech Moves: AI researcher Yejin Choi leaves Univ. of Washington and Allen Institute for AI. by Todd Bishop & Taylor Soper on August 2, 2024 August 2, 2024 at 11:59 am. Tim Dettmers' research focuses on making foundation models, such as ChatGPT, accessible to researchers and practitioners by reducing their resource requirements. This involves developing novel compression and networking algorithms and building systems that allow for memory-efficient, fast, and cheap deep learning. ... Dettmers is a PhD ...

    Additional kwargs:
    Tool calls:
    No tool calls requested
    Content:


Learn how attention mechanisms in deep learning enable models to focus on relevant information and improve performance in tasks such as machine translation, image captioning, and speech recognition. Understand the steps and components of attention mechanism architecture and see examples of its applications. The Transformer, since then, has become a popular architecture choice for a variety of tasks. It is capable of capturing long-range dependencies in data making it a powerful tool not only for NLP but also for computer vision, audio, and protein folding. Transformer: the all-powerful in the land of machine learning. Attention mechanism is a fundamental invention in artificial intelligence and machine learning, redefining the capabilities of deep learning models. This mechanism, inspired by the human mental process of selective focus, has emerged as a pillar in a variety of applications, accelerating developments in natural language processing, computer vision, and beyond. In the ever-evolving field of deep learning, one concept that has garnered significant attention (pun intended) is the Attention Mechanism. This ingenious concept has revolutionized the way neural… The introduction of the Transformer model was a significant leap forward for the concept of attention in deep learning. Vaswani et al. described this model in the seminal paper titled "Attention is All You Need" in 2017. ... Attention mechanisms represent advancements in machine learning and computer vision, enabling models to prioritize ...

    Additional kwargs:
    Tool calls:
    No tool calls requested





Messages count = 6

Evaluate the final response.

Helfpul! The response is:

### LoRA (Low-Rank Adaptation)

LoRA, or Low-Rank Adaptation, is a technique used in machine learning to fine-tune large language models (LLMs) efficiently. Instead of adjusting all the parameters of a pre-trained model, LoRA focuses on training only low-rank perturbations to selected weight matrices. This approach significantly reduces the memory and computational resources required for fine-tuning, making it possible to adapt large models to specific tasks even with limited resources and datasets. The method involves decomposing the weight updates into a product of lower-rank matrices, which balances the need for adaptation while maintaining computational efficiency.

### Tim Dettmers

Tim Dettmers is a researcher specializing in efficient deep learning, particularly at the intersection of machine learning, natural language processing (NLP), and computer systems. He is known for his work on quantization and the use of deep learning hardware accelerators. One of his notable contributions is QLoRA, a method that significantly reduces the GPU memory required to fine-tune large pre-trained models. Dettmers' research aims to make advanced AI models more accessible by developing novel compression and networking algorithms that allow for memory-efficient, fast, and cost-effective deep learning.

### Attention in Machine Learning

Attention mechanisms are a fundamental concept in deep learning that enable models to focus on relevant parts of the input data, improving performance in various tasks such as machine translation, image captioning, and speech recognition. The attention mechanism was popularized by the Transformer model, introduced in the seminal paper "Attention is All You Need" by Vaswani et al. in 2017. This mechanism allows models to capture long-range dependencies in data, making it a powerful tool not only for NLP but also for computer vision, audio processing, and other domains. Attention mechanisms have revolutionized the capabilities of deep learning models by allowing them to prioritize important information, much like the human mental process of selective focus.

Receiving update from node: **agent**

    Content:


### LoRA (Low-Rank Adaptation)

LoRA, or Low-Rank Adaptation, is a technique used in machine learning to fine-tune large language models (LLMs) efficiently. Instead of adjusting all the parameters of a pre-trained model, LoRA focuses on training only low-rank perturbations to selected weight matrices. This approach significantly reduces the memory and computational resources required for fine-tuning, making it possible to adapt large models to specific tasks even with limited resources and datasets. The method involves decomposing the weight updates into a product of lower-rank matrices, which balances the need for adaptation while maintaining computational efficiency.

### Tim Dettmers

Tim Dettmers is a researcher specializing in efficient deep learning, particularly at the intersection of machine learning, natural language processing (NLP), and computer systems. He is known for his work on quantization and the use of deep learning hardware accelerators. One of his notable contributions is QLoRA, a method that significantly reduces the GPU memory required to fine-tune large pre-trained models. Dettmers' research aims to make advanced AI models more accessible by developing novel compression and networking algorithms that allow for memory-efficient, fast, and cost-effective deep learning.

### Attention in Machine Learning

Attention mechanisms are a fundamental concept in deep learning that enable models to focus on relevant parts of the input data, improving performance in various tasks such as machine translation, image captioning, and speech recognition. The attention mechanism was popularized by the Transformer model, introduced in the seminal paper "Attention is All You Need" by Vaswani et al. in 2017. This mechanism allows models to capture long-range dependencies in data, making it a powerful tool not only for NLP but also for computer vision, audio processing, and other domains. Attention mechanisms have revolutionized the capabilities of deep learning models by allowing them to prioritize important information, much like the human mental process of selective focus.

    Additional kwargs:


refusal

    Tool calls:





### Task 4: LangGraph for the "Patterns" of GenAI

Let's ask our system about the 4 patterns of Generative AI:

1. Prompt Engineering
2. RAG
3. Fine-tuning
4. Agents

In [154]:
patterns = ["prompt engineering", "RAG", "fine-tuning", "LLM-based agents"]

In [156]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  # print(messages["messages"][-1].content)
  print("\n\n")

Messages count = 2

Evaluate the final response.

Helfpul! The response is:

Prompt engineering is a concept primarily associated with the field of artificial intelligence, particularly in the context of natural language processing (NLP) and large language models like GPT-3. It involves the design and crafting of prompts (input text) to elicit desired responses from AI models. The goal is to optimize the input to get the most accurate, relevant, or useful output from the model.

### Key Aspects of Prompt Engineering:
1. **Crafting Effective Prompts**: Designing prompts that are clear, specific, and structured in a way that the AI can understand and respond to appropriately.
2. **Iterative Testing**: Continuously refining prompts based on the responses received to improve the quality and relevance of the output.
3. **Understanding Model Behavior**: Gaining insights into how the model interprets different types of prompts and using this knowledge to guide prompt design.

### Emergence of Prompt Engineering:
Prompt engineering became more prominent with the advent of large-scale language models like OpenAI's GPT-3, which was released in June 2020. The ability of these models to generate human-like text based on prompts led to a growing interest in how to effectively communicate with them to achieve specific goals. The term "prompt engineering" itself started gaining traction around this time as researchers and practitioners began to explore and document best practices for interacting with these advanced AI systems.

Would you like more detailed information or recent developments in prompt engineering?






Messages count = 2

Evaluate the final response.

Helfpul! The response is:

RAG stands for Retrieval-Augmented Generation. It is a technique in natural language processing (NLP) that combines retrieval-based methods with generative models to improve the quality and relevance of generated text. The key idea is to retrieve relevant documents or pieces of information from a large corpus and use them to inform and enhance the generation process.

RAG was introduced by Facebook AI Research (FAIR) in a paper titled "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," which was published in 2020. The technique leverages both retrieval and generation to handle knowledge-intensive tasks more effectively than either approach alone.

Would you like more detailed information or specific papers on RAG?






Messages count = 2

Evaluate the final response.

Helfpul! The response is:

Fine-tuning is a process in machine learning where a pre-trained model is further trained on a new, often smaller, dataset to adapt it to a specific task. This approach leverages the knowledge the model has already acquired during its initial training on a large dataset, making it more efficient and effective for specialized tasks.

### Key Aspects of Fine-Tuning:
1. **Pre-trained Model**: Start with a model that has been trained on a large dataset.
2. **New Dataset**: Use a smaller, task-specific dataset to further train the model.
3. **Adaptation**: The model adjusts its parameters to better perform the new task while retaining the general knowledge from the initial training.

### Benefits:
- **Efficiency**: Requires less data and computational resources compared to training a model from scratch.
- **Performance**: Often results in better performance on specific tasks due to the pre-existing knowledge in the model.

### Historical Context:
Fine-tuning became particularly prominent with the advent of deep learning and transfer learning techniques. It gained significant attention with the success of models like BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which demonstrated the effectiveness of fine-tuning for various natural language processing tasks.

To pinpoint the exact timeline and key milestones, I can look up more detailed information. Would you like me to do that?






Messages count = 2

Evaluate the final response.

Helfpul! The response is:

LLM-based agents, or Large Language Model-based agents, are artificial intelligence systems that leverage large language models to perform a variety of tasks. These tasks can include natural language understanding, text generation, translation, summarization, question answering, and more. The core technology behind these agents is typically a deep learning model trained on vast amounts of text data to understand and generate human-like text.

### Key Characteristics of LLM-based Agents:
1. **Natural Language Processing (NLP):** They excel in understanding and generating human language.
2. **Contextual Understanding:** They can maintain context over longer conversations or documents.
3. **Versatility:** They can be fine-tuned for specific tasks or domains.
4. **Scalability:** They can handle a wide range of applications from chatbots to complex decision-making systems.

### Breakthrough and Evolution:
The concept of LLM-based agents has been around for a while, but significant breakthroughs occurred with the development of models like OpenAI's GPT (Generative Pre-trained Transformer) series. Here are some key milestones:

1. **2018 - GPT-1:** OpenAI released the first version of GPT, which demonstrated the potential of pre-trained language models.
2. **2019 - GPT-2:** This version showed significant improvements in text generation and understanding, leading to widespread attention and adoption.
3. **2020 - GPT-3:** With 175 billion parameters, GPT-3 became one of the largest and most powerful language models, capable of performing a wide range of tasks with minimal fine-tuning.
4. **2021 and Beyond:** Continued advancements in model architecture, training techniques, and the release of other large models like Google's BERT, T5, and more have further solidified the role of LLM-based agents in AI applications.

### Current Trends:
- **Integration in Products:** LLM-based agents are now integrated into various products and services, including virtual assistants, customer service bots, and content creation tools.
- **Research and Development:** Ongoing research aims to make these models more efficient, ethical, and capable of understanding more complex tasks.
- **Ethical Considerations:** There is a growing focus on addressing biases, ensuring data privacy, and making these models more transparent and accountable.

Would you like to know more about a specific aspect of LLM-based agents or their applications?




