# Streaming

Streaming is an important UX consideration for LLM apps, and agents are no exception. Streaming with agents is made more complicated by the fact that it's not just tokens of the final answer that you will want to stream, but you may also want to stream back the intermediate steps an agent takes.

Our agent will use the OpenAI tools API for tool invocation, and we'll provide the agent with two tools:

1. `where_cat_is_hiding`: A tool that uses an LLM to tell us where the cat is hiding
2. `get_items`: A tool that uses an LLM to determine which items are in a given place

In this notebook, we'll see how to use `.stream` to stream action / observation pairs, and then we'll see how to use `.astream_log` to stream LLM output token by token, including from within the underlying tools.

In [12]:
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union
from uuid import UUID

from langchain import agents, hub
from langchain.prompts import ChatPromptTemplate
from langchain.tools import tool
from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.callbacks import Callbacks
from langchain_core.callbacks.base import AsyncCallbackHandler
from langchain_core.documents import Document
from langchain_core.messages import BaseMessage
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
from langchain_openai import ChatOpenAI

## Create the model

**Attention** For older versions of langchain, we must set `streaming=True`

In [13]:
model = ChatOpenAI(temperature=0.0, streaming=True)

## Tools

We define two tools that rely on a chat model to generate output!

Please note a few different things:

1. We invoke the model using .stream() to force the output to stream (unfortunately for older langchain versions you should still set `streaming=True` on the model)
2. We attach tags to the model so that we can filter on said tags when using astream_log

In [15]:
@tool
async def where_cat_is_hiding(callbacks: Callbacks) -> str:  # <--- Accept callbacks
    """Where is the cat hiding right now?"""
    # Attach name, tags and callbacks.
    # Name
    configured_model = model.with_config{{'tags': ['hiding_spot'], 'run_name': "where_cat_at", "callbacks": callbacks})
    chunks = [
        chunk
        async for chunk in configured_model.astream(
            "Make up a place in the house where the cat might be hiding in the house right now. Include the name of the place only.",
        )
    ]
    return "".join(chunk.content for chunk in chunks)


@tool
async def get_items(place: str, callbacks: Callbacks) -> str:  # <--- Accept callbacks
    """Use this tool to look up which items are in the given place."""
    template = ChatPromptTemplate.from_messages(
        [
            (
                "human",
                "Can you tell me what kind of items i might find in the following place: '{place}'. "
                "List at least 3 such items separating them by a comma. And include a brief description of each item..",
            ),
        ]
    )
    chain = template | model.with_config(
        {"tags": ["get_items"], "run_name": "Get Items LLM",  "callbacks": callbacks} # <-- Propagate callbacks
    )
    chunks = [
        chunk
        async for chunk in chain.astream({"place": place})
    ]
    return "".join(chunk.content for chunk in chunks)

SyntaxError: closing parenthesis ')' does not match opening parenthesis '{' (1547706171.py, line 6)

In [9]:
await where_cat_is_hiding.ainvoke({})

'The Fuzzy Fortress'

In [10]:
await get_items.ainvoke({"place": "on a a table"})

'On a table, you might find a few common items such as:\n\n1. A book: A book is a written or printed work consisting of pages glued or sewn together along one side and bound in covers. It could be a novel, a textbook, or any other type of reading material.\n\n2. A coffee mug: A coffee mug is a cylindrical-shaped cup typically used for drinking hot beverages like coffee or tea. It usually has a handle for easy gripping and can be made of various materials such as ceramic, glass, or stainless steel.\n\n3. A vase with flowers: A vase is a decorative container, often made of glass or ceramic, used to hold flowers or other ornamental plants. It adds a touch of beauty and freshness to the surroundings, and the flowers inside can vary depending on personal preference or occasion.'

In [11]:
await get_items.ainvoke({"place": "The Mysterious Laundry Basket"})

"In 'The Mysterious Laundry Basket', you might find the following items:\n\n1. A tattered diary: This diary appears to be old and worn, with its pages yellowed and edges frayed. It contains cryptic writings, sketches, and mysterious symbols, hinting at hidden secrets and forgotten stories.\n\n2. A silver locket: This delicate locket is intricately designed with ornate patterns and engravings. It is locked shut, and its contents remain a mystery. The locket holds an air of nostalgia and whispers of forgotten memories.\n\n3. A faded map: This map seems to have been through many hands and countless adventures. Its edges are worn, and some areas are faded, making it difficult to decipher. It leads to an unknown destination, promising untold treasures or perhaps a hidden realm waiting to be discovered."

## Initialize the agent

Here, we'll initialize an OpenAI tools agent.

In [7]:
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-tools-agent")
# print(prompt.messages) -- to see the prompt
tools = [get_items, where_cat_is_hiding]
agent = agents.create_openai_tools_agent(
    model.with_config({"tags": ["agent_llm"]}), tools, prompt
)
agent_executor = agents.AgentExecutor(agent=agent, tools=tools)

## Stream Intermediate Steps

We'll use `.stream` method of the AgentExecutor to stream the agent's intermediate steps.

The output from `.stream` alternates between (action, observation) pairs, finally concluding with the answer if the agent achieved its objective. 

It'll look like this:

1. actions output
2. observations output
3. actions output
4. observations output

**... (continue until goal is reached) ...**

Then, if the final goal is reached, the agent will output the **final answer**.


The contents of these outputs are summarized here:

| Output             | Contents                                                                                          |
|----------------------|------------------------------------------------------------------------------------------------------|
| **Actions**   |  <ul> <li> `actions` `AgentAction` or a subclass </li><li> `messages` chat messages corresponding to action invocation </li></ul> |
| **Observations** | <ul> <li> `steps` History of what the agent did so far, including the current action and its observation </li><li> `messages` chat message with function invocation results (aka observations) </li></ul>|
| **Final answer** | <ul> <li> `output` `AgentFinish`  </li><li> `messages` chat messages with the final output </li></ul>|

In [9]:
# Note: We use `pprint` to print only to depth 1, it makes it easier to see the output from a high level, before digging in.
import pprint

chunks = []

async for chunk in agent_executor.astream(
    {"input": "what's items are located where the cat is hiding?"}
):
    chunks.append(chunk)
    print("------")
    pprint.pprint(chunk, depth=1)

------
{'actions': [...], 'messages': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'actions': [...], 'messages': [...]}
------
{'messages': [...], 'steps': [...]}
------
{'messages': [...],
 'output': 'The items located where the cat is hiding in "The Fuzzy Fortress" '
           'are:\n'
           '\n'
           '1. Fluffy Slippers\n'
           '2. Fuzzy Blankets\n'
           '3. Furry Pillows'}


### Using Messages

You can access the underlying `messages` from the outputs. Using messages can be nice when working with chat applications - because everything is a message!

In [13]:
chunks[0]["actions"]

[OpenAIToolAgentAction(tool='where_cat_is_hiding', tool_input={}, log='\nInvoking: `where_cat_is_hiding` with `{}`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_5OKVxVMSBDWzHsXACHozZqUo', 'function': {'arguments': '{}', 'name': 'where_cat_is_hiding'}, 'type': 'function'}]})], tool_call_id='call_5OKVxVMSBDWzHsXACHozZqUo')]

In [14]:
for chunk in chunks:
    print(chunk["messages"])

[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_5OKVxVMSBDWzHsXACHozZqUo', 'function': {'arguments': '{}', 'name': 'where_cat_is_hiding'}, 'type': 'function'}]})]
[FunctionMessage(content='The Fuzzy Fortress', name='where_cat_is_hiding')]
[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_sxYLixBFuB9mfIH3CiQlF2q5', 'function': {'arguments': '{\n  "place": "The Fuzzy Fortress"\n}', 'name': 'get_items'}, 'type': 'function'}]})]
[FunctionMessage(content="In 'The Fuzzy Fortress', you might find the following items:\n\n1. Fluffy Slippers: These cozy slippers are made of soft, plush material, providing ultimate comfort for your feet. They come in various vibrant colors and have adorable fuzzy animal faces on the front.\n\n2. Fuzzy Blankets: These luxurious blankets are crafted from high-quality fleece, ensuring warmth and snuggliness. They are available in different sizes and patterns, featuring cute fuzzy animals or 

In addition, they contain full logging information (`actions` and `steps`) which may be easier to process for rendering purposes.

### Using AgentAction/Observation

The outputs also contain richer structured information inside of `actions` and `steps`, which could be useful in some situations, but can also be harder to parse.

**Attention** `AgentFinish` is not available as part of the `streaming` method. If this is something you'd like to be added, please start a discussion on github and explain the reasoning.

In [17]:
async for chunk in agent_executor.astream(
    {"input": "what's items are located where the cat is hiding?"}
):
    # Agent Action
    if "actions" in chunk:
        for action in chunk["actions"]:
            print(f"Calling Tool: `{action.tool}` with input `{action.tool_input}`")
    # Observation
    elif "steps" in chunk:
        for step in chunk["steps"]:
            print(f"Tool Result: `{step.observation}`")
    # Final result
    elif "output" in chunk:
        print(f'Final Output: {chunk["output"]}')
    else:
        raise ValueError()
    print("---")

Calling Tool: `where_cat_is_hiding` with input `{}`
---
Tool Result: `The Cozy Cavern`
---
Calling Tool: `get_items` with input `{'place': 'Cozy Cavern'}`
---
Tool Result: `In the "Cozy Cavern," you might find:

1. Plush velvet cushions: Soft and luxurious, these cushions are perfect for sinking into and getting comfortable. They come in various colors, providing a cozy and inviting atmosphere in the cavern.
2. Rustic fireplace: A charming fireplace made of stone or brick, with a crackling fire warming the space. It adds both warmth and a rustic ambiance to the cavern, making it an ideal spot for relaxation and gathering.
3. Twinkling fairy lights: Delicate and enchanting, these fairy lights are strung across the cavern's ceiling, creating a magical and whimsical atmosphere. Their soft glow adds a touch of warmth and beauty to the space, making it feel even cozier.`
---
Final Output: The items located where the cat is hiding in the Cozy Cavern are:

1. Plush velvet cushions
2. Rustic f

## Streaming Tokens & More

For some applications, you may want to stream individual LLM tokens, surface information about tool execution, or output custom messages before / after tool executions.

There are different ways in which you might be able to achieve token streaming:

1. `astream_events`: **beta** API, introduced in new langchain versions. This is the **recommended** approach.
2. [astream_log](https://python.langchain.com/docs/expression_language/interface#async-stream-intermediate-steps) API: Produces a granular log of all events that occur during execution. The log format is based on the [JSONPatch](https://jsonpatch.com/) standard. It's granular, but requirs some effort to parse.
3. `callbacks`: This can be useful if you're on older versions of LangChain and cannot upgrade. This is **NOT** recommended, as for most applications you'll need to set up a queue and send the callbacks to another worker (i.e., there's hidden complexity!). `astream_events` does this under the hood!

**ATTENTION** 
* Make sure that you set the LLM to `streaming=True`
* Use async throughout (we will try to lift that restriction a bit, but for now if something isn't working use async!)
* 

### Event Streaming

**NEW** This is a new API only works with recent versions of langchain-core!

In this notebook, we'll see how to use `astream_events` to stream **token by token** from LLM calls used within the tools invoked by the agent. 

We will **only** stream tokens from LLMs used within tools and from no other LLMs (just to show that we can)! 

Feel free to adapt this example to the needs of your application.

Our agent will use the OpenAI tools API for tool invocation, and we'll provide the agent with two tools:

1. `where_cat_is_hiding`: A tool that uses an LLM to tell us where the cat is hiding
2. `tell_me_a_joke_about`: A tool that can use an LLM to tell a joke about the given topic

⚠️ Beta ⚠️

Event Streaming is a **beta** API, and may change a bit based on feedback.

Keep in mind the following constraints (repeated in tools section):

* streaming only works properly if using `async`
* propagate callbacks if definning custom functions / runnables
* If creating a tool that uses an LLM, make sure to use `.astream()` on the LLM rather than `.ainvoke` to ask the LLM to stream tokens.

#### Evens Reference


Here is a reference table that shows some events that might be emitted by the various Runnable objects.
Definitions for some of the Runnable are included after the table.

⚠️ When streaming the inputs for the runnable will not be available until the input stream has been entirely consumed This means that the inputs will be available at for the corresponding `end` hook rather than `start` event.


| event                | name             | chunk                           | input                                         | output                                          |
|----------------------|------------------|---------------------------------|-----------------------------------------------|-------------------------------------------------|
| on_chat_model_start  | [model name]     |                                 | {"messages": [[SystemMessage, HumanMessage]]} |                                                 |
| on_chat_model_stream | [model name]     | AIMessageChunk(content="hello") |                                               |                                                 |
| on_chat_model_end    | [model name]     |                                 | {"messages": [[SystemMessage, HumanMessage]]} | {"generations": [...], "llm_output": None, ...} |
| on_llm_start         | [model name]     |                                 | {'input': 'hello'}                            |                                                 |
| on_llm_stream        | [model name]     | 'Hello'                         |                                               |                                                 |
| on_llm_end           | [model name]     |                                 | 'Hello human!'                                |
| on_chain_start       | format_docs      |                                 |                                               |                                                 |
| on_chain_stream      | format_docs      | "hello world!, goodbye world!"  |                                               |                                                 |
| on_chain_end         | format_docs      |                                 | [Document(...)]                               | "hello world!, goodbye world!"                  |
| on_tool_start        | some_tool        |                                 | {"x": 1, "y": "2"}                            |                                                 |
| on_tool_stream       | some_tool        | {"x": 1, "y": "2"}              |                                               |                                                 |
| on_tool_end          | some_tool        |                                 |                                               | {"x": 1, "y": "2"}                              |
| on_retriever_start   | [retriever name] |                                 | {"query": "hello"}                            |                                                 |
| on_retriever_chunk   | [retriever name] | {documents: [...]}              |                                               |                                                 |
| on_retriever_end     | [retriever name] |                                 | {"query": "hello"}                            | {documents: [...]}                              |
| on_prompt_start      | [template_name]  |                                 | {"question": "hello"}                         |                                                 |
| on_prompt_end        | [template_name]  |                                 | {"question": "hello"}                         | ChatPromptValue(messages: [SystemMessage, ...]) |


Here are declarations associated with the events shown above:

`format_docs`:

```python
def format_docs(docs: List[Document]) -> str:
    '''Format the docs.'''
    return ", ".join([doc.page_content for doc in docs])

format_docs = RunnableLambda(format_docs)
```

`some_tool`:

```python
@tool
def some_tool(x: int, y: str) -> dict:
    '''Some_tool.'''
    return {"x": x, "y": y}
```

`prompt`:

```python
template = ChatPromptTemplate.from_messages(
    [("system", "You are Cat Agent 007"), ("human", "{question}")]
).with_config({"run_name": "my_template", "tags": ["my_template"]})
```



In [18]:
record_log_patches = [
    record_log_patch
    async for record_log_patch in agent_executor.astream_log(
        {"input": "what's items are located where the cat is hiding?"},
        include_types=["tool", "llm"],
    )
]

Below we're showing a few sample JSON patch operations. These json patch operations contain extremely granular information about all events that occurred during agent streaming.

In [19]:
record_log_patches[6]

RunLogPatch({'op': 'add',
  'path': '/streamed_output/-',
  'value': {'actions': [OpenAIToolAgentAction(tool='where_cat_is_hiding', tool_input={}, log='\nInvoking: `where_cat_is_hiding` with `{}`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_7j1dxpnUzJ1OUQrGaJPqtZes', 'function': {'arguments': '{}', 'name': 'where_cat_is_hiding'}, 'type': 'function'}]})], tool_call_id='call_7j1dxpnUzJ1OUQrGaJPqtZes')],
            'messages': [AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_7j1dxpnUzJ1OUQrGaJPqtZes', 'function': {'arguments': '{}', 'name': 'where_cat_is_hiding'}, 'type': 'function'}]})]}},
 {'op': 'replace',
  'path': '/final_output',
  'value': {'actions': [OpenAIToolAgentAction(tool='where_cat_is_hiding', tool_input={}, log='\nInvoking: `where_cat_is_hiding` with `{}`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_7j1dxpnUzJ1OU

In [20]:
record_log_patches[10]

RunLogPatch({'op': 'add',
  'path': '/logs/ChatOpenAI:2/streamed_output_str/-',
  'value': 'The'},
 {'op': 'add',
  'path': '/logs/ChatOpenAI:2/streamed_output/-',
  'value': AIMessageChunk(content='The')})

### Parsing astream_log

As of 2024-01-12, `LangChain` currently does provide utility code to work with astream_log, but will be introduced in the near future.

In the menatime, to make it easier to use `astream_log`, we've included sample parsing code is included below.

If you're reading this in a month in the future, please check LangChain documentation about astream_log to see if stable helpers have been added to the library.

In [21]:
from typing import Any, AsyncIterator, Dict, Literal, Optional

from langchain_core.tracers import RunLogPatch
from typing_extensions import TypedDict


class StartEvent(TypedDict):
    """Represents a start event."""

    event_type: Literal["start"]
    start_time: float
    type: Optional[str]  # e.g., llm, tool
    name: Optional[str]  # the name of the llm or tool if assigned
    tags: Optional[Dict[str, Any]]
    metadata: Optional[Dict[str, Any]]


class EndEvent(TypedDict):
    """End event."""

    event_type: Literal["end"]
    start_time: float
    type: Optional[str]
    name: Optional[str]
    tags: Optional[Dict[str, Any]]
    metadata: Optional[Dict[str, Any]]


class StreamEvent(TypedDict):
    """Streaming event."""

    event_type: Literal["stream"]
    stream_type: Literal["str", "original"]
    type: Optional[str]
    name: Optional[str]
    tags: Optional[Dict[str, Any]]
    metadata: Optional[Dict[str, Any]]


Event = Union[StartEvent, EndEvent, StreamEvent]


async def as_event_stream(
    run_log_patches: AsyncIterator[RunLogPatch]
) -> AsyncIterator[Event]:
    """Process log patches into a list of AIMessageChunks."""
    # Info keeps track of information like tags, metadata, type, name etc.
    info: Dict[str, Any] = {}
    async for run_log_patch in run_log_patches:
        for op in run_log_patch.ops:
            if op["op"] != "add":
                continue

            path = op["path"]

            if not path.startswith("/logs/"):
                continue

            path_in_logs = path[len("/logs/") :]

            components = path_in_logs.split("/")

            if len(components) == 1:
                # It's a start event.
                name = components[0]
                value = op["value"]
                info[name] = {
                    "start_time": value["start_time"],
                    "type": value.get("type"),
                    "name": value.get("name"),
                    "tags": value.get("tags"),
                    "metadata": value.get("metadata"),
                }

                yield {
                    "event_type": "start",
                    **info[name],  # TODO(Eugene): We should make a copy here
                }
                continue
            elif len(components) == 2:
                name, kind = components
                value = op["value"]
                if kind == "final_output":
                    info["value"] = value
                    continue
                elif kind == "end_time":
                    yield {
                        "event_type": "end",
                        **info[name],  # TODO(Eugene): We should make a copy here
                    }
                    continue
                else:
                    raise ValueError(op)
            elif len(components) == 3:
                name, kind, remainder = components
                if remainder != "-":
                    raise AssertionError(components)
                if kind == "streamed_output_str":
                    value = op["value"]
                    yield {
                        "event_type": "stream",
                        "stream_type": "str",
                        "value": value,
                        **info[name],  # TODO(Eugene): We should make a copy here
                    }
                    continue
                elif kind == "streamed_output":
                    value = op["value"]
                    yield {
                        "event_type": "stream",
                        "stream_type": "original",
                        "value": value,
                        **info[name],  # TODO(Eugene): We should make a copy here
                    }
                    continue
                else:
                    raise NotImplementedError(
                        f"Parsing for op: `{op}` not implemented."
                    )
            else:
                raise NotImplementedError(f"Parsing for op: `{op}` not implemented.")

In [22]:
# Let's materialize all events to make it convenient to work with them.
events = [
    event
    async for event in as_event_stream(
        agent_executor.astream_log(
            {"input": "what items are located where the cat is hiding?"},
            include_types=["tool", "llm"],
        )
    )
]
events[:3]

[{'event_type': 'start',
  'start_time': '2024-01-12T16:21:21.480+00:00',
  'type': 'llm',
  'name': 'ChatOpenAI',
  'tags': ['seq:step:3', 'agent_llm'],
  'metadata': {}},
 {'event_type': 'stream',
  'stream_type': 'str',
  'value': '',
  'start_time': '2024-01-12T16:21:21.480+00:00',
  'type': 'llm',
  'name': 'ChatOpenAI',
  'tags': ['seq:step:3', 'agent_llm'],
  'metadata': {}},
 {'event_type': 'stream',
  'stream_type': 'original',
  'value': AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_iwOW9rEwf6OtINFRCtw9hrd3', 'function': {'arguments': '', 'name': 'where_cat_is_hiding'}, 'type': 'function'}]}),
  'start_time': '2024-01-12T16:21:21.480+00:00',
  'type': 'llm',
  'name': 'ChatOpenAI',
  'tags': ['seq:step:3', 'agent_llm'],
  'metadata': {}}]

Let's use the parsed astream_log information to print stream some information about what the agent and the tools are doing!

In [23]:
from langchain_core.messages import AIMessageChunk

async for event in as_event_stream(
    agent_executor.astream_log(
        {"input": "what items are located where the cat is hiding?"},
        include_types=["tool", "llm"],
    )
):
    function_message = None
    if event["event_type"] == "start":
        tags = ", ".join(sorted(event["tags"]))
        print(f"Start >> {event['name']} ({event['type']}), tags: {tags or []}")

    if event["event_type"] == "stream":
        value = event["value"]

        if event["stream_type"] != "original":
            continue

        if not value:
            continue
        if isinstance(value, AIMessageChunk):
            # print(event['time'])
            print(value.content, end="")
        else:
            raise NotImplementedError(type(value))

    if event["event_type"] == "end":
        print()
        print(f"End >> {event['name']} [{event['type']}]")
        print()
        function_message = None

Start >> ChatOpenAI (llm), tags: agent_llm, seq:step:3

End >> ChatOpenAI [llm]

Start >> where_cat_is_hiding (tool), tags: []
Start >> ChatOpenAI (llm), tags: hiding_spot
The Cozy Cubby
End >> ChatOpenAI [llm]


End >> where_cat_is_hiding [tool]

Start >> ChatOpenAI (llm), tags: agent_llm, seq:step:3

End >> ChatOpenAI [llm]

Start >> get_items (tool), tags: []
Start >> Get Items LLM (llm), tags: get_items, seq:step:2
At 'The Cozy Cubby', you might find the following items:

1. Plush Throw Blankets: Soft and luxurious, these blankets are perfect for snuggling up on the couch or adding a cozy touch to your bed. They come in various colors and patterns, providing warmth and comfort during chilly evenings.

2. Aromatherapy Candles: These candles are designed to create a soothing ambiance and fill the air with delightful scents. Made from natural ingredients, they offer a calming and relaxing atmosphere, perfect for unwinding after a long day.

3. Vintage Teacups: Delicate and charming, t