LangChain's streaming capabilities are designed to enhance the user experience of applications built on Large Language Models (LLMs) by providing real-time feedback. Streaming is crucial for improving responsiveness due to the latency often associated with LLMs.

Here's a breakdown of the key concepts:

**What to Stream:**

*   **LLM Outputs:** Streaming the output of the LLM itself is the most common use case, allowing users to see the text being generated incrementally.
*   **Pipeline or Workflow Progress:** Streaming updates about the progress of workflows or pipelines provides users with a sense of the application's execution, including:
    *   **LangGraph Workflows:** This involves tracking changes to the graph state as individual nodes request updates.
    *   **LCEL Pipelines:** This involves capturing progress from individual sub-runnables as they execute.
*  **Custom Data**: Custom data can be streamed from specific steps within a workflow (whether a tool or a LangGraph node), providing more granular insights into the execution of the process.

**Streaming APIs:**

*   LangChain offers two main APIs for streaming output in real-time, supported by components that implement the Runnable Interface.
*   **`stream()` and `astream()`:**
    *   These methods are used to stream outputs from individual Runnables (e.g., a chat model) or any workflow created with LangGraph.
    *   `stream()` returns an iterator that yields chunks of output synchronously, while `astream()` is the asynchronous version for non-blocking workflows.
    *   The type of chunk yielded depends on the component being streamed (e.g., `AIMessageChunk` for chat models).
    *   When using these with LangGraph, you can control the type of output streamed using modes like "values", "updates", "debug", "messages", and "custom".
    *  With LCEL, `stream()` and `astream()` will stream the output of the last step in the chain.
*   **`astream_events`:**
    *   This asynchronous API provides access to custom events and intermediate outputs from LLM applications built entirely with LCEL.
    *   It is not usually needed with LangGraph, where `stream` and `astream` provide comprehensive capabilities.
    *   `astream_events` returns an iterator that yields various types of events, allowing you to filter and process them.

**Writing Custom Data to the Stream:**

*   **LangGraph:** Use the `StreamWriter` to write custom data surfaced through `stream` and `astream`. This feature is not available for pure LCEL workflows.
*   **LCEL:** Use `dispatch_events` or `adispatch_events` to write custom data surfaced through the `astream_events` API.

**Auto-Streaming:**

*   LangChain can automatically enable streaming mode in certain cases, even when you’re not explicitly calling the streaming methods.
*   When you call `invoke` (or `ainvoke`) on a chat model, LangChain will switch to streaming mode if it detects that you are trying to stream the overall application. This is done by using the `stream` or `astream` method.

**Async Programming:**

*   LangChain offers both synchronous and asynchronous versions of its methods, with async methods typically prefixed with "a" (e.g., `ainvoke`, `astream`).
*   When writing async code, use async methods consistently for non-blocking behaviour and optimal performance.

**Important Notes:**

*   When processing chunks from a stream, ensure that the processing is efficient to avoid pausing the upstream component or causing timeouts.
*   The legacy `astream_log` API is not recommended for new projects.


In [13]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama

model = ChatOllama(model="llama3.1")

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
parser = StrOutputParser()
chain = prompt | model | parser

async for event in chain.astream_events({"topic": "sun"}, version="v2"):
    kind = event["event"]
    # if kind == "on_chat_model_stream":
    #     print(event, end="|", flush=True)
    # else:
    print(event["event"], event["name"], event['data'], flush=True)
    

on_chain_start RunnableSequence {'input': {'topic': 'sun'}}
on_prompt_start ChatPromptTemplate {'input': {'topic': 'sun'}}
on_prompt_end ChatPromptTemplate {'output': ChatPromptValue(messages=[HumanMessage(content='tell me a joke about sun', additional_kwargs={}, response_metadata={})]), 'input': {'topic': 'sun'}}
on_chat_model_start ChatOllama {'input': {'messages': [[HumanMessage(content='tell me a joke about sun', additional_kwargs={}, response_metadata={})]]}}
on_chat_model_stream ChatOllama {'chunk': AIMessageChunk(content='Why', additional_kwargs={}, response_metadata={}, id='run-f7706db7-756a-47d0-b98a-78e8ecb7afaf')}
on_parser_start StrOutputParser {}
on_parser_stream StrOutputParser {'chunk': 'Why'}
on_chain_stream RunnableSequence {'chunk': 'Why'}
on_chat_model_stream ChatOllama {'chunk': AIMessageChunk(content=' did', additional_kwargs={}, response_metadata={}, id='run-f7706db7-756a-47d0-b98a-78e8ecb7afaf')}
on_parser_stream StrOutputParser {'chunk': ' did'}
on_chain_stream 