# Streaming

All Runnable objects implement a sync method called stream and an async variant called astream.

These methods are designed to stream the final output in chunks, yielding each chunk as soon as it is available.

Streaming is only possible if all steps in the program know how to process an input stream; i.e., process an input chunk one at a time, and yield a corresponding output chunk.

The complexity of this processing can vary, from straightforward tasks like emitting tokens produced by an LLM, to more challenging ones like streaming parts of JSON results before the entire JSON is complete.

In [1]:
import os
import sys
# %load_ext autoreload
# %autoreload 2
cwd = os.getcwd()
parent_dir = os.path.dirname(cwd)
sys.path.append(parent_dir)
sys.path.append('..')
from dotenv import load_dotenv
from typing import List, Tuple
from langchain_core.messages  import AIMessageChunk

from srcs.llm import LLM

load_dotenv()

True

In [3]:
model = LLM()
question = "who's elon musk?"
r: Tuple[str, List[AIMessageChunk]] = model.stream(question)

Elon Musk is a billionaire entrepreneur and CEO of multiple companies, including Tesla Inc., SpaceX, Neuralink, and The Boring Company. He is known for his work in advancing technology and innovation in industries such as electric vehicles, space exploration, and artificial intelligence. Musk is also involved in various philanthropic efforts and has a significant influence on the tech industry.

In [4]:
type(r)

tuple

In [5]:
print(r[0])

Elon Musk is a billionaire entrepreneur and CEO of multiple companies, including Tesla Inc., SpaceX, Neuralink, and The Boring Company. He is known for his work in advancing technology and innovation in industries such as electric vehicles, space exploration, and artificial intelligence. Musk is also involved in various philanthropic efforts and has a significant influence on the tech industry.


In [7]:
print(r[1][:10])
""" 
AIMessageChunk = {
    'content': str,
    'additional_kwargs': dict,
    'response_metadata': dict,
    'id': str
}
"""

[AIMessageChunk(content='', additional_kwargs={}, response_metadata={}, id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'), AIMessageChunk(content='El', additional_kwargs={}, response_metadata={}, id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'), AIMessageChunk(content='on', additional_kwargs={}, response_metadata={}, id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'), AIMessageChunk(content=' Musk', additional_kwargs={}, response_metadata={}, id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'), AIMessageChunk(content=' is', additional_kwargs={}, response_metadata={}, id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'), AIMessageChunk(content=' a', additional_kwargs={}, response_metadata={}, id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'), AIMessageChunk(content=' billionaire', additional_kwargs={}, response_metadata={}, id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'), AIMessageChunk(content=' entrepreneur', additional_kwargs={}, response_metadata={}, id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'), AIMessag

" \nAIMessageChunk = {\n    'content': str,\n    'additional_kwargs': dict,\n    'response_metadata': dict,\n    'id': str\n}\n"

Message chunks are additive by design -- one can simply add them up to get the state of the response so far!

In [17]:
result = r[1][0] + r[1][1] + r[1][2] + r[1][3]

print(result)

content='Elon Musk' additional_kwargs={} response_metadata={} id='run-edeb4d0d-5a58-49dc-a0aa-121e03c0f1dc'
