# Batching and Streaming with LangChain LCEL

This notebook demonstrates **batching** and **streaming** capabilities
in LangChain when using the **LangChain Expression Language (LCEL)**.

The focus is on:
- Building an LCEL chain using prompt templates and chat models
- Comparing single invocation vs batch invocation
- Measuring execution time differences
- Streaming model output token-by-token

These techniques are essential for building scalable
and responsive LLM-powered applications.


In [None]:
import getpass
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

## Prompt Template Definition

A reusable `ChatPromptTemplate` is defined with dynamic placeholders
for both the pet type and breed.

This template will be reused across single, batch, and streaming calls.


In [None]:
chat_template  = ChatPromptTemplate.from_messages([('human', "I've recently adopted a {pet} which is a {breed}, Can you suggest several training tips?")])

In [None]:
if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

## Model Initialization

A deterministic chat model is initialized to ensure
consistent outputs when comparing different invocation methods.


In [None]:
chat = ChatOpenAI(
    model="gpt-5-nano", 
    temperature=0, 
    model_kwargs= {"text":{"verbosity": 'low'},"reasoning":{"effort": "medium"}},
    ) 



## LCEL Chain Construction

Using the LangChain Expression Language, the prompt template
and chat model are composed into a single executable chain.


In [None]:
chain = chat_template | chat

In [None]:
chain.invoke({"pet": "cat", "breed": "Siamese"})

## Batch Invocation

Batching allows multiple inputs to be processed **in parallel**,
which is more efficient than invoking the chain repeatedly.

Key characteristics:
- Parallel execution
- Lower overhead per request
- Ideal for bulk processing


In [None]:
%%time     
chain.batch([{'pet':'cat', 'breed': 'Siamese'}, {'pet':'dragon', 'breed': 'night furry'}])

In [None]:
%%time 
chain.invoke({'pet':'dog', 'breed': 'shepherd'})

In [None]:
%%time
chain.invoke({'pet':'dragon', 'breed': 'night furry'})

## Streaming Responses

Streaming allows tokens to be returned incrementally
as they are generated by the model.

This is useful for:
- Interactive applications
- Chat interfaces
- Reducing perceived latency


In [None]:
chat = ChatOpenAI(   
    model="gpt-4",
    temperature=0, 
    model_kwargs = {'seed': 365},
    max_tokens =50

    )

## Streaming Execution

The `stream()` method returns an iterator over response chunks,
which can be processed token-by-token.


In [None]:
response = chain.stream({'pet':'dragon', 'breed': 'night furry'})

In [None]:
# next(response)

In [None]:
for chunk in response:
    print(chunk.text, end="")

## Summary

This notebook demonstrated:

- Building an LCEL chain with prompt templates and chat models  
- Executing single requests using `invoke()`  
- Improving efficiency with parallel execution using `batch()`  
- Streaming model output incrementally using `stream()`  

Batching and streaming are key techniques for
scalable, low-latency LLM applications.
