langchain-ai · lnhsingh · Nov 10, 2025 · Nov 10, 2025 · Nov 10, 2025 · Nov 10, 2025
@@ -340,7 +340,7 @@ Now let's define a parent agent that combines our two task-specific agents. The
 
 ```python
 # Schema for routing user intent.
-# We'll use structured outputs to enforce that the model returns only
+# We'll use structured output to enforce that the model returns only
 # the desired output.
 class UserIntent(TypedDict):
     """The user's current intent in the conversation"""
@@ -1267,7 +1267,7 @@ qa_graph = create_agent(qa_llm, [lookup_track, lookup_artist, lookup_album])
 
 
 # Schema for routing user intent.
-# We'll use structured outputs to enforce that the model returns only
+# We'll use structured output to enforce that the model returns only
 # the desired output.
 class UserIntent(TypedDict):
     """The user's current intent in the conversation"""

@@ -274,7 +274,7 @@ class GradeHallucinations(BaseModel):
     """Binary score for hallucination present in generation answer."""
     is_grounded: bool = Field(..., description="True if the answer is grounded in the facts, False otherwise.")
 
-# LLM with structured outputs for grading hallucinations
+# LLM with structured output for grading hallucinations
 # For more see: https://python.langchain.com/docs/how_to/structured_output/
 grader_llm= init_chat_model("gpt-4o-mini", temperature=0).with_structured_output(
     GradeHallucinations,

@@ -77,7 +77,7 @@ Tools are interfaces the LLM can use to interact with the outside world. Tools c
 Structured output is a feature of most state of the art LLMs, wherein instead of producing raw text as output they stick to a specified schema. This may or may not use [Tools](#tools) under the hood.
 
 <Check>
-Structured outputs are similar to tools, but different in a few key ways. With tools, the LLM choose which tool to call (or may choose not to call any); with structured output, the LLM **always** responds in this format. With tools, the LLM may select **multiple** tools; with structured output, only one response is generate.
+Structured output is similar to tools, but different in a few key ways. With tools, the LLM choose which tool to call (or may choose not to call any); with structured output, the LLM **always** responds in this format. With tools, the LLM may select **multiple** tools; with structured output, only one response is generate.
 </Check>
 
 ### Model

@@ -3,7 +3,7 @@ title: Trace with Instructor
 sidebarTitle: Instructor (Python only)
 ---
 
-LangSmith provides a convenient integration with [Instructor](https://python.useinstructor.com/), a popular open-source library for generating structured outputs with LLMs.
+LangSmith provides a convenient integration with [Instructor](https://python.useinstructor.com/), a popular open-source library for generating structured output with LLMs.
 
 In order to use, you first need to set your LangSmith API key.
 

@@ -55,7 +55,7 @@ Deep agents applications can be deployed via [LangSmith Deployment](/langsmith/d
 ## Get started
 
 :::python
-<CardGroup cols={2}>
+<CardGroup cols={3}>
     <Card title="Quickstart" icon="rocket" href="/oss/deepagents/quickstart">
         Build your first deep agent
     </Card>
@@ -82,8 +82,5 @@ Deep agents applications can be deployed via [LangSmith Deployment](/langsmith/d
     <Card title="Middleware" icon="layer-group" href="/oss/deepagents/middleware">
         Understand the middleware architecture
     </Card>
-    <Card title="Reference" icon="arrow-up-right-from-square" href="https://reference.langchain.com/javascript/deepagents/">
-        See the `deepagents` API reference
-    </Card>
 </CardGroup>
 :::
@@ -10,7 +10,7 @@ import ChatModelTabsJS from '/snippets/chat-model-tabs-js.mdx';
 In addition to text generation, many models support:
 
 * <Icon icon="hammer" size={16} /> [Tool calling](#tool-calling) - calling external tools (like databases queries or API calls) and use results in their responses.
-* <Icon icon="shapes" size={16} /> [Structured output](#structured-outputs) - where the model's response is constrained to follow a defined format.
+* <Icon icon="shapes" size={16} /> [Structured output](#structured-output) - where the model's response is constrained to follow a defined format.
 * <Icon icon="image" size={16} /> [Multimodality](#multimodal) - process and return data other than text, such as images, audio, and video.
 * <Icon icon="brain" size={16} /> [Reasoning](#reasoning) - models perform multi-step reasoning to arrive at a conclusion.
 
@@ -889,9 +889,9 @@ Below, we show some common ways you can use tool calling.
 
 ---
 
-## Structured outputs
+## Structured output
 
-Models can be requested to provide their response in a format matching a given schema. This is useful for ensuring the output can be easily parsed and used in subsequent processing. LangChain supports multiple schema types and methods for enforcing structured outputs.
+Models can be requested to provide their response in a format matching a given schema. This is useful for ensuring the output can be easily parsed and used in subsequent processing. LangChain supports multiple schema types and methods for enforcing structured output.
 
 :::python
 <Tabs>
@@ -1043,7 +1043,7 @@ Models can be requested to provide their response in a format matching a given s
 
 :::python
 <Note>
-    **Key considerations for structured outputs:**
+    **Key considerations for structured output:**
 
     - **Method parameter**: Some providers support different methods (`'json_schema'`, `'function_calling'`, `'json_mode'`)
         - `'json_schema'` typically refers to dedicated structured output features offered by a provider
@@ -1056,7 +1056,7 @@ Models can be requested to provide their response in a format matching a given s
 
 :::js
 <Note>
-    **Key considerations for structured outputs:**
+    **Key considerations for structured output:**
 
     - **Method parameter**: Some providers support different methods (`'jsonSchema'`, `'functionCalling'`, `'jsonMode'`)
     - **Include raw**: Use @[`includeRaw: true`][BaseChatModel.with_structured_output(include_raw)] to get both the parsed output and the raw @[`AIMessage`]

@@ -201,15 +201,6 @@ graph LR
     K --> O
     L --> O
     F --> P
-
-    %% Styling
-    classDef runtimeStyle fill:#e3f2fd,stroke:#1976d2
-    classDef resourceStyle fill:#e8f5e8,stroke:#388e3c
-    classDef capabilityStyle fill:#fff3e0,stroke:#f57c00
-
-    class A,B,C,D,E,F runtimeStyle
-    class G,H,I,J,K,L resourceStyle
-    class M,N,O,P capabilityStyle
 ```
 
 ### `ToolRuntime`

@@ -600,7 +600,7 @@ A [state schema](/oss/langgraph/graph-api#schema) specifies a set of keys that a
 
 But, what if we want to retain some information _across threads_? Consider the case of a chatbot where we want to retain specific information about the user across _all_ chat conversations (e.g., threads) with that user!
 
-With checkpointers alone, we cannot share information across threads. This motivates the need for the [`Store`](https://python.langchain.com/api_reference/langgraph/index.html#module-langgraph.store) interface. As an illustration, we can define an `InMemoryStore` to store information about a user across threads. We simply compile our graph with a checkpointer, as before, and with our new `in_memory_store` variable.
+With checkpointers alone, we cannot share information across threads. This motivates the need for the [`Store`](https://reference.langchain.com/python/langgraph/store/) interface. As an illustration, we can define an `InMemoryStore` to store information about a user across threads. We simply compile our graph with a checkpointer, as before, and with our new `in_memory_store` variable.
 
 <Info>
 **LangGraph API handles stores automatically**

@@ -135,6 +135,18 @@ print(ai_msg.content)
 J'adore la programmation.
 ```
 
+## Streaming usage metadata
+
+OpenAI's Chat Completions API does not stream token usage statistics by default (see API reference [here](https://platform.openai.com/docs/api-reference/completions/create#completions-create-stream_options)).
+
+To recover token counts when streaming with @[`ChatOpenAI`] or `AzureChatOpenAI`, set `stream_usage=True` as an initialization parameter or on invocation:
+
+```python
+from langchain_openai import AzureChatOpenAI
+
+llm = AzureChatOpenAI(model="gpt-4.1-mini", stream_usage=True)  # [!code highlight]
+```
+
 ## Specifying model version
 
 Azure OpenAI responses contain `model_name` response metadata property, which is name of the model used to generate the response. However unlike native OpenAI responses, it does not contain the specific version of the model, which is set on the deployment in Azure. e.g. it does not distinguish between `gpt-35-turbo-0125` and `gpt-35-turbo-0301`. This makes it tricky to know which version of the model was used to generate the response, which as result can lead to e.g. wrong total cost calculation with `OpenAICallbackHandler`.