diff --git a/src/langsmith/evaluate-complex-agent.mdx b/src/langsmith/evaluate-complex-agent.mdx
index 55977866d5..9712db3b73 100644
--- a/src/langsmith/evaluate-complex-agent.mdx
+++ b/src/langsmith/evaluate-complex-agent.mdx
@@ -340,7 +340,7 @@ Now let's define a parent agent that combines our two task-specific agents. The
```python
# Schema for routing user intent.
-# We'll use structured outputs to enforce that the model returns only
+# We'll use structured output to enforce that the model returns only
# the desired output.
class UserIntent(TypedDict):
"""The user's current intent in the conversation"""
@@ -1267,7 +1267,7 @@ qa_graph = create_agent(qa_llm, [lookup_track, lookup_artist, lookup_album])
# Schema for routing user intent.
-# We'll use structured outputs to enforce that the model returns only
+# We'll use structured output to enforce that the model returns only
# the desired output.
class UserIntent(TypedDict):
"""The user's current intent in the conversation"""
diff --git a/src/langsmith/evaluate-on-intermediate-steps.mdx b/src/langsmith/evaluate-on-intermediate-steps.mdx
index 5aa1c077c0..f4e94e7165 100644
--- a/src/langsmith/evaluate-on-intermediate-steps.mdx
+++ b/src/langsmith/evaluate-on-intermediate-steps.mdx
@@ -274,7 +274,7 @@ class GradeHallucinations(BaseModel):
"""Binary score for hallucination present in generation answer."""
is_grounded: bool = Field(..., description="True if the answer is grounded in the facts, False otherwise.")
-# LLM with structured outputs for grading hallucinations
+# LLM with structured output for grading hallucinations
# For more see: https://python.langchain.com/docs/how_to/structured_output/
grader_llm= init_chat_model("gpt-4o-mini", temperature=0).with_structured_output(
GradeHallucinations,
diff --git a/src/langsmith/prompt-engineering-concepts.mdx b/src/langsmith/prompt-engineering-concepts.mdx
index afbe58b99d..9cccb71628 100644
--- a/src/langsmith/prompt-engineering-concepts.mdx
+++ b/src/langsmith/prompt-engineering-concepts.mdx
@@ -77,7 +77,7 @@ Tools are interfaces the LLM can use to interact with the outside world. Tools c
Structured output is a feature of most state of the art LLMs, wherein instead of producing raw text as output they stick to a specified schema. This may or may not use [Tools](#tools) under the hood.
-Structured outputs are similar to tools, but different in a few key ways. With tools, the LLM choose which tool to call (or may choose not to call any); with structured output, the LLM **always** responds in this format. With tools, the LLM may select **multiple** tools; with structured output, only one response is generate.
+Structured output is similar to tools, but different in a few key ways. With tools, the LLM choose which tool to call (or may choose not to call any); with structured output, the LLM **always** responds in this format. With tools, the LLM may select **multiple** tools; with structured output, only one response is generate.
### Model
diff --git a/src/langsmith/trace-with-instructor.mdx b/src/langsmith/trace-with-instructor.mdx
index 0cfd96471d..cc750e840d 100644
--- a/src/langsmith/trace-with-instructor.mdx
+++ b/src/langsmith/trace-with-instructor.mdx
@@ -3,7 +3,7 @@ title: Trace with Instructor
sidebarTitle: Instructor (Python only)
---
-LangSmith provides a convenient integration with [Instructor](https://python.useinstructor.com/), a popular open-source library for generating structured outputs with LLMs.
+LangSmith provides a convenient integration with [Instructor](https://python.useinstructor.com/), a popular open-source library for generating structured output with LLMs.
In order to use, you first need to set your LangSmith API key.
diff --git a/src/oss/deepagents/overview.mdx b/src/oss/deepagents/overview.mdx
index 2d33b417f3..75220f955c 100644
--- a/src/oss/deepagents/overview.mdx
+++ b/src/oss/deepagents/overview.mdx
@@ -55,7 +55,7 @@ Deep agents applications can be deployed via [LangSmith Deployment](/langsmith/d
## Get started
:::python
-
+
Build your first deep agent
@@ -82,8 +82,5 @@ Deep agents applications can be deployed via [LangSmith Deployment](/langsmith/d
Understand the middleware architecture
-
- See the `deepagents` API reference
-
:::
diff --git a/src/oss/langchain/models.mdx b/src/oss/langchain/models.mdx
index 3baf5aded4..cc721862b2 100644
--- a/src/oss/langchain/models.mdx
+++ b/src/oss/langchain/models.mdx
@@ -10,7 +10,7 @@ import ChatModelTabsJS from '/snippets/chat-model-tabs-js.mdx';
In addition to text generation, many models support:
* [Tool calling](#tool-calling) - calling external tools (like databases queries or API calls) and use results in their responses.
-* [Structured output](#structured-outputs) - where the model's response is constrained to follow a defined format.
+* [Structured output](#structured-output) - where the model's response is constrained to follow a defined format.
* [Multimodality](#multimodal) - process and return data other than text, such as images, audio, and video.
* [Reasoning](#reasoning) - models perform multi-step reasoning to arrive at a conclusion.
@@ -889,9 +889,9 @@ Below, we show some common ways you can use tool calling.
---
-## Structured outputs
+## Structured output
-Models can be requested to provide their response in a format matching a given schema. This is useful for ensuring the output can be easily parsed and used in subsequent processing. LangChain supports multiple schema types and methods for enforcing structured outputs.
+Models can be requested to provide their response in a format matching a given schema. This is useful for ensuring the output can be easily parsed and used in subsequent processing. LangChain supports multiple schema types and methods for enforcing structured output.
:::python
@@ -1043,7 +1043,7 @@ Models can be requested to provide their response in a format matching a given s
:::python
- **Key considerations for structured outputs:**
+ **Key considerations for structured output:**
- **Method parameter**: Some providers support different methods (`'json_schema'`, `'function_calling'`, `'json_mode'`)
- `'json_schema'` typically refers to dedicated structured output features offered by a provider
@@ -1056,7 +1056,7 @@ Models can be requested to provide their response in a format matching a given s
:::js
- **Key considerations for structured outputs:**
+ **Key considerations for structured output:**
- **Method parameter**: Some providers support different methods (`'jsonSchema'`, `'functionCalling'`, `'jsonMode'`)
- **Include raw**: Use @[`includeRaw: true`][BaseChatModel.with_structured_output(include_raw)] to get both the parsed output and the raw @[`AIMessage`]
diff --git a/src/oss/langchain/tools.mdx b/src/oss/langchain/tools.mdx
index 6e42554151..a67fb8d1ff 100644
--- a/src/oss/langchain/tools.mdx
+++ b/src/oss/langchain/tools.mdx
@@ -201,15 +201,6 @@ graph LR
K --> O
L --> O
F --> P
-
- %% Styling
- classDef runtimeStyle fill:#e3f2fd,stroke:#1976d2
- classDef resourceStyle fill:#e8f5e8,stroke:#388e3c
- classDef capabilityStyle fill:#fff3e0,stroke:#f57c00
-
- class A,B,C,D,E,F runtimeStyle
- class G,H,I,J,K,L resourceStyle
- class M,N,O,P capabilityStyle
```
### `ToolRuntime`
diff --git a/src/oss/langgraph/persistence.mdx b/src/oss/langgraph/persistence.mdx
index 37a62baf9c..441e07a5e9 100644
--- a/src/oss/langgraph/persistence.mdx
+++ b/src/oss/langgraph/persistence.mdx
@@ -600,7 +600,7 @@ A [state schema](/oss/langgraph/graph-api#schema) specifies a set of keys that a
But, what if we want to retain some information _across threads_? Consider the case of a chatbot where we want to retain specific information about the user across _all_ chat conversations (e.g., threads) with that user!
-With checkpointers alone, we cannot share information across threads. This motivates the need for the [`Store`](https://python.langchain.com/api_reference/langgraph/index.html#module-langgraph.store) interface. As an illustration, we can define an `InMemoryStore` to store information about a user across threads. We simply compile our graph with a checkpointer, as before, and with our new `in_memory_store` variable.
+With checkpointers alone, we cannot share information across threads. This motivates the need for the [`Store`](https://reference.langchain.com/python/langgraph/store/) interface. As an illustration, we can define an `InMemoryStore` to store information about a user across threads. We simply compile our graph with a checkpointer, as before, and with our new `in_memory_store` variable.
**LangGraph API handles stores automatically**
diff --git a/src/oss/python/integrations/chat/azure_chat_openai.mdx b/src/oss/python/integrations/chat/azure_chat_openai.mdx
index 38f0a4405a..fb69e530e0 100644
--- a/src/oss/python/integrations/chat/azure_chat_openai.mdx
+++ b/src/oss/python/integrations/chat/azure_chat_openai.mdx
@@ -135,6 +135,18 @@ print(ai_msg.content)
J'adore la programmation.
```
+## Streaming usage metadata
+
+OpenAI's Chat Completions API does not stream token usage statistics by default (see API reference [here](https://platform.openai.com/docs/api-reference/completions/create#completions-create-stream_options)).
+
+To recover token counts when streaming with @[`ChatOpenAI`] or `AzureChatOpenAI`, set `stream_usage=True` as an initialization parameter or on invocation:
+
+```python
+from langchain_openai import AzureChatOpenAI
+
+llm = AzureChatOpenAI(model="gpt-4.1-mini", stream_usage=True) # [!code highlight]
+```
+
## Specifying model version
Azure OpenAI responses contain `model_name` response metadata property, which is name of the model used to generate the response. However unlike native OpenAI responses, it does not contain the specific version of the model, which is set on the deployment in Azure. e.g. it does not distinguish between `gpt-35-turbo-0125` and `gpt-35-turbo-0301`. This makes it tricky to know which version of the model was used to generate the response, which as result can lead to e.g. wrong total cost calculation with `OpenAICallbackHandler`.