getsentry · colin-sentry · May 21, 2024 · May 21, 2024 · May 21, 2024 · May 21, 2024
diff --git a/src/docs/sdk/performance/modules/index.mdx b/src/docs/sdk/performance/modules/index.mdx
@@ -6,6 +6,7 @@ The list below contains SDK documentation for our various Performance Modules.
 
 - [App Starts](/sdk/performance/modules/app-starts/)
 - [Caches](/sdk/performance/modules/caches/)
+- [LLM Monitoring](/sdk/performance/modules/llm-monitoring/)
 - [Queues](/sdk/performance/modules/queues/)
 - [Queries](/sdk/performance/modules/queries/)
 - [Requests](/sdk/performance/modules/requests/)

diff --git a/src/docs/sdk/performance/modules/llm-monitoring.mdx b/src/docs/sdk/performance/modules/llm-monitoring.mdx
@@ -0,0 +1,66 @@
+---
+title: 'LLM Monitoring'
+---
+
+Sentry auto-generates LLM Monitoring data for common providers in Python, but you may need to manually annotate spans for other frameworks.
+
+## Span conventions
+
+### Span Operations
+
+| Span OP         | Description                                                                          |
+|:----------------|:-------------------------------------------------------------------------------------|
+| `ai.pipeline.*` | The top-level span which corresponds to one or more AI operations & helper functions |
+| `ai.run.*`      | A unit of work - a tool call, LLM execution, or helper method.                       |
+
+### Span Data
+
+| Attribute                   | Type    | Description                                           | Examples                                 |
+|-----------------------------|---------|-------------------------------------------------------|------------------------------------------|
+| `ai.input_messages`         | string  | The input messages sent to the model                  | `[{"role": "user", "message": "hello"}]` |
+| `ai.completion_tоkens.used` | int     | The number of tokens used to respond to the message   | `10`                                     |
+| `ai.prompt_tоkens.used`     | int     | The number of tokens used to process just the prompt  | `20`                                     |
+| `ai.total_tоkens.used`      | int     | The total number of tokens used to process the prompt | `30`                                     |
+| `ai.model_id`               | list    | The vendor-specific ID of the model used              | `"gpt-4"`                                |
+| `ai.streaming`              | boolean | Whether the request was streamed back                 | `true`                                   |
+| `ai.responses`              | list    | The response messages sent back by the AI model       | `["hello", "world"]`                     |
+| `ai.pipeline.name`          | string  | The description of the parent ai.pipeline span        | `My AI pipeline`                         |
+
+## Instrumentation
+
+When a user creates a new AI pipeline, the SDK automatically creates spans that instrument both the pipeline and its AI operations.
+
+**Example**
+
+```python
+from sentry_sdk.ai.monitoring import ai_track
+from openai import OpenAI
+
+sentry.init(...)
+
+openai = OpenAI()
+
+@ai_track(description="My AI pipeline")
+def invoke_pipeline():
+    result = openai.chat.completions.create(
+        model="some-model", messages=[{"role": "system", "content": "hello"}]
+    ).choices[0].message.content
+
+    return openai.chat.completions.create(
+        model="some-model", messages=[{"role": "system", "content": result}]
+    ).choices[0].message.content
+
+
+
+```
+
+This should result in the following spans.
+
+```
+<span op:"ai.pipeline" description:"My AI pipeline">
+	<span op:"ai.run.openai" description:"OpenAI Chat Completion" data[ai.total_tokens.used]:15 data[ai.pipeline.name]:"My AI pipeline" />
+	<span op:"ai.run.openai" description:"OpenAI Chat Completion" data[ai.total_tokens.used]:20 data[ai.pipeline.name]:"My AI pipeline" />
+</span>
+```
+
+Notice that the ai.pipeline.name span of the children spans is the description of the `ai.pipeline.*` span parent.