From 6086817052bee683e06ddb55d650aeb0fce94db8 Mon Sep 17 00:00:00 2001 From: Gowri Harish Date: Wed, 12 Nov 2025 09:18:24 -0800 Subject: [PATCH 01/16] feat: initial paq writeup --- src/langsmith/annotation-queues.mdx | 134 +++++++++++++++++++++------- 1 file changed, 100 insertions(+), 34 deletions(-) diff --git a/src/langsmith/annotation-queues.mdx b/src/langsmith/annotation-queues.mdx index 4c73a1f820..a1bf6940ed 100644 --- a/src/langsmith/annotation-queues.mdx +++ b/src/langsmith/annotation-queues.mdx @@ -3,34 +3,42 @@ title: Use annotation queues sidebarTitle: Use annotation queues --- -_Annotation queues_ provide a streamlined, directed view for human annotators to attach feedback to specific [runs](/langsmith/observability-concepts#runs). While you can always annotate [traces](/langsmith/observability-concepts#traces) inline, annotation queues provide another option to group runs together, then have annotators review and provide [feedback](/langsmith/observability-concepts#feedback) on them. +_Annotation queues_ provide a streamlined, directed view for human annotators to attach feedback to specific [runs](/langsmith/observability-concepts#runs). While you can always annotate [traces](/langsmith/observability-concepts#traces) inline, annotation queues provide a way to group runs together, prescribe rubrics, and track reviewer progress. + +LangSmith now supports two queue styles: + +- **Single-run queues** present one run at a time and let reviewers submit any rubric feedback you configure. +- **Pairwise annotation queues ** present two runs side-by-side so reviewers can quickly decide which output is better (or if they are equivalent) against the rubric items you define. ## Create an annotation queue -To create an annotation queue: +You can create either queue type from the [LangSmith UI](https://smith.langchain.com). The creation flow you see depends on whether you pick the single-run or pairwise option from the **Annotation queues** section or while working with datasets and experiments. + +### Create a single-run annotation queue -1. Navigate to the **Annotation queues** section on the left-hand navigation panel of the [LangSmith UI](https://smith.langchain.com). -1. Click **+ New annotation queue** in the top right corner. +1. Navigate to **Annotation queues** in the left navigation. +2. Click **+ New annotation queue** in the top-right corner. - ![Create Annotation Queue form with Basic Details, Annotation Rubric, and Feedback sections.](/langsmith/images/create-annotation-queue-new.png) +![Create Annotation Queue form with Basic Details, Annotation Rubric, and Feedback sections.](/langsmith/images/create-annotation-queue-new.png) -### Basic Details +#### Basic Details -1. Fill in the form with the **Name** and **Description** of the queue. You can also assign a **default dataset** to queue, which will streamline the process of sending the inputs and outputs of certain runs to datasets in your LangSmith [workspace](/langsmith/administration-overview#workspaces). +1. Fill in the **Name** and **Description** of the queue. +2. Optionally assign a **default dataset** to streamline exporting reviewed runs into a dataset in your LangSmith [workspace](/langsmith/administration-overview#workspaces). -### Annotation Rubric +#### Annotation Rubric 1. Draft some high-level instructions for your annotators, which will be shown in the sidebar on every run. -1. Click **+ Desired Feedback** to add feedback keys to your annotation queue. Annotators will be presented with these feedback keys on each run. -1. Add a description for each, as well as a short description of each category, if the feedback is categorical. +2. Click **+ Desired Feedback** to add feedback keys to your annotation queue. Annotators will be presented with these feedback keys on each run. +3. Add a description for each, as well as a short description of each category, if the feedback is categorical. - ![Annotation queue rubric form with instructions and desired feedback entered.](/langsmith/images/create-annotation-rubric.png) +![Annotation queue rubric form with instructions and desired feedback entered.](/langsmith/images/create-annotation-rubric.png) For example, with the descriptions in the previous screenshot, reviewers will see the **Annotation Rubric** details in the right-hand pane of the UI. - ![The rendered rubric for reviewers from the example instructions.](/langsmith/images/rubric-for-annotators.png) +![The rendered rubric for reviewers from the example instructions.](/langsmith/images/rubric-for-annotators.png) -### Collaborator Settings +#### Collaborator Settings (single-run) When there are multiple annotators for a run: @@ -41,56 +49,114 @@ When there are multiple annotators for a run: - **Enable reservations on runs**: When a reviewer views a run, the run is reserved for that reviewer for the specified **Reservation length**. If there are multiple reviewers per run as specified above, the run can be reserved by multiple reviewers (up to the number of reviewers per run) at the same time. - + We recommend enabling reservations. This will prevent multiple annotators from reviewing the same run at the same time. - + If a reviewer has viewed a run and then leaves the run without marking it **Done**, the reservation will expire after the specified **Reservation length**. The run is then released back into the queue and can be reserved by another reviewer. - + Clicking **Requeue** for a run's annotation will only move the current run to the end of the current user's queue; it won't affect the queue order of any other user. It will also release the reservation that the current user has on that run. - + + +Because of these settings, the number of runs visible to each reviewer can differ from the total queue size. + +You can revisit the pencil icon in **Annotation queues** to update any settings later. + +### Create a pairwise annotation queue + +Pairwise queues are designed for fast A/B comparisons between two experiments (often a baseline vs. a candidate model). You initiate them from the **Datasets & Experiments** pages: + +1. Navigate to **Datasets & Experiments**, open a dataset, and select **exactly two experiments** you want to compare. +2. Click **Annotate**. In the popover, choose **Add to Pairwise Annotation Queue**. (The button is disabled until exactly two experiments are selected.) +3. Decide whether to send the experiments to an existing pairwise queue or create a new one. Selecting **Create new** launches the pairwise queue form with both experiment IDs prefilled and locked at the top. +4. Complete the form: + - **Basic details** (name and description) + - **Instructions & rubrics** tailored to pairwise scoring + - **Collaborator settings** (reviewer count, reservations, reservation length) +5. Submit the form to create the queue. LangSmith immediately pairs runs from the two experiments and populates the queue; there is no separate populate step. + +Popover showing the “Add to Pairwise Annotation Queue” card highlighted after two experiments are selected. + +Pairwise creation pane with two selected experiments pinned, plus fields for name, description, rubric instructions, and pairwise rubric items. + +Key differences for PAQs: + +- **Experiments**: You must provide two experiment sessions up front. LangSmith automatically pairs their runs in chronological order and populates the queue during creation. +- **Rubric**: Pairwise rubric items only require a feedback key and (optionally) a description. Annotators decide whether Run A, Run B, or both are better for each rubric item. +- **Dataset**: Pairwise queues do not use a default dataset, because comparisons span two experiments. +- **Reservations & reviewers**: The same collaborator controls apply. Reservations help prevent two people from judging the same comparison simultaneously. -As a result of the **Collaborator settings**, it's possible (and likely) that the number of runs visible to an individual in an annotation queue differs from the total number of runs in the queue compared to another user's queue size. +Collaborator settings section for pairwise queues highlighting reviewer count, reservations, and reservation duration controls. -You can update these settings at any time by clicking on the pencil icon in the **Annotation Queues** section. +If you prefer to populate a PAQ later or via automation, you can create an empty queue under **Annotation queues** → **Pairwise** and use automation rules (see below) to add comparisons incrementally. ## Assign runs to an annotation queue -To assign runs to an annotation queue, do one of the following: +Depending on your queue type, there are several ways to populate it with work items. -- Click on **Add to Annotation Queue** in top right corner of any [trace](/langsmith/observability-concepts#traces) view. You can add any intermediate [run](/langsmith/observability-concepts#runs) (span) of the trace to an annotation queue, but not the root span. +### Single-run queues - ![Trace view with the Add to Annotation Queue button highglighted at the top of the screen.](/langsmith/images/add-to-annotation-queue.png) +- **From a trace view**: Click **Add to Annotation Queue** in the top-right corner of any [trace](/langsmith/observability-concepts#traces) view. You can add any intermediate [run](/langsmith/observability-concepts#runs), but not the root span. -- Select multiple runs in the runs table then click **Add to Annotation Queue** at the bottom of the page. +![Trace view with the Add to Annotation Queue button highglighted at the top of the screen.](/langsmith/images/add-to-annotation-queue.png) - ![View of the runs table with runs selected. Add to Annotation Queue button at the botton of the page.](/langsmith/images/multi-select-annotation-queue.png) +- **From the runs table**: Select multiple runs, then click **Add to Annotation Queue** at the bottom of the page. -- [Set up an automation rule](/langsmith/rules) that automatically assigns runs that pass a certain filter and sampling condition to an annotation queue. -- Navigate to the **Datasets & Experiments** page and select a dataset. On the dataset's page select one or multiple [experiments](/langsmith/evaluation-concepts#experiment). At the bottom of the page, click ** Annotate**. From the resulting popup, you can either create a new queue or add the runs to an existing one. +![View of the runs table with runs selected. Add to Annotation Queue button at the botton of the page.](/langsmith/images/multi-select-annotation-queue.png) - ![Selected experiments with the Annotate button at the bottom of the page.](/langsmith/images/annotate-experiment.png) +- **Automation rules**: [Set up a rule](/langsmith/rules) to automatically assign runs that match a filter (for example, errors or low user scores) into a queue. +- **Datasets & experiments**: Select one or more [experiments](/langsmith/evaluation-concepts#experiment) within a dataset and click ** Annotate**. Choose an existing queue or create a new one, then confirm the (single-run) queue option. + +![Selected experiments with the Annotate button at the bottom of the page.](/langsmith/images/annotate-experiment.png) + +### Pairwise annotation queues + +- **During creation**: Selecting two experiments and creating a PAQ automatically pairs the runs. No additional “populate” step is required. +- **Populate later**: On an existing PAQ, open the queue and click **Populate** to add new comparisons. You can either provide a new pair of experiment IDs or choose a single pairwise experiment that already contains matched runs. +- **Automation rules**: Create rules that feed candidate runs into the pairwise pipeline. For example, you can trigger a comparison whenever a new experiment completes, sending the baseline and candidate into the same PAQ. + +When augmenting an existing PAQ, LangSmith preserves historical comparisons and appends new pairs to the queue. -It is often a good idea to assign runs that have a particular type of user feedback score (e.g., thumbs up, thumbs down) from the application to an annotation queue. This way, you can identify and address issues that are causing user dissatisfaction. To learn more about how to capture user feedback from your LLM application, follow the guide on [attaching user feedback](/langsmith/attach-user-feedback). + +Consider routing runs that already have user feedback (e.g., thumbs-down) into a single-run queue for triage and a pairwise queue for head-to-head comparisons against a stronger baseline. This helps you identify regressions quickly. Learn more about collecting user feedback [here](/langsmith/attach-user-feedback). + ## Review runs in an annotation queue -To review runs in an annotation queue: +### Review a single-run queue 1. Navigate to the **Annotation Queues** section through the left-hand navigation bar. -1. Click on the queue you want to review. This will take you to a focused, cyclical view of the runs in the queue that require review. -1. You can attach a comment, attach a score for a particular [feedback](/langsmith/observability-concepts#feedback) criteria, add the run to a dataset or mark the run as reviewed. You can also remove the run from the queue for all users, despite any current reservations or settings for the queue, by clicking the **Trash** icon next to **View run**. +2. Click on the queue you want to review. This will take you to a focused, cyclical view of the runs in the queue that require review. +3. You can attach a comment, attach a score for a particular [feedback](/langsmith/observability-concepts#feedback) criteria, add the run to a dataset or mark the run as reviewed. You can also remove the run from the queue for all users, despite any current reservations or settings for the queue, by clicking the **Trash** icon next to **View run**. - + The keyboard shortcuts that are next to each option can help streamline the review process. - + - ![View or a run with the Annotate side panel. Keyboard shortcuts visible for options.](/langsmith/images/review-runs.png) +![View or a run with the Annotate side panel. Keyboard shortcuts visible for options.](/langsmith/images/review-runs.png) + +### Review a pairwise annotation queue + +1. From **Annotation queues**, switch to the **Pairwise** tab (or open the queue from the dataset where you created it). +2. Each queue item displays Run A on the left and Run B on the right, along with your rubric. +3. For every rubric item: + - Choose **A is better**, **B is better**, or **Equal**. The UI records binary feedback on both runs behind the scenes. + - Use hotkeys `A`, `B`, or `E` to lock in your choice. +4. Once you finish all rubric items, press **Done** (or hit `Enter` on the final rubric item) to advance to the next comparison. +5. Optional actions: + - Leave comments tied to either run. + - Requeue the comparison if you need to revisit it later. + - Open the full trace view for deeper debugging. + +Reservations, reviewer thresholds, and comments work the same as in single-run queues, so teams can mix queue types without learning a new workflow. + +Pairwise review screen showing runs side-by-side with the floating feedback bar containing A/B/Equal buttons and keyboard shortcuts. ## Video guide +