Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,25 @@
{
"group": "Tools",
"pages": [
"tools/detections",
"tools/reports"
{
"group": "Detections",
"icon": "ghost",
"pages": [
"tools/detections/overview",
"tools/detections/hallucination-detection",
"tools/detections/document-relevance",
"tools/detections/polling-and-results-api"
]
},
{
"group": "Reports",
"icon": "clipboard",
"pages": [
"tools/reports/overview",
"tools/reports/integration",
"tools/reports/when-to-use"
]
}
]
},
{
Expand Down
44 changes: 44 additions & 0 deletions tools/detections/document-relevance.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
title: "Document Relevance"
description: "Measure whether retrieved documents actually support user queries."
icon: "bug"
---

# What is document relevance?

**Document relevance** measures how well your retrieval or search system finds context that is genuinely useful for answering the user's query. A document is considered **relevant** if it contains information that addresses at least one part of the query. Otherwise, it is marked **irrelevant**.

The document relevance score is calculated as the fraction of documents that are relevant to the query.

# How Quotient scores relevance

1. Compare each document (or chunk) against the full `user_query`.
2. Determine whether the document contains information relevant to any part of the query:
- If it does, mark it `relevant`.
- If it does not, mark it `irrelevant`.
3. Compute `relevant_documents / total_documents` to derive the overall score.

## What influences the score

- **Chunk granularity:** smaller chunks make it easier to mark only the useful passages as relevant.
- **Query clarity:** ambiguous prompts can lower relevancy; capture clarifying follow-ups in `message_history`.
- **Retriever filters:** tag each log with retriever configuration so you can compare performance across setups.

## Why track document relevance?

Document relevance is a core metric for evaluating retrieval-augmented systems. Even if the AI generates well, weak retrieval can degrade the final answer. Monitoring this metric helps teams:

- Assess whether retrieval surfaces useful context.
- Debug cases where generation fails despite solid prompting.
- Improve recall and precision of retrievers.
- Watch for drift after retriever or data changes.

<Tip>
A sudden dip in relevancy is often the earliest warning that embeddings, indexing, or filters changed. Alert on sustained drops before they cascade into hallucinations.
</Tip>

<Tip>
High-performing systems typically show \> 75% document relevance. Lower scores may signal ambiguous user queries, incorrect retrieval, or noisy source data.
</Tip>

Next: [Polling & Results API](./polling-and-results-api).
63 changes: 63 additions & 0 deletions tools/detections/hallucination-detection.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: "Hallucination Detection"
description: "Understand how Quotient identifies hallucinations and why the metric matters."
icon: "flag"
---

# What counts as a hallucination?

The **hallucination rate** measures how often a model generates information that cannot be found in its provided inputs, such as retrieved documents, user messages, or system prompts.

Quotient reports an **extrinsic hallucination rate**. We determine whether the model's output is externally unsupported by the context it was given.

<Accordion title="What is an Extrinsic Hallucination?">
<Tip>
Extrinsic hallucinations occur when a model generates content that is not backed by any input. This is distinct from **intrinsic hallucinations**, where the model generates text that is self-contradictory or logically incoherent regardless of the input.

We focus on **extrinsic** hallucination detection because this is what matters most in retrieval-augmented systems: **does the model stick to the facts it was given?**\
\
Refer to [How to Detect Hallucinations in Retrieval Augmented Systems: A Primer](https://blog.quotientai.co/how-to-detect-hallucinations-in-retrieval-augmented-systems-a-primer/) for an in-depth overview of hallucinations in augmented AI systems.
</Tip>
</Accordion>

# How Quotient detects hallucinations

1. **Segment the output** into atomic claims or sentences.
2. **Cross-check every claim** against all available context:
- `user_query` (what the user asked)
- `documents` (retrieved evidence)
- `message_history` (prior conversation turns)
- `instructions` (system or developer guidance)
3. **Flag unsupported claims** when no context backs them up.

If a sentence cannot be traced back to any provided evidence, it is marked as a hallucination.

## Inputs that improve detection quality

- **High-signal documents:** include only the evidence actually retrieved for the answer.
- **Conversation history:** pass the full multi-turn exchange so references to earlier turns can be validated.
- **Instructions:** provide system prompts so the detection pass understands guardrails and policies.

## Interpret hallucination results

- **`has_hallucination`**: Boolean flag indicating whether we found any unsupported claims.
- **Highlighted spans**: In the dashboard, statements are color-coded to show what lacked support.
- **Tag filters**: Slice hallucination rate by model, feature, or customer to prioritize remediation.

<Tip>
Pair hallucination detection with assertions or automated tests when shipping prompt updates. A sudden spike often signals a regression in retrieval or guardrails.
</Tip>

# Why monitor hallucinations?

Extrinsic hallucinations are a primary failure mode in augmented AI systems. Even when retrieval succeeds, generation can drift. Tracking this metric helps teams:

- Catch hallucinations early in development.
- Monitor output quality after deployment.
- Guide prompt iteration and model fine-tuning.

<Tip>
Well-grounded systems typically show \< 5% hallucination rate. If yours is higher, it's often a signal that your data ingestion, retrieval pipeline, or prompting needs improvement.
</Tip>

Next: [Document Relevance](./document-relevance).
125 changes: 125 additions & 0 deletions tools/detections/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
title: "Overview"
description: "Understand how Quotient detections work and how to enable them in your logging pipeline."
icon: "eye"
---

<CardGroup>
<Card title="Initialize the Logger" icon="chevron-right" href="#initialize-the-logger-with-detections">
Configure detection types and sampling without leaving this page.
</Card>
<Card title="Hallucination Detection" icon="flag" href="/tools/detections/hallucination-detection">
See how Quotient scores extrinsic hallucinations.
</Card>
<Card title="Document Relevance" icon="bug" href="/tools/detections/document-relevance">
Measure whether retrieved documents support an answer.
</Card>
<Card title="Polling & Results" icon="clock" href="/tools/detections/polling-and-results-api">
Retrieve detection results via the SDK.
</Card>
</CardGroup>

# What are Detections?

Detections are asynchronous analyses that run whenever you ship logs or traces to Quotient. They continuously score outputs for hallucinations, document relevance, and other reliability risks so you can intervene before they impact users.

Once configured, detections execute in the background. You can review outcomes in the dashboard or poll for them programmatically.

## Why enable detections

- **Catch issues fast:** surface hallucinations or irrelevant context without manually reviewing transcripts.
- **Quantify reliability:** trend hallucination rate and document relevance over time or by tag.
- **Prioritize fixes:** combine detection scores with tags (model version, customer tier) to see where to invest engineering time.

<Tip>
Keep `detection_sample_rate` high during development to observe every interaction. Dial it down in production once metrics stabilize.
</Tip>

## Configure detections in three steps

1. **Initialize the logger** with the detection types and sample rate that make sense for your workload.
2. **Send logs or traces** that include the user prompt, model output, and supporting evidence.
3. **Review the results** in the dashboard or via the SDK once detections finish processing.

# Initialize the Logger with Detections

Enable detections during logger initialization:

<CodeGroup>

```python logging.py
from quotientai import QuotientAI, DetectionType

quotient = QuotientAI(api_key="your-quotient-api-key")

logger = quotient.logger.init(
app_name="my-first-app",
environment="dev",
sample_rate=1.0,
# automatically run hallucination and document relevance detection on every output
detections=[DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY],
detection_sample_rate=1.0,
)
```

```typescript logging.ts
import { QuotientAI, DetectionType } from "quotientai";

const quotient = new QuotientAI({ apiKey: "your-quotient-api-key" });

const logger = quotient.logger.init({
appName: "my-first-app",
environment: "dev",
sampleRate: 1.0,
// automatically run hallucination and document relevance detection on every output
detections: [DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY],
detectionSampleRate: 1.0,
});
```

</CodeGroup>

# Send logs with detections enabled

After initialization, send logs that include the user query, model output, and any documents, instructions, or message history you want Quotient to evaluate.

<CodeGroup>

```python logging.py
log_id = quotient.log(
user_query="What is the capital of France?",
model_output="The capital of France is Paris.",
documents=[
"France is a country in Western Europe.",
"Paris is the capital of France.",
],
)
```

```typescript logging.ts
const logId = await quotient.log({
userQuery: "What is the capital of France?",
modelOutput: "The capital of France is Paris.",
documents: ["France is a country in Western Europe.", "Paris is the capital of France."],
});
```

</CodeGroup>

# Interpret detection outcomes

Each detection result is attached to the originating log. In the dashboard you can:

- Inspect hallucination highlights and see which sentences lack evidence.
- Review document relevance scores to spot noisy retrieval results.
- Filter by tags (environment, customer, model) to zero in on problematic slices.

Head to the [Detections Dashboard](https://app.quotientai.co/detections) to review results, export findings, or share links with teammates.

<Tip>
Combine detections with [Reports](/tools/reports) to move from single-log triage to trend analysis.
</Tip>

![Detections Dashboard](../../assets/detections-screenshot.png)

Continue with [Hallucination Detection](./hallucination-detection).
61 changes: 61 additions & 0 deletions tools/detections/polling-and-results-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: "Polling & Results API"
description: "Retrieve detection outcomes through the Quotient SDK."
icon: "clock"
---

# Poll for detections via the SDK

Use the polling helpers when you want to block until detections finish so you can act on the results immediately (e.g., inside an evaluation harness or CI job).

<CodeGroup>

```python logging.py
detection = quotient.poll_for_detection(log_id=log_id)
```

```typescript logging.ts
const detection = await quotient.pollForDetections(logId);
```

</CodeGroup>

# Parameters

- `log_id` **(string)**: Identifier of the log you want to poll for detections.
- `timeout` **(int)**: Maximum time to wait for a response in seconds. Defaults to `300`.
- `poll_interval` **(float)**: Interval between checks in seconds. Defaults to `2.0`.

## Return value

`poll_for_detection` returns a `Log` object with these notable fields:

- `id` **(string)**: Unique identifier for the log.
- `app_name` **(string)**: Application that generated the log.
- `environment` **(string)**: Deployment environment (e.g., `dev`, `prod`).
- `detections` **(array)**: Detection types configured for this log.
- `detection_sample_rate` **(float)**: Sample rate applied for detections on this log.
- `user_query` **(string)**: Logged user input.
- `model_output` **(string)**: Logged model output.
- `documents` **(array)**: Context documents used for the detection run.
- `message_history` **(array)**: Prior messages following the OpenAI format.
- `instructions` **(array)**: Instructions provided to the model.
- `tags` **(object)**: Metadata associated with the log entry.
- `created_at` **(datetime)**: Timestamp when the log was created.
- `status` **(string)**: Current status of the log entry.
- `has_hallucination` **(boolean)**: Whether hallucinations were detected.
- `doc_relevancy_average` **(float)**: Average document relevancy score.
- `updated_at` **(datetime)**: Timestamp when the log was last updated.
- `evaluations` **(array)**: Evaluation results attached to the log.

## Example workflow

1. Log an interaction with detections enabled.
2. Call the polling helper and wait for the promise/function to resolve.
3. Inspect the returned `Log` payload for `has_hallucination` or `doc_relevancy_average` before deciding whether to alert, retry, or proceed.

<Tip>
In long-running jobs, increase `timeout` or handle the timeout exception so you can fall back to asynchronous processing instead of failing the entire workflow.
</Tip>

Back to the [Detections overview](./overview).
8 changes: 2 additions & 6 deletions tools/reports.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ They give you a daily snapshot of your users’ queries, clustered by topic and

Reports are automatically generated based on the logs and traces you send to Quotient. Reports are available once 100+ [detections](/tools/detections) have been generated within a 30 day window. You can find more information on how to send logs and traces below:

- [Logs](/data-collection/logs)
- [Traces](/data-collection/traces)
- [Logs](/core-functionalities/logs/overview)
- [Traces](/core-functionalities/traces/overview)

![Reports](../assets/report-overview.png)

Expand All @@ -31,7 +31,3 @@ The Reports feature is particularly valuable for:
- **Query Optimization**: Find patterns in queries that lead to poor results
- **Resource Allocation**: Focus improvements on high-volume or high-risk areas
- **Trend Monitoring**: Track how system performance changes over time




34 changes: 34 additions & 0 deletions tools/reports/integration.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: "Integration"
description: "Learn how reports are generated from your logs, traces, and detections."
icon: "handshake"
---

# How to Integrate Reports

Reports are automatically generated based on the logs and traces you send to Quotient. They become available once 100+ [detections](/tools/detections) have been generated within a 30-day window.

## Prerequisites

- **Consistent logging**: Send structured logs with `user_query`, `model_output`, and evidence so detections can run.
- **Detections enabled**: Hallucination and document relevance detections provide the quality signals that power report scoring.
- **Tag hygiene**: Attach tags such as `model`, `customer`, or `feature` to slice reports by meaningful segments.

## Data pipeline at a glance

1. Your application emits logs and traces through the Quotient SDKs.
2. Detections execute asynchronously on each record.
3. Reports aggregate detections and metadata into daily clusters with trend charts.
4. You review the dashboard (or export via API) to plan remediation.

## Best practices

- Keep `detection_sample_rate` high enough to capture statistically significant coverage for each segment you care about.
- Align tags with your roadmap—if you track `model_version` or `retriever`, you can measure the impact of each launch.
- Review reports alongside [Logs](/core-functionalities/logs/overview) and [Traces](/core-functionalities/traces/overview) to trace issues from cluster to underlying interaction.

<Tip>
If your traffic is bursty, consider uploading a curated evaluation set to quickly hit the 100-detection threshold and unlock reports before production volume ramps.
</Tip>

Back to the [Reports overview](./overview) or continue to [When to Use Reports](./when-to-use).
Loading