diff --git a/docs.json b/docs.json index d711397..8a5d213 100644 --- a/docs.json +++ b/docs.json @@ -52,8 +52,25 @@ { "group": "Tools", "pages": [ - "tools/detections", - "tools/reports" + { + "group": "Detections", + "icon": "ghost", + "pages": [ + "tools/detections/overview", + "tools/detections/hallucination-detection", + "tools/detections/document-relevance", + "tools/detections/polling-and-results-api" + ] + }, + { + "group": "Reports", + "icon": "clipboard", + "pages": [ + "tools/reports/overview", + "tools/reports/integration", + "tools/reports/when-to-use" + ] + } ] }, { diff --git a/tools/detections/document-relevance.mdx b/tools/detections/document-relevance.mdx new file mode 100644 index 0000000..d6e3579 --- /dev/null +++ b/tools/detections/document-relevance.mdx @@ -0,0 +1,44 @@ +--- +title: "Document Relevance" +description: "Measure whether retrieved documents actually support user queries." +icon: "bug" +--- + +# What is document relevance? + +**Document relevance** measures how well your retrieval or search system finds context that is genuinely useful for answering the user's query. A document is considered **relevant** if it contains information that addresses at least one part of the query. Otherwise, it is marked **irrelevant**. + +The document relevance score is calculated as the fraction of documents that are relevant to the query. + +# How Quotient scores relevance + +1. Compare each document (or chunk) against the full `user_query`. +2. Determine whether the document contains information relevant to any part of the query: + - If it does, mark it `relevant`. + - If it does not, mark it `irrelevant`. +3. Compute `relevant_documents / total_documents` to derive the overall score. + +## What influences the score + +- **Chunk granularity:** smaller chunks make it easier to mark only the useful passages as relevant. +- **Query clarity:** ambiguous prompts can lower relevancy; capture clarifying follow-ups in `message_history`. +- **Retriever filters:** tag each log with retriever configuration so you can compare performance across setups. + +## Why track document relevance? + +Document relevance is a core metric for evaluating retrieval-augmented systems. Even if the AI generates well, weak retrieval can degrade the final answer. Monitoring this metric helps teams: + +- Assess whether retrieval surfaces useful context. +- Debug cases where generation fails despite solid prompting. +- Improve recall and precision of retrievers. +- Watch for drift after retriever or data changes. + + + A sudden dip in relevancy is often the earliest warning that embeddings, indexing, or filters changed. Alert on sustained drops before they cascade into hallucinations. + + + + High-performing systems typically show \> 75% document relevance. Lower scores may signal ambiguous user queries, incorrect retrieval, or noisy source data. + + +Next: [Polling & Results API](./polling-and-results-api). diff --git a/tools/detections/hallucination-detection.mdx b/tools/detections/hallucination-detection.mdx new file mode 100644 index 0000000..adbb73a --- /dev/null +++ b/tools/detections/hallucination-detection.mdx @@ -0,0 +1,63 @@ +--- +title: "Hallucination Detection" +description: "Understand how Quotient identifies hallucinations and why the metric matters." +icon: "flag" +--- + +# What counts as a hallucination? + +The **hallucination rate** measures how often a model generates information that cannot be found in its provided inputs, such as retrieved documents, user messages, or system prompts. + +Quotient reports an **extrinsic hallucination rate**. We determine whether the model's output is externally unsupported by the context it was given. + + + + Extrinsic hallucinations occur when a model generates content that is not backed by any input. This is distinct from **intrinsic hallucinations**, where the model generates text that is self-contradictory or logically incoherent regardless of the input. + + We focus on **extrinsic** hallucination detection because this is what matters most in retrieval-augmented systems: **does the model stick to the facts it was given?**\ + \ + Refer to [How to Detect Hallucinations in Retrieval Augmented Systems: A Primer](https://blog.quotientai.co/how-to-detect-hallucinations-in-retrieval-augmented-systems-a-primer/) for an in-depth overview of hallucinations in augmented AI systems. + + + +# How Quotient detects hallucinations + +1. **Segment the output** into atomic claims or sentences. +2. **Cross-check every claim** against all available context: + - `user_query` (what the user asked) + - `documents` (retrieved evidence) + - `message_history` (prior conversation turns) + - `instructions` (system or developer guidance) +3. **Flag unsupported claims** when no context backs them up. + +If a sentence cannot be traced back to any provided evidence, it is marked as a hallucination. + +## Inputs that improve detection quality + +- **High-signal documents:** include only the evidence actually retrieved for the answer. +- **Conversation history:** pass the full multi-turn exchange so references to earlier turns can be validated. +- **Instructions:** provide system prompts so the detection pass understands guardrails and policies. + +## Interpret hallucination results + +- **`has_hallucination`**: Boolean flag indicating whether we found any unsupported claims. +- **Highlighted spans**: In the dashboard, statements are color-coded to show what lacked support. +- **Tag filters**: Slice hallucination rate by model, feature, or customer to prioritize remediation. + + + Pair hallucination detection with assertions or automated tests when shipping prompt updates. A sudden spike often signals a regression in retrieval or guardrails. + + +# Why monitor hallucinations? + +Extrinsic hallucinations are a primary failure mode in augmented AI systems. Even when retrieval succeeds, generation can drift. Tracking this metric helps teams: + +- Catch hallucinations early in development. +- Monitor output quality after deployment. +- Guide prompt iteration and model fine-tuning. + + + Well-grounded systems typically show \< 5% hallucination rate. If yours is higher, it's often a signal that your data ingestion, retrieval pipeline, or prompting needs improvement. + + +Next: [Document Relevance](./document-relevance). diff --git a/tools/detections/overview.mdx b/tools/detections/overview.mdx new file mode 100644 index 0000000..9e1bfe8 --- /dev/null +++ b/tools/detections/overview.mdx @@ -0,0 +1,125 @@ +--- +title: "Overview" +description: "Understand how Quotient detections work and how to enable them in your logging pipeline." +icon: "eye" +--- + + + + Configure detection types and sampling without leaving this page. + + + See how Quotient scores extrinsic hallucinations. + + + Measure whether retrieved documents support an answer. + + + Retrieve detection results via the SDK. + + + +# What are Detections? + +Detections are asynchronous analyses that run whenever you ship logs or traces to Quotient. They continuously score outputs for hallucinations, document relevance, and other reliability risks so you can intervene before they impact users. + +Once configured, detections execute in the background. You can review outcomes in the dashboard or poll for them programmatically. + +## Why enable detections + +- **Catch issues fast:** surface hallucinations or irrelevant context without manually reviewing transcripts. +- **Quantify reliability:** trend hallucination rate and document relevance over time or by tag. +- **Prioritize fixes:** combine detection scores with tags (model version, customer tier) to see where to invest engineering time. + + + Keep `detection_sample_rate` high during development to observe every interaction. Dial it down in production once metrics stabilize. + + +## Configure detections in three steps + +1. **Initialize the logger** with the detection types and sample rate that make sense for your workload. +2. **Send logs or traces** that include the user prompt, model output, and supporting evidence. +3. **Review the results** in the dashboard or via the SDK once detections finish processing. + +# Initialize the Logger with Detections + +Enable detections during logger initialization: + + + +```python logging.py +from quotientai import QuotientAI, DetectionType + +quotient = QuotientAI(api_key="your-quotient-api-key") + +logger = quotient.logger.init( + app_name="my-first-app", + environment="dev", + sample_rate=1.0, + # automatically run hallucination and document relevance detection on every output + detections=[DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY], + detection_sample_rate=1.0, +) +``` + +```typescript logging.ts +import { QuotientAI, DetectionType } from "quotientai"; + +const quotient = new QuotientAI({ apiKey: "your-quotient-api-key" }); + +const logger = quotient.logger.init({ + appName: "my-first-app", + environment: "dev", + sampleRate: 1.0, + // automatically run hallucination and document relevance detection on every output + detections: [DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY], + detectionSampleRate: 1.0, +}); +``` + + + +# Send logs with detections enabled + +After initialization, send logs that include the user query, model output, and any documents, instructions, or message history you want Quotient to evaluate. + + + +```python logging.py +log_id = quotient.log( + user_query="What is the capital of France?", + model_output="The capital of France is Paris.", + documents=[ + "France is a country in Western Europe.", + "Paris is the capital of France.", + ], +) +``` + +```typescript logging.ts +const logId = await quotient.log({ + userQuery: "What is the capital of France?", + modelOutput: "The capital of France is Paris.", + documents: ["France is a country in Western Europe.", "Paris is the capital of France."], +}); +``` + + + +# Interpret detection outcomes + +Each detection result is attached to the originating log. In the dashboard you can: + +- Inspect hallucination highlights and see which sentences lack evidence. +- Review document relevance scores to spot noisy retrieval results. +- Filter by tags (environment, customer, model) to zero in on problematic slices. + +Head to the [Detections Dashboard](https://app.quotientai.co/detections) to review results, export findings, or share links with teammates. + + + Combine detections with [Reports](/tools/reports) to move from single-log triage to trend analysis. + + +![Detections Dashboard](../../assets/detections-screenshot.png) + +Continue with [Hallucination Detection](./hallucination-detection). diff --git a/tools/detections/polling-and-results-api.mdx b/tools/detections/polling-and-results-api.mdx new file mode 100644 index 0000000..94c9821 --- /dev/null +++ b/tools/detections/polling-and-results-api.mdx @@ -0,0 +1,61 @@ +--- +title: "Polling & Results API" +description: "Retrieve detection outcomes through the Quotient SDK." +icon: "clock" +--- + +# Poll for detections via the SDK + +Use the polling helpers when you want to block until detections finish so you can act on the results immediately (e.g., inside an evaluation harness or CI job). + + + +```python logging.py +detection = quotient.poll_for_detection(log_id=log_id) +``` + +```typescript logging.ts +const detection = await quotient.pollForDetections(logId); +``` + + + +# Parameters + +- `log_id` **(string)**: Identifier of the log you want to poll for detections. +- `timeout` **(int)**: Maximum time to wait for a response in seconds. Defaults to `300`. +- `poll_interval` **(float)**: Interval between checks in seconds. Defaults to `2.0`. + +## Return value + +`poll_for_detection` returns a `Log` object with these notable fields: + +- `id` **(string)**: Unique identifier for the log. +- `app_name` **(string)**: Application that generated the log. +- `environment` **(string)**: Deployment environment (e.g., `dev`, `prod`). +- `detections` **(array)**: Detection types configured for this log. +- `detection_sample_rate` **(float)**: Sample rate applied for detections on this log. +- `user_query` **(string)**: Logged user input. +- `model_output` **(string)**: Logged model output. +- `documents` **(array)**: Context documents used for the detection run. +- `message_history` **(array)**: Prior messages following the OpenAI format. +- `instructions` **(array)**: Instructions provided to the model. +- `tags` **(object)**: Metadata associated with the log entry. +- `created_at` **(datetime)**: Timestamp when the log was created. +- `status` **(string)**: Current status of the log entry. +- `has_hallucination` **(boolean)**: Whether hallucinations were detected. +- `doc_relevancy_average` **(float)**: Average document relevancy score. +- `updated_at` **(datetime)**: Timestamp when the log was last updated. +- `evaluations` **(array)**: Evaluation results attached to the log. + +## Example workflow + +1. Log an interaction with detections enabled. +2. Call the polling helper and wait for the promise/function to resolve. +3. Inspect the returned `Log` payload for `has_hallucination` or `doc_relevancy_average` before deciding whether to alert, retry, or proceed. + + + In long-running jobs, increase `timeout` or handle the timeout exception so you can fall back to asynchronous processing instead of failing the entire workflow. + + +Back to the [Detections overview](./overview). diff --git a/tools/reports.mdx b/tools/reports.mdx index 460143a..32de309 100644 --- a/tools/reports.mdx +++ b/tools/reports.mdx @@ -15,8 +15,8 @@ They give you a daily snapshot of your users’ queries, clustered by topic and Reports are automatically generated based on the logs and traces you send to Quotient. Reports are available once 100+ [detections](/tools/detections) have been generated within a 30 day window. You can find more information on how to send logs and traces below: -- [Logs](/data-collection/logs) -- [Traces](/data-collection/traces) +- [Logs](/core-functionalities/logs/overview) +- [Traces](/core-functionalities/traces/overview) ![Reports](../assets/report-overview.png) @@ -31,7 +31,3 @@ The Reports feature is particularly valuable for: - **Query Optimization**: Find patterns in queries that lead to poor results - **Resource Allocation**: Focus improvements on high-volume or high-risk areas - **Trend Monitoring**: Track how system performance changes over time - - - - diff --git a/tools/reports/integration.mdx b/tools/reports/integration.mdx new file mode 100644 index 0000000..027bdbf --- /dev/null +++ b/tools/reports/integration.mdx @@ -0,0 +1,34 @@ +--- +title: "Integration" +description: "Learn how reports are generated from your logs, traces, and detections." +icon: "handshake" +--- + +# How to Integrate Reports + +Reports are automatically generated based on the logs and traces you send to Quotient. They become available once 100+ [detections](/tools/detections) have been generated within a 30-day window. + +## Prerequisites + +- **Consistent logging**: Send structured logs with `user_query`, `model_output`, and evidence so detections can run. +- **Detections enabled**: Hallucination and document relevance detections provide the quality signals that power report scoring. +- **Tag hygiene**: Attach tags such as `model`, `customer`, or `feature` to slice reports by meaningful segments. + +## Data pipeline at a glance + +1. Your application emits logs and traces through the Quotient SDKs. +2. Detections execute asynchronously on each record. +3. Reports aggregate detections and metadata into daily clusters with trend charts. +4. You review the dashboard (or export via API) to plan remediation. + +## Best practices + +- Keep `detection_sample_rate` high enough to capture statistically significant coverage for each segment you care about. +- Align tags with your roadmap—if you track `model_version` or `retriever`, you can measure the impact of each launch. +- Review reports alongside [Logs](/core-functionalities/logs/overview) and [Traces](/core-functionalities/traces/overview) to trace issues from cluster to underlying interaction. + + + If your traffic is bursty, consider uploading a curated evaluation set to quickly hit the 100-detection threshold and unlock reports before production volume ramps. + + +Back to the [Reports overview](./overview) or continue to [When to Use Reports](./when-to-use). diff --git a/tools/reports/overview.mdx b/tools/reports/overview.mdx new file mode 100644 index 0000000..d8d7c66 --- /dev/null +++ b/tools/reports/overview.mdx @@ -0,0 +1,40 @@ +--- +title: "Overview" +description: "Understand how Quotient Reports summarize agent usage and surface high-risk topics." +icon: "eye" +--- + + + + Learn how detections, logs, and traces feed report generation. + + + Spot the scenarios where reports accelerate iteration. + + + +# Why Reports? + +Reports automatically surface how people are interacting with your agent, giving you a daily snapshot of user queries clustered by topic and ranked by risk. They combine semantic grouping with relevance and hallucination detections to highlight where your agent might be struggling. + +## What Reports deliver + +- **Daily digests** that summarize top intents, risky clusters, and notable regressions. +- **Issue drill-downs** that link directly to representative logs and detection evidence. +- **Trend charts** that make it easy to spot regressions after prompt, retriever, or model changes. + +## Inside the dashboard + +- **Overview**: high-level traffic, detection rates, and surfaced clusters for the past 24 hours. +- **Issues**: deep dives into specific query clusters with sample conversations and affected customers. +- **Filters**: slice every panel by tag (environment, customer tier, model version) to focus on the traffic that matters. + + + Reports unlock once Quotient has processed at least 100 detections in a rolling 30-day window. Keep detections enabled during onboarding to hit the threshold quickly. + + +![Reports Overview](../../assets/report-overview.png) + +![Reports Issues](../../assets/report-issues.png) + +Next: [How to Integrate Reports](./integration). diff --git a/tools/reports/when-to-use.mdx b/tools/reports/when-to-use.mdx new file mode 100644 index 0000000..b796418 --- /dev/null +++ b/tools/reports/when-to-use.mdx @@ -0,0 +1,32 @@ +--- +title: "When to Use Reports" +description: "Identify scenarios where Quotient Reports accelerate iteration and monitoring." +icon: "book-copy" +--- + +# When to Use Reports + +Reports are most valuable when you need to: + +- **Perform content gap analysis**: Identify topics where your agent consistently struggles. +- **Optimize queries**: Find patterns in prompts that lead to poor results. +- **Prioritize roadmap work**: Focus improvements on high-volume or high-risk areas. +- **Monitor trends**: Track how system performance changes over time. + +## Common scenarios + +- **Post-launch monitoring:** watch how a new prompt, retriever, or model performs across your customer base. +- **Support triage:** surface clusters tied to specific accounts so customer-facing teams have context before reaching out. +- **Evaluation cycles:** run scripted tests, trigger detections, and let reports highlight regressions between builds. + +## Signals to watch + +- Rising hallucination rate for a specific model or feature flag. +- Clusters with rapidly increasing query volume but low relevancy scores. +- Repeated issues for a high-value customer or account tier. + + + Bookmark the daily digest report and share it in Slack or email. Teams stay aligned on what to fix next without digging through raw logs. + + +Back to the [Reports overview](./overview).