SQL agent omits required category-discriminator filter when aggregating a metric fact table

## Symptom

When a fact table mixes multiple metric categories in one column (e.g. a `metric_line` column whose distinct values include `Total Revenue`, `Active Headcount`, `Operating Expense`), the SQL agent sometimes writes an aggregate over `value` without filtering on the discriminator column. The result is a sum across heterogeneous rows — semantically meaningless. Every downstream ratio or percentage built on that aggregate is therefore also wrong, even when the downstream math is correct.

## Synthetic repro

Schema: `finance_fact(year string, market string, metric_line string, value real)`

Sample rows:

| year | market | metric_line       | value |
|------|--------|-------------------|-------|
| 2024 | EU     | Total Revenue     | 1000  |
| 2024 | EU     | Active Headcount  | 320   |
| 2024 | EU     | Operating Expense | 800   |
| 2024 | EU     | Total Revenue     | 200   |

User question: `What is the 2024 revenue for market EU?`

Observed SQL: `SELECT SUM(value) FROM finance_fact WHERE year='2024' AND market='EU'` → 2320 (revenue + headcount + expense).
Expected SQL: `SELECT SUM(value) FROM finance_fact WHERE year='2024' AND market='EU' AND metric_line='Total Revenue'` → 1200.

Downstream knock-on: any answer expressed as a ratio (e.g. revenue / headcount) inherits the same error twice — both numerator and denominator are computed over the wrong row sets — so the percentage looks plausible but is wrong by a large margin.

## Suggested direction

- Prompt: in `fireflyframework_agentic/rag/retrieval/sql.py`, add a worked example showing this pitfall on a `*_line` / `*_type` discriminator column. Steer the agent to call `distinct_values` on any column whose name matches a known discriminator pattern before writing the aggregate.
- Schema annotation: optional `discriminator: true` on `ColumnSpec` that the prompt builder surfaces as "this column MUST appear in the WHERE clause when aggregating other columns from this table."
- Probe-trail check: when `run_select` aggregates a numeric column but no `distinct_values` was inspected for any string column in the same table, append a soft warning to the result the answerer sees.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL agent omits required category-discriminator filter when aggregating a metric fact table #161

Symptom

Synthetic repro

Suggested direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

year	market	metric_line	value
2024	EU	Total Revenue	1000
2024	EU	Active Headcount	320
2024	EU	Operating Expense	800
2024	EU	Total Revenue	200

SQL agent omits required category-discriminator filter when aggregating a metric fact table #161

Description

Symptom

Synthetic repro

Suggested direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions