Skip to content

SQL agent omits required category-discriminator filter when aggregating a metric fact table #161

@javier-alvarez

Description

@javier-alvarez

Symptom

When a fact table mixes multiple metric categories in one column (e.g. a metric_line column whose distinct values include Total Revenue, Active Headcount, Operating Expense), the SQL agent sometimes writes an aggregate over value without filtering on the discriminator column. The result is a sum across heterogeneous rows — semantically meaningless. Every downstream ratio or percentage built on that aggregate is therefore also wrong, even when the downstream math is correct.

Synthetic repro

Schema: finance_fact(year string, market string, metric_line string, value real)

Sample rows:

year market metric_line value
2024 EU Total Revenue 1000
2024 EU Active Headcount 320
2024 EU Operating Expense 800
2024 EU Total Revenue 200

User question: What is the 2024 revenue for market EU?

Observed SQL: SELECT SUM(value) FROM finance_fact WHERE year='2024' AND market='EU' → 2320 (revenue + headcount + expense).
Expected SQL: SELECT SUM(value) FROM finance_fact WHERE year='2024' AND market='EU' AND metric_line='Total Revenue' → 1200.

Downstream knock-on: any answer expressed as a ratio (e.g. revenue / headcount) inherits the same error twice — both numerator and denominator are computed over the wrong row sets — so the percentage looks plausible but is wrong by a large margin.

Suggested direction

  • Prompt: in fireflyframework_agentic/rag/retrieval/sql.py, add a worked example showing this pitfall on a *_line / *_type discriminator column. Steer the agent to call distinct_values on any column whose name matches a known discriminator pattern before writing the aggregate.
  • Schema annotation: optional discriminator: true on ColumnSpec that the prompt builder surfaces as "this column MUST appear in the WHERE clause when aggregating other columns from this table."
  • Probe-trail check: when run_select aggregates a numeric column but no distinct_values was inspected for any string column in the same table, append a soft warning to the result the answerer sees.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions