Summary
df.metrics() appears to count failed rows from duroxide.executions, including orphan executions created when df.start() is rolled back, while df.instances and df.list_instances('failed') only show persisted workflow instances.
That makes the failed instance count disagree across public APIs after rollback scenarios.
Observed
Tested against current main at 11ac64e3adb64c14386be5c737b3a3806d873fc4.
After rollback-oriented tests, the counts diverged:
source total completed failed running
df.metrics() 399 392 7 0
df.instances 396 392 4 0
duroxide.executions 399 392 7 0
The extra failed rows were in duroxide.executions with no matching row in df.instances:
SELECT
e.instance_id,
e.execution_id,
e.status,
left(e.output, 180) AS output_prefix,
i.id AS df_instance_id
FROM duroxide.executions e
LEFT JOIN df.instances i ON i.id = e.instance_id
WHERE e.status = 'Failed'
AND i.id IS NULL
ORDER BY e.instance_id;
Example output prefix:
Instance <id> not found after 5s (transaction may have been rolled back)
So df.metrics() reports these as failed instances even though df.instances and df.list_instances('failed') do not expose them as failed workflow instances.
Repro Shape
One way to trigger this is to start a workflow inside a transaction that later rolls back, wait for the worker to observe the missing instance, then compare the metrics API with df.instances.
BEGIN;
SELECT df.start('SELECT 1', 'rollback-metrics-probe');
ROLLBACK;
-- wait long enough for the worker to record the missing instance failure
SELECT * FROM df.metrics();
SELECT status, count(*)
FROM df.instances
GROUP BY status;
SELECT e.instance_id, e.status, e.output, i.id AS df_instance_id
FROM duroxide.executions e
LEFT JOIN df.instances i ON i.id = e.instance_id
WHERE e.status = 'Failed'
AND i.id IS NULL;
Expected
Either:
df.metrics() should count the same persisted workflow instances that df.instances / df.list_instances() expose, or
- the docs should clearly state that
df.metrics().failed_instances includes lower-level failed duroxide.executions, including orphan executions created by rolled-back starts.
For dashboards and alerting, the current behavior makes rollback probes look like durable workflow failures.
Notes
From a quick source read, df.metrics() appears to come from the generated get_system_metrics() path and counts failed rows in duroxide.executions. That explains why it can diverge from df.instances after rollback.
Summary
df.metrics()appears to count failed rows fromduroxide.executions, including orphan executions created whendf.start()is rolled back, whiledf.instancesanddf.list_instances('failed')only show persisted workflow instances.That makes the failed instance count disagree across public APIs after rollback scenarios.
Observed
Tested against current
mainat11ac64e3adb64c14386be5c737b3a3806d873fc4.After rollback-oriented tests, the counts diverged:
The extra failed rows were in
duroxide.executionswith no matching row indf.instances:Example output prefix:
So
df.metrics()reports these as failed instances even thoughdf.instancesanddf.list_instances('failed')do not expose them as failed workflow instances.Repro Shape
One way to trigger this is to start a workflow inside a transaction that later rolls back, wait for the worker to observe the missing instance, then compare the metrics API with
df.instances.Expected
Either:
df.metrics()should count the same persisted workflow instances thatdf.instances/df.list_instances()expose, ordf.metrics().failed_instancesincludes lower-level failedduroxide.executions, including orphan executions created by rolled-back starts.For dashboards and alerting, the current behavior makes rollback probes look like durable workflow failures.
Notes
From a quick source read,
df.metrics()appears to come from the generatedget_system_metrics()path and counts failed rows induroxide.executions. That explains why it can diverge fromdf.instancesafter rollback.