Hi everyone,
We've started to build Durable Pipelines in in the VS Code PostgreSQL extension. We hit a few rough edges in df.instances that we'd love your help smoothing. They all share one root cause: external clients (like us) can't paginate df.instances efficiently today.
The first durable surface our users will see in the MVP is a paginated Instances list sourced straight from df.instances. We identified three independent gaps surfaced. They land cleanly in any order, so we've ranked them by impact for convenience, but whichever you pick up first unblocks us.
Where we got stuck
There's no index on df.instances.created_at. The canonical list API df.list_instances has no pagination or total count support, and it omits timestamps. The random 8 character id can't serve as a proxy for chronological ordering either. So any client that wants a paginated, timestamped Instances list in chronological order has two options: (a) sort the full table on every page request, or (b) fetch a bounded window in one batch and paginate on the client as a workaround.
Root cause: three independent gaps
Gap 1: df.instances has no index for time ordering
Per src/lib.rs:190, 193, the only df.* indexes today are:
CREATE INDEX IF NOT EXISTS idx_instances_status ON df.instances(status);
CREATE INDEX IF NOT EXISTS idx_nodes_instance ON df.nodes(instance_id);
Notably absent: any index supporting ORDER BY created_at or completed_at. Every chronological query against df.instances requires a full table sort.
We considered using id as a proxy for chronological order to lean on the PK index. But short_id() (src/types.rs:91-101) keeps only the trailing 8 hex characters of a fresh UUIDv4. Fully random, no time component. So ORDER BY id is arbitrary, not chronological. No proxy is possible.
Gap 2: df.list_instances lacks pagination and total count
Per src/monitoring.rs:17-107, df.list_instances(status_filter, limit_count):
- Runs
SELECT id, label FROM df.instances ORDER BY created_at DESC LIMIT $N (gated by Gap 1).
- Caps at 10 000 rows.
- Has no
OFFSET and no total count return value.
- Enriches each row with a
client.get_instance_info(id) call per row.
There's no canonical API a downstream client can call to render "Page X of Y".
Gap 3: df.list_instances omits timestamps
The function returns (instance_id, label, function_name, status, execution_count, output). It does not return created_at, completed_at, or duration. Any UI that wants to show "when did this run?" has to run a second query against df.instances to enrich. That's PK keyed, so it's cheap, but it's still an extra round trip on every list call. And the data is right there in the function.
Our current workaround
In MVP, we plan to:
- Call
df.list_instances(NULL, 200) to fetch a bounded recent window (still gated by Gap 1, but only one sort per list call rather than one per page click).
- Enrich via
SELECT id, created_at, completed_at FROM df.instances WHERE id = ANY($ids). PK keyed, fast (workaround for Gap 3).
- Paginate on the client through the 200 row window (Prev / current page / Next, page size 20).
- Surface a banner when the window cap is hit: "Showing the 200 most recent instances. Extended pagination ships in a later release."
Functional, but it caps user visible history at 200 rows and still incurs a full table sort on every list operation. We want to swap in real server side pagination as soon as the upstream supports it.
Proposed changes
1. Add a (created_at DESC, id) index on df.instances
CREATE INDEX IF NOT EXISTS idx_instances_created_at_desc_id
ON df.instances (created_at DESC, id);
Smallest change with the largest payoff. Unblocks efficient ORDER BY created_at DESC for every downstream client, including the existing df.list_instances.
2. Add a paginated server side list function
Just a starting point to react to, we're flexible on the exact shape (keyset vs offset, total_count snapshot vs has_more, etc.):
df.list_instances_paginated(
status_filter text DEFAULT NULL,
limit_count int DEFAULT 100,
after_cursor text DEFAULT NULL
) RETURNS TABLE (
instance_id text,
label text,
function_name text,
status text,
execution_count bigint,
output text,
created_at timestamptz,
completed_at timestamptz,
total_count bigint,
next_cursor text
);
Happy to align on whatever idiom fits the extension best.
3. Return created_at and completed_at from df.list_instances
The smallest of the three. Even if (2) is out of scope, this alone would eliminate our enrichment query.
If any of these are welcome but lower priority on your side, give us a shout. We're happy to send PRs.
Hi everyone,
We've started to build Durable Pipelines in in the VS Code PostgreSQL extension. We hit a few rough edges in
df.instancesthat we'd love your help smoothing. They all share one root cause: external clients (like us) can't paginatedf.instancesefficiently today.The first durable surface our users will see in the MVP is a paginated Instances list sourced straight from
df.instances. We identified three independent gaps surfaced. They land cleanly in any order, so we've ranked them by impact for convenience, but whichever you pick up first unblocks us.Where we got stuck
There's no index on
df.instances.created_at. The canonical list APIdf.list_instanceshas no pagination or total count support, and it omits timestamps. The random 8 characteridcan't serve as a proxy for chronological ordering either. So any client that wants a paginated, timestamped Instances list in chronological order has two options: (a) sort the full table on every page request, or (b) fetch a bounded window in one batch and paginate on the client as a workaround.Root cause: three independent gaps
Gap 1:
df.instanceshas no index for time orderingPer
src/lib.rs:190, 193, the onlydf.*indexes today are:Notably absent: any index supporting
ORDER BY created_atorcompleted_at. Every chronological query againstdf.instancesrequires a full table sort.We considered using
idas a proxy for chronological order to lean on the PK index. Butshort_id()(src/types.rs:91-101) keeps only the trailing 8 hex characters of a fresh UUIDv4. Fully random, no time component. SoORDER BY idis arbitrary, not chronological. No proxy is possible.Gap 2:
df.list_instanceslacks pagination and total countPer
src/monitoring.rs:17-107,df.list_instances(status_filter, limit_count):SELECT id, label FROM df.instances ORDER BY created_at DESC LIMIT $N(gated by Gap 1).OFFSETand no total count return value.client.get_instance_info(id)call per row.There's no canonical API a downstream client can call to render "Page X of Y".
Gap 3:
df.list_instancesomits timestampsThe function returns
(instance_id, label, function_name, status, execution_count, output). It does not returncreated_at,completed_at, or duration. Any UI that wants to show "when did this run?" has to run a second query againstdf.instancesto enrich. That's PK keyed, so it's cheap, but it's still an extra round trip on every list call. And the data is right there in the function.Our current workaround
In MVP, we plan to:
df.list_instances(NULL, 200)to fetch a bounded recent window (still gated by Gap 1, but only one sort per list call rather than one per page click).SELECT id, created_at, completed_at FROM df.instances WHERE id = ANY($ids). PK keyed, fast (workaround for Gap 3).Functional, but it caps user visible history at 200 rows and still incurs a full table sort on every list operation. We want to swap in real server side pagination as soon as the upstream supports it.
Proposed changes
1. Add a
(created_at DESC, id)index ondf.instancesSmallest change with the largest payoff. Unblocks efficient
ORDER BY created_at DESCfor every downstream client, including the existingdf.list_instances.2. Add a paginated server side list function
Just a starting point to react to, we're flexible on the exact shape (keyset vs offset, total_count snapshot vs has_more, etc.):
Happy to align on whatever idiom fits the extension best.
3. Return
created_atandcompleted_atfromdf.list_instancesThe smallest of the three. Even if (2) is out of scope, this alone would eliminate our enrichment query.
If any of these are welcome but lower priority on your side, give us a shout. We're happy to send PRs.