Enabling efficient paginated listing of df.instances for external clients

Hi everyone,

We've started to build Durable Pipelines in in the VS Code PostgreSQL extension. We hit a few rough edges in `df.instances` that we'd love your help smoothing. They all share one root cause: external clients (like us) can't paginate `df.instances` efficiently today.

The first durable surface our users will see in the MVP is a paginated **Instances list** sourced straight from `df.instances`. We identified three independent gaps surfaced. They land cleanly in any order, so we've ranked them by impact for convenience, but whichever you pick up first unblocks us.

## Where we got stuck

There's no index on `df.instances.created_at`. The canonical list API `df.list_instances` has no pagination or total count support, and it omits timestamps. The random 8 character `id` can't serve as a proxy for chronological ordering either. So any client that wants a paginated, timestamped Instances list in chronological order has two options: (a) sort the full table on every page request, or (b) fetch a bounded window in one batch and paginate on the client as a workaround.

## Root cause: three independent gaps

### Gap 1: `df.instances` has no index for time ordering

Per `src/lib.rs:190, 193`, the only `df.*` indexes today are:

```sql
CREATE INDEX IF NOT EXISTS idx_instances_status ON df.instances(status);
CREATE INDEX IF NOT EXISTS idx_nodes_instance ON df.nodes(instance_id);
```

Notably absent: any index supporting `ORDER BY created_at` or `completed_at`. Every chronological query against `df.instances` requires a full table sort.

We considered using `id` as a proxy for chronological order to lean on the PK index. But `short_id()` (`src/types.rs:91-101`) keeps only the trailing 8 hex characters of a fresh UUIDv4. Fully random, no time component. So `ORDER BY id` is arbitrary, not chronological. No proxy is possible.

### Gap 2: `df.list_instances` lacks pagination and total count

Per `src/monitoring.rs:17-107`, `df.list_instances(status_filter, limit_count)`:

- Runs `SELECT id, label FROM df.instances ORDER BY created_at DESC LIMIT $N` (gated by Gap 1).
- Caps at 10 000 rows.
- Has no `OFFSET` and no total count return value.
- Enriches each row with a `client.get_instance_info(id)` call per row.

There's no canonical API a downstream client can call to render "Page X of Y". 

### Gap 3: `df.list_instances` omits timestamps

The function returns `(instance_id, label, function_name, status, execution_count, output)`. It does not return `created_at`, `completed_at`, or duration. Any UI that wants to show "when did this run?" has to run a second query against `df.instances` to enrich. That's PK keyed, so it's cheap, but it's still an extra round trip on every list call. And the data is right there in the function.

## Our current workaround

In MVP, we plan to:

1. Call `df.list_instances(NULL, 200)` to fetch a bounded recent window (still gated by Gap 1, but only one sort per list call rather than one per page click).
2. Enrich via `SELECT id, created_at, completed_at FROM df.instances WHERE id = ANY($ids)`. PK keyed, fast (workaround for Gap 3).
3. Paginate on the client through the 200 row window (Prev / current page / Next, page size 20).
4. Surface a banner when the window cap is hit: *"Showing the 200 most recent instances. Extended pagination ships in a later release."*

Functional, but it caps user visible history at 200 rows and still incurs a full table sort on every list operation. We want to swap in real server side pagination as soon as the upstream supports it.

## Proposed changes

### 1. Add a `(created_at DESC, id)` index on `df.instances`

```sql
CREATE INDEX IF NOT EXISTS idx_instances_created_at_desc_id
    ON df.instances (created_at DESC, id);
```

Smallest change with the largest payoff. Unblocks efficient `ORDER BY created_at DESC` for every downstream client, including the existing `df.list_instances`.

### 2. Add a paginated server side list function

Just a starting point to react to, we're flexible on the exact shape (keyset vs offset, total_count snapshot vs has_more, etc.):

```sql
df.list_instances_paginated(
    status_filter text DEFAULT NULL,
    limit_count   int  DEFAULT 100,
    after_cursor  text DEFAULT NULL
) RETURNS TABLE (
    instance_id     text,
    label           text,
    function_name   text,
    status          text,
    execution_count bigint,
    output          text,
    created_at      timestamptz,
    completed_at    timestamptz,
    total_count     bigint,
    next_cursor     text
);
```

Happy to align on whatever idiom fits the extension best.

### 3. Return `created_at` and `completed_at` from `df.list_instances`

The smallest of the three. Even if (2) is out of scope, this alone would eliminate our enrichment query.

--------------------------------------

If any of these are welcome but lower priority on your side, give us a shout. We're happy to send PRs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling efficient paginated listing of df.instances for external clients #146

Where we got stuck

Root cause: three independent gaps

Gap 1: `df.instances` has no index for time ordering

Gap 2: `df.list_instances` lacks pagination and total count

Gap 3: `df.list_instances` omits timestamps

Our current workaround

Proposed changes

1. Add a `(created_at DESC, id)` index on `df.instances`

2. Add a paginated server side list function

3. Return `created_at` and `completed_at` from `df.list_instances`

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Enabling efficient paginated listing of df.instances for external clients #146

Description

Where we got stuck

Root cause: three independent gaps

Gap 1: df.instances has no index for time ordering

Gap 2: df.list_instances lacks pagination and total count

Gap 3: df.list_instances omits timestamps

Our current workaround

Proposed changes

1. Add a (created_at DESC, id) index on df.instances

2. Add a paginated server side list function

3. Return created_at and completed_at from df.list_instances

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Gap 1: `df.instances` has no index for time ordering

Gap 2: `df.list_instances` lacks pagination and total count

Gap 3: `df.list_instances` omits timestamps

1. Add a `(created_at DESC, id)` index on `df.instances`

3. Return `created_at` and `completed_at` from `df.list_instances`