`persistSharingPost` → `persistPost` → recursive `getReplies()` walk hammers origin `/replies` (observed 429 rate-limit)

https://hl.oyasumi.dev/@ntek/019dbe77-60f6-7171-86d9-8fb369ec501d
Opening a ticket regarding this matter.
That said, since we haven’t been able to thoroughly investigate the issue—such as how to reproduce it—I have included below a summary of the problem generated by Opus 4.7 after analyzing the logs.

### Summary
When `persistSharingPost` runs (either directly from a valid `Announce`
or from a retry caused by the `(actor_id, sharing_id)` unique-constraint
violation), it calls `persistPost` on the original object, and
`persistPost` unconditionally walks the remote `replies` collection and
recursively re-invokes itself on each reply — which in turn walks that
reply's `replies` collection. Under normal load this already produces a
lot of traffic; when multiple such activities are processed concurrently
it produces bursty, parallel fetches of the same remote URL at
millisecond intervals, severe enough to be rejected with HTTP 429 by the
origin instance.

### Evidence (single offending URL hit three times in ~80 ms)

```
07:46:33.505  DBG  Fetched document: 200 'https://fedibird.com/users/***/statuses/<status id>/replies?only_other_accounts=true&page=true'
              (x-ratelimit-remaining: 0)
07:46:33.537  DBG  Fetching document: 'GET'  <same URL>
07:46:33.557  DBG  fedify·sig·http: Failed to verify with draft-cavage-http-signatures-12 (429); retrying with rfc9421...
07:46:33.583  DBG  Fetching document: 'GET'  <same URL>   (double-knock retry)
07:46:33.628  ERR  Failed to fetch document: 429 <same URL>
07:46:33.629  ERR  fedify·vocab: Failed to fetch '<same URL>': FetchError: HTTP 429
```

### Stack trace at the 429 failure

```
at getRemoteDocument (.../@fedify/fedify/.../docloader-*.js)
at load             (.../@fedify/fedify/.../authdocloader-*.js)
at CollectionPage.#fetchNext (.../@fedify/fedify/.../actor-*.js)
at CollectionPage.getNext    (.../@fedify/fedify/.../actor-*.js)
at traverseCollection        (.../@fedify/fedify/dist/vocab/mod.js)
at iterateCollection         (/app/src/federation/collection.ts:19)
at persistPost               (/app/src/federation/post.ts:453)
at persistSharingPost        (/app/src/federation/post.ts:516)
```

### Code path

`persistPost` (`src/federation/post.ts`) fetches the `replies`
collection once up-front:

```ts
const replies = await object.getReplies(options);
```

and later walks the entire collection, recursively persisting each
reply — which itself calls `getReplies()` on that reply:

```ts
if (replies != null) {
  for await (const item of iterateCollection(replies, { ...options, suppressError: true })) {
    if (!isPost(item)) continue;
    await persistPost(db, item, baseUrl, { ...options, skipUpdate: true, replyTarget: post });
    //      ^^^^^^^^^^^ every recursion refetches that reply's `replies` too
  }
}
```

`persistSharingPost` (same file) calls `persistPost(originalObject, …)`
on every invocation. It currently deduplicates only by the `Announce`
activity IRI, so when the same `(actor, sharing)` pair is announced
again (e.g. re-reblog or duplicated delivery), the insert fails against
the `posts_actor_id_sharing_id_unique` constraint, Fedify retries the
activity, and each retry re-enters this code path.

### Impact
- Concurrent inbox handling causes the same origin URL to be refetched
  within tens of milliseconds.
- A single post with many replies produces a cascade: fetch root
  replies → N recursive `persistPost` calls → N further `getReplies()`
  fetches, etc.
- Observed outcome: 429 from the origin, further replies silently
  dropped.

### Suggested fixes
1. Make `persistSharingPost` idempotent on `(actor_id, sharing_id)`
   before insert, so duplicate Announces do not trigger retries and
   re-fetches at all:

   ```ts
   const existingShare = await db.query.posts.findFirst({
     with: {
       account: { with: { owner: true } },
       sharing: { with: { account: { with: { owner: true } } } },
     },
     where: and(
       eq(posts.accountId, account.id),
       eq(posts.sharingId, originalPost.id),
     ),
   });
   if (existingShare != null) return existingShare;
   ```

   (or use `onConflictDoNothing().returning()` + lookup.)

2. Reconsider whether `persistPost` must eagerly traverse the remote
   `replies` collection on every call — in particular, whether it
   should do so at all from `persistSharingPost`, and whether the
   recursive descent per-reply is necessary. Skipping it, bounding it
   (max depth / max items), or making it lazy would both reduce
   baseline load and eliminate the worst-case burst observed above.

### Environment
- Hollo 0.7.11
- PostgreSQL 18
- Affected code: `src/federation/post.ts` — `persistSharingPost`,
  `persistPost`; `src/federation/collection.ts` — `iterateCollection`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`persistSharingPost` → `persistPost` → recursive `getReplies()` walk hammers origin `/replies` (observed 429 rate-limit) #443

Summary

Evidence (single offending URL hit three times in ~80 ms)

Stack trace at the 429 failure

Code path

Impact

Suggested fixes

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

persistSharingPost → persistPost → recursive getReplies() walk hammers origin /replies (observed 429 rate-limit) #443

Description

Summary

Evidence (single offending URL hit three times in ~80 ms)

Stack trace at the 429 failure

Code path

Impact

Suggested fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`persistSharingPost` → `persistPost` → recursive `getReplies()` walk hammers origin `/replies` (observed 429 rate-limit) #443