Skip to content

persistSharingPostpersistPost → recursive getReplies() walk hammers origin /replies (observed 429 rate-limit) #443

@ntsklab

Description

@ntsklab

https://hl.oyasumi.dev/@ntek/019dbe77-60f6-7171-86d9-8fb369ec501d
Opening a ticket regarding this matter.
That said, since we haven’t been able to thoroughly investigate the issue—such as how to reproduce it—I have included below a summary of the problem generated by Opus 4.7 after analyzing the logs.

Summary

When persistSharingPost runs (either directly from a valid Announce
or from a retry caused by the (actor_id, sharing_id) unique-constraint
violation), it calls persistPost on the original object, and
persistPost unconditionally walks the remote replies collection and
recursively re-invokes itself on each reply — which in turn walks that
reply's replies collection. Under normal load this already produces a
lot of traffic; when multiple such activities are processed concurrently
it produces bursty, parallel fetches of the same remote URL at
millisecond intervals, severe enough to be rejected with HTTP 429 by the
origin instance.

Evidence (single offending URL hit three times in ~80 ms)

07:46:33.505  DBG  Fetched document: 200 'https://fedibird.com/users/***/statuses/<status id>/replies?only_other_accounts=true&page=true'
              (x-ratelimit-remaining: 0)
07:46:33.537  DBG  Fetching document: 'GET'  <same URL>
07:46:33.557  DBG  fedify·sig·http: Failed to verify with draft-cavage-http-signatures-12 (429); retrying with rfc9421...
07:46:33.583  DBG  Fetching document: 'GET'  <same URL>   (double-knock retry)
07:46:33.628  ERR  Failed to fetch document: 429 <same URL>
07:46:33.629  ERR  fedify·vocab: Failed to fetch '<same URL>': FetchError: HTTP 429

Stack trace at the 429 failure

at getRemoteDocument (.../@fedify/fedify/.../docloader-*.js)
at load             (.../@fedify/fedify/.../authdocloader-*.js)
at CollectionPage.#fetchNext (.../@fedify/fedify/.../actor-*.js)
at CollectionPage.getNext    (.../@fedify/fedify/.../actor-*.js)
at traverseCollection        (.../@fedify/fedify/dist/vocab/mod.js)
at iterateCollection         (/app/src/federation/collection.ts:19)
at persistPost               (/app/src/federation/post.ts:453)
at persistSharingPost        (/app/src/federation/post.ts:516)

Code path

persistPost (src/federation/post.ts) fetches the replies
collection once up-front:

const replies = await object.getReplies(options);

and later walks the entire collection, recursively persisting each
reply — which itself calls getReplies() on that reply:

if (replies != null) {
  for await (const item of iterateCollection(replies, { ...options, suppressError: true })) {
    if (!isPost(item)) continue;
    await persistPost(db, item, baseUrl, { ...options, skipUpdate: true, replyTarget: post });
    //      ^^^^^^^^^^^ every recursion refetches that reply's `replies` too
  }
}

persistSharingPost (same file) calls persistPost(originalObject, …)
on every invocation. It currently deduplicates only by the Announce
activity IRI, so when the same (actor, sharing) pair is announced
again (e.g. re-reblog or duplicated delivery), the insert fails against
the posts_actor_id_sharing_id_unique constraint, Fedify retries the
activity, and each retry re-enters this code path.

Impact

  • Concurrent inbox handling causes the same origin URL to be refetched
    within tens of milliseconds.
  • A single post with many replies produces a cascade: fetch root
    replies → N recursive persistPost calls → N further getReplies()
    fetches, etc.
  • Observed outcome: 429 from the origin, further replies silently
    dropped.

Suggested fixes

  1. Make persistSharingPost idempotent on (actor_id, sharing_id)
    before insert, so duplicate Announces do not trigger retries and
    re-fetches at all:

    const existingShare = await db.query.posts.findFirst({
      with: {
        account: { with: { owner: true } },
        sharing: { with: { account: { with: { owner: true } } } },
      },
      where: and(
        eq(posts.accountId, account.id),
        eq(posts.sharingId, originalPost.id),
      ),
    });
    if (existingShare != null) return existingShare;

    (or use onConflictDoNothing().returning() + lookup.)

  2. Reconsider whether persistPost must eagerly traverse the remote
    replies collection on every call — in particular, whether it
    should do so at all from persistSharingPost, and whether the
    recursive descent per-reply is necessary. Skipping it, bounding it
    (max depth / max items), or making it lazy would both reduce
    baseline load and eliminate the worst-case burst observed above.

Environment

  • Hollo 0.7.11
  • PostgreSQL 18
  • Affected code: src/federation/post.tspersistSharingPost,
    persistPost; src/federation/collection.tsiterateCollection

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions