Skip to content

Fix: core.async/timeout combined with core.async/go prevented fat dynamic var GC#75769

Merged
perivamsi merged 2 commits into
masterfrom
galdre/oom-june-11-26
Jun 12, 2026
Merged

Fix: core.async/timeout combined with core.async/go prevented fat dynamic var GC#75769
perivamsi merged 2 commits into
masterfrom
galdre/oom-june-11-26

Conversation

@galdre

@galdre galdre commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Closes #75748.

Description

Well this was weird.

async.core/go somewhat reasonably conveys dynamic bindings to a new thread. core.async.impl.timers somewhat unreasonably stashes timeout queue entries without weak references.

It's got to be an unusual scenario, though: we put a long timeout (20 minutes) on an async.core/timeout channel inside a go block, and that results in our cached metadata provider (which is usually created and thrown away immediately) being un-GC-able for 20 minutes, no matter whether we've forgotten about that metadata-provider and assumed it would be cleaned up. Run this code in a tight loop, multiple times per second, caching data warehouse metadata in each one, and 💥 .

Every single query-processor invocation (dashboard card, API query, internal compile...) left behind a ~1MB memory anchor that nothing could release until the 20-minute timer expired.

timeouts-queue (defonce static, timers.clj:17)         ← GC root, JVM lifetime
 └─ TimeoutQueueEntry (strong ref, timers.clj:25)      ← removable only by daemon at deadline
     └─ ManyToManyChannel (timeout chan)
         └─ takes LinkedList (channels.clj:38)         ← cleanup never runs: no ops until expiry
             └─ deactivated alt-handler (alts! leftovers)
                 └─ go state-machine array
                     └─ BINDINGS-IDX: Var$Frame (go.clj:1048)
                         └─ qp.store/*metadata-provider* binding
                             └─ InvocationTracker → CachedProxyMetadataProvider (~917KB avg)

Regression test included.

Alternatives Considered

  • a/take! with a callback + a done-flag works (this performs no binding conveyance). But this would be one well-meaning "simplify this to a go block" away from regressing. We use scheduled-executors elsewhere already.
  • Closing the cancel channel on completion unparks the driver listeners, but the channel is sometimes owned by callers who reuse it across several queries (transforms), so closing it from inside one query seems wrong.

Checklist

  • Tests have been added/updated to cover changes in this PR

@galdre galdre requested a review from a team as a code owner June 11, 2026 23:46
@galdre galdre added the single-backport Automatically create PR on current release branch on merge label Jun 11, 2026
@galdre galdre changed the title This has to be the strangest Clojure footgun I've seen in many years. Fix: async.core/timeout combined with async.core/go prevented fat dynamic var GC Jun 11, 2026
@galdre galdre changed the title Fix: async.core/timeout combined with async.core/go prevented fat dynamic var GC Fix: core.async/timeout combined with core.async/go prevented fat dynamic var GC Jun 11, 2026
@perivamsi perivamsi merged commit e112552 into master Jun 12, 2026
573 of 585 checks passed
@perivamsi perivamsi deleted the galdre/oom-june-11-26 branch June 12, 2026 15:00
eliotmetabase pushed a commit that referenced this pull request Jun 12, 2026
… prevented fat dynamic var GC" (#75796)

Fix: `core.async/timeout` combined with `core.async/go` prevented fat dynamic var GC (#75769)

* This has to be the strangest Clojure footgun I've seen in many years.

* Kondo-ignore in the test file what the src file ignores

Co-authored-by: Timothy S. Dean <timothy.dean@metabase.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

single-backport Automatically create PR on current release branch on merge .Team/Graphy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate OOMs on Dependency Checks

3 participants