Skip to content

Defer run_check! to Sidekiq to fix web dyno memory pressure#216

Merged
JuanVqz merged 5 commits intomainfrom
feature/defer-run-check-to-sidekiq
May 6, 2026
Merged

Defer run_check! to Sidekiq to fix web dyno memory pressure#216
JuanVqz merged 5 commits intomainfrom
feature/defer-run-check-to-sidekiq

Conversation

@JuanVqz
Copy link
Copy Markdown
Member

@JuanVqz JuanVqz commented May 6, 2026

Closes #209

Problem

POST /lockfiles was holding the Puma web thread for the entire duration of run_check!. For a 182-gem lockfile that was 366 sequential DB + Redis round-trips before the response returned:

Lockfile.save                        → 1 INSERT
LockfileCheck.create_for!            → 1 INSERT (LockfileCheck)
  lockfile.gems.each                 → N INSERTs (GemCheck, one per gem)
enqueue_gem_checks                   → N Redis pushes (perform_async, one per gem)
render 202                           ← held until all 2N+2 round-trips done

Each Puma worker retained all those ActiveRecord objects in memory for the lifetime of the request. Ruby's heap does not release pages back to the OS after GC, so repeated large requests cause RSS to ratchet up permanently. This is consistent with the R14 pattern observed in production (average 114% quota, peak 209%).

Changes

Lockfiles::StartCheck (new Sidekiq job)

Moves all post-save work off the web thread. The controller now does exactly:

Lockfile.save                        → 1 INSERT
Lockfiles::StartCheck.perform_async  → 1 Redis push
render 202                           ← returns immediately

The job does the LockfileCheck creation and gem check enqueue asynchronously on a worker dyno.

LockfileCheck.create_for!insert_all

Replaces the N-INSERT loop with a single batched insert_all. All GemCheck rows are built in memory and written in one SQL statement.

Before: N sequential INSERT INTO gem_checks (one per gem)
After: 1 INSERT INTO gem_checks ... VALUES (...), (...), (...) for all gems

LockfileCheck#enqueue_gem_checksperform_bulk

Replaces N individual perform_async calls with one perform_bulk call.

Before: N sequential LPUSH to Redis (one per gem_check)
After: 1 bulk Redis push for all pending gem_checks

Lockfile#run_check!

Now a thin delegate to Lockfiles::StartCheck.perform_async. Accepts the same rails_release: keyword for explicit override.

Complexity reduction

For a 182-gem lockfile:

Before After
Web thread DB calls 183 1
Web thread Redis calls 182 1
Total web round-trips 366 2
Worker DB calls 0 2
Worker Redis calls 0 1

Tests

  • 27 examples, 0 failures
  • lockfile_spec.rb#run_check! now asserts StartCheck is enqueued with correct ids
  • lockfile_check_spec.rb#enqueue_gem_checks now asserts perform_bulk is called
  • api/lockfiles_controller_spec.rb — asserts StartCheck is enqueued on POST
  • start_check_spec.rb — new, covers the happy path, no-next-release no-op, and deleted-lockfile error

POST /lockfiles was holding the web thread for 366+ sequential DB+Redis
round-trips on a 182-gem lockfile (1 LockfileCheck INSERT + N GemCheck
INSERTs + N perform_async calls). Under concurrent load, multiple Puma
workers each retained hundreds of AR objects, causing RSS to grow and
never release back to the OS.

Changes:
- Add Lockfiles::StartCheck Sidekiq job: does the LockfileCheck +
  GemCheck creation and enqueue off the web thread
- Lockfile#run_check! now calls perform_async instead of doing the
  work inline; controllers return in milliseconds regardless of
  lockfile size
- LockfileCheck.create_for! replaces the N-INSERT gem loop with a
  single insert_all (one batched INSERT for all GemCheck rows)
- LockfileCheck#enqueue_gem_checks replaces N perform_async calls
  with a single perform_bulk (one Redis push for all pending checks)
- Update specs throughout to match new async contract
@JuanVqz JuanVqz self-assigned this May 6, 2026
@JuanVqz JuanVqz requested a review from a team May 6, 2026 15:41
JuanVqz added 4 commits May 6, 2026 09:44
- Rename 'no-op on deleted lockfile' to 'raises RecordNotFound' since
  raising is not a no-op; behavior is intentional (Sidekiq retries cover it)
- Rewrite 'uses next_rails_release' test to actually assert a LockfileCheck
  is created targeting the correct release, not just that perform_bulk runs
- Add comment on insert_all omitting returning: and the two-round-trip tradeoff
@JuanVqz JuanVqz marked this pull request as ready for review May 6, 2026 15:51
@JuanVqz JuanVqz merged commit cbbcc1a into main May 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Defer run_check! to a Sidekiq job to keep POST /lockfiles cheap

1 participant