Skip to content

fix: add asyncio.Lock to prevent concurrent CUDA downloads#428

Merged
jamiepine merged 2 commits intojamiepine:mainfrom
shaun0927:fix/cuda-download-race-condition
Apr 16, 2026
Merged

fix: add asyncio.Lock to prevent concurrent CUDA downloads#428
jamiepine merged 2 commits intojamiepine:mainfrom
shaun0927:fix/cuda-download-race-condition

Conversation

@shaun0927
Copy link
Copy Markdown
Contributor

@shaun0927 shaun0927 commented Apr 16, 2026

Summary

  • Add a module-level asyncio.Lock in backend/services/cuda.py to prevent concurrent calls to download_cuda_binary() from racing on the same temp file

Problem

The startup auto-update task (check_and_update_cuda_binary(), fired via create_background_task in app.py) and the manual download endpoint (POST /backend/download-cuda) can both invoke download_cuda_binary() concurrently.

The manual endpoint has two guards:

  1. Check if the binary already exists on disk (get_cuda_binary_path())
  2. Check if the progress manager reports status == "downloading"

Both guards are TOCTOU (time-of-check/time-of-use) races. The auto-update path performs several synchronous checks (_needs_server_download, _needs_cuda_libs_download, version introspection via subprocess) before it ever calls progress.update_progress() with status="downloading". Until that first status update fires, Guard 2 sees None and grants a second concurrent download.

Race window:

startup:  check_and_update → _needs_server_download()=True → ...
                                                  |
                                         [asyncio yields here]
                                                  |
HTTP:     POST /backend/download-cuda arrives during this gap
          guard: get_progress("cuda-backend") → None (not set yet)
          → starts its own download_cuda_binary()
                                                  |
startup:  resumes → also calls download_cuda_binary()

Both coroutines then write to the same deterministic temp file path (.download-CUDA-server.tmp), corrupting the download. The SHA-256 verification catches the corruption and aborts, but the error is silently logged with no user-visible feedback.

The most realistic trigger is a user clicking "Download CUDA" in the GPU settings immediately after a fresh install — exactly the intended UX flow.

Changes

File Change
backend/services/cuda.py Added asyncio.Lock and wrapped download_cuda_binary() body in async with _download_lock

The lock is module-level and acquired before any status checks or downloads begin, eliminating the TOCTOU window entirely.

Verification

  • py_compile passes on the modified file
  • The lock is cooperative (asyncio.Lock, not threading.Lock) matching the existing async architecture
  • Existing callers are unaffected — the public API download_cuda_binary() signature is unchanged

Summary by CodeRabbit

  • Bug Fixes
    • Improved CUDA binary download handling: concurrent requests are now serialized so automatic updates and manual requests no longer run duplicate downloads or overlapping extraction/progress checks, preventing conflicts and preserving consistent progress/state for shared temp/destination files. This reduces failed or partial downloads, duplicate progress indicators, and unexpected file corruption.

The startup auto-update task and the manual download endpoint can both
invoke download_cuda_binary() concurrently. Without mutual exclusion,
both coroutines write to the same temp file path, corrupting the
download. The progress-manager status check is a TOCTOU race because
the status is not set until after several synchronous checks complete.

Add a module-level asyncio.Lock acquired at the top of
download_cuda_binary() so only one download can proceed at a time.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17eba3b9-54f4-45af-a96a-ef0acd115059

📥 Commits

Reviewing files that changed from the base of the PR and between 30c3677 and 9f04b32.

📒 Files selected for processing (1)
  • backend/services/cuda.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/services/cuda.py

📝 Walkthrough

Walkthrough

A module-level asyncio.Lock named _download_lock is added and download_cuda_binary() now short-circuits if the lock is held; otherwise it acquires the lock and delegates to a new internal coroutine _download_cuda_binary_locked() which contains the original download logic, serializing concurrent downloads.

Changes

Cohort / File(s) Summary
CUDA Download Synchronization
backend/services/cuda.py
Added module-level asyncio.Lock _download_lock. download_cuda_binary() now checks locked() and returns early if held; otherwise it async with _download_lock and calls new internal _download_cuda_binary_locked(version) which contains the prior download/extract logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 A lock in the burrow to keep things right,
Two tasks once tangled, now polite.
Downloads wait in a tidy queue,
No clashing files, just calm and true.
Hooray — safe binaries through the night! 🎉

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: adding an asyncio.Lock to serialize CUDA downloads and prevent race conditions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/services/cuda.py`:
- Around line 251-256: The issue is redundant background tasks can be queued
because get_progress() is checked before acquiring _download_lock; update
download admission so only the first request enqueues work: either set the
progress status to "queued" or "downloading" (via the same progress API used by
get_progress()) immediately before awaiting _download_lock in
download_cuda_binary, or move the existing admission-check logic (the
get_progress() check in backend/routes/cuda.py) so it runs while holding
_download_lock; reference the symbols download_cuda_binary,
_download_cuda_binary_locked, _download_lock, and get_progress when making the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f80f2f08-bd23-4689-8421-006e36eb45d7

📥 Commits

Reviewing files that changed from the base of the PR and between 75abbb0 and 30c3677.

📒 Files selected for processing (1)
  • backend/services/cuda.py

Comment thread backend/services/cuda.py
Address CodeRabbit review feedback: check _download_lock.locked()
before awaiting the lock so concurrent callers return immediately
instead of queueing behind the first download. This prevents the
route handler from returning "started" to multiple callers when only
one download actually proceeds.
@jamiepine jamiepine merged commit be7c0ce into jamiepine:main Apr 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants