fix: add asyncio.Lock to prevent concurrent CUDA downloads#428
fix: add asyncio.Lock to prevent concurrent CUDA downloads#428jamiepine merged 2 commits intojamiepine:mainfrom
Conversation
The startup auto-update task and the manual download endpoint can both invoke download_cuda_binary() concurrently. Without mutual exclusion, both coroutines write to the same temp file path, corrupting the download. The progress-manager status check is a TOCTOU race because the status is not set until after several synchronous checks complete. Add a module-level asyncio.Lock acquired at the top of download_cuda_binary() so only one download can proceed at a time.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughA module-level Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/services/cuda.py`:
- Around line 251-256: The issue is redundant background tasks can be queued
because get_progress() is checked before acquiring _download_lock; update
download admission so only the first request enqueues work: either set the
progress status to "queued" or "downloading" (via the same progress API used by
get_progress()) immediately before awaiting _download_lock in
download_cuda_binary, or move the existing admission-check logic (the
get_progress() check in backend/routes/cuda.py) so it runs while holding
_download_lock; reference the symbols download_cuda_binary,
_download_cuda_binary_locked, _download_lock, and get_progress when making the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Address CodeRabbit review feedback: check _download_lock.locked() before awaiting the lock so concurrent callers return immediately instead of queueing behind the first download. This prevents the route handler from returning "started" to multiple callers when only one download actually proceeds.
Summary
asyncio.Lockinbackend/services/cuda.pyto prevent concurrent calls todownload_cuda_binary()from racing on the same temp fileProblem
The startup auto-update task (
check_and_update_cuda_binary(), fired viacreate_background_taskinapp.py) and the manual download endpoint (POST /backend/download-cuda) can both invokedownload_cuda_binary()concurrently.The manual endpoint has two guards:
get_cuda_binary_path())status == "downloading"Both guards are TOCTOU (time-of-check/time-of-use) races. The auto-update path performs several synchronous checks (
_needs_server_download,_needs_cuda_libs_download, version introspection via subprocess) before it ever callsprogress.update_progress()withstatus="downloading". Until that first status update fires, Guard 2 seesNoneand grants a second concurrent download.Race window:
Both coroutines then write to the same deterministic temp file path (
.download-CUDA-server.tmp), corrupting the download. The SHA-256 verification catches the corruption and aborts, but the error is silently logged with no user-visible feedback.The most realistic trigger is a user clicking "Download CUDA" in the GPU settings immediately after a fresh install — exactly the intended UX flow.
Changes
backend/services/cuda.pyasyncio.Lockand wrappeddownload_cuda_binary()body inasync with _download_lockThe lock is module-level and acquired before any status checks or downloads begin, eliminating the TOCTOU window entirely.
Verification
py_compilepasses on the modified fileasyncio.Lock, notthreading.Lock) matching the existing async architecturedownload_cuda_binary()signature is unchangedSummary by CodeRabbit