fix: add asyncio.Lock to prevent concurrent CUDA downloads by shaun0927 · Pull Request #428 · jamiepine/voicebox

shaun0927 · 2026-04-16T03:14:52Z

Summary

Add a module-level asyncio.Lock in backend/services/cuda.py to prevent concurrent calls to download_cuda_binary() from racing on the same temp file

Problem

The startup auto-update task (check_and_update_cuda_binary(), fired via create_background_task in app.py) and the manual download endpoint (POST /backend/download-cuda) can both invoke download_cuda_binary() concurrently.

The manual endpoint has two guards:

Check if the binary already exists on disk (get_cuda_binary_path())
Check if the progress manager reports status == "downloading"

Both guards are TOCTOU (time-of-check/time-of-use) races. The auto-update path performs several synchronous checks (_needs_server_download, _needs_cuda_libs_download, version introspection via subprocess) before it ever calls progress.update_progress() with status="downloading". Until that first status update fires, Guard 2 sees None and grants a second concurrent download.

Race window:

startup:  check_and_update → _needs_server_download()=True → ...
                                                  |
                                         [asyncio yields here]
                                                  |
HTTP:     POST /backend/download-cuda arrives during this gap
          guard: get_progress("cuda-backend") → None (not set yet)
          → starts its own download_cuda_binary()
                                                  |
startup:  resumes → also calls download_cuda_binary()

Both coroutines then write to the same deterministic temp file path (.download-CUDA-server.tmp), corrupting the download. The SHA-256 verification catches the corruption and aborts, but the error is silently logged with no user-visible feedback.

The most realistic trigger is a user clicking "Download CUDA" in the GPU settings immediately after a fresh install — exactly the intended UX flow.

Changes

File	Change
`backend/services/cuda.py`	Added `asyncio.Lock` and wrapped `download_cuda_binary()` body in `async with _download_lock`

The lock is module-level and acquired before any status checks or downloads begin, eliminating the TOCTOU window entirely.

Verification

py_compile passes on the modified file
The lock is cooperative (asyncio.Lock, not threading.Lock) matching the existing async architecture
Existing callers are unaffected — the public API download_cuda_binary() signature is unchanged

Summary by CodeRabbit

Bug Fixes
- Improved CUDA binary download handling: concurrent requests are now serialized so automatic updates and manual requests no longer run duplicate downloads or overlapping extraction/progress checks, preventing conflicts and preserving consistent progress/state for shared temp/destination files. This reduces failed or partial downloads, duplicate progress indicators, and unexpected file corruption.

The startup auto-update task and the manual download endpoint can both invoke download_cuda_binary() concurrently. Without mutual exclusion, both coroutines write to the same temp file path, corrupting the download. The progress-manager status check is a TOCTOU race because the status is not set until after several synchronous checks complete. Add a module-level asyncio.Lock acquired at the top of download_cuda_binary() so only one download can proceed at a time.

coderabbitai · 2026-04-16T03:15:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17eba3b9-54f4-45af-a96a-ef0acd115059

📥 Commits

Reviewing files that changed from the base of the PR and between 30c3677 and 9f04b32.

📒 Files selected for processing (1)

backend/services/cuda.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/services/cuda.py

📝 Walkthrough

Walkthrough

A module-level asyncio.Lock named _download_lock is added and download_cuda_binary() now short-circuits if the lock is held; otherwise it acquires the lock and delegates to a new internal coroutine _download_cuda_binary_locked() which contains the original download logic, serializing concurrent downloads.

Changes

Cohort / File(s)	Summary
CUDA Download Synchronization `backend/services/cuda.py`	Added module-level `asyncio.Lock` `_download_lock`. `download_cuda_binary()` now checks `locked()` and returns early if held; otherwise it `async with _download_lock` and calls new internal `_download_cuda_binary_locked(version)` which contains the prior download/extract logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 A lock in the burrow to keep things right,
Two tasks once tangled, now polite.
Downloads wait in a tidy queue,
No clashing files, just calm and true.
Hooray — safe binaries through the night! 🎉

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main change: adding an asyncio.Lock to serialize CUDA downloads and prevent race conditions.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/services/cuda.py`:
- Around line 251-256: The issue is redundant background tasks can be queued
because get_progress() is checked before acquiring _download_lock; update
download admission so only the first request enqueues work: either set the
progress status to "queued" or "downloading" (via the same progress API used by
get_progress()) immediately before awaiting _download_lock in
download_cuda_binary, or move the existing admission-check logic (the
get_progress() check in backend/routes/cuda.py) so it runs while holding
_download_lock; reference the symbols download_cuda_binary,
_download_cuda_binary_locked, _download_lock, and get_progress when making the
change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f80f2f08-bd23-4689-8421-006e36eb45d7

📥 Commits

Reviewing files that changed from the base of the PR and between 75abbb0 and 30c3677.

📒 Files selected for processing (1)

backend/services/cuda.py

Address CodeRabbit review feedback: check _download_lock.locked() before awaiting the lock so concurrent callers return immediately instead of queueing behind the first download. This prevents the route handler from returning "started" to multiple callers when only one download actually proceeds.

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread backend/services/cuda.py

jamiepine merged commit be7c0ce into jamiepine:main Apr 16, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add asyncio.Lock to prevent concurrent CUDA downloads#428

fix: add asyncio.Lock to prevent concurrent CUDA downloads#428
jamiepine merged 2 commits intojamiepine:mainfrom
shaun0927:fix/cuda-download-race-condition

shaun0927 commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaun0927 commented Apr 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaun0927 commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 16, 2026 •

edited

Loading