[ci][llm] Deflake llm_batch_vllm:test_vllm_llama_lora model downloading behavior#60199
Conversation
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Model downloading locking issue was causing vLLM / engine init to fail with "Cannot find any model weights". Cause: all download types (tokenizer, full model, etc) were sharing the same file lock. When a tokenizer-only download acq the lock first and ocmpleted, subsequent full model downloads were getting TimeoutError and would wait for lock release -> then return without downloading (assuming that the weights were cached). This was observed inconsistently due to the model files being cached (and test passing) in some cases. Reproduction: 1. Clear cache with `rm -rf ~/.cache/huggingface/hub/models--s3:----air-example-data--llama-3.2-216M-dummy` 2. Run test with `pytest -vs release/llm_tests/batch/test_batch_vllm.py::test_vllm_llama_lora` 3. Observe failure to get waits + engine init failure: `RuntimeError: Cannot find any model weights` Fix: use separate lock paths for different download types: This way, tokenizer-only downloads completing can't take full model downloads to skip directly fetching the model weights. Testing (on Anyscale and Ray OSS) 1. Clear cache 2. Add fix 3. Run test - success Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
llm_batch_vllm:test_vllm_llama_lora model downloading behavior
llm_batch_vllm:test_vllm_llama_lora model downloading behaviorllm_batch_vllm:test_vllm_llama_lora model downloading behavior
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a race condition in model downloading by introducing separate file locks for different download types. The use of a lock_suffix based on the download type is a clean and direct solution to the problem described. Additionally, the clarification in the log message for missing hash files is a welcome improvement, reducing potential confusion for users. The changes are well-implemented and address the issue correctly. I have one minor suggestion to improve maintainability.
| if tokenizer_only: | ||
| lock_suffix = "-tokenizer" | ||
| elif exclude_safetensors: | ||
| lock_suffix = "-exclude-safetensors" | ||
| else: | ||
| lock_suffix = "-full" |
There was a problem hiding this comment.
The logic to use different lock suffixes based on the download type is a great fix for the race condition. To improve maintainability and avoid magic strings, you could consider defining "-tokenizer", "-exclude-safetensors", and "-full" as module-level or class-level constants. This would make the code more robust and easier to read, especially if these suffixes are ever used elsewhere.
…ding behavior (ray-project#60199) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
…ding behavior (ray-project#60199) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: jinbum-kim <jinbum9958@gmail.com>
…ding behavior (ray-project#60199) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Description
llm_batch_vllm:test_vllm_llama_loramodel downloadingProblem Model downloading locking issue was causing vLLM / engine init to fail with "Cannot find any model weights".
Cause: all download types (tokenizer, full model, etc) were sharing the
same file lock. When a tokenizer-only download acq the lock first and
ocmpleted, subsequent full model downloads were getting TimeoutError and
would wait for lock release -> then return without downloading (assuming
that the weights were cached).
This was observed inconsistently due to the model files being cached
(and test passing) in some cases.
Reproduction:
rm -rf ~/.cache/huggingface/hub/models--llama-3.2-216M-lora-dummy/pytest -vs release/llm_tests/batch/test_batch_vllm.py::test_vllm_llama_loraRuntimeError: Cannot find any model weightsFix: use separate lock paths for different download types:
This way, tokenizer-only downloads completing can't take full model
downloads to skip directly fetching the model weights.
Testing (on Anyscale and Ray OSS)
Also ran all
llm_batch_vllmrelease tests on Anyscale and Ray OSS to confirm no regressions in model downloading behaviorRelated issues
n/a - internal Ray CI
Additional information