-
Notifications
You must be signed in to change notification settings - Fork 367
Description
Describe the bug
I ran lighteval and getting error:
RuntimeError: No executable batch size found, reached zero.
To Reproduce
Here is the code to reproduce. Please use Google Colab with T4
!lighteval accelerate \
"model_name=EpistemeAI/VibeCoder-20B-alpha-0.001,max_length=16384,skip_special_tokens=False,generation_parameters={temperature:1,top_p:1,top_k:40,min_p:0,max_new_tokens:16384}" \
"extended|lcb:codegeneration|0|0" \
--remove-reasoning-tags --reasoning-tags="[('<|channel|>analysis<|message|>','<|end|><|start|>assistant<|channel|>final<|message|>')]"
Expected behavior
[2025-09-27 18:36:17,303] [ INFO]: NumExpr defaulting to 2 threads. (utils.py:164)
[2025-09-27 18:36:17,690] [ INFO]: TensorFlow version 2.19.0 available. (config.py:112)
[2025-09-27 18:36:17,692] [ INFO]: JAX version 0.5.3 available. (config.py:125)
2025-09-27 18:36:20.005514: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1758998180.265845 9446 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1758998180.337247 9446 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1758998180.877699 9446 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1758998180.877741 9446 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1758998180.877747 9446 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1758998180.877751 9446 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-09-27 18:36:20.927331: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO 09-27 18:36:37 [init.py:241] Automatically detected platform cuda.
[2025-09-27 18:36:38,841] [ INFO]: --- INIT SEEDS --- (pipeline.py:249)
[2025-09-27 18:36:38,842] [ INFO]: --- LOADING TASKS --- (pipeline.py:210)
[2025-09-27 18:36:38,842] [ INFO]: Found 1 custom tasks in /content/lighteval/src/lighteval/tasks/extended/ifeval/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [ INFO]: Found 2 custom tasks in /content/lighteval/src/lighteval/tasks/extended/ifbench/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [ INFO]: Found 6 custom tasks in /content/lighteval/src/lighteval/tasks/extended/tiny_benchmarks/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [ INFO]: Found 1 custom tasks in /content/lighteval/src/lighteval/tasks/extended/mt_bench/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [ INFO]: Found 4 custom tasks in /content/lighteval/src/lighteval/tasks/extended/mix_eval/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [ INFO]: Found 5 custom tasks in /content/lighteval/src/lighteval/tasks/extended/olympiade_bench/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [ INFO]: Found 1 custom tasks in /content/lighteval/src/lighteval/tasks/extended/hle/main.py (registry.py:260)
[2025-09-27 18:36:38,843] [ INFO]: Found 23 custom tasks in /content/lighteval/src/lighteval/tasks/extended/lcb/main.py (registry.py:260)
[2025-09-27 18:36:38,844] [ WARNING]: Deprecation warning: You provided 4 arguments in your task name, but we no longer support the truncate_fewshot
option. We will ignore the parameter for now, but it will fail in a couple of versions, so you should change your task name to suite|task|num_fewshot
. (registry.py:287)
[2025-09-27 18:36:38,845] [ WARNING]: Careful, the task lcb:codegeneration is using evaluation data to build the few shot examples. (lighteval_task.py:269)
[2025-09-27 18:37:27,990] [ INFO]: Test gather tensor (parallelism.py:127)
[2025-09-27 18:37:28,161] [ INFO]: gathered_tensor tensor([0], device='cuda:0'), should be [0] (parallelism.py:130)
[2025-09-27 18:37:28,161] [ INFO]: --- LOADING MODEL --- (pipeline.py:177)
[2025-09-27 18:37:29,688] [ INFO]: Tokenizer truncation and padding size set to the left side. (transformers_model.py:450)
[2025-09-27 18:37:29,689] [ INFO]: We are not in a distributed setting. Setting model_parallel to False. (transformers_model.py:341)
[2025-09-27 18:37:29,689] [ INFO]: Model parallel was set to False, max memory set to None and device map to None (transformers_model.py:370)
You have loaded an FP4 model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model. To remove this warning, pass device_map = 'cuda'.
Fetching 41 files: 0% 0/41 [00:00<?, ?it/s]
_ops.py: 100% 201/201 [00:00<00:00, 2.15MB/s]
init.py: 100% 363/363 [00:00<00:00, 3.07MB/s]
Fetching 41 files: 2% 1/41 [00:00<00:04, 8.51it/s]
compaction.py: 2.54kB [00:00, 3.17MB/s]
_finalize_matmul.py: 16.0kB [00:00, 6.41MB/s]
_common.py: 6.85kB [00:00, 19.6MB/s]
_masked_compaction.py: 100% 814/814 [00:00<00:00, 7.45MB/s]
init.cpython-312.pyc: 100% 220/220 [00:00<00:00, 2.17MB/s]
_p_matmul_ogs.py: 23.8kB [00:00, 23.6MB/s]
Fetching 41 files: 5% 2/41 [00:00<00:05, 7.79it/s]
_matmul_ogs.py: 22.2kB [00:00, 44.0MB/s]
opt_flags.py: 13.2kB [00:00, 9.58MB/s]
matmul_ogs.py: 30.0kB [00:00, 9.80MB/s]
opt_flags_amd.py: 1.07kB [00:00, 7.55MB/s]
opt_flags_nvidia.py: 4.26kB [00:00, 21.3MB/s]
opt_flags_intel.py: 1.25kB [00:00, 5.71MB/s]
numerics.py: 1.02kB [00:00, 5.78MB/s]
Fetching 41 files: 37% 15/41 [00:00<00:00, 52.19it/s]
flexpoint.py: 6.99kB [00:00, 34.2MB/s]
mxfp.py: 14.5kB [00:00, 64.0MB/s]
_upcast_from_mxfp.py: 6.55kB [00:00, 32.7MB/s]
proton_opts.py: 100% 456/456 [00:00<00:00, 5.74MB/s]
reduce_bitmatrix.py: 4.07kB [00:00, 30.4MB/s]
routing.py: 15.5kB [00:00, 13.6MB/s]
_expt_data.py: 2.06kB [00:00, 13.6MB/s]
specialize.py: 0.00B [00:00, ?B/s]
specialize.py: 4.54kB [00:00, 6.34MB/s]
_downcast_to_mxfp.py: 8.19kB [00:00, 21.9MB/s]
Fetching 41 files: 51% 21/41 [00:00<00:00, 47.45it/s]
swiglu.py: 3.20kB [00:00, 21.3MB/s]
_swiglu.py: 4.46kB [00:00, 28.0MB/s]
_routing_compute.py: 6.51kB [00:00, 4.05MB/s]
target_info.py: 2.56kB [00:00, 19.3MB/s]
tensor.py: 7.22kB [00:00, 43.1MB/s]
blackwell_scale.py: 2.37kB [00:00, 19.7MB/s]
base.py: 100% 352/352 [00:00<00:00, 4.42MB/s]
hopper_scale.py: 3.06kB [00:00, 24.2MB/s]
strided.py: 100% 337/337 [00:00<00:00, 4.13MB/s]
hopper_value.py: 11.7kB [00:00, 61.7MB/s]
layout.py: 1.04kB [00:00, 7.67MB/s]
Fetching 41 files: 76% 31/41 [00:00<00:00, 47.18it/s]
testing.py: 7.72kB [00:00, 38.1MB/s]
_topk_backward.py: 1.21kB [00:00, 8.67MB/s]
topk.py: 3.72kB [00:00, 25.8MB/s]
_topk_forward.py: 5.71kB [00:00, 31.0MB/s]
Fetching 41 files: 100% 41/41 [00:00<00:00, 49.79it/s]
Loading checkpoint shards: 0% 0/3 [00:00<?, ?it/s]
Fetching 41 files: 100% 41/41 [00:00<00:00, 4705.22it/s]
Loading checkpoint shards: 100% 3/3 [00:42<00:00, 14.02s/it]
[2025-09-27 18:38:18,354] [ INFO]: Using Data Parallelism, putting model on device cuda (transformers_model.py:219)
[2025-09-27 18:38:32,983] [ INFO]: [CACHING] Initializing data cache (cache_management.py:105)
[2025-09-27 18:38:32,984] [ INFO]: --- RUNNING MODEL --- (pipeline.py:330)
[2025-09-27 18:38:32,984] [ INFO]: Running SamplingMethod.GENERATIVE requests (pipeline.py:313)
[2025-09-27 18:38:47,070] [ INFO]: Cache: Starting to process 268/268 samples (not found in cache) for tasks extended|lcb:codegeneration|0 (52eb0b8d3282c52f, GENERATIVE) (cache_management.py:399)
[2025-09-27 18:38:47,071] [ WARNING]: You cannot select the number of dataset splits for a generative evaluation at the moment. Automatically inferring. (data.py:206)
Splits: 0% 0/1 [00:00<?, ?it/s][2025-09-27 18:38:47,136] [ INFO]: Detecting largest batch size with max_input_length=16384 (transformers_model.py:503)
Splits: 0% 0/1 [00:05<?, ?it/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/lighteval/src/lighteval/main_accelerate.py:144 in accelerate │
│ │
│ 141 │ │ model_config=model_config, │
│ 142 │ ) │
│ 143 │ │
│ ❱ 144 │ pipeline.evaluate() │
│ 145 │ │
│ 146 │ pipeline.show_results() │
│ 147 │
│ │
│ /content/lighteval/src/lighteval/pipeline.py:282 in evaluate │
│ │
│ 279 │ │ │ │ ) │
│ 280 │ │ │ │ outputs = self._run_model() │
│ 281 │ │ else: │
│ ❱ 282 │ │ │ outputs = self._run_model() │
│ 283 │ │ │
│ 284 │ │ if self.is_main_process(): │
│ 285 │ │ │ self._post_process_outputs(outputs) │
│ │
│ /content/lighteval/src/lighteval/pipeline.py:335 in _run_model │
│ │
│ 332 │ │ if self.model.is_async: │
│ 333 │ │ │ outputs = asyncio.run(self._run_model_async()) │
│ 334 │ │ else: │
│ ❱ 335 │ │ │ outputs = self._run_model_sync() │
│ 336 │ │ │
│ 337 │ │ # Cleaning up the model before running metrics │
│ 338 │ │ self.model.cleanup() │
│ │
│ /content/lighteval/src/lighteval/pipeline.py:316 in _run_model_sync │
│ │
│ 313 │ │ │ logger.info(f"Running {sampling_method} requests") │
│ 314 │ │ │ match sampling_method: │
│ 315 │ │ │ │ case SamplingMethod.GENERATIVE: │
│ ❱ 316 │ │ │ │ │ model_outputs = self.model.greedy_until(docs) │
│ 317 │ │ │ │ │ outputs[sampling_method] = model_outputs │
│ 318 │ │ │ │ case SamplingMethod.LOGPROBS: │
│ 319 │ │ │ │ │ model_outputs = self.model.loglikelihood(docs) │
│ │
│ /content/lighteval/src/lighteval/utils/cache_management.py:402 in wrapper │
│ │
│ 399 │ │ │ │ logger.info( │
│ 400 │ │ │ │ │ f"Cache: Starting to process {len(docs_not_cached) │
│ 401 │ │ │ │ ) │
│ ❱ 402 │ │ │ │ new_results = func(self, docs_not_cached, *args, **kwa │
│ 403 │ │ │ │ │
│ 404 │ │ │ │ # Store new results in file cache │
│ 405 │ │ │ │ cache.cache_samples( │
│ │
│ /content/lighteval/src/lighteval/models/transformers/transformers_model.py:7 │
│ 49 in greedy_until │
│ │
│ 746 │ │ if self.continuous_batching: │
│ 747 │ │ │ return self._continuous_greedy_until(docs) │
│ 748 │ │ else: │
│ ❱ 749 │ │ │ return self._padded_greedy_until(docs) │
│ 750 │ │
│ 751 │ def _generate_continuous( │
│ 752 │ │ self, │
│ │
│ /content/lighteval/src/lighteval/models/transformers/transformers_model.py:6 │
│ 59 in _padded_greedy_until │
│ │
│ 656 │ │ │ │ max_context_continuation_size_allowed = min( │
│ 657 │ │ │ │ │ longest_context_continuation_size_in_split, self. │
│ 658 │ │ │ │ ) │
│ ❱ 659 │ │ │ batch_size = self._get_batch_size( │
│ 660 │ │ │ │ override_bs=self.config.batch_size, │
│ 661 │ │ │ │ max_input_length=max_context_continuation_size_allowe │
│ 662 │ │ │ │ starting_batch_size=starting_batch_size, │
│ │
│ /content/lighteval/src/lighteval/models/transformers/transformers_model.py:5 │
│ 15 in _get_batch_size │
│ │
│ 512 │ │ │ F.log_softmax(self._model_call(test_batch).float(), dim=- │
│ 513 │ │ │ return batch_size │
│ 514 │ │ │
│ ❱ 515 │ │ batch_size = forward_batch() │
│ 516 │ │ logger.info(f"Determined largest batch size: {batch_size}") │
│ 517 │ │ return batch_size │
│ 518 │
│ │
│ /content/lighteval/src/lighteval/utils/parallelism.py:104 in decorator │
│ │
│ 101 │ │ │ ) │
│ 102 │ │ while True: │
│ 103 │ │ │ if batch_size == 0: │
│ ❱ 104 │ │ │ │ raise RuntimeError("No executable batch size found, re │
│ 105 │ │ │ try: │
│ 106 │ │ │ │ return function(batch_size, *args, **kwargs) │
│ 107 │ │ │ except Exception as e: │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: No executable batch size found, reached zero.
Version info
Thefollowing code was execute, which loads later version of lighteval
!git clone https://github.com/huggingface/lighteval
!cd lighteval && pip install -e .[dev] # make sure you have the correct transformers version installed!
!cd lighteval && pip install -e .[extended_tasks]
!pip install -q transformers triton==3.4 kernels