[BUG] RuntimeError: No executable batch size found, reached zero.

## Describe the bug
I ran lighteval and getting error: 
 RuntimeError: No executable batch size found, reached zero.

## To Reproduce
Here is the code to reproduce. Please use Google Colab with T4

```
!lighteval accelerate \
    "model_name=EpistemeAI/VibeCoder-20B-alpha-0.001,max_length=16384,skip_special_tokens=False,generation_parameters={temperature:1,top_p:1,top_k:40,min_p:0,max_new_tokens:16384}" \
    "extended|lcb:codegeneration|0|0" \
    --remove-reasoning-tags --reasoning-tags="[('<|channel|>analysis<|message|>','<|end|><|start|>assistant<|channel|>final<|message|>')]"
```

## Expected behavior
[2025-09-27 18:36:17,303] [    INFO]: NumExpr defaulting to 2 threads. (utils.py:164)
[2025-09-27 18:36:17,690] [    INFO]: TensorFlow version 2.19.0 available. (config.py:112)
[2025-09-27 18:36:17,692] [    INFO]: JAX version 0.5.3 available. (config.py:125)
2025-09-27 18:36:20.005514: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1758998180.265845    9446 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1758998180.337247    9446 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1758998180.877699    9446 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1758998180.877741    9446 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1758998180.877747    9446 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1758998180.877751    9446 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-09-27 18:36:20.927331: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO 09-27 18:36:37 [__init__.py:241] Automatically detected platform cuda.
[2025-09-27 18:36:38,841] [    INFO]: --- INIT SEEDS --- (pipeline.py:249)
[2025-09-27 18:36:38,842] [    INFO]: --- LOADING TASKS --- (pipeline.py:210)
[2025-09-27 18:36:38,842] [    INFO]: Found 1 custom tasks in /content/lighteval/src/lighteval/tasks/extended/ifeval/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [    INFO]: Found 2 custom tasks in /content/lighteval/src/lighteval/tasks/extended/ifbench/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [    INFO]: Found 6 custom tasks in /content/lighteval/src/lighteval/tasks/extended/tiny_benchmarks/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [    INFO]: Found 1 custom tasks in /content/lighteval/src/lighteval/tasks/extended/mt_bench/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [    INFO]: Found 4 custom tasks in /content/lighteval/src/lighteval/tasks/extended/mix_eval/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [    INFO]: Found 5 custom tasks in /content/lighteval/src/lighteval/tasks/extended/olympiade_bench/main.py (registry.py:260)
[2025-09-27 18:36:38,842] [    INFO]: Found 1 custom tasks in /content/lighteval/src/lighteval/tasks/extended/hle/main.py (registry.py:260)
[2025-09-27 18:36:38,843] [    INFO]: Found 23 custom tasks in /content/lighteval/src/lighteval/tasks/extended/lcb/main.py (registry.py:260)
[2025-09-27 18:36:38,844] [ WARNING]: Deprecation warning: You provided 4 arguments in your task name, but we no longer support the `truncate_fewshot` option. We will ignore the parameter for now, but it will fail in a couple of versions, so you should change your task name to `suite|task|num_fewshot`. (registry.py:287)
[2025-09-27 18:36:38,845] [ WARNING]: Careful, the task lcb:codegeneration is using evaluation data to build the few shot examples. (lighteval_task.py:269)
[2025-09-27 18:37:27,990] [    INFO]: Test gather tensor (parallelism.py:127)
[2025-09-27 18:37:28,161] [    INFO]: gathered_tensor tensor([0], device='cuda:0'), should be [0] (parallelism.py:130)
[2025-09-27 18:37:28,161] [    INFO]: --- LOADING MODEL --- (pipeline.py:177)
[2025-09-27 18:37:29,688] [    INFO]: Tokenizer truncation and padding size set to the left side. (transformers_model.py:450)
[2025-09-27 18:37:29,689] [    INFO]: We are not in a distributed setting. Setting model_parallel to False. (transformers_model.py:341)
[2025-09-27 18:37:29,689] [    INFO]: Model parallel was set to False, max memory set to None and device map to None (transformers_model.py:370)
You have loaded an FP4 model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model. To remove this warning, pass device_map = 'cuda'. 
Fetching 41 files:   0% 0/41 [00:00<?, ?it/s]
_ops.py: 100% 201/201 [00:00<00:00, 2.15MB/s]

__init__.py: 100% 363/363 [00:00<00:00, 3.07MB/s]
Fetching 41 files:   2% 1/41 [00:00<00:04,  8.51it/s]
compaction.py: 2.54kB [00:00, 3.17MB/s]

_finalize_matmul.py: 16.0kB [00:00, 6.41MB/s]

_common.py: 6.85kB [00:00, 19.6MB/s]

_masked_compaction.py: 100% 814/814 [00:00<00:00, 7.45MB/s]

__init__.cpython-312.pyc: 100% 220/220 [00:00<00:00, 2.17MB/s]

_p_matmul_ogs.py: 23.8kB [00:00, 23.6MB/s]
Fetching 41 files:   5% 2/41 [00:00<00:05,  7.79it/s]
_matmul_ogs.py: 22.2kB [00:00, 44.0MB/s]

opt_flags.py: 13.2kB [00:00, 9.58MB/s]

matmul_ogs.py: 30.0kB [00:00, 9.80MB/s]

opt_flags_amd.py: 1.07kB [00:00, 7.55MB/s]

opt_flags_nvidia.py: 4.26kB [00:00, 21.3MB/s]

opt_flags_intel.py: 1.25kB [00:00, 5.71MB/s]

numerics.py: 1.02kB [00:00, 5.78MB/s]
Fetching 41 files:  37% 15/41 [00:00<00:00, 52.19it/s]
flexpoint.py: 6.99kB [00:00, 34.2MB/s]

mxfp.py: 14.5kB [00:00, 64.0MB/s]

_upcast_from_mxfp.py: 6.55kB [00:00, 32.7MB/s]

proton_opts.py: 100% 456/456 [00:00<00:00, 5.74MB/s]

reduce_bitmatrix.py: 4.07kB [00:00, 30.4MB/s]

routing.py: 15.5kB [00:00, 13.6MB/s]

_expt_data.py: 2.06kB [00:00, 13.6MB/s]

specialize.py: 0.00B [00:00, ?B/s]

specialize.py: 4.54kB [00:00, 6.34MB/s]
_downcast_to_mxfp.py: 8.19kB [00:00, 21.9MB/s]
Fetching 41 files:  51% 21/41 [00:00<00:00, 47.45it/s]
swiglu.py: 3.20kB [00:00, 21.3MB/s]

_swiglu.py: 4.46kB [00:00, 28.0MB/s]

_routing_compute.py: 6.51kB [00:00, 4.05MB/s]

target_info.py: 2.56kB [00:00, 19.3MB/s]

tensor.py: 7.22kB [00:00, 43.1MB/s]

blackwell_scale.py: 2.37kB [00:00, 19.7MB/s]

base.py: 100% 352/352 [00:00<00:00, 4.42MB/s]

hopper_scale.py: 3.06kB [00:00, 24.2MB/s]

strided.py: 100% 337/337 [00:00<00:00, 4.13MB/s]

hopper_value.py: 11.7kB [00:00, 61.7MB/s]

layout.py: 1.04kB [00:00, 7.67MB/s]
Fetching 41 files:  76% 31/41 [00:00<00:00, 47.18it/s]
testing.py: 7.72kB [00:00, 38.1MB/s]

_topk_backward.py: 1.21kB [00:00, 8.67MB/s]

topk.py: 3.72kB [00:00, 25.8MB/s]

_topk_forward.py: 5.71kB [00:00, 31.0MB/s]
Fetching 41 files: 100% 41/41 [00:00<00:00, 49.79it/s]
Loading checkpoint shards:   0% 0/3 [00:00<?, ?it/s]
Fetching 41 files: 100% 41/41 [00:00<00:00, 4705.22it/s]
Loading checkpoint shards: 100% 3/3 [00:42<00:00, 14.02s/it]
[2025-09-27 18:38:18,354] [    INFO]: Using Data Parallelism, putting model on device cuda (transformers_model.py:219)
[2025-09-27 18:38:32,983] [    INFO]: [CACHING] Initializing data cache (cache_management.py:105)
[2025-09-27 18:38:32,984] [    INFO]: --- RUNNING MODEL --- (pipeline.py:330)
[2025-09-27 18:38:32,984] [    INFO]: Running SamplingMethod.GENERATIVE requests (pipeline.py:313)
[2025-09-27 18:38:47,070] [    INFO]: Cache: Starting to process 268/268 samples (not found in cache) for tasks extended|lcb:codegeneration|0 (52eb0b8d3282c52f, GENERATIVE) (cache_management.py:399)
[2025-09-27 18:38:47,071] [ WARNING]: You cannot select the number of dataset splits for a generative evaluation at the moment. Automatically inferring. (data.py:206)
Splits:   0% 0/1 [00:00<?, ?it/s][2025-09-27 18:38:47,136] [    INFO]: Detecting largest batch size with max_input_length=16384 (transformers_model.py:503)
Splits:   0% 0/1 [00:05<?, ?it/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/lighteval/src/lighteval/main_accelerate.py:144 in accelerate        │
│                                                                              │
│   141 │   │   model_config=model_config,                                     │
│   142 │   )                                                                  │
│   143 │                                                                      │
│ ❱ 144 │   pipeline.evaluate()                                                │
│   145 │                                                                      │
│   146 │   pipeline.show_results()                                            │
│   147                                                                        │
│                                                                              │
│ /content/lighteval/src/lighteval/pipeline.py:282 in evaluate                 │
│                                                                              │
│   279 │   │   │   │   )                                                      │
│   280 │   │   │   │   outputs = self._run_model()                            │
│   281 │   │   else:                                                          │
│ ❱ 282 │   │   │   outputs = self._run_model()                                │
│   283 │   │                                                                  │
│   284 │   │   if self.is_main_process():                                     │
│   285 │   │   │   self._post_process_outputs(outputs)                        │
│                                                                              │
│ /content/lighteval/src/lighteval/pipeline.py:335 in _run_model               │
│                                                                              │
│   332 │   │   if self.model.is_async:                                        │
│   333 │   │   │   outputs = asyncio.run(self._run_model_async())             │
│   334 │   │   else:                                                          │
│ ❱ 335 │   │   │   outputs = self._run_model_sync()                           │
│   336 │   │                                                                  │
│   337 │   │   # Cleaning up the model before running metrics                 │
│   338 │   │   self.model.cleanup()                                           │
│                                                                              │
│ /content/lighteval/src/lighteval/pipeline.py:316 in _run_model_sync          │
│                                                                              │
│   313 │   │   │   logger.info(f"Running {sampling_method} requests")         │
│   314 │   │   │   match sampling_method:                                     │
│   315 │   │   │   │   case SamplingMethod.GENERATIVE:                        │
│ ❱ 316 │   │   │   │   │   model_outputs = self.model.greedy_until(docs)      │
│   317 │   │   │   │   │   outputs[sampling_method] = model_outputs           │
│   318 │   │   │   │   case SamplingMethod.LOGPROBS:                          │
│   319 │   │   │   │   │   model_outputs = self.model.loglikelihood(docs)     │
│                                                                              │
│ /content/lighteval/src/lighteval/utils/cache_management.py:402 in wrapper    │
│                                                                              │
│   399 │   │   │   │   logger.info(                                           │
│   400 │   │   │   │   │   f"Cache: Starting to process {len(docs_not_cached) │
│   401 │   │   │   │   )                                                      │
│ ❱ 402 │   │   │   │   new_results = func(self, docs_not_cached, *args, **kwa │
│   403 │   │   │   │                                                          │
│   404 │   │   │   │   # Store new results in file cache                      │
│   405 │   │   │   │   cache.cache_samples(                                   │
│                                                                              │
│ /content/lighteval/src/lighteval/models/transformers/transformers_model.py:7 │
│ 49 in greedy_until                                                           │
│                                                                              │
│    746 │   │   if self.continuous_batching:                                  │
│    747 │   │   │   return self._continuous_greedy_until(docs)                │
│    748 │   │   else:                                                         │
│ ❱  749 │   │   │   return self._padded_greedy_until(docs)                    │
│    750 │                                                                     │
│    751 │   def _generate_continuous(                                         │
│    752 │   │   self,                                                         │
│                                                                              │
│ /content/lighteval/src/lighteval/models/transformers/transformers_model.py:6 │
│ 59 in _padded_greedy_until                                                   │
│                                                                              │
│    656 │   │   │   │   max_context_continuation_size_allowed = min(          │
│    657 │   │   │   │   │   longest_context_continuation_size_in_split, self. │
│    658 │   │   │   │   )                                                     │
│ ❱  659 │   │   │   batch_size = self._get_batch_size(                        │
│    660 │   │   │   │   override_bs=self.config.batch_size,                   │
│    661 │   │   │   │   max_input_length=max_context_continuation_size_allowe │
│    662 │   │   │   │   starting_batch_size=starting_batch_size,              │
│                                                                              │
│ /content/lighteval/src/lighteval/models/transformers/transformers_model.py:5 │
│ 15 in _get_batch_size                                                        │
│                                                                              │
│    512 │   │   │   F.log_softmax(self._model_call(test_batch).float(), dim=- │
│    513 │   │   │   return batch_size                                         │
│    514 │   │                                                                 │
│ ❱  515 │   │   batch_size = forward_batch()                                  │
│    516 │   │   logger.info(f"Determined largest batch size: {batch_size}")   │
│    517 │   │   return batch_size                                             │
│    518                                                                       │
│                                                                              │
│ /content/lighteval/src/lighteval/utils/parallelism.py:104 in decorator       │
│                                                                              │
│   101 │   │   │   )                                                          │
│   102 │   │   while True:                                                    │
│   103 │   │   │   if batch_size == 0:                                        │
│ ❱ 104 │   │   │   │   raise RuntimeError("No executable batch size found, re │
│   105 │   │   │   try:                                                       │
│   106 │   │   │   │   return function(batch_size, *args, **kwargs)           │
│   107 │   │   │   except Exception as e:                                     │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: No executable batch size found, reached zero.

## Version info
Thefollowing code was execute, which loads later version of lighteval
!git clone https://github.com/huggingface/lighteval
!cd lighteval && pip install -e .[dev] # make sure you have the correct transformers version installed!
!cd lighteval && pip install -e .[extended_tasks]
!pip install -q transformers triton==3.4 kernels


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] RuntimeError: No executable batch size found, reached zero. #996

Describe the bug

To Reproduce

Expected behavior

Version info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] RuntimeError: No executable batch size found, reached zero. #996

Description

Describe the bug

To Reproduce

Expected behavior

Version info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions