Fixes for continuous batching #40828

remi-or · 2025-09-11T16:46:55Z

Some architectures like llama alter the attention mask if it is not a tensor, which was not compatible with the way CB created and handled the attention mask. Now, arguments like attention_mask, cumulative_seqlens_k and max_seqlen_k are tensors or ints unless the model is hybrid, in which case they are dictonnaries. This is the main fix, but PR also:

cleans up how those arguments are built, reset, and delivered to the model to make things tidier
adds support for attention sink in eager_paged
explicitly disables (for now) cuda graphs and sampling in CB
adds test for CB that pass on MI325 and H100 (but it's a little sketchy for some models there)
fixes some telemetry metrics that were broken when hybrid allocation were added

HuggingFaceDocBuilderDev · 2025-09-11T16:57:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

LGTM, supprised that sampling is not supported, but cudagraphs yes. Wondering if you just set slice_inputs = not cuda_graph ?

src/transformers/generation/continuous_batching/continuous_api.py

tests/generation/test_continuous_batching.py

remi-or · 2025-09-12T07:38:58Z

supprised that sampling is not supported

I think there were issues last time, but I can check

Wondering if you just set slice_inputs = not cuda_graph ?

Tried that it does not work for a few reasons, will look into restoring asap

McPatate · 2025-09-12T14:19:54Z

src/transformers/generation/continuous_batching/requests.py

 # We centralize the logger here to coordinate between logging and progress bar
 logger = logging.getLogger("ContinuousBatchingLogger")
-logger.setLevel(logging.INFO)
+# logger.setLevel(logging.INFO)


was this intentional btw?

Yes, seems like default should not be INFO. I can remove the comment next time, I agree it will be cleaner :)

McPatate · 2025-09-12T14:21:23Z

src/transformers/utils/metrics.py

* Fix for CB attn mask and refactor * Tests for CB (not all passing) * Passing tests and a logger fix * Fixed the KV metrics that were broken when we moved to hybrid alloc * Fix circular import and style * Added tests for FA * Unfolded test to have device expectations * Fixes for H100 * more fixes for h100 * H100 are good * Style * Adding some comments from huggingface#40831 * Rename test * Avoid 1 letter variables * Dictonnary is only removed during kwargs * Test for supported sample * Fix a unvoluntary slice * Fixes for non-sliced inputs and small example improvments * Slice inputs is more understandabe * Style

remi-or requested review from ArthurZucker and McPatate September 11, 2025 16:46

This was referenced Sep 11, 2025

Lower logging level CB #40831

Closed

generate_batch failing #40835

Open

remi-or added 12 commits September 11, 2025 21:48

Fix for CB attn mask and refactor

2c9c94e

Tests for CB (not all passing)

1d4fd51

Passing tests and a logger fix

f1beeff

Fixed the KV metrics that were broken when we moved to hybrid alloc

97a4248

Fix circular import and style

c8f8257

Added tests for FA

e9e04cc

Unfolded test to have device expectations

d7694e8

Fixes for H100

146f4fc

more fixes for h100

2acd1d5

H100 are good

f35ae0c

Style

04893a4

Adding some comments from huggingface#40831

f7a4448

remi-or force-pushed the cb-fix branch from ad4ab44 to f7a4448 Compare September 11, 2025 21:48

Rename test

0d5cb00

ArthurZucker approved these changes Sep 12, 2025

View reviewed changes

Avoid 1 letter variables

ed2d647

remi-or and others added 8 commits September 12, 2025 08:22

Dictonnary is only removed during kwargs

7e2fb21

Merge branch 'main' into cb-fix

880372a

Test for supported sample

17fb99a

Fix a unvoluntary slice

1266d4d

Fixes for non-sliced inputs and small example improvments

0792dce

Slice inputs is more understandabe

49ae061

Merge branch 'main' into cb-fix

add2bb8

Style

0678183

Cyrilvallez merged commit ef05393 into huggingface:main Sep 12, 2025
21 of 23 checks passed

McPatate reviewed Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes for continuous batching #40828

Fixes for continuous batching #40828

Uh oh!

remi-or commented Sep 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

remi-or commented Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

McPatate Sep 12, 2025

Uh oh!

remi-or Sep 12, 2025

Uh oh!

McPatate Sep 12, 2025

Uh oh!

Uh oh!

Fixes for continuous batching #40828

Fixes for continuous batching #40828

Uh oh!

Conversation

remi-or commented Sep 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

remi-or commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

McPatate Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

remi-or Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

McPatate Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

remi-or commented Sep 12, 2025 •

edited

Loading