Skip to content

[docs] continuous batching#44896

Open
stevhliu wants to merge 1 commit intohuggingface:mainfrom
stevhliu:cb
Open

[docs] continuous batching#44896
stevhliu wants to merge 1 commit intohuggingface:mainfrom
stevhliu:cb

Conversation

@stevhliu
Copy link
Copy Markdown
Member

updates the continuous batching docs

  • new page for the API reference
  • adds sections for new features like CUDA graphs, async batching, prefix caching, logprobs (depending on when its merged)
  • clearer example of generation with varying loads using continuous_batching_context_manager
  • new page explaining how the system works underneath. it explains the scheduling half of the system, but doesn't cover the memory side yet. i'm thinking it may make more sense to move the paged attention doc in here to fill the gap. what do you think @remi-or ?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stevhliu stevhliu requested a review from remi-or March 20, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants