-
Notifications
You must be signed in to change notification settings - Fork 30.6k
feat: err when unsupported attn impl is set w/ --continuous_batching
#40618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a1e18b1
to
11c5581
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Please add a test :)
457baa5
to
a8a0fca
Compare
Will add more advanced tests in CB later to test that the returned list of attention implementations is indeed compatible. |
a8a0fca
to
a2b348d
Compare
if self.use_continuous_batching: | ||
default_attn_impl = ContinuousBatchingManager.default_attention_implementation() | ||
# checking if attn_implementation is supported by continuous batching | ||
if self.args.attn_implementation is None: | ||
self.args.attn_implementation = default_attn_impl # default to sdpa_paged | ||
logger.info(f"No attn_implementation passed, defaulting to {default_attn_impl}") | ||
supported_attn_impl = ContinuousBatchingManager.supported_attention_implementations() | ||
if self.args.attn_implementation not in supported_attn_impl: | ||
raise ValueError( | ||
f"Continuous batching only supports {supported_attn_impl} as attn_implementation, got " | ||
f"{self.args.attn_implementation}" | ||
f"Try setting `--attn_implementation={default_attn_impl}`" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdy about putting this directly in CB's api? would be better in general + we can automatically add paged_{attn_implememntation}
no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤗 Okay!
supported_attn_impl = ContinuousBatchingManager.supported_attention_implementations() | ||
if self.args.attn_implementation not in supported_attn_impl: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here if attn impelmeention not supported, you can try to map it to the correct ones?
Because for sdpa eager and flash, its easy to map: prefix with paged|
But up to you !
huggingface#40618) * feat: err when unsupported attn impl is set w/ `--continuous_batching` * refactor: move defaults and support list to CB code * feat: add action item in error msg * fix(serve): add default attn implementation * feat(serve): add log when `attn_implementation` is `None` * feat: raise Exception when attn_implementation is not supported by CB
In #40479, we introduced a new
--continuous_batching
flag. Here we add a check to make sure the--attn_implementation
is compatible with CB and error when not.