-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] per-token dynamic observer #24
base: main
Are you sure you want to change the base?
Conversation
@Observer.register("per_token", alias="per_token_dynamic") | ||
class PerTokenObserver(Observer): | ||
""" | ||
Values targted for a dyanmic observer do not require calibration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling
__all__ = ["PerTokenObserver"] | ||
|
||
|
||
@Observer.register("per_token", alias="per_token_dynamic") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are per token observers always dynamic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'm unclear on is how this is going to appear in the quantization config. Will it be a new strategy like tensor/channel/group?
@horheynm let's redo this PR as a quant arg |
:return: tuple of scale and zero point derived from the observed tensor | ||
""" | ||
# reduce every dimension except token dimension | ||
reduce_dims = [idx for idx in range(observed.dim()) if idx != self.axis] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not reduce along batch as well
This PR adds support for per-token dynamic observers. These observer find a scale and zero point for each group of values against a given token dimension.
TODO:
Unit testing