-
Notifications
You must be signed in to change notification settings - Fork 25.7k
dbr quant: store auto_quant_state on the top level model #72934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary:
Before this PR, DBR quantization had a limitation on handling user
code which iterates over all module children. For example, imagine
a forward function such as
```
def forward(self, x):
for module in self:
x = module(x)
return x
```
Before this PR, this code would break with DBR quantization, because
we attach `AutoQuantizationState` objects to each child, and those
objects live in the child's module hierarchy and will appear in
these kinds of iterations, changing the meaning of the user program.
This PR reduces the scope of this problem to just the top level module.
Instead of attaching `AutoQuantizationState` objects to each child,
we register them in a map on the parent. Here is a before and after:
```
// toy model
model
|--> child1
// toy model with AutoQuantizationState objects, before this PR
model
|--> child1
| |--> _auto_quant_state
|--> _auto_quant_state
// toy model with AutoQuantizationState objects, after this PR
model
|--> child1
|--> _fqn_to_auto_quant_state_map
|--> ( ) --> _auto_quant_state // of `model`
|--> (child1) --> _auto_quant_state // of `model.child1`
```
Note: `child1._auto_quant_state` works as before for convenience,
but the `child1` object now stores a soft link to its `_auto_quant_state`
instead of properly registering it in its module hierarchy. This is
somewhat hacky. If we need to improve this in the future, we could
remove this soft link and refactor the code to call the FQN map
instead.
Note: if the top level module iterates over its children, things will
still be broken. This is less likely, and we will recommend that the
user work around this by wrapping their model, or checking for the
`AutoQuantizationStateModuleDict` type in their iteration loop.
The impact of this change should be an improvement of coverage
of user models. In fact, we expect this to drive our coverage of
torchbenchmark models from 89% to 100%.
Test plan:
```
// previously disabled test cases with user code iterating
// over module children are now enabled, with wrappers
python test/test_quantization.py -k test_module_calls_items
python test/test_quantization.py -k test_vovnet_sequential
```
[ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 97ded85 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Summary:
Before this PR, DBR quantization had a limitation on handling user
code which iterates over all module children. For example, imagine
a forward function such as
```
def forward(self, x):
for module in self:
x = module(x)
return x
```
Before this PR, this code would break with DBR quantization, because
we attach `AutoQuantizationState` objects to each child, and those
objects live in the child's module hierarchy and will appear in
these kinds of iterations, changing the meaning of the user program.
This PR reduces the scope of this problem to just the top level module.
Instead of attaching `AutoQuantizationState` objects to each child,
we register them in a map on the parent. Here is a before and after:
```
// toy model
model
|--> child1
// toy model with AutoQuantizationState objects, before this PR
model
|--> child1
| |--> _auto_quant_state
|--> _auto_quant_state
// toy model with AutoQuantizationState objects, after this PR
model
|--> child1
|--> _fqn_to_auto_quant_state_map
|--> ( ) --> _auto_quant_state // of `model`
|--> (child1) --> _auto_quant_state // of `model.child1`
```
Note: `child1._auto_quant_state` works as before for convenience,
but the `child1` object now stores a soft link to its `_auto_quant_state`
instead of properly registering it in its module hierarchy. This is
somewhat hacky. If we need to improve this in the future, we could
remove this soft link and refactor the code to call the FQN map
instead.
Note: if the top level module iterates over its children, things will
still be broken. This is less likely, and we will recommend that the
user work around this by wrapping their model, or checking for the
`AutoQuantizationStateModuleDict` type in their iteration loop.
The impact of this change should be an improvement of coverage
of user models. In fact, we expect this to drive our coverage of
torchbenchmark models from 89% to 100%.
Test plan:
```
// previously disabled test cases with user code iterating
// over module children are now enabled, with wrappers
python test/test_quantization.py -k test_module_calls_items
python test/test_quantization.py -k test_vovnet_sequential
```
[ghstack-poisoned]
Summary:
Before this PR, DBR quantization had a limitation on handling user
code which iterates over all module children. For example, imagine
a forward function such as
```
def forward(self, x):
for module in self:
x = module(x)
return x
```
Before this PR, this code would break with DBR quantization, because
we attach `AutoQuantizationState` objects to each child, and those
objects live in the child's module hierarchy and will appear in
these kinds of iterations, changing the meaning of the user program.
This PR reduces the scope of this problem to just the top level module.
Instead of attaching `AutoQuantizationState` objects to each child,
we register them in a map on the parent. Here is a before and after:
```
// toy model
model
|--> child1
// toy model with AutoQuantizationState objects, before this PR
model
|--> child1
| |--> _auto_quant_state
|--> _auto_quant_state
// toy model with AutoQuantizationState objects, after this PR
model
|--> child1
|--> _fqn_to_auto_quant_state_map
|--> ( ) --> _auto_quant_state // of `model`
|--> (child1) --> _auto_quant_state // of `model.child1`
```
Note: `child1._auto_quant_state` works as before for convenience,
but the `child1` object now stores a soft link to its `_auto_quant_state`
instead of properly registering it in its module hierarchy. This is
somewhat hacky. If we need to improve this in the future, we could
remove this soft link and refactor the code to call the FQN map
instead.
Note: if the top level module iterates over its children, things will
still be broken. This is less likely, and we will recommend that the
user work around this by wrapping their model, or checking for the
`AutoQuantizationStateModuleDict` type in their iteration loop.
The impact of this change should be an improvement of coverage
of user models. In fact, we expect this to drive our coverage of
torchbenchmark models from 89% to 100%.
Test plan:
```
// previously disabled test cases with user code iterating
// over module children are now enabled, with wrappers
python test/test_quantization.py -k test_module_calls_items
python test/test_quantization.py -k test_vovnet_sequential
```
ghstack-source-id: 097d5aa
Pull Request resolved: #72934
|
@vkuzo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also comment on how does this affect autograd?
I don't think there is any changes to autograd as long as the user runs forward on the top level model. Running forward on a submodule (without running it on the top level model) is no longer supported after this PR, which also matches FX graph mode quantization. |
| return x | ||
|
|
||
| m = SequentialAppendList(torch.nn.Conv2d(1, 1, 1)).eval() | ||
| class Wrapper(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to detect this use case (iterating through children modules) and throw a nicer exception that tells the user to use a wrapper? What does the current exception look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AutoQuantizationState.forward throws an exception, that exception will be hit if user code tries to call the forward. In a future PR, we can make the exception be more verbose and point to documentation.
|
|
||
| def get_fqn_valid_for_module_dict_key(fqn: str) -> str: | ||
| """ | ||
| Modifies `fqn` to make it a valid key to a ModuleDict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? Is this just the syntax expected by torch.nn.ModuleDict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One cannot have empty strings or dots in ModuleDict keys. I don't know exactly why.
| # On the child, manually set the attribute without | ||
| # going through the `torch.nn.Module.__setattr__` | ||
| # function, to prevent this object from appearing in | ||
| # the child's module hierarchy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this affect the serialized output? I guess AutoQuantizationState is not currently serialized, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is breaking BC with models using DBR created before this PR, because now the quantization state is always stored on the top level module. We are not calling out BC because we don't have people using this yet. Other than that, serialization will work as before.
Summary: Pull Request resolved: #72934 Before this PR, DBR quantization had a limitation on handling user code which iterates over all module children. For example, imagine a forward function such as ``` def forward(self, x): for module in self: x = module(x) return x ``` Before this PR, this code would break with DBR quantization, because we attach `AutoQuantizationState` objects to each child, and those objects live in the child's module hierarchy and will appear in these kinds of iterations, changing the meaning of the user program. This PR reduces the scope of this problem to just the top level module. Instead of attaching `AutoQuantizationState` objects to each child, we register them in a map on the parent. Here is a before and after: ``` // toy model model |--> child1 // toy model with AutoQuantizationState objects, before this PR model |--> child1 | |--> _auto_quant_state |--> _auto_quant_state // toy model with AutoQuantizationState objects, after this PR model |--> child1 |--> _fqn_to_auto_quant_state_map |--> ( ) --> _auto_quant_state // of `model` |--> (child1) --> _auto_quant_state // of `model.child1` ``` Note: `child1._auto_quant_state` works as before for convenience, but the `child1` object now stores a soft link to its `_auto_quant_state` instead of properly registering it in its module hierarchy. This is somewhat hacky. If we need to improve this in the future, we could remove this soft link and refactor the code to call the FQN map instead. Note: if the top level module iterates over its children, things will still be broken. This is less likely, and we will recommend that the user work around this by wrapping their model, or checking for the `AutoQuantizationStateModuleDict` type in their iteration loop. The impact of this change should be an improvement of coverage of user models. In fact, we expect this to drive our coverage of torchbenchmark models from 89% to 100%. Test Plan: ``` // previously disabled test cases with user code iterating // over module children are now enabled, with wrappers python test/test_quantization.py -k test_module_calls_items python test/test_quantization.py -k test_vovnet_sequential ``` Reviewed By: dzdang Differential Revision: D34281074 Pulled By: vkuzo fbshipit-source-id: 0e25fc1ec529c47f72478a1875fe43219feac6b1
Summary: Pull Request resolved: pytorch/pytorch#72934 Before this PR, DBR quantization had a limitation on handling user code which iterates over all module children. For example, imagine a forward function such as ``` def forward(self, x): for module in self: x = module(x) return x ``` Before this PR, this code would break with DBR quantization, because we attach `AutoQuantizationState` objects to each child, and those objects live in the child's module hierarchy and will appear in these kinds of iterations, changing the meaning of the user program. This PR reduces the scope of this problem to just the top level module. Instead of attaching `AutoQuantizationState` objects to each child, we register them in a map on the parent. Here is a before and after: ``` // toy model model |--> child1 // toy model with AutoQuantizationState objects, before this PR model |--> child1 | |--> _auto_quant_state |--> _auto_quant_state // toy model with AutoQuantizationState objects, after this PR model |--> child1 |--> _fqn_to_auto_quant_state_map |--> ( ) --> _auto_quant_state // of `model` |--> (child1) --> _auto_quant_state // of `model.child1` ``` Note: `child1._auto_quant_state` works as before for convenience, but the `child1` object now stores a soft link to its `_auto_quant_state` instead of properly registering it in its module hierarchy. This is somewhat hacky. If we need to improve this in the future, we could remove this soft link and refactor the code to call the FQN map instead. Note: if the top level module iterates over its children, things will still be broken. This is less likely, and we will recommend that the user work around this by wrapping their model, or checking for the `AutoQuantizationStateModuleDict` type in their iteration loop. The impact of this change should be an improvement of coverage of user models. In fact, we expect this to drive our coverage of torchbenchmark models from 89% to 100%. Test Plan: ``` // previously disabled test cases with user code iterating // over module children are now enabled, with wrappers python test/test_quantization.py -k test_module_calls_items python test/test_quantization.py -k test_vovnet_sequential ``` Reviewed By: dzdang Differential Revision: D34281074 Pulled By: vkuzo fbshipit-source-id: 0e25fc1ec529c47f72478a1875fe43219feac6b1 (cherry picked from commit 4008f899671f643f5a54c311254af3f68ae22e2e)
Summary: Pull Request resolved: pytorch/pytorch#72934 Before this PR, DBR quantization had a limitation on handling user code which iterates over all module children. For example, imagine a forward function such as ``` def forward(self, x): for module in self: x = module(x) return x ``` Before this PR, this code would break with DBR quantization, because we attach `AutoQuantizationState` objects to each child, and those objects live in the child's module hierarchy and will appear in these kinds of iterations, changing the meaning of the user program. This PR reduces the scope of this problem to just the top level module. Instead of attaching `AutoQuantizationState` objects to each child, we register them in a map on the parent. Here is a before and after: ``` // toy model model |--> child1 // toy model with AutoQuantizationState objects, before this PR model |--> child1 | |--> _auto_quant_state |--> _auto_quant_state // toy model with AutoQuantizationState objects, after this PR model |--> child1 |--> _fqn_to_auto_quant_state_map |--> ( ) --> _auto_quant_state // of `model` |--> (child1) --> _auto_quant_state // of `model.child1` ``` Note: `child1._auto_quant_state` works as before for convenience, but the `child1` object now stores a soft link to its `_auto_quant_state` instead of properly registering it in its module hierarchy. This is somewhat hacky. If we need to improve this in the future, we could remove this soft link and refactor the code to call the FQN map instead. Note: if the top level module iterates over its children, things will still be broken. This is less likely, and we will recommend that the user work around this by wrapping their model, or checking for the `AutoQuantizationStateModuleDict` type in their iteration loop. The impact of this change should be an improvement of coverage of user models. In fact, we expect this to drive our coverage of torchbenchmark models from 89% to 100%. Test Plan: ``` // previously disabled test cases with user code iterating // over module children are now enabled, with wrappers python test/test_quantization.py -k test_module_calls_items python test/test_quantization.py -k test_vovnet_sequential ``` Reviewed By: dzdang Differential Revision: D34281074 Pulled By: vkuzo fbshipit-source-id: 0e25fc1ec529c47f72478a1875fe43219feac6b1 (cherry picked from commit 4008f899671f643f5a54c311254af3f68ae22e2e)
Stack from ghstack:
Summary:
Before this PR, DBR quantization had a limitation on handling user
code which iterates over all module children. For example, imagine
a forward function such as
Before this PR, this code would break with DBR quantization, because
we attach
AutoQuantizationStateobjects to each child, and thoseobjects live in the child's module hierarchy and will appear in
these kinds of iterations, changing the meaning of the user program.
This PR reduces the scope of this problem to just the top level module.
Instead of attaching
AutoQuantizationStateobjects to each child,we register them in a map on the parent. Here is a before and after:
Note:
child1._auto_quant_stateworks as before for convenience,but the
child1object now stores a soft link to its_auto_quant_stateinstead of properly registering it in its module hierarchy. This is
somewhat hacky. If we need to improve this in the future, we could
remove this soft link and refactor the code to call the FQN map
instead.
Note: if the top level module iterates over its children, things will
still be broken. This is less likely, and we will recommend that the
user work around this by wrapping their model, or checking for the
AutoQuantizationStateModuleDicttype in their iteration loop.The impact of this change should be an improvement of coverage
of user models. In fact, we expect this to drive our coverage of
torchbenchmark models from 89% to 100%.
Test plan:
Differential Revision: D34281074