Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] (suggested temporary fix) Pytorch >= 2 causes mmrazor.engine to fail #632

Open
elisa-aleman opened this issue Mar 26, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@elisa-aleman
Copy link

elisa-aleman commented Mar 26, 2024

Describe the bug

When using tools/train.py, I get the following error:

Traceback (most recent call last):
  File "/root/workspace/mmrazor/tools/train.py", line 121 in <module>
    main()
  File "/root/workspace/mmrazor/tools/train.py", line 55 in main
    register_all_modules
  File "/root/.cache/.../site-packages/mmrazor/utils/setup_env.py", line 65 in register_all_modules
    import mmrazor.engine #noqa: F401,F403
    ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/.../site-packages/mmrazor/engine/__init__.py", line 2 in <module>
    from .hooks import(DMCPSubnetHook, DumpSubnetHook, EstimateResourcesHook,
  File "/root/.cache/.../site-packages/mmrazor/engine/hooks/__init__.py", line 2 in <module>
    from .dmcp_subnet_hook import DMCPSubnetHook
  File "/root/.cache/.../site-packages/mmrazor/engine/hooks/dmcp_subnet_hook.py", line 8 in <module>
    from mmrazor.structures import export_fix_subnet
  File "/root/.cache/.../site-packages/mmrazor/structures/__init__.py", line 2 in <module>
    from .quantization import * #noqa: F401,F403
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/.../site-packages/mmrazor/structures/quantization/__init__.py", line 2 in <module>
    from .backend_config import * #noqa: F401,F403
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/.../site-packages/mmrazor/structures/quantization/backend_config/__init__.py", line 2 in <module>
  from .academic import (get_academic_backend_config,
  File "/root/.cache/.../site-packages/mmrazor/structures/quantization/backend_config/academic.py", line 11 in <module>
    from .common_operator_config_utils import (_get_conv_configs,
  File "/root/.cache/.../site-packages/mmrazor/structures/quantization/backend_config/common_operator_config_utils.py", line 54 in <module>
    nn.Conv1d, nn.ConvTranspose1d, nn.BatchNorm1d, nnqr.Conv1d,
                                                   ^^^^^^^^^^^
  File "/root/.cache/.../site-packages/mmrazor/utils/placeholder.py", line 50, in __getattr__
    raise_import_error(string)
  File "/root/.cache/.../site-packages/mmrazor/utils/placeholder.py", line 43, in raise_import_error
    raise ImportError(
ImportError: `torch>=1.13` is not installed properly, plz check

However I am using torch 2.0.0:

>>> import torch
>>> torch.__version__
2.0.0

To Reproduce

The command you executed.

python tools/train.py \

configuration redacted.

Additional context

Checking the mmrazor/structures/quantization/backend_config/common_operator_config_utils.py myself led me to find this line:

from torch.ao.quantization.fuser_method_mappings import (
        fuse_conv_bn, fuse_conv_bn_relu, fuse_convtranspose_bn, fuse_linear_bn,
        reverse2, reverse3, reverse_sequential_wrapper2)

Executing that line in my local environment resulted in a module name error, so I checked further.

Looking at the pytorch repository under 2.0.0, torch.ao.quantization.fuser_method_mappings.reverse2 is now torch.ao.quantization.fuser_method_mappings._reverse2, and the same thing happens with reverse3 -> _reverse3. Furthermore, the reverse_sequential_wrapper2 is gone altogether.

Other namespaces that disappeared were:

  • torch.ao.quantization.backend_config.BackendPatternConfig._set_overwrite_output_fake_quantize
  • torch.ao.quantization.backend_config.BackendPatternConfig._set_overwrite_output_observer
  • torch.ao.quantization.backend_config.BackendPatternConfig._set_input_output_observed

Monkey patching all these with the removed methods and the changed namespaces and then calling import mmrazor.engine produces no errors anymore, but the solution needs to be >= torch2.0.0 compatible moving forward.

This bug might be related to #615

@elisa-aleman elisa-aleman added the bug Something isn't working label Mar 26, 2024
@elisa-aleman
Copy link
Author

Also related to #553

@chenjie04
Copy link

If you look at the source code, this error does not require torch to be higher than 1.13, but less than or equal to 1.13, and degrading torch will fix the problem.

@elisa-aleman
Copy link
Author

the source code

Then the requirements files need to be updated and follow PEP.

degrading torch will fix the problem.

Regardless, torch 1.13 is extremely outdated, please update the source code.

@elisa-aleman
Copy link
Author

Note, the suggested fix above will not work with fusions because of the changes in the BackendPatternConfig from torch 1 to torch 2. Any model with potential fusions will have mishaps in torch 2 unless updating these BackendPatternConfig to match the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants