[Bug] ZeroRedundancyOptimizer
cannot support param-wise settings with torch.__version__ < 1.12.0
#778
Closed
2 tasks done
Labels
bug
Something isn't working
Prerequisite
Environment
OrderedDict([('sys.platform', 'linux'), ('Python', '3.7.2 (default, May 11 2021, 10:20:27) [GCC 7.5.0]'), ('CUDA available', True), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA A100-SXM-80GB MIG 2g.20gb'), ('CUDA_HOME', '/usr/local/cuda'), ('NVCC', 'Cuda compilation tools, release 11.1, V11.1.105'), ('GCC', 'gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0'), ('PyTorch', '1.9.1+cu111'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.1\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n - CuDNN 8.0.5\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n'), ('TorchVision', '0.10.1+cu111'), ('OpenCV', '4.5.1'), ('MMEngine', '0.3.1')])
Reproduces the problem - code sample
configs/swin/mask-rcnn_swin-t-p4-w7_fpn_1x_coco.py
Reproduces the problem - command or script
Reproduces the problem - error message
Additional information
Related to #716
Some configurations such as
mask-rcnn_swin-t-p4-w7_fpn_1x_coco.py
requiresparamwise_cfg
to give different parameters to some modules.However,
torch.distributed.optim.ZeroRedundancyOptimizer
only recently started supporting the feature for multiple param groups. (Link) (It supports intorch.__version__ >= 1.12.0
).The key implementation is
List[torch.Tensor]
inZeroRedundancyOptimizer._all_params
optimizer_class
) asOptimizer.__init__(self, params, defaults)
We can copy-and-paste its
__init__
logic to support param-wise settings whentorch.__version__ < 1.12.0
.However, I think this kind of solution is quite messy and vulnerable to change in
torch.distributed.optim.ZeroRedundancyOptimizer
.I kindly ask for any good ideas to handle this!
The text was updated successfully, but these errors were encountered: