RuntimeError: Trying to backward through the graph a second time #1379

FrancescoSaverioZuppichini · 2021-09-27T13:23:34Z

Thanks for reporting the unexpected results and we appreciate it a lot.

Describe the Issue
GradientCumulativeOptimizerHook doesn't work.

Reproduction

What command, code, or script did you run?
Add GradientCumulativeOptimizerHook to your *_config.py file

custom_hooks = [
    dict(type="GradientCumulativeOptimizerHook", cumulative_iters=4),
]

Output

2021-09-27 14:21:26,889 - mmdet - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "/home/zuppif/integration-object-detection/playground.py", line 81, in <module>
    main(Args(config_file, cfg_options=options))
  File "/home/zuppif/integration-object-detection/src/train.py", line 185, in main
    train_detector(
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmdet/apis/train.py", line 174, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/hooks/optimizer.py", line 115, in after_train_iter
    loss.backward()
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Environment

Please run python -c "from mmcv.utils import collect_env; print(collect_env())"

{'sys.platform': 'linux', 'Python': '3.9.5 (default, Jun  4 2021, 12:28:51) [GCC 7.5.0]', 'CUDA available': True, 'GPU 0,1,2': 'GeForce GTX 1080 Ti', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Build cuda_11.2.r11.2/compiler.29373293_0', 'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0', 'PyTorch': '1.9.1+cu102', 'PyTorch compiling details': 'PyTorch built with:\n  - GCC 7.3\n  - C++ Version: 201402\n  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 10.2\n  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70\n  - CuDNN 7.6.5\n  - Magma 2.5.2\n  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.10.1+cu102', 'OpenCV': '4.5.3', 'MMCV': '1.3.13', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': '11.2'}

Error traceback
If applicable, paste the error traceback here.

2021-09-27 14:21:26,889 - mmdet - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "/home/zuppif/integration-object-detection/playground.py", line 81, in <module>
    main(Args(config_file, cfg_options=options))
  File "/home/zuppif/integration-object-detection/src/train.py", line 185, in main
    train_detector(
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmdet/apis/train.py", line 174, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/hooks/optimizer.py", line 115, in after_train_iter
    loss.backward()
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

The text was updated successfully, but these errors were encountered:

zhouzaida · 2021-09-27T15:15:13Z

Hi, @FrancescoSaverioZuppichini , GradientCumulativeOptimizerHook and OptimizerHook are both set so RuntimeError was raised. OptimizerHook is set here

You can modify the config like this

optimizer_config = dict(type="GradientCumulativeOptimizerHook", cumulative_iters=4)

FrancescoSaverioZuppichini · 2021-09-28T07:52:16Z

Super! Thank you. @zhouzaida would it make sense to add an example in the doc to avoid this confusion?

zhouzaida · 2021-09-28T07:58:38Z

yet, great suggestion. We have a plan to add more examples or tutorials about how to use all of the hooks like OptimizerHook or GradientCumulativeOptimizerHook

xiviax · 2022-09-05T15:17:17Z

Hi @zhouzaida

I replaced optimizer_config = dict(grad_clip=None) in the linked file with
optimizer_config = dict(type="GradientCumulativeOptimizerHook", cumulative_iters=4) as mentioned above.

However, I obtain the following error:

How can I change the Hook from the OptimizerHook to the GradientCumulativeOptimizerHook correctly?

Thanks already in advance for your help!

zhouzaida · 2022-09-06T06:53:23Z

Hi @xiviax , could you provide the versions of mmcv and mmdet?

xiviax · 2022-09-06T07:08:35Z

Hi @zhouzaida
Thanks for your fast reply! :)
I am using mmcv version 1.6.1 and for mmdet I am cloning the mmdetection repository to colab - so version 2.25.1.

xiviax · 2022-09-06T07:13:06Z

These are the hooks that are being executed when I am not trying to use the GradientCumulativeOptimizerHook:

zhouzaida · 2022-09-06T12:47:06Z

It seems like you set the fp16 like here so optimizer_config = dict(type="GradientCumulativeOptimizerHook", cumulative_iters=4) is invalid. Could you provide the config you used?

xiviax · 2022-09-06T14:10:01Z

Ah I see! This is the config that I used:

#base config file
_base_='/content/mmdetection/configs/swin/mask_rcnn_swin-s-p4-w7_fpn_fp16_ms-crop-3x_coco.py'

#declare self-customed dataset type
dataset_type='CocoDataset'

CATEGORY = 'bottletype'
BASE_WORK_DIR = '/content/drive/MyDrive/Uni Materialien/Master Thesis/PoC no label/Output Bottletype'

#images directory
img_dir_synthetic='/content/drive/MyDrive/Uni Materialien/Master Thesis/Ubuntu_Master_Thesis/Datasets/synthetic/50_bottles_randomized_labels/images'
img_dir_real='/content/drive/MyDrive/Uni Materialien/Master Thesis/Ubuntu_Master_Thesis/Datasets/real/2022_08_11-POC_bottles_conveyor/images'
img_dir_real_test='/content/drive/MyDrive/Uni Materialien/Master Thesis/Ubuntu_Master_Thesis/Datasets/real/2022_08_23-POC_testset/images'

#classes list
classes = ("beverage PET", "milk & dairy", "food", "beauty care", "home care",)

#model configuration
model = dict(
    roi_head=dict(
        bbox_head=dict(num_classes=5),#change bbox and mask head numbers here.
        mask_head=dict(num_classes=5)))

# data setup
DATA_BASE_PATH_SYNTHETIC = '/content/drive/MyDrive/Uni Materialien/Master Thesis/Ubuntu_Master_Thesis/Datasets/synthetic/50_bottles_randomized_labels/coco_annotations'
DATA_BASE_PATH_REAL = '/content/drive/MyDrive/Uni Materialien/Master Thesis/Ubuntu_Master_Thesis/Datasets/real/2022_08_11-POC_bottles_conveyor/coco_annotations'
DATA_BASE_PATH_REAL_TEST = '/content/drive/MyDrive/Uni Materialien/Master Thesis/Ubuntu_Master_Thesis/Datasets/real/2022_08_23-POC_testset/coco_annotations'

data=dict(
  
    train=dict(
        type=dataset_type,
        classes=classes,
        ann_file=[
            f'{DATA_BASE_PATH_SYNTHETIC}/plastic_bottles_annotations_{CATEGORY}_train.json',
        ],
        img_prefix=img_dir_synthetic
    ),

    val=dict(
        type=dataset_type,
        classes=classes,
        ann_file=[
            f'{DATA_BASE_PATH_SYNTHETIC}/plastic_bottles_annotations_{CATEGORY}_val.json',
        ],
        img_prefix=img_dir_synthetic
    ),

    multi_val={
        'real_valset':dict(
            type=dataset_type,
            classes=classes,
            ann_file=[
                f'{DATA_BASE_PATH_REAL}/plastic_bottles_annotations_{CATEGORY}_val.json',
            ],
            img_prefix=img_dir_real
        ),
        'real_testset':dict(
            type=dataset_type,
            classes=classes,
            ann_file=[
                f'{DATA_BASE_PATH_REAL_TEST}/plastic_bottles_annotations_{CATEGORY}_val.json',
            ],
            img_prefix=img_dir_real_test
        )
    },

    test=dict(
        type=dataset_type,
        classes=classes,
        ann_file=[
            f'{DATA_BASE_PATH_REAL}/plastic_bottles_annotations_{CATEGORY}_val.json',#validate file path
        ],
        img_prefix=img_dir_real

    )
)

#optimizer_config = dict(type="GradientCumulativeOptimizerHook", cumulative_iters=4)

#pretrained model
#load_from=f'{BASE_WORK_DIR}//synthetic/epoch_30.pth'

#resume training
#resume_from=f'{BASE_WORK_DIR}/synthetic/epoch_23.pth'

#where to save checkpoint files
work_dir = f'{BASE_WORK_DIR}/synthetic-bs8-hook'

checkpoint_config = dict(
    interval = 2,
    max_keep_ckpts = 1
)

I just saw that there is also a GradientCumulativeFp16OptimizerHook. I tried to put it in my config file the same way aas for the GradientCumulativeOptimizerHook , however I am getting the same error:

Traceback (most recent call last):
  File "/content/mmdetection/tools/train.py", line 242, in <module>
    main()
  File "/content/mmdetection/tools/train.py", line 238, in main
    meta=meta)
  File "/content/mmdetection/mmdet/apis/train.py", line 186, in train_detector
    **cfg.optimizer_config, **fp16_cfg, distributed=distributed)
TypeError: __init__() got an unexpected keyword argument 'type'

Could the GradientCumulativeFp16OptimizerHook work in my case? If yes, how can it be implemented?

Thanks already in advance for your answer!

zhouzaida · 2022-09-07T02:17:07Z

Hi, you need to remove this line from config.

xiviax · 2022-09-08T12:32:26Z

Great. It seems to be working now. Thank you so much for your support!

kazuma-jp · 2023-04-11T10:22:51Z

Hi @zhouzaida.
Thank you for your description of GradientCumulativeOptimizerHook.
Can GradientCumulative be used in the fp16 setting?

I run mask r cnn on following config.But, doesn't work.

fp16 = dict(loss_scale=512.)

optimizer_config = dict(
    type="GradientCumulativeFp16OptimizerHook", cumulative_iters=4)

Detailed settings can be found here.

zhouzaida added the Usage label Sep 27, 2021

zhouzaida closed this as completed Sep 28, 2021

zhouzaida mentioned this issue Dec 20, 2021

FAQ #1601

Closed

zhouzaida mentioned this issue Apr 17, 2022

How to use GradientCumulativeOptimizerHook? #1889

Closed

HAOCHENYE mentioned this issue Jun 26, 2022

Add Cross-Iteration Batch Normalization and Accumulate Gradient #2074

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Trying to backward through the graph a second time #1379

RuntimeError: Trying to backward through the graph a second time #1379

FrancescoSaverioZuppichini commented Sep 27, 2021

zhouzaida commented Sep 27, 2021

FrancescoSaverioZuppichini commented Sep 28, 2021

zhouzaida commented Sep 28, 2021

xiviax commented Sep 5, 2022

zhouzaida commented Sep 6, 2022

xiviax commented Sep 6, 2022 •

edited

Loading

xiviax commented Sep 6, 2022

zhouzaida commented Sep 6, 2022

xiviax commented Sep 6, 2022 •

edited

Loading

zhouzaida commented Sep 7, 2022 •

edited

Loading

xiviax commented Sep 8, 2022

kazuma-jp commented Apr 11, 2023

RuntimeError: Trying to backward through the graph a second time #1379

RuntimeError: Trying to backward through the graph a second time #1379

Comments

FrancescoSaverioZuppichini commented Sep 27, 2021

zhouzaida commented Sep 27, 2021

FrancescoSaverioZuppichini commented Sep 28, 2021

zhouzaida commented Sep 28, 2021

xiviax commented Sep 5, 2022

zhouzaida commented Sep 6, 2022

xiviax commented Sep 6, 2022 • edited Loading

xiviax commented Sep 6, 2022

zhouzaida commented Sep 6, 2022

xiviax commented Sep 6, 2022 • edited Loading

zhouzaida commented Sep 7, 2022 • edited Loading

xiviax commented Sep 8, 2022

kazuma-jp commented Apr 11, 2023

xiviax commented Sep 6, 2022 •

edited

Loading

xiviax commented Sep 6, 2022 •

edited

Loading

zhouzaida commented Sep 7, 2022 •

edited

Loading