Skip to content

Error when training htc_x101_64x4d_fpn_20e_16 model on a Custom Dataset #2020

@prateek-77

Description

@prateek-77

Describe the bug
I tried training the htc_x101_64x4d_fpn_20e_16gpu model on a custom dataset. I set the 'seg_prefix' location to the folder that contains my segmentation maps. But soon after I start the training, it gives me the error: RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 100, 100, 3]
Also, can you please tell me what is the difference between htc without semantic and htc with semantic?

Reproduction

  1. What command or script did you run?
python tools/train.py ~/Prateek/Prateek/mmdetection2/mmdetection/configs/htc/htc_x101_64x4d_fpn_20e_16gpu.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
    I modified the num_classes according to the custom dataset. I'm not sure what value of num_classes should I set in 'semantic_head'

Environment

sys.platform: linux
Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.168
GPU 0,1: GeForce RTX 2080 Ti
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.1.2
MMCV: 0.2.16
MMDetection: 1.0rc1+4b984a7
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.1

Error traceback

2020-01-26 15:34:51,233 - INFO - workflow: [('train', 1)], max: 25 epochs

Traceback (most recent call last):
  File "tools/train.py", line 124, in <module>
    main()
  File "tools/train.py", line 120, in main
    timestamp=timestamp)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/apis/train.py", line 133, in train_detector
    timestamp=timestamp)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/apis/train.py", line 319, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 268, in train
    self.model, data_batch, train_mode=True, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/apis/train.py", line 100, in batch_processor
    losses = model(**data)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/models/detectors/base.py", line 138, in forward
    return self.forward_train(img, img_meta, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/models/detectors/htc.py", line 230, in forward_train
    loss_seg = self.semantic_head.loss(semantic_pred, gt_semantic_seg)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/core/fp16/decorators.py", line 127, in new_func
    return old_func(*args, **kwargs)
  File "/home/user4/Prateek/Prateek/mmdetection2/mmdetection/mmdet/models/mask_heads/fused_semantic_head.py", line 108, in loss
    loss_semantic_seg = self.criterion(mask_pred, labels)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 916, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 2021, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/user4/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 1840, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 100, 100, 3]

Thanks for the help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions