Training with Custom Dataset #49

Shromm-Gaind · 2023-02-13T18:31:25Z

Prerequisites

I have searched Issues and Discussions but cannot get the expected help. I have also searched through issues on this github page.
I have read the (https://mmdetection3d.readthedocs.io/en/latest/2_new_data_model.html) but cannot get the expected help.

Task
I am trying to train fcaf3d with a custom dataset that emulates the sunrgbd dataset.

Environment
Docker Image Build from DockerFile.
GPU: 2080ti

Steps and Issues
To emulate the SUNRGB-D Dataset:

I generated the binary file as a numpy ndarray with 6 dimensions. (x, y, z, r, g, b) e.g in human readable format (point_cloudcustom.txt). Which is supposed to emulate the binary file from the create_data.py in human readable format ( point_cloud.txt
*Github doesn't like me attaching a binary file. Also I only have 733 pointclouds, significantly less than the sunrgbd dataset.
I also generated the corresponding pkl file, based of the details provided from; https://mmdetection3d.readthedocs.io/en/latest/datasets/sunrgbd_det.html. The only difference is you have to add two classes at minimum otherwise it gives me a Assertion error like so:

File "/mmdetection3d/mmdet3d/core/bbox/structures/base_box3d.py", line 47, in __init__
assert tensor.dim() == 2 and tensor.size(-1) == box_dim, tensor.size()
AssertionError: torch.Size([7])

Example pkl file in human readable format: pklhumanreadable.txt

Reproduces the problem - Command or script
python tools/train.py configs/fcaf3d/fcaf3d_sunrgbd-3d-10class.py

Reproduces the problem - issue on 2080 ti
issue as log:
2080ti_train_issue.txt

  File "/mmdetection3d/mmdet3d/models/backbones/me_resnet.py", line 89, in forward
    x = self.layer2(x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/modules/resnet_block.py", line 59, in forward
    out = self.conv2(out)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 321, in forward
    input._manager,
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 84, in forward
    coordinate_manager._manager,
RuntimeError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 10.75 GiB total capacity; 8.93 GiB already allocated; 201.06 MiB free; 9.09 GiB reserved in total by PyTorch)

Reproduces the problem - issue on A100
I also launched an instance with a A100 to verify if it genuinely needed more memory.

Thoughts
I am quite confused as to why my dataset is more memory hungry, given that it is only 733 points. Also it trained fine with the SUNRGB-D Dataset on a 2080ti.

The text was updated successfully, but these errors were encountered:

filaPro · 2023-02-13T19:50:45Z

Hi, @Shromm-Gaind
Looks like the problem is in the scale of your data. I briefly looked at your data and the sizes of your objects are ~100. We operate in meters, so are your objects actually of size 100 meters? FCAF3D operates with 4 levels of 0.08, 0.16, 0.32, and 0.64 meters, which are much smaller then your objects, if I understand correctly.

filaPro closed this as completed Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with Custom Dataset #49

Training with Custom Dataset #49

Shromm-Gaind commented Feb 13, 2023 •

edited

Loading

filaPro commented Feb 13, 2023

Training with Custom Dataset #49

Training with Custom Dataset #49

Comments

Shromm-Gaind commented Feb 13, 2023 • edited Loading

filaPro commented Feb 13, 2023

Shromm-Gaind commented Feb 13, 2023 •

edited

Loading