Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with Custom Dataset #49

Closed
4 tasks done
Shromm-Gaind opened this issue Feb 13, 2023 · 1 comment
Closed
4 tasks done

Training with Custom Dataset #49

Shromm-Gaind opened this issue Feb 13, 2023 · 1 comment

Comments

@Shromm-Gaind
Copy link

Shromm-Gaind commented Feb 13, 2023

Prerequisites

Task
I am trying to train fcaf3d with a custom dataset that emulates the sunrgbd dataset.

Environment
Docker Image Build from DockerFile.
GPU: 2080ti

Steps and Issues
To emulate the SUNRGB-D Dataset:

  • I generated the binary file as a numpy ndarray with 6 dimensions. (x, y, z, r, g, b) e.g in human readable format (point_cloudcustom.txt). Which is supposed to emulate the binary file from the create_data.py in human readable format ( point_cloud.txt
    *Github doesn't like me attaching a binary file. Also I only have 733 pointclouds, significantly less than the sunrgbd dataset.

  • I also generated the corresponding pkl file, based of the details provided from; https://mmdetection3d.readthedocs.io/en/latest/datasets/sunrgbd_det.html. The only difference is you have to add two classes at minimum otherwise it gives me a Assertion error like so:

File "/mmdetection3d/mmdet3d/core/bbox/structures/base_box3d.py", line 47, in __init__
assert tensor.dim() == 2 and tensor.size(-1) == box_dim, tensor.size()
AssertionError: torch.Size([7])

Example pkl file in human readable format: pklhumanreadable.txt

Reproduces the problem - Command or script
python tools/train.py configs/fcaf3d/fcaf3d_sunrgbd-3d-10class.py

Reproduces the problem - issue on 2080 ti
issue as log:
2080ti_train_issue.txt

  File "/mmdetection3d/mmdet3d/models/backbones/me_resnet.py", line 89, in forward
    x = self.layer2(x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/modules/resnet_block.py", line 59, in forward
    out = self.conv2(out)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 321, in forward
    input._manager,
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 84, in forward
    coordinate_manager._manager,
RuntimeError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 10.75 GiB total capacity; 8.93 GiB already allocated; 201.06 MiB free; 9.09 GiB reserved in total by PyTorch)

Reproduces the problem - issue on A100
I also launched an instance with a A100 to verify if it genuinely needed more memory.
image
image

Thoughts
I am quite confused as to why my dataset is more memory hungry, given that it is only 733 points. Also it trained fine with the SUNRGB-D Dataset on a 2080ti.

@filaPro
Copy link
Contributor

filaPro commented Feb 13, 2023

Hi, @Shromm-Gaind
Looks like the problem is in the scale of your data. I briefly looked at your data and the sizes of your objects are ~100. We operate in meters, so are your objects actually of size 100 meters? FCAF3D operates with 4 levels of 0.08, 0.16, 0.32, and 0.64 meters, which are much smaller then your objects, if I understand correctly.

@filaPro filaPro closed this as completed Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants