You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Task
I am trying to train fcaf3d with a custom dataset that emulates the sunrgbd dataset.
Environment
Docker Image Build from DockerFile.
GPU: 2080ti
Steps and Issues
To emulate the SUNRGB-D Dataset:
I generated the binary file as a numpy ndarray with 6 dimensions. (x, y, z, r, g, b) e.g in human readable format (point_cloudcustom.txt). Which is supposed to emulate the binary file from the create_data.py in human readable format ( point_cloud.txt
*Github doesn't like me attaching a binary file. Also I only have 733 pointclouds, significantly less than the sunrgbd dataset.
File "/mmdetection3d/mmdet3d/models/backbones/me_resnet.py", line 89, in forward
x = self.layer2(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/modules/resnet_block.py", line 59, in forward
out = self.conv2(out)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 321, in forward
input._manager,
File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 84, in forward
coordinate_manager._manager,
RuntimeError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 10.75 GiB total capacity; 8.93 GiB already allocated; 201.06 MiB free; 9.09 GiB reserved in total by PyTorch)
Reproduces the problem - issue on A100
I also launched an instance with a A100 to verify if it genuinely needed more memory.
Thoughts
I am quite confused as to why my dataset is more memory hungry, given that it is only 733 points. Also it trained fine with the SUNRGB-D Dataset on a 2080ti.
The text was updated successfully, but these errors were encountered:
Hi, @Shromm-Gaind
Looks like the problem is in the scale of your data. I briefly looked at your data and the sizes of your objects are ~100. We operate in meters, so are your objects actually of size 100 meters? FCAF3D operates with 4 levels of 0.08, 0.16, 0.32, and 0.64 meters, which are much smaller then your objects, if I understand correctly.
Prerequisites
I have searched Issues and Discussions but cannot get the expected help. I have also searched through issues on this github page.
I have read the (https://mmdetection3d.readthedocs.io/en/latest/2_new_data_model.html) but cannot get the expected help.
Task
I am trying to train fcaf3d with a custom dataset that emulates the sunrgbd dataset.
Environment
Docker Image Build from DockerFile.
GPU: 2080ti
Steps and Issues
To emulate the SUNRGB-D Dataset:
I generated the binary file as a numpy ndarray with 6 dimensions. (x, y, z, r, g, b) e.g in human readable format (point_cloudcustom.txt). Which is supposed to emulate the binary file from the create_data.py in human readable format ( point_cloud.txt
*Github doesn't like me attaching a binary file. Also I only have 733 pointclouds, significantly less than the sunrgbd dataset.
I also generated the corresponding pkl file, based of the details provided from; https://mmdetection3d.readthedocs.io/en/latest/datasets/sunrgbd_det.html. The only difference is you have to add two classes at minimum otherwise it gives me a Assertion error like so:
Example pkl file in human readable format: pklhumanreadable.txt
Reproduces the problem - Command or script
python tools/train.py configs/fcaf3d/fcaf3d_sunrgbd-3d-10class.py
Reproduces the problem - issue on 2080 ti
issue as log:
2080ti_train_issue.txt
Reproduces the problem - issue on A100
I also launched an instance with a A100 to verify if it genuinely needed more memory.
Thoughts
I am quite confused as to why my dataset is more memory hungry, given that it is only 733 points. Also it trained fine with the SUNRGB-D Dataset on a 2080ti.
The text was updated successfully, but these errors were encountered: