You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run the training script in the example of CON with the default args (= grid mode) using ShapeNet (downloaded by occupancy_networks repo's script) using 32GB GPU. However, it caused OOM. When setting -bs 24, it works (memory usage 30622MiB / 32510MiB).
Is this an intended behavior?
$ python -u examples/con/train.py -dd /mnt/nfs-mnj-hot-02/tmp/sosk/pynif3dcon/occupancy_networks/data/ShapeNet -sd saved_models_grid
Traceback (most recent call last):
File "examples/con/train.py", line 218, in <module>
main()
File "examples/con/train.py", line 214, in main
train(dataset, model, optimizer, args)
File "examples/con/train.py", line 103, in train
prediction = model(input_points, query_points)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/nfs-mnj-hot-02/tmp/sosk/pynif3dcon/pynif3d/pynif3d/pipeline/con.py", line 99, in forward
features = self.feature_encoder(input_points)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/nfs-mnj-hot-02/tmp/sosk/pynif3dcon/pynif3d/pynif3d/models/con/local_pool_pointnet.py", line 275, in forward
input_points, c, feature_grid=grid_id
File "/mnt/nfs-mnj-hot-02/tmp/sosk/pynif3dcon/pynif3d/pynif3d/models/con/local_pool_pointnet.py", line 191, in generate_coordinate_features
fea_grid = self.feature_processing_fn(fea_grid)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/nfs-mnj-hot-02/tmp/sosk/pynif3dcon/pynif3d/pynif3d/models/con/unet3d.py", line 289, in forward
x = layer(encoders_features[idx + 1], x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/nfs-mnj-hot-02/tmp/sosk/pynif3dcon/pynif3d/pynif3d/models/con/unet3d.py", line 172, in forward
x = self.layer(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/nfs-mnj-hot-02/tmp/sosk/pynif3dcon/pynif3d/pynif3d/models/con/unet3d.py", line 82, in forward
x = self.relu(self.convolution1(self.group_norm1(x)))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 246, in forward
input, self.num_groups, self.weight, self.bias, self.eps)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2112, in group_norm
torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 3.00 GiB (GPU 0; 31.75 GiB total capacity; 27.60 GiB already allocated; 2.92 GiB free; 27.72 GiB reserved in total by PyTorch)
About OOM, it would be yes for using grid with default setting.
Please try to add the argument --n-levels 3 or -nl 3 to set the U-Net 3D depth size, which is same setup for official implementation (i.e. in default the depth size is 4 for U-Net 2D and 3 for U-Net 3D. Please see official config in detail).
(Not urgent question.)
I run the training script in the example of CON with the default args (= grid mode) using ShapeNet (downloaded by occupancy_networks repo's script) using 32GB GPU. However, it caused OOM. When setting
-bs 24
, it works (memory usage 30622MiB / 32510MiB).Is this an intended behavior?
The env (at mnj) is here: (I run https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py)
The text was updated successfully, but these errors were encountered: