Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pytorch 1.13 error due to 2D weights #601

Merged
merged 2 commits into from
Jun 15, 2023

Conversation

ssheorey
Copy link
Member

@ssheorey ssheorey commented Jun 13, 2023

Regression when moving to pytorch 1.13.1.
Also reported in #567 #580 #590

Tested change works with both PyTorch 1.13.1 and Tensorflow 2.8.4

$ python -mipdb scripts/run_pipeline.py torch -c ml3d/configs/randl
anet_semantickitti.yml --dataset.dataset_path /export/share/datasets/SemanticKITTI/  --pipeline SemanticSegmentation --dataset
.use_cache True --pipeline.num_workers 0 --pipeline.pin_memory False                                                          
/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/runpy.py:127: RuntimeWarning: 'ipdb.__main__' found in sys.modules after 
import of package 'ipdb', but prior to execution of 'ipdb.__main__'; this may result in unpredictable behaviour               
  warn(RuntimeWarning(msg))                                                                                                   
> /mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py(1)<module>()                            
----> 1 import os                                                                                                             
      2 import argparse                                                                                                       
      3 import logging                                                                                                        
                                                                                                                              
ipdb> c                                                                                                                       
Using external Open3D-ML in /home/ssheorey/projects/open3d_ml/Open3D-ML_2                                                     
regular arguments                                                                                                             
backend: gloo                                                                                                                 
batch_size: null                                                                                                              
cfg_dataset: null                                                                                                             
cfg_file: ml3d/configs/randlanet_semantickitti.yml                                                                            
cfg_model: null                                                                                                               
cfg_pipeline: null                                                                                                            
ckpt_path: null                                                                                                               
dataset: null                                                                                                                 
dataset_path: null                                                                                                            
device: cuda                                                                                                                  
device_ids:                                                                                                                   
- '0'                                                                                                                         
framework: torch                                                                                                              
host: localhost                                                                                                               
main_log_dir: null                                                                                                            
max_epochs: null                                                                                                              
mode: null                                                                                                                    
model: null                                                                                                                   
node_rank: 0                                                                                                                  
nodes: 1                                                                                                                      
pipeline: SemanticSegmentation                                                                                                
port: '12355'                                                                                                                 
seed: 0                                                                                                                       
split: train                                                                                                                  
                                                                                                                              
extra arguments                                                                                                               
dataset.dataset_path: /export/share/datasets/SemanticKITTI/                                                                   
dataset.use_cache: 'True'      
pipeline.num_workers: '0'                                                                                                     
pipeline.pin_memory: 'False'                                                                                                  
                                                               
INFO - 2023-06-13 12:57:02,092 - semantic_segmentation - DEVICE : cuda   
INFO - 2023-06-13 12:57:02,092 - semantic_segmentation - Logging in file : ./logs/RandLANet_SemanticKITTI_torch/log_train_2023
-06-13_12:57:02.txt           
INFO - 2023-06-13 12:57:02,645 - semantickitti - Found 19130 pointclouds for train                                            
INFO - 2023-06-13 12:57:06,678 - semantickitti - Found 4071 pointclouds for validation
INFO - 2023-06-13 12:57:08,010 - semantic_segmentation - Initializing from scratch.                                           
INFO - 2023-06-13 12:57:08,019 - semantic_segmentation - Writing summary in train_log/00013_RandLANet_SemanticKITTI_torch.    
INFO - 2023-06-13 12:57:08,023 - semantic_segmentation - Started training                                                     
INFO - 2023-06-13 12:57:08,024 - semantic_segmentation - === EPOCH 0/100 ===                                                  
training:   0%|                                                                                      | 0/4783 [00:02<?, ?it/s]
Traceback (most recent call last):  
  File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/ipdb/__main__.py", line 323, in main               
    pdb._runscript(mainpyfile)                                                                                                
  File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/pdb.py", line 1573, in _runscript                                
    self.run(statement)
  File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/bdb.py", line 580, in run                                        
    exec(cmd, globals, locals)                                 
  File "<string>", line 1, in <module>                                                                                        
  File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py", line 1, in <module>             
    import os                                                                                                                 
  File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py", line 192, in main 
    pipeline.run_train()
File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/ml3d/torch/pipelines/semantic_segmentation.py", line 411, in run_train                                                                                                                                               loss, gt_labels, predict_scores = model.get_loss(
File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/ml3d/torch/models/randlanet.py", line 378, in get_loss                
loss = Loss.weighted_CrossEntropyLoss(scores, labels)                                                                               File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl          
return forward_call(*input, **kwargs)                                     
  File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward               
return F.cross_entropy(input, target, weight=self.weight,                                                                           File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/functional.py", line 3026, in cross_entropy           
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)      
RuntimeError: weight tensor should be defined either for all 19 classes or no classes but got weight tensor of shape: [1, 19]  

This change is Reviewable

Copy link
Collaborator

@benjaminum benjaminum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ssheorey)

@ssheorey ssheorey merged commit 4d6e9dc into dev Jun 15, 2023
2 checks passed
@ssheorey ssheorey deleted the ss/fix-get_class_weights-regression branch June 15, 2023 16:58
@RauchLukas RauchLukas mentioned this pull request Aug 7, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants