[Bug] multiprocessing get stuck when creating waymo gt_database #2371

KickCellarDoor · 2023-03-20T17:44:29Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (dev) or latest version (1.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

1.1x branch https://github.com/open-mmlab/mmdetection3d/tree/1.1

Environment

I am using NGC pytorch container, pytorch 1.14

Reproduces the problem - code sample

python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo

Most of the processes stuck when creating gt database. Creating gt database for Waymo takes about 24 days on a 56 cores machine.

Reproduces the problem - command or script

python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo

Reproduces the problem - error message

No error message.

Additional information

Inside of the GTDatabaseCreater.create_single function,
point_indices = box_np_ops.points_in_rbbox(points, gt_boxes_3d) is used, which will use multi thread acceleration. So if you further use multiprocessing, most of the processes will stuck and the overall generating speed will be much slower than single thread.

Change

mmdetection3d/tools/dataset_converters/create_gt_database.py

Lines 613 to 617 in dfcf542

    
           multi_db_infos = mmengine.track_parallel_progress( 
        
               self.create_single, 
        
               ((loop_dataset(i) 
        
                 for i in range(len(self.dataset))), len(self.dataset)), 
        
               self.num_worker)

to

        multi_db_infos = mmengine.track_progress(
            self.create_single,
            ((loop_dataset(i)
              for i in range(len(self.dataset))), len(self.dataset)),
            )

will be much much faster. (24 days -> 5 hours)

The text was updated successfully, but these errors were encountered:

JingweiZhang12 · 2023-03-21T03:31:24Z

Thanks for your feedback, could you please create a PR to reduce the default number of workers and add docs about this dataset conversion?

OrcunCanDeniz · 2023-04-28T07:32:14Z

I was also getting stuck at numPointsInGtCalculater. As in #2364 adding below solved the issue so far.

import multiprocessing
multiprocessing.set_start_method('spawn')

ammaryasirnaich · 2023-10-27T14:16:07Z

I am also stuck in this part, did you manage to solve it?

pawel-kotowski mentioned this issue Sep 27, 2023

Fix/hanging with multiprocessing issue boczekbartek/mmdetection3d#2

Closed

s95huang mentioned this issue Oct 27, 2023

[Bug] waymo dataset to kitti conversion stuck at ] 0/158081, elapsed: 0s, ETA: #2796

Open

3 tasks

KickCellarDoor mentioned this issue Oct 30, 2023

[Bug] multiprocessing get stuck when creating waymo gt_database #2364

Closed

3 tasks

t-martyniuk mentioned this issue Feb 27, 2024

Reconstruction for urban scenes dataset? fudan-zvg/4d-gaussian-splatting#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] multiprocessing get stuck when creating waymo gt_database #2371

[Bug] multiprocessing get stuck when creating waymo gt_database #2371

KickCellarDoor commented Mar 20, 2023

JingweiZhang12 commented Mar 21, 2023

OrcunCanDeniz commented Apr 28, 2023

ammaryasirnaich commented Oct 27, 2023

[Bug] multiprocessing get stuck when creating waymo gt_database #2371

[Bug] multiprocessing get stuck when creating waymo gt_database #2371

Comments

KickCellarDoor commented Mar 20, 2023

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

JingweiZhang12 commented Mar 21, 2023

OrcunCanDeniz commented Apr 28, 2023

ammaryasirnaich commented Oct 27, 2023