Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] multiprocessing get stuck when creating waymo gt_database #2371

Open
3 tasks done
KickCellarDoor opened this issue Mar 20, 2023 · 3 comments
Open
3 tasks done

Comments

@KickCellarDoor
Copy link
Contributor

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

1.1x branch https://github.com/open-mmlab/mmdetection3d/tree/1.1

Environment

I am using NGC pytorch container, pytorch 1.14

Reproduces the problem - code sample

python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo

Most of the processes stuck when creating gt database. Creating gt database for Waymo takes about 24 days on a 56 cores machine.

Reproduces the problem - command or script

python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo

Reproduces the problem - error message

No error message.

Additional information

Inside of the GTDatabaseCreater.create_single function,
point_indices = box_np_ops.points_in_rbbox(points, gt_boxes_3d) is used, which will use multi thread acceleration. So if you further use multiprocessing, most of the processes will stuck and the overall generating speed will be much slower than single thread.

Change

multi_db_infos = mmengine.track_parallel_progress(
self.create_single,
((loop_dataset(i)
for i in range(len(self.dataset))), len(self.dataset)),
self.num_worker)

to

        multi_db_infos = mmengine.track_progress(
            self.create_single,
            ((loop_dataset(i)
              for i in range(len(self.dataset))), len(self.dataset)),
            )

will be much much faster. (24 days -> 5 hours)

@JingweiZhang12
Copy link
Contributor

Thanks for your feedback, could you please create a PR to reduce the default number of workers and add docs about this dataset conversion?

@OrcunCanDeniz
Copy link

I was also getting stuck at numPointsInGtCalculater. As in #2364 adding below solved the issue so far.

import multiprocessing
multiprocessing.set_start_method('spawn')

@ammaryasirnaich
Copy link

I am also stuck in this part, did you manage to solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants