Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO Performance Issues #4

Open
kylegenova opened this issue Aug 1, 2020 · 2 comments
Open

IO Performance Issues #4

kylegenova opened this issue Aug 1, 2020 · 2 comments

Comments

@kylegenova
Copy link
Collaborator

kylegenova commented Aug 1, 2020

Previously (See #3) a user reported poor IO performance that was bottlenecking training. In response to this, a recent commit 801f5b1 added new flags to meshes2dataset.py "--optimize" and "--optimize_only". These flags generate a sharded and compressed tfrecords dataset for reduced IO overhead. The files are written to a subdirectory inside the dataset_directory path. The train.py script looks for that directory, and if available uses it for training rather than the existing files (which remain because they are useful for interactive visualization and the evaluation scripts).

Commit f78dbc4 enables this behavior by default. If you are an existing user experiencing less than 100% GPU utilization, please ctrl+c training, git pull, rerun the meshes2dataset.py script with the flags --optimize and --optimize_only (the latter flag skips the first part of dataset creation, which was already run), and rerun the training command (no change to the training command is required; it will resume using the new tfrecords data). Unfortunately this new meshes2dataset.py step can take several hours on shapenet, and also consumes ~3mb extra disk space per dataset element (totaling 129GB extra on ShapeNet). However, in the tested cases it has resulted in 100% gpu utilization. With this change, I experience ~3.5 steps/sec with a batch size of 24 on a V100, and ~2 steps/sec with a batch size of 24 on a P100.

The size of the shards and their contents could be further optimized, but without a failing example I'm not sure what the optimal settings are. If you experience less than 100% gpu utilization after this change, please comment below and I will do my best to address your issue. Similarly, if you can confirm 100% utilization on a networked HDD, that would be highly appreciated, since I can't easily test on that setup.

One other minor note is that a byproduct of this change is that the 10K points per sample are no longer randomly chosen from 100K each time a mesh is seen; instead the same 10K points are used each time. Because those 10K points are never seen by the network directly, but rather used to generate local pointclouds based on the SIF elements, I anticipate no effect from this change, unless the dataset is extremely small. However I will verify this quantitatively before closing the issue.

@chengzhag
Copy link

The tfrecords dataset improved training speed on a machine with limited storage speed significantly! Thanks for your update!

With a GbE connection storage server, a single 1080ti GPU can train at a speed of 1.5 steps/sec at almost full utilization:
image
Previously, the same GPU can only achieve less than 0.6 steps/sec with the same network storage server, or 1.0 steps/sec with a nvme SSD (tested today).

@chengzhag
Copy link

PS: with the tfrecord dataset and a nvme SSD, a 1080ti can achieve 1.8 steps/sec at full utilization:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants