Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to process images on GPU for speed at the expense of memory #2288

Merged
merged 3 commits into from
Aug 1, 2023

Conversation

AX-I
Copy link
Contributor

@AX-I AX-I commented Aug 1, 2023

Analogous to #2110. This fix speeds up training by over 3x when using multi-gpu DDP.

The observation that prompted this PR: training with --num-devices=2 is significantly slower than training with single gpu, with extremely high cpu usage (~2600% on a 28-core cpu).

Profiling with --logging.profiler=pytorch reveals that collate_image_dataset_batch() in pixel_samplers.py takes 70ms with --num-devices=2 and 30ms with single gpu. The batch image tensors are stored on the cpu, so the indexing operation is cpu-intensive.

Profile

This PR adds an option to remove "image" from exclude_batch_keys_from_device to keep the image tensor on the gpu. Using the option with suitable hardware reduces cpu usage from 2600% to 200%, and speeds up 2-gpu training from ~70 Krays/s to ~250 Krays/s. Single-gpu training is also sped up from 100 to 140 Krays/s. VRAM usage is increased from 2gb to 5gb.

Profiling the 2-gpu training with this fix confirms that the pixel sampler is no longer a bottleneck.

ProfileAfter

Training configuration: nerfacto default params, on F2Nerf grass scene
Machine: 2x Xeon Gold 5120 (disabled HT), 2x Titan RTX

@AX-I
Copy link
Contributor Author

AX-I commented Aug 1, 2023

Might also help with #1638

@AX-I AX-I changed the title Fix pixel sampler CPU bottleneck Option to process images on GPU for speed at the expense of memory Aug 1, 2023
@tancik
Copy link
Contributor

tancik commented Aug 1, 2023

The "expense of memory" is the tricky part. Many users have 8gb cards which nerfacto has been designed to fit onto. When the dataset is moved to GPU, the memory is no longer predictable.

@AX-I
Copy link
Contributor Author

AX-I commented Aug 1, 2023

The "expense of memory" is the tricky part. Many users have 8gb cards which nerfacto has been designed to fit onto. When the dataset is moved to GPU, the memory is no longer predictable.

Converted this to a config option! That should allow for all use cases.
--pipeline.datamanager.images-on-gpu True

cpu usage reduced 1700% -> 100%, and even more if using multi-gpu training
@AX-I
Copy link
Contributor Author

AX-I commented Aug 1, 2023

Changed optional to bool and rebased

Copy link
Contributor

@tancik tancik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tancik tancik merged commit 98d36ed into nerfstudio-project:main Aug 1, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants