Option to process images on GPU for speed at the expense of memory #2288

AX-I · 2023-08-01T15:11:41Z

Analogous to #2110. This fix speeds up training by over 3x when using multi-gpu DDP.

The observation that prompted this PR: training with --num-devices=2 is significantly slower than training with single gpu, with extremely high cpu usage (~2600% on a 28-core cpu).

Profiling with --logging.profiler=pytorch reveals that collate_image_dataset_batch() in pixel_samplers.py takes 70ms with --num-devices=2 and 30ms with single gpu. The batch image tensors are stored on the cpu, so the indexing operation is cpu-intensive.

This PR adds an option to remove "image" from exclude_batch_keys_from_device to keep the image tensor on the gpu. Using the option with suitable hardware reduces cpu usage from 2600% to 200%, and speeds up 2-gpu training from ~70 Krays/s to ~250 Krays/s. Single-gpu training is also sped up from 100 to 140 Krays/s. VRAM usage is increased from 2gb to 5gb.

Profiling the 2-gpu training with this fix confirms that the pixel sampler is no longer a bottleneck.

Training configuration: nerfacto default params, on F2Nerf grass scene
Machine: 2x Xeon Gold 5120 (disabled HT), 2x Titan RTX

AX-I · 2023-08-01T15:23:32Z

Might also help with #1638

tancik · 2023-08-01T15:59:56Z

The "expense of memory" is the tricky part. Many users have 8gb cards which nerfacto has been designed to fit onto. When the dataset is moved to GPU, the memory is no longer predictable.

AX-I · 2023-08-01T16:04:24Z

The "expense of memory" is the tricky part. Many users have 8gb cards which nerfacto has been designed to fit onto. When the dataset is moved to GPU, the memory is no longer predictable.

Converted this to a config option! That should allow for all use cases.
--pipeline.datamanager.images-on-gpu True

nerfstudio/data/datamanagers/base_datamanager.py

cpu usage reduced 1700% -> 100%, and even more if using multi-gpu training

AX-I · 2023-08-01T17:45:33Z

Changed optional to bool and rebased

tancik

LGTM

AX-I changed the title ~~Fix pixel sampler CPU bottleneck~~ Option to process images on GPU for speed at the expense of memory Aug 1, 2023

tancik reviewed Aug 1, 2023

View reviewed changes

nerfstudio/data/datamanagers/base_datamanager.py Outdated Show resolved Hide resolved

AX-I added 3 commits August 1, 2023 12:26

Fix pixel sampler cpu bottleneck

bc5f91b

cpu usage reduced 1700% -> 100%, and even more if using multi-gpu training

Convert to config option

86cb5ce

Change optional to bool

7eadb63

AX-I force-pushed the fix-pixel-sampler branch from f2d3e06 to 7eadb63 Compare August 1, 2023 16:26

tancik approved these changes Aug 1, 2023

View reviewed changes

tancik merged commit 98d36ed into nerfstudio-project:main Aug 1, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to process images on GPU for speed at the expense of memory #2288

Option to process images on GPU for speed at the expense of memory #2288

AX-I commented Aug 1, 2023 •

edited

AX-I commented Aug 1, 2023

tancik commented Aug 1, 2023

AX-I commented Aug 1, 2023 •

edited

AX-I commented Aug 1, 2023

tancik left a comment

Option to process images on GPU for speed at the expense of memory #2288

Option to process images on GPU for speed at the expense of memory #2288

Conversation

AX-I commented Aug 1, 2023 • edited

AX-I commented Aug 1, 2023

tancik commented Aug 1, 2023

AX-I commented Aug 1, 2023 • edited

AX-I commented Aug 1, 2023

tancik left a comment

Choose a reason for hiding this comment

AX-I commented Aug 1, 2023 •

edited

AX-I commented Aug 1, 2023 •

edited