Reduce GPU memory usage in CacheDataloader #1730

jkulhanek · 2023-04-13T10:00:19Z

This PR implements functionality to allow Datasets to keep some tensors on CPU (same as "image") - reducing GPU memory requirements.
Currently, I moved the "depth_image" tensor to GPU.

Note, I haven't tested it yet. Which datasets should I test?

tancik · 2023-04-13T16:20:47Z

If you have a lidar iphone, you can create a capture with polycam that has depth.

machenmusik · 2023-04-13T16:54:07Z

By keeping on CPU , we should also expect some modest performance performance improvement due to reduced GPU transfer bandwidth usage, is that correct?

jkulhanek · 2023-04-13T17:00:21Z

Unfortunately, I don’t have a lidar iphone. I guess I can run a monocular depth prediction model

jkulhanek · 2023-04-17T19:32:49Z

This is also related to #1465

nepfaff · 2023-04-21T22:31:43Z

You could try this dataset for testing (had it lying around so thought it might help here). I only tested this with instantNGP though, so you will have to go through that dataparser.

https://drive.google.com/file/d/1-8hXmkmtvHI36N6_WVV2uBZnTx39HFno/view?usp=sharing

NOTE: The data is very nosy, so don't expect fantastic reconstructions.

jkulhanek · 2023-05-16T08:51:06Z

I have tested this and I worked with the SDF dataparser. I will, however, make it more general by also reducing memory usage by segmentation masks and etc.

jkulhanek · 2023-05-16T18:24:53Z

The implementation is complete. It should be tested for all possible setups I guess. I just tested sdf-studio parser with neus-facto and vanilla nerfacto.

tancik · 2023-05-18T23:34:01Z

I'd be curious to hear @machenmusik perspective on this PR since he has looked more into dataloader/device stuff.

machenmusik · 2023-05-20T20:38:21Z

I'd be curious to hear @machenmusik perspective on this PR since he has looked more into dataloader/device stuff.

Stymied by Windows and torch 2 issues at the moment; once that is cleared up, I can hopefully try some other variations with this.

machenmusik · 2023-05-22T23:40:03Z

(Looks like there are some conflicts now...)

machenmusik · 2023-06-05T13:25:35Z

I finally got a chance to look at this with private dataset, doing the obvious conflict resolution, and merged into PR2003 where I assume performance impacts would be magnified.

Caveats

This is on Windows without torch.compile
I did not attempt to assess visual quality

With this PR:

nerfacto-huge default

Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec
-----------------------------------------------------------------------------------
0 (0.00%)           2 s, 171.857 ms      2 d, 12 h, 19 m, 45 s
10 (0.01%)          6 s, 422.363 ms      7 d, 10 h, 22 m, 52 s 2.20 K
20 (0.02%)          4 s, 872.264 ms      5 d, 15 h, 18 m, 49 s 17.88 K
30 (0.03%)          2 s, 899.456 ms      3 d, 8 h, 30 m, 58 s 31.78 K
40 (0.04%)          2 s, 895.946 ms      3 d, 8 h, 24 m, 38 s 32.26 K
50 (0.05%)          2 s, 894.647 ms      3 d, 8 h, 22 m, 0 s  32.83 K
60 (0.06%)          2 s, 900.305 ms      3 d, 8 h, 30 m, 56 s 32.73 K

nerfacto default

Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec
-----------------------------------------------------------------------------------
0 (0.00%)           749.999 ms           6 h, 14 m, 59 s
10 (0.03%)          113.636 ms           56 m, 47 s           87.39 K
20 (0.07%)          40.626 ms            20 m, 17 s           110.38 K
30 (0.10%)          30.469 ms            15 m, 13 s           141.99 K
40 (0.13%)          29.687 ms            14 m, 49 s           152.92 K
50 (0.17%)          35.937 ms            17 m, 56 s           127.79 K
60 (0.20%)          36.719 ms            18 m, 19 s           121.24 K
70 (0.23%)          30.469 ms            15 m, 11 s           150.73 K
80 (0.27%)          36.738 ms            18 m, 19 s           131.05 K

Without this PR:

nerfacto-huge default

Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec
-----------------------------------------------------------------------------------
0 (0.00%)           1 s, 15.627 ms       1 d, 4 h, 12 m, 42 s
10 (0.01%)          6 s, 670.184 ms      7 d, 17 h, 15 m, 51 s 2.06 K
20 (0.02%)          5 s, 219.445 ms      6 d, 0 h, 57 m, 20 s 16.32 K
30 (0.03%)          3 s, 68.791 ms       3 d, 13 h, 13 m, 7 s 30.15 K
40 (0.04%)          3 s, 17.879 ms       3 d, 11 h, 47 m, 47 s 30.19 K
50 (0.05%)          3 s, 27.521 ms       3 d, 12 h, 3 m, 20 s 29.94 K
60 (0.06%)          2 s, 984.983 ms      3 d, 10 h, 51 m, 59 s 30.13 K

nerfacto default

Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec
-----------------------------------------------------------------------------------
0 (0.00%)           750.084 ms           6 h, 15 m, 2 s
10 (0.03%)          109.382 ms           54 m, 40 s           97.09 K
20 (0.07%)          47.217 ms            23 m, 35 s           93.68 K
30 (0.10%)          47.999 ms            23 m, 58 s           91.19 K
40 (0.13%)          47.656 ms            23 m, 47 s           92.19 K
50 (0.17%)          47.656 ms            23 m, 47 s           93.28 K
60 (0.20%)          46.875 ms            23 m, 23 s           93.93 K
70 (0.23%)          46.875 ms            23 m, 22 s           91.76 K
80 (0.27%)          45.331 ms            22 m, 36 s           94.99 K

So if others can confirm output is same or better (which I think was done previously) I think this is good. Let me know if you'd like me to commit the conflict resolution here for clarity.

(cc @tancik @jkulhanek)

jkulhanek · 2023-06-05T14:32:41Z

@machenmusik thanks! Do you, by any chance, also have GPU memory usage stats?

machenmusik · 2023-06-05T14:59:25Z

@machenmusik thanks! Do you, by any chance, also have GPU memory usage stats?

No sorry, huge model saturates this machine anyway so wouldn't see anything there really

jkulhanek · 2023-06-07T07:32:24Z

Can we merge this?

machenmusik · 2023-06-07T13:47:05Z

It would be great if someone can independently assess visual quality is not degraded, but I have no objections

tancik

LGTM

machenmusik mentioned this pull request Apr 20, 2023

Allow masks with instant ngp dataparser #1741

Merged

jkulhanek mentioned this pull request Apr 25, 2023

Add support for torch v2.0.0 #1713

Merged

SauravMaheshkar assigned jkulhanek May 15, 2023

SauravMaheshkar added speedup quality of life data processing python Pull requests that update Python code labels May 15, 2023

jkulhanek force-pushed the reduce-gpu-usage-cached-dataloader branch from 3a2bfbc to 5c19d09 Compare May 16, 2023 18:16

jkulhanek added 2 commits May 16, 2023 20:18

Reduce GPU memory usage in CacheDataloader

5224ee7

Move also masks and seg.masks, depths to cpu

9f68fd2

jkulhanek force-pushed the reduce-gpu-usage-cached-dataloader branch from 5c19d09 to 9f68fd2 Compare May 16, 2023 18:18

Merge branch 'main' into reduce-gpu-usage-cached-dataloader

5cbc2f3

jkulhanek requested a review from tancik May 17, 2023 13:11

jkulhanek added 2 commits June 6, 2023 10:31

Merge branch 'main' into reduce-gpu-usage-cached-dataloader

be04204

Fix linter issues

0d2f3dd

tancik approved these changes Jun 7, 2023

View reviewed changes

Merge branch 'main' into reduce-gpu-usage-cached-dataloader

c536c14

tancik enabled auto-merge (squash) June 7, 2023 17:23

tancik merged commit 3b0f758 into main Jun 7, 2023
4 checks passed

tancik deleted the reduce-gpu-usage-cached-dataloader branch June 7, 2023 18:14

This was referenced Jun 14, 2023

masked (mustard) dataset much slower with main than PR2025 #2075

Closed

nerfacto-huge #2003

Merged

YueYin27 mentioned this pull request Jun 24, 2023

longer training time after updating #2127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce GPU memory usage in CacheDataloader #1730

Reduce GPU memory usage in CacheDataloader #1730

jkulhanek commented Apr 13, 2023

tancik commented Apr 13, 2023

machenmusik commented Apr 13, 2023

jkulhanek commented Apr 13, 2023

jkulhanek commented Apr 17, 2023

nepfaff commented Apr 21, 2023 •

edited

jkulhanek commented May 16, 2023

jkulhanek commented May 16, 2023

tancik commented May 18, 2023

machenmusik commented May 20, 2023

machenmusik commented May 22, 2023

machenmusik commented Jun 5, 2023

jkulhanek commented Jun 5, 2023

machenmusik commented Jun 5, 2023

jkulhanek commented Jun 7, 2023

machenmusik commented Jun 7, 2023

tancik left a comment

Reduce GPU memory usage in CacheDataloader #1730

Reduce GPU memory usage in CacheDataloader #1730

Conversation

jkulhanek commented Apr 13, 2023

tancik commented Apr 13, 2023

machenmusik commented Apr 13, 2023

jkulhanek commented Apr 13, 2023

jkulhanek commented Apr 17, 2023

nepfaff commented Apr 21, 2023 • edited

jkulhanek commented May 16, 2023

jkulhanek commented May 16, 2023

tancik commented May 18, 2023

machenmusik commented May 20, 2023

machenmusik commented May 22, 2023

machenmusik commented Jun 5, 2023

jkulhanek commented Jun 5, 2023

machenmusik commented Jun 5, 2023

jkulhanek commented Jun 7, 2023

machenmusik commented Jun 7, 2023

tancik left a comment

Choose a reason for hiding this comment

nepfaff commented Apr 21, 2023 •

edited