Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce GPU memory usage in CacheDataloader #1730

Merged
merged 6 commits into from
Jun 7, 2023

Conversation

jkulhanek
Copy link
Contributor

This PR implements functionality to allow Datasets to keep some tensors on CPU (same as "image") - reducing GPU memory requirements.
Currently, I moved the "depth_image" tensor to GPU.

Note, I haven't tested it yet. Which datasets should I test?

@tancik
Copy link
Contributor

tancik commented Apr 13, 2023

If you have a lidar iphone, you can create a capture with polycam that has depth.

@machenmusik
Copy link
Contributor

By keeping on CPU , we should also expect some modest performance performance improvement due to reduced GPU transfer bandwidth usage, is that correct?

@jkulhanek
Copy link
Contributor Author

Unfortunately, I don’t have a lidar iphone. I guess I can run a monocular depth prediction model

@jkulhanek
Copy link
Contributor Author

This is also related to #1465

@nepfaff
Copy link
Contributor

nepfaff commented Apr 21, 2023

You could try this dataset for testing (had it lying around so thought it might help here). I only tested this with instantNGP though, so you will have to go through that dataparser.

https://drive.google.com/file/d/1-8hXmkmtvHI36N6_WVV2uBZnTx39HFno/view?usp=sharing

NOTE: The data is very nosy, so don't expect fantastic reconstructions.

@jkulhanek
Copy link
Contributor Author

I have tested this and I worked with the SDF dataparser. I will, however, make it more general by also reducing memory usage by segmentation masks and etc.

@jkulhanek jkulhanek force-pushed the reduce-gpu-usage-cached-dataloader branch from 3a2bfbc to 5c19d09 Compare May 16, 2023 18:16
@jkulhanek jkulhanek force-pushed the reduce-gpu-usage-cached-dataloader branch from 5c19d09 to 9f68fd2 Compare May 16, 2023 18:18
@jkulhanek
Copy link
Contributor Author

The implementation is complete. It should be tested for all possible setups I guess. I just tested sdf-studio parser with neus-facto and vanilla nerfacto.

@jkulhanek jkulhanek requested a review from tancik May 17, 2023 13:11
@tancik
Copy link
Contributor

tancik commented May 18, 2023

I'd be curious to hear @machenmusik perspective on this PR since he has looked more into dataloader/device stuff.

@machenmusik
Copy link
Contributor

I'd be curious to hear @machenmusik perspective on this PR since he has looked more into dataloader/device stuff.

Stymied by Windows and torch 2 issues at the moment; once that is cleared up, I can hopefully try some other variations with this.

@machenmusik
Copy link
Contributor

(Looks like there are some conflicts now...)

@machenmusik
Copy link
Contributor

I finally got a chance to look at this with private dataset, doing the obvious conflict resolution, and merged into PR2003 where I assume performance impacts would be magnified.

Caveats

  • This is on Windows without torch.compile
  • I did not attempt to assess visual quality

With this PR:

nerfacto-huge default

Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec
-----------------------------------------------------------------------------------
0 (0.00%)           2 s, 171.857 ms      2 d, 12 h, 19 m, 45 s
10 (0.01%)          6 s, 422.363 ms      7 d, 10 h, 22 m, 52 s 2.20 K
20 (0.02%)          4 s, 872.264 ms      5 d, 15 h, 18 m, 49 s 17.88 K
30 (0.03%)          2 s, 899.456 ms      3 d, 8 h, 30 m, 58 s 31.78 K
40 (0.04%)          2 s, 895.946 ms      3 d, 8 h, 24 m, 38 s 32.26 K
50 (0.05%)          2 s, 894.647 ms      3 d, 8 h, 22 m, 0 s  32.83 K
60 (0.06%)          2 s, 900.305 ms      3 d, 8 h, 30 m, 56 s 32.73 K

nerfacto default

Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec
-----------------------------------------------------------------------------------
0 (0.00%)           749.999 ms           6 h, 14 m, 59 s
10 (0.03%)          113.636 ms           56 m, 47 s           87.39 K
20 (0.07%)          40.626 ms            20 m, 17 s           110.38 K
30 (0.10%)          30.469 ms            15 m, 13 s           141.99 K
40 (0.13%)          29.687 ms            14 m, 49 s           152.92 K
50 (0.17%)          35.937 ms            17 m, 56 s           127.79 K
60 (0.20%)          36.719 ms            18 m, 19 s           121.24 K
70 (0.23%)          30.469 ms            15 m, 11 s           150.73 K
80 (0.27%)          36.738 ms            18 m, 19 s           131.05 K

Without this PR:

nerfacto-huge default

Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec
-----------------------------------------------------------------------------------
0 (0.00%)           1 s, 15.627 ms       1 d, 4 h, 12 m, 42 s
10 (0.01%)          6 s, 670.184 ms      7 d, 17 h, 15 m, 51 s 2.06 K
20 (0.02%)          5 s, 219.445 ms      6 d, 0 h, 57 m, 20 s 16.32 K
30 (0.03%)          3 s, 68.791 ms       3 d, 13 h, 13 m, 7 s 30.15 K
40 (0.04%)          3 s, 17.879 ms       3 d, 11 h, 47 m, 47 s 30.19 K
50 (0.05%)          3 s, 27.521 ms       3 d, 12 h, 3 m, 20 s 29.94 K
60 (0.06%)          2 s, 984.983 ms      3 d, 10 h, 51 m, 59 s 30.13 K

nerfacto default

Step (% Done)       Train Iter (time)    ETA (time)           Train Rays / Sec
-----------------------------------------------------------------------------------
0 (0.00%)           750.084 ms           6 h, 15 m, 2 s
10 (0.03%)          109.382 ms           54 m, 40 s           97.09 K
20 (0.07%)          47.217 ms            23 m, 35 s           93.68 K
30 (0.10%)          47.999 ms            23 m, 58 s           91.19 K
40 (0.13%)          47.656 ms            23 m, 47 s           92.19 K
50 (0.17%)          47.656 ms            23 m, 47 s           93.28 K
60 (0.20%)          46.875 ms            23 m, 23 s           93.93 K
70 (0.23%)          46.875 ms            23 m, 22 s           91.76 K
80 (0.27%)          45.331 ms            22 m, 36 s           94.99 K

So if others can confirm output is same or better (which I think was done previously) I think this is good. Let me know if you'd like me to commit the conflict resolution here for clarity.

(cc @tancik @jkulhanek)

@jkulhanek
Copy link
Contributor Author

@machenmusik thanks! Do you, by any chance, also have GPU memory usage stats?

@machenmusik
Copy link
Contributor

@machenmusik thanks! Do you, by any chance, also have GPU memory usage stats?

No sorry, huge model saturates this machine anyway so wouldn't see anything there really

@jkulhanek
Copy link
Contributor Author

Can we merge this?

@machenmusik
Copy link
Contributor

It would be great if someone can independently assess visual quality is not degraded, but I have no objections

Copy link
Contributor

@tancik tancik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tancik tancik enabled auto-merge (squash) June 7, 2023 17:23
@tancik tancik merged commit 3b0f758 into main Jun 7, 2023
4 checks passed
@tancik tancik deleted the reduce-gpu-usage-cached-dataloader branch June 7, 2023 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data processing python Pull requests that update Python code quality of life speedup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants