add gradient accumulation support #2049

ksnzh · 2023-06-08T17:45:00Z

This modification has no hurt to current repo. The train step number is same as before, so optimizer and scheduler works on old way. Users can easily increase batch size since nerf network has no batchnorm layer.

tancik · 2023-06-08T19:53:15Z

Run to fix issues.

pip install -e .[dev]
pre-commit install
pre-commit

brentyi · 2023-06-09T18:47:38Z

Would it make sense to:

Divide the batch size by the gradient accumulation count?
Provide an "auto" mode that computes the accumulation steps to achieve some effective batch size, while still respecting memory constraints?

(relevant to #2003, which won't fit on smaller GPUs)

ksnzh · 2023-06-12T06:00:00Z

Would it make sense to:

Divide the batch size by the gradient accumulation count?

Provide an "auto" mode that computes the accumulation steps to achieve some effective batch size, while still respecting memory constraints?

(relevant to #2003, which won't fit on smaller GPUs)

I think it's better to let user decide batchsize and coresponding learning rate.

kerrj · 2023-06-13T18:14:51Z

nerfstudio/engine/trainer.py

-            loss = functools.reduce(torch.add, loss_dict.values())
-        self.grad_scaler.scale(loss).backward()  # type: ignore
+        internal_step = 0
+        while True:


This would be clearer with a for loop, since internal_step is unnecessary except for counting iterations

kerrj · 2023-06-13T18:15:54Z

I think memory budgeting is a larger PR in itself, since going from batch size->memory consumption is a pretty nontrivial thing. I'd be in favor of merging in grad accumulation and thinking about how to automatically set these parameters later.

ksnzh · 2023-06-16T13:41:05Z

Here are two experients which use num rays of batch 8192 and accumulate two 4096 rays step.

https://wandb.ai/ksnzh/nerfstudio-public

kerrj

Looks good, thanks!

* add gradient accumulation support * fix 2 blank lines * fix possibly unbound variable * update gradient accumulation step with for loop * add seert and ignore pyright check --------- Co-authored-by: Zhang Jian <zhangjian49@lenovo.com>

add gradient accumulation support

fbb369f

Zhang Jian added 2 commits June 9, 2023 11:05

fix 2 blank lines

bfe1a26

fix possibly unbound variable

b0ce9b0

Merge branch 'main' into main

8c06e09

kerrj reviewed Jun 13, 2023

View reviewed changes

update gradient accumulation step with for loop

444fbda

ksnzh requested a review from kerrj June 16, 2023 13:41

kerrj approved these changes Jun 16, 2023

View reviewed changes

ksnzh added 2 commits June 18, 2023 18:37

Merge branch 'main' into ksnzh

0c3707d

add seert and ignore pyright check

eedaaaa

tancik merged commit c55f5d3 into nerfstudio-project:main Jun 18, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add gradient accumulation support #2049

add gradient accumulation support #2049

ksnzh commented Jun 8, 2023

tancik commented Jun 8, 2023

brentyi commented Jun 9, 2023

ksnzh commented Jun 12, 2023 •

edited

kerrj Jun 13, 2023

kerrj commented Jun 13, 2023

ksnzh commented Jun 16, 2023

kerrj left a comment

add gradient accumulation support #2049

add gradient accumulation support #2049

Conversation

ksnzh commented Jun 8, 2023

tancik commented Jun 8, 2023

brentyi commented Jun 9, 2023

ksnzh commented Jun 12, 2023 • edited

kerrj Jun 13, 2023

Choose a reason for hiding this comment

kerrj commented Jun 13, 2023

ksnzh commented Jun 16, 2023

kerrj left a comment

Choose a reason for hiding this comment

ksnzh commented Jun 12, 2023 •

edited