Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

Reduce Memory Use of GPUs in one line code. #34

Open
yumi-cn opened this issue Dec 11, 2020 · 5 comments
Open

Reduce Memory Use of GPUs in one line code. #34

yumi-cn opened this issue Dec 11, 2020 · 5 comments

Comments

@yumi-cn
Copy link

yumi-cn commented Dec 11, 2020

I have try to run this project codes in RTX2080Ti (11GB) x 4,

the original args like "--view-per-batch 4 --pixel-per-view 2048" will cause the OOM Error In Cuda devices in just 2 iters,

so I try to reduce the batch size to "--view-per-batch 4 --pixel-per-view 128",and it works well in the first 5000 iters,

and the args "--view-per-batch 2 --pixel-per-view 128", works well in the first 25000 iters,

They will finally cause the OOM Error in the voxels split step(just a guess),So I try to check the codes about the mm control part, and I did not found any codes about "Release the unused cache of Pytorch",like some codes:

torch.cuda.empty_cache()

so I try to add this code to the "fairnr/models/nsvf.py/NSVFModel/clean_caches":

    def clean_caches(self, reset=False):
        self.encoder.clean_runtime_caches()
        if reset:
            self.encoder.reset_runtime_caches()
        torch.cuda.empty_cache() # cache release after Model do all things

And this really help me to do more split steps (but still can not do more split steps like after 75000 iters)。

Before Add this line code:

Mem use of Cuda device: 4000MB ->(voxel split) 8000MB -> (voxel split) OOM Error

After Add this line code:

Mem use of Cuda device: 4000MB ->(voxel split) 6800MB -> (voxel split) 9900MB ->  (voxel split) OOM Error

And I don't find any bad affect on the results, yet.

I also try other ways to solve the problem of OOM, like add args "--fp16" to turn on the fp16 mode in apex module(which says can reduce the mem use due to use float16),
But this just cause error which I post the Issue #33.

If you guys have interesets about how to run these codes in the other cuda device(especially those not have so much gpu mm as V100 32GB),This line code and the bug report maybe useful for you guys.

Thanks for replying.

@ghasemikasra39
Copy link

On which dataset and on which object of that dataset are you training?

@yumi-cn
Copy link
Author

yumi-cn commented Dec 13, 2020

On which dataset and on which object of that dataset are you training?

I have test on the Synthetic-NSVF dataset, such as the Bike and the Palace.

@yyeboah
Copy link

yyeboah commented Dec 16, 2020

@yumi-cn Thanks for sharing your insights. For those that cannot make use of half precision, and haven’t got 32 GB of GPU memory, is there any other way to get past sub-division at 75K ?

@yumi-cn
Copy link
Author

yumi-cn commented Dec 16, 2020

@yumi-cn Thanks for sharing your insights. For those that cannot make use of half precision, and haven’t got 32 GB of GPU memory, is there any other way to get past sub-division at 75K ?

Actually I do have some ideas about this, but I cannot share those to u now (maybe some paper ideas). And I find that the 25K sub-division and 40K iters can be usable for most scenes at general precision, if u don't need such high precision, dont need to take sub-divison at 75K.

@MultiPath
Copy link
Contributor

Also, maybe the initial voxel size is too small. Maybe you can make it bigger.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants