Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda OOM in plots.py during mesh extraction #18

Closed
athena913 opened this issue Jul 3, 2021 · 4 comments
Closed

Cuda OOM in plots.py during mesh extraction #18

athena913 opened this issue Jul 3, 2021 · 4 comments

Comments

@athena913
Copy link

Hi,

Thank you for sharing the code. I trained the model. When I ran eval.py to extract the mesh using the latest checkpoint I am getting Cuda OOM in the File "../code/utils/plots.py", line 197, in get_surface_high_res_mesh which is the line
grid_points = torch.cat(g, dim=0)
I tried to change the following code snippet in plots.py, but it did not help. My GPU has 11GB memory. I also tried using 2 gpus, but I think the eval code uses only a single gpu. I also tried torch.cuda.empty_cache() to clear the cuda cache but I still get the OOM error. Could you please provide some guidance on how to fix the OOM problem? Thanks.

g = []
for i, pnts in enumerate(torch.split(grid_points, 100000, dim=0)):
    g.append(torch.bmm(vecs.unsqueeze(0).repeat(pnts.shape[0], 1, 1).transpose(1, 2),
                       pnts.unsqueeze(-1)).squeeze() + s_mean)
grid_points = torch.cat(g, dim=0)
@iernstig
Copy link

I found a similar problem on another project (Neural Sparse Voxel Fields), they released unused cache. Maybe it could be useful here as well? facebookresearch/NSVF#34

@athena913
Copy link
Author

Thanks. As I mentioned in my post above, I have tried to empty the cache, which is what the other post also has mentioned, I have also tried to reduce the 32-bit precision. But these changes did not help me.

However, this problem occurs only for some objects (e.g. SCAN ID 24), where the len(g)>2600 after the for loop. If len(g) < 2600 (e.g. SCAN ID 65), the mesh extraction works well. The paper uses a single Nvidia V100 which has 32GB gpu memory. I dont have such high gpu memory. So I have to look into using distributed memory across multiple gpus to see if that helps resolve the issue.

@lioryariv
Copy link
Owner

You can simply address this by giving a lower resolution parameter, e.g.: --resolution 256
where the default value is 512. It determines the grid resolution for the marching cube, so it will affect a bit the extracted mesh resolution but it does not suppose to be that transparent, 512 is quite big.

Best
Lior

@athena913
Copy link
Author

Yes, that works. Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants