Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error when generating image #81

Closed
jacobmacweb opened this issue Mar 14, 2021 · 11 comments
Closed

Memory error when generating image #81

jacobmacweb opened this issue Mar 14, 2021 · 11 comments

Comments

@jacobmacweb
Copy link

jacobmacweb commented Mar 14, 2021

I encounter this error upon running:

Traceback (most recent call last):
  File "c:\users\miner\appdata\local\programs\python\python38\lib\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\miner\appdata\local\programs\python\python38\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Miner\AppData\Local\Programs\Python\Python38\Scripts\imagine.exe\__main__.py", line 7, in <module>
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\cli.py", line 111, in main
    fire.Fire(train)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\cli.py", line 107, in train
    imagine()
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 447, in forward
    _, loss = self.train_step(epoch, i)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 380, in train_step
    out, loss = self.model(self.clip_encoding)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 168, in forward
    out = self.model()
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 97, in forward
    out = self.net(coords)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 76, in forward
    x = self.net(x)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 48, in forward
    out = self.activation(out)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 19, in forward
    return torch.sin(self.w0 * x)
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.85 GiB already allocated; 79.44 MiB free; 3.87 GiB reserved in total by PyTorch)

I attempted clearing cuda cache, but the same error occured.

>>> import torch
>>> torch.cuda.empty_cache()
@jacobmacweb
Copy link
Author

By default, torch has no memory allocated.

>>> print(torch.cuda.memory_summary(device=None, abbreviated=False))
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| GPU reserved memory   |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Allocations           |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|===========================================================================|

@jacobmacweb
Copy link
Author

Running it just after https://github.com/lucidrains/deep-daze/blob/main/deep_daze/deep_daze.py#L168 produces the following output

|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  360168 KB |    1374 MB |   12678 MB |   12327 MB |
|       from large pool |  347904 KB |    1362 MB |   12629 MB |   12290 MB |
|       from small pool |   12264 KB |      13 MB |      49 MB |      37 MB |
|---------------------------------------------------------------------------|
| Active memory         |  360168 KB |    1374 MB |   12678 MB |   12327 MB |
|       from large pool |  347904 KB |    1362 MB |   12629 MB |   12290 MB |
|       from small pool |   12264 KB |      13 MB |      49 MB |      37 MB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    1396 MB |    1396 MB |    1396 MB |       0 B  |
|       from large pool |    1382 MB |    1382 MB |    1382 MB |       0 B  |
|       from small pool |      14 MB |      14 MB |      14 MB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |   20760 KB |   25791 KB |  275962 KB |  255202 KB |
|       from large pool |   18688 KB |   23808 KB |  224128 KB |  205440 KB |
|       from small pool |    2072 KB |    2139 KB |   51834 KB |   49762 KB |
|---------------------------------------------------------------------------|
| Allocations           |     351    |     359    |     725    |     374    |
|       from large pool |      88    |      92    |     137    |      49    |
|       from small pool |     263    |     272    |     588    |     325    |
|---------------------------------------------------------------------------|
| Active allocs         |     351    |     359    |     725    |     374    |
|       from large pool |      88    |      92    |     137    |      49    |
|       from small pool |     263    |     272    |     588    |     325    |
|---------------------------------------------------------------------------|
| GPU reserved segments |      25    |      25    |      25    |       0    |
|       from large pool |      18    |      18    |      18    |       0    |
|       from small pool |       7    |       7    |       7    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |      11    |      12    |     171    |     160    |
|       from large pool |       6    |       6    |      15    |       9    |
|       from small pool |       5    |       7    |     156    |     151    |
|===========================================================================|

@jacobmacweb
Copy link
Author

For reference, I have a GeForce RTX 2060

@afiaka87
Copy link

afiaka87 commented Mar 14, 2021

There's a similar issue happening in: #80 (comment)

But yeah, you don't have enough VRAM. Most consumer GPUs dont - so don't feel bad. Less than 8 GiB of VRAM makes it pretty tough to do. But you might be able to if you set image_width to 256 or lower. There's a lot of people with this issue today so please check the link for information on how to solve it. I've typed too much for now ha.

Edit: as usual (unfortunately) the best (free) way to run this program is with the Google Colab notebooks. If you're not opposed to that you can use it for free (seriously) and you're basically guaranteed a GPU with 16 GB of VRAM. You can find them on the front page of this project ("README.md")

@afiaka87
Copy link

@discordstars

@jacobmacweb
Copy link
Author

@afiaka87 Oh alright, thanks for the quick response. I'll give it a go with a smaller image width; I already tried smaller batch size

@afiaka87
Copy link

afiaka87 commented Mar 14, 2021

For sure no problem. The most important bit on that page is @NotNANtoN's benchmarks for the 256 image_width while varying batch size. GPU usage on the right. bs is the batch_size. grad_acc stands for --gradient_accumulate_every=1. It defaults to 4, but you don't need it as much with higher batch sizes.

bs 8, num_layers 48: 5.3 GB
bs 16, num_layers 48: 5.46 GB - 2.0 it/s
bs 32, num_layers 48: 5.92 GB - 1.67 it/s
bs 8, num_layers 44: 5 GB - 2.39 it/s
bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it/s
bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it/s
bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it/s

Keep in mind, your OS (windows, linux?) is going to be using some GPU VRAM as well. Anywhere from 500 MB to 2 GB in my experience.

@afiaka87
Copy link

@discordstars Thanks for filing an issue btw! We always appreciate it even if we're too busy to get around to helping everyone.

If you're new to github, make sure you mash that "Close Issue" button if you feel your question's been answered. Do let me know if you manage to get it working on there. It's useful for future users to know if it's even possible.

@jacobmacweb
Copy link
Author

jacobmacweb commented Mar 14, 2021

Not new, but thanks for the reminder.

I'll give it a go with smaller image sizes and batch sizes and update the issue before I close it :)

Edit: and oops, I must have entirely skimmed over the links in the README. I'll do that after too (for the sake of actually getting decent output)

@afiaka87
Copy link

Not new, but thanks for the reminder.

My bad. I try to make as few assumptions about people on here. Hope it didnt come across as patronizing.

@jacobmacweb
Copy link
Author

@afiaka87 Absolutely not, no worries 😆 just making a remark.

I was able to run with --image-width 256 with the 6GiB of VRAM. I haven't tried other resolutions but this is working. ~2.84 it/s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants