Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultCPUAllocator: can't allocate memory: you tried to allocate 195696230400 bytes #3

Closed
4ndr3aR opened this issue Jun 23, 2021 · 4 comments

Comments

@4ndr3aR
Copy link

4ndr3aR commented Jun 23, 2021

Hey there,

first of all, thank you for the wonderful repo, it works great!

However, I've been experimenting for a few hours now and I can't process over seventy frames. I'm using 960x480 as resolution, but reducing the frame size doesn't seem to solve the problem.

Usually the script is interrupted by the usual CUDA OOM errors:

/local/data/venvs/swav/lib/python3.6/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
  File "eval_generic.py", line 127, in <module>
    processor.interact(with_bg_msk, frame_idx, rgb.shape[1], obj_idx)
  File "/local/data/repos/STCN/inference_core_yv.py", line 119, in interact
    key_v = self.prop_net.encode_value(self.images[:,frame_idx].cuda(), qf16, self.prob[self.enabled_obj,frame_idx].cuda())
  File "/local/data/repos/STCN/model/eval_network.py", line 47, in encode_value
    f16 = self.value_encoder(frame, kf16.repeat(k,1,1,1), masks, others)
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/local/data/repos/STCN/model/modules.py", line 114, in forward
    x = self.bn1(x)
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 178, in forward
    self.eps,
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/nn/functional.py", line 2282, in batch_norm
    input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 1.04 GiB (GPU 0; 31.75 GiB total capacity; 29.64 GiB already allocated; 667.50 MiB free; 29.76 GiB reserved in total by PyTorch)
Processing video1 ...
N/A% (0 of 1) |                                                                                                                                        | Elapsed Time: 0:00:00 ETA:  --:--:--Traceback (most recent call last):
  File "eval_generic.py", line 107, in <module>
    mem_every=args.mem_every, include_last=args.include_last)
  File "/local/data/repos/STCN/inference_core_yv.py", line 38, in __init__
    self.prob = torch.zeros((self.k+1, t, 1, nh, nw), dtype=torch.float32, device=self.device)
RuntimeError: CUDA out of memory. Tried to allocate 47.21 GiB (GPU 0; 31.75 GiB total capacity; 416.90 MiB already allocated; 29.99 GiB free; 446.00 MiB reserved in total by PyTorch)
Processing video1 ...

But sometimes there are much more disturbing errors like this one:

Traceback (most recent call last):
  File "eval_generic.py", line 80, in <module>
    for data in progressbar(test_loader, max_value=len(test_loader), redirect_stdout=True):
  File "/local/data/venvs/swav/lib/python3.6/site-packages/progressbar/shortcuts.py", line 10, in progressbar
    for result in progressbar(iterator):
  File "/local/data/venvs/swav/lib/python3.6/site-packages/progressbar/bar.py", line 547, in __next__
    value = next(self._iterable)
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/local/data/venvs/swav/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/local/data/repos/STCN/dataset/generic_test_dataset.py", line 101, in __getitem__
    masks = torch.from_numpy(all_to_onehot(masks, labels)).float()
RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 195696230400 bytes. Error code 12 (Cannot allocate memory)

100% (1 of 1) |########################################################################################################################################| Elapsed Time: 0:00:00 ETA:  00:00:00

The command line is fairly standard:

python eval_generic.py --data_path /local/data/dataset/dummy-test-set --output /local/data/repos/STCN/output-dummy-test-set

The only thing that changes is the number of images and their resolution (960x480 is the maximum).

Is there a way to do inference one batch at a time, without allocating all the memory at the beginning and thus avoiding all these OOMs?

Thank you!

@hkchengrex
Copy link
Owner

Can you print self.k?

@4ndr3aR
Copy link
Author

4ndr3aR commented Jun 25, 2021

Here you are:

Processing video1 ...
InferenceCore.__init__() - t: 500 - h: 480 - w: 960
InferenceCore.__init__() - nh: 480 - nw: 960
N/A% (0 of 1) |                                                                                                                                        | Elapsed Time: 0:00:00 ETA:  --:--:--Traceback (most recent call last):
  File "eval_generic.py", line 107, in <module>
    mem_every=args.mem_every, include_last=args.include_last)
  File "/local/data/repos/STCN/inference_core_yv.py", line 42, in __init__
    self.prob = torch.zeros((self.k+1, t, 1, nh, nw), dtype=torch.float32, device=self.device)
RuntimeError: CUDA out of memory. Tried to allocate 47.21 GiB (GPU 0; 31.75 GiB total capacity; 416.90 MiB already allocated; 29.99 GiB free; 446.00 MiB reserved in total by PyTorch)
InferenceCore.__init__() - self.k: 54

Yep, 960×480×500×4×55 is exactly 47.21 Gb.

@hkchengrex
Copy link
Owner

self.k stores the number of objects -- do you really have 55 objects? Otherwise it seems like your mask files have some problems. We use np.unique to determine the number of objects.

@4ndr3aR
Copy link
Author

4ndr3aR commented Jun 25, 2021

I actually have 5 objects in the mask and I think I already know the problem... I have resized the original mask to lower resolutions, for sure the anti-aliasing has produced a lot of intermediate colors which are seen as new classes that don't really exist.

Thanks for the support & debugging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants