Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

TensorCircularBuffer with capacity larger of 1mln fails #87

Open
d3sm0 opened this issue Oct 18, 2022 · 1 comment
Open

TensorCircularBuffer with capacity larger of 1mln fails #87

d3sm0 opened this issue Oct 18, 2022 · 1 comment

Comments

@d3sm0
Copy link

d3sm0 commented Oct 18, 2022

Replay buffer of capacity of 1mln tries to allocate 846.72 gb. Steps to reproduce:

from rlmeta.storage import TensorCircularBuffer
import torch

rb = TensorCircularBuffer(capacity=int(1e6))
rb.append(torch.randn(10, 3, 84, 84))

Log:

RuntimeError: [enforce fail at alloc_cpu.cpp:66] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 846720000000 bytes. Error code 12 (Cannot allocate memory)
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x55 (0x7fd5b71980c5 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::alloc_cpu(unsigned long) + 0x7ac (0x7fd5b71894cc in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x23bc3 (0x7fd5b7176bc3 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #3: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x7bf (0x7fd5e04a5b2f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) + 0x40 (0x7fd5e04a64a0 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x34 (0x7fd5e04a64f4 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::native::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1f (0x7fd5e09b826f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x24f700b (0x7fd5e122a00b in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0xe3 (0x7fd5e0f75653 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x24d200f (0x7fd5e120500f in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::empty_memory_format::call(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1b7 (0x7fd5e0fb3077 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x4586c (0x7fd5b5ba886c in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0x49700 (0x7fd5b5bac700 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x4a0c0 (0x7fd5b5bad0c0 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #14: <unknown function> + 0x1dd0f (0x7fd5b5b80d0f in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #30: <unknown function> + 0x3feb0 (0x7fd65cacbeb0 in /lib64/libc.so.6)
frame #31: __libc_start_main + 0x80 (0x7fd65cacbf60 in /lib64/libc.so.6)
@xiaomengy
Copy link
Contributor

Hi, thanks for this issue. For the current implementation, we have to pre-allocate the capacity size. Here the total size of 1e6 float tensor is 10 * 3 * 84 * 84 * 4 * 1e6 ~= 846.72gb. We have a plan to add some kind of LazyTensor or something like np's memmap to deal with very large cases in the future.
While for the current case, I'd suggest to use the original int8 input for Atari instead of float Tensor which can make the size 1/4 to help.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants