Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Replay buffer crashes after being cleared #85

Closed
d3sm0 opened this issue Oct 5, 2022 · 4 comments
Closed

Replay buffer crashes after being cleared #85

d3sm0 opened this issue Oct 5, 2022 · 4 comments
Assignees

Comments

@d3sm0
Copy link

d3sm0 commented Oct 5, 2022

Minimal example:

import torch
from _rlmeta_extension import UniformSampler
from rlmeta.core.replay_buffer import ReplayBuffer
from rlmeta.storage import TensorCircularBuffer

replay_buffer = ReplayBuffer(TensorCircularBuffer(12), UniformSampler())

while True:
    for t in torch.randn(size=(12,2)).chunk(12,dim=0):
        replay_buffer.append(t)
        replay_buffer.sample(12)
    replay_buffer.clear()

Stack trace:

RuntimeError: output with shape [2] doesn't match the broadcast shape [1, 2]
Exception raised from mark_resize_outputs at ../aten/src/ATen/TensorIterator.cpp:1181 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7fd72c9a220e in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7fd72c97d5e8 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libc10.so)
frame #2: at::TensorIteratorBase::mark_resize_outputs(at::TensorIteratorConfig const&) + 0x241 (0x7fd755cf6301 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x64 (0x7fd755cf6e54 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x19d4f8c (0x7fd755f11f8c in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::native::copy_(at::Tensor&, at::Tensor const&, bool) + 0x62 (0x7fd755f12ec2 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::_ops::copy_::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) + 0x75 (0x7fd756886555 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x46e94f5 (0x7fd758c264f5 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::copy_::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) + 0x75 (0x7fd756886555 in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x46ea6ad (0x7fd758c276ad in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) + 0x16e (0x7fd7568cdbce in /home/d3sm0/.venvs/torch_env/lib64/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x495df (0x7fd7024265df in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0x4a0c0 (0x7fd7024270c0 in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x1dd0f (0x7fd7023fad0f in /home/d3sm0/code/research/rlmeta/_rlmeta_extension.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #31: <unknown function> + 0x3feb0 (0x7fd7aa936eb0 in /lib64/libc.so.6)
frame #32: __libc_start_main + 0x80 (0x7fd7aa936f60 in /lib64/libc.so.6)
frame #33: _start + 0x25 (0x5649803a1095 in /home/d3sm0/.venvs/torch_env/bin/python)
@xiaomengy xiaomengy self-assigned this Oct 5, 2022
@xiaomengy
Copy link
Contributor

Thank for this issue. It should be a TensorCircularBuffer schema reset issue. I will prepare a PR to fix it today.

@xiaomengy
Copy link
Contributor

Hi, sorry for the delay because of some deadlines last week. I created #86 to fix this issue and also introduce a reset method for CircularBuffers. Both clear and reset can clear the buffer but reset will also reset the internal status such as initialized_ and schema_. So clear will be more efficient if the data schema will never change.
Please try that PR to confirm it resolves all of the issues. Thanks for the issue.

@d3sm0
Copy link
Author

d3sm0 commented Oct 18, 2022

Thank you i'll have a look!

@d3sm0
Copy link
Author

d3sm0 commented Oct 19, 2022

Closing this issue, looks neat!

@d3sm0 d3sm0 closed this as completed Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants