Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于qtran_base.py中_get_individual_q的一个小问题 #108

Closed
Johnson221b opened this issue Mar 14, 2024 · 2 comments
Closed

关于qtran_base.py中_get_individual_q的一个小问题 #108

Johnson221b opened this issue Mar 14, 2024 · 2 comments

Comments

@Johnson221b
Copy link

前辈您好,您在qtran_base.py的_get_individual_q函数中首先进行了这样一个操作:
if transition_idx == 0:
_, self.target_hidden = self.target_rnn(inputs, self.eval_hidden)
我想请教一下这样做的原因是什么,如果不这样做会导致什么错误?

另外,我在将您的代码与自己的工作结合的时候,遇到了一个报错:Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 9420]], which is output 0 of AsStridedBackward0, is at version 40; expected version 39 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). 您知道这大概是什么导致的问题吗?
很抱歉打扰您,非常感谢您的阅读与解答!

@starry-sky6688
Copy link
Owner

  1. 因为target+rnn也要输入历史观察信息,用target_hidden来记忆;但是RL这里的target每次计算的都是下一个obs对应的target,所以对于每个episode中的第一条数据,要把第一个obs输入到taregt_rnn里让它记下来,不然它记忆的就是从第二个obs开始的了;

  2. 这个问题得你自己调试了,貌似是你对张量进行了原地修改导致的,一般都是用一个新的变量来作为修改后的张量

@Johnson221b
Copy link
Author

非常感谢!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants