-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about cost_transform #7
Comments
Thank you for your question. In the dataset's sample_prob, the intuition of using cost_transform is similar to the reward reweighting in offline RL, i.e., sampling high-reward (in our case, small-cost) trajectories with high probability. So we use a cost transform to make the sampling prob linearly correlated with the transformed cost returns. But later on, it turns out that this small trick did not play a significant role in experiments, but rather the data augmentation trick is more important. Therefore, we did not mention this minor implementation trick in the paper, though somehow it was enabled by default. Please let me know if you have any further questions. |
Thank you for explaining! I have grasped the second intuition of the forward path and what was attempted with the sample prob. I also understood the details mentioned in the paper. However, I'm still having trouble understanding the initial intuition behind the forward path. When the constraint violation budget (is this referring to the costs-to-go?) becomes small, even if the value increases due to the |
Sorry for the confusion. Note that the embedding layer for the cost return and reward return are Linear layers (see here), so if the input target cost to go ( You are correct if we use nn.Embedding to process integer cost return inputs, but we regard the cost return as continuous variables since it depends on tasks. Therefore, we thought inputs with a non-zero cost return value could help to learn the Linear embedding layer weights. |
Thank you for the clear explanation. I now understand that with nn.Linear, if we input a value close to 0 or 0 itself, the input weight becomes inactive. (I wasn't fully grasping the difference between nn.Linear and nn.Embedding. I learned a lot from your explanation, thank you🙏) |
You are correct. The intuition is simply to make a target cost return that is close to 0 less likely to deactivate the network weight. But I want to emphasize that this is just an intuition and a minor trick. If your target cost value is not 0, like 10, 20, 40 in my experiments, then using this |
Thank you for your detailed response! I've understood, so I'd like to close this issue. I have other questions as well, but since they are different from 'cost_transform', I'll ask in a new issue. |
Thank you for the wonderful work. I have read the paper about Constrained Decision Transformer. Could you explain why you apply
cost_transform
when calculating a cost-to-go (CDT's forward:50-x
and dataset's sample_prob:70-x
)? Also, if this is mentioned in the paper, I would appreciate it if you could tell me where it is written. Thank you.The text was updated successfully, but these errors were encountered: