Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about cost_transform #7

Closed
Tsunehiko opened this issue Aug 14, 2023 · 6 comments
Closed

Questions about cost_transform #7

Tsunehiko opened this issue Aug 14, 2023 · 6 comments

Comments

@Tsunehiko
Copy link

Thank you for the wonderful work. I have read the paper about Constrained Decision Transformer. Could you explain why you apply cost_transform when calculating a cost-to-go (CDT's forward: 50-x and dataset's sample_prob: 70-x)? Also, if this is mentioned in the paper, I would appreciate it if you could tell me where it is written. Thank you.

@liuzuxin
Copy link
Owner

Thank you for your question.
In the forward pass, this is a heuristic design and an experimental feature, and the intuition is to make the smaller constraint violation budget has higher input weights in the input sequence (similar to the reward return).
Since most of our experiments have a target cost range smaller than 50 in the CDT paper, we just adopt this value to transform the target cost return to be positive and linearly increase when the violation budget becomes smaller.
Another intuition is that: suppose we want the agent to achieve zero-constraint violations, i.e., the target cost return is 0. Then we thought using a large value (50) might be better than using 0 as inputs, because 0 will mask out the attention weights (zero multiplied by any attention values will still be 0).

In the dataset's sample_prob, the intuition of using cost_transform is similar to the reward reweighting in offline RL, i.e., sampling high-reward (in our case, small-cost) trajectories with high probability. So we use a cost transform to make the sampling prob linearly correlated with the transformed cost returns.

But later on, it turns out that this small trick did not play a significant role in experiments, but rather the data augmentation trick is more important. Therefore, we did not mention this minor implementation trick in the paper, though somehow it was enabled by default. Please let me know if you have any further questions.

@Tsunehiko
Copy link
Author

Thank you for explaining! I have grasped the second intuition of the forward path and what was attempted with the sample prob. I also understood the details mentioned in the paper. However, I'm still having trouble understanding the initial intuition behind the forward path. When the constraint violation budget (is this referring to the costs-to-go?) becomes small, even if the value increases due to the cost_transform, it seems that the input weight in the input sequence cannot be controlled since it passes through the embedding. What exactly is meant by "input weight" in this context? I would greatly appreciate it if you could provide a more detailed explanation regarding the first intuition.

@liuzuxin
Copy link
Owner

Sorry for the confusion. Note that the embedding layer for the cost return and reward return are Linear layers (see here), so if the input target cost to go (c) are zero, then the Linear layers weights (Wc=0) are not used, and only the bias terms are passed through the next attention layer. Then if the bias term is initialized to be small or zero, then the cost input tokens to the attention layers will also make the corresponding attention matrix weight deactivated.

You are correct if we use nn.Embedding to process integer cost return inputs, but we regard the cost return as continuous variables since it depends on tasks. Therefore, we thought inputs with a non-zero cost return value could help to learn the Linear embedding layer weights.

@Tsunehiko
Copy link
Author

Tsunehiko commented Aug 16, 2023

Thank you for the clear explanation. I now understand that with nn.Linear, if we input a value close to 0 or 0 itself, the input weight becomes inactive. (I wasn't fully grasping the difference between nn.Linear and nn.Embedding. I learned a lot from your explanation, thank you🙏)
However, I believe that this context you just explained is deeply related to the second intuition. Is the first intuition that it increases linearly as the target cost return decreases intended to prevent the attention weight from becoming inactive when the target cost return is close to 0? Or is there another significant reason for it to increase linearly?

@liuzuxin
Copy link
Owner

You are correct. The intuition is simply to make a target cost return that is close to 0 less likely to deactivate the network weight. But I want to emphasize that this is just an intuition and a minor trick. If your target cost value is not 0, like 10, 20, 40 in my experiments, then using this cost_transform in the forward pass or not will not significantly affect the final results.

@Tsunehiko
Copy link
Author

Thank you for your detailed response! I've understood, so I'd like to close this issue. I have other questions as well, but since they are different from 'cost_transform', I'll ask in a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants