Questions about cost_transform #7

Tsunehiko · 2023-08-14T10:07:05Z

Thank you for the wonderful work. I have read the paper about Constrained Decision Transformer. Could you explain why you apply cost_transform when calculating a cost-to-go (CDT's forward: 50-x and dataset's sample_prob: 70-x)? Also, if this is mentioned in the paper, I would appreciate it if you could tell me where it is written. Thank you.

The text was updated successfully, but these errors were encountered:

liuzuxin · 2023-08-14T17:36:58Z

Thank you for your question.
In the forward pass, this is a heuristic design and an experimental feature, and the intuition is to make the smaller constraint violation budget has higher input weights in the input sequence (similar to the reward return).
Since most of our experiments have a target cost range smaller than 50 in the CDT paper, we just adopt this value to transform the target cost return to be positive and linearly increase when the violation budget becomes smaller.
Another intuition is that: suppose we want the agent to achieve zero-constraint violations, i.e., the target cost return is 0. Then we thought using a large value (50) might be better than using 0 as inputs, because 0 will mask out the attention weights (zero multiplied by any attention values will still be 0).

In the dataset's sample_prob, the intuition of using cost_transform is similar to the reward reweighting in offline RL, i.e., sampling high-reward (in our case, small-cost) trajectories with high probability. So we use a cost transform to make the sampling prob linearly correlated with the transformed cost returns.

But later on, it turns out that this small trick did not play a significant role in experiments, but rather the data augmentation trick is more important. Therefore, we did not mention this minor implementation trick in the paper, though somehow it was enabled by default. Please let me know if you have any further questions.

Tsunehiko · 2023-08-15T10:13:52Z

Thank you for explaining! I have grasped the second intuition of the forward path and what was attempted with the sample prob. I also understood the details mentioned in the paper. However, I'm still having trouble understanding the initial intuition behind the forward path. When the constraint violation budget (is this referring to the costs-to-go?) becomes small, even if the value increases due to the cost_transform, it seems that the input weight in the input sequence cannot be controlled since it passes through the embedding. What exactly is meant by "input weight" in this context? I would greatly appreciate it if you could provide a more detailed explanation regarding the first intuition.

liuzuxin · 2023-08-16T00:18:59Z

Sorry for the confusion. Note that the embedding layer for the cost return and reward return are Linear layers (see here), so if the input target cost to go (c) are zero, then the Linear layers weights (Wc=0) are not used, and only the bias terms are passed through the next attention layer. Then if the bias term is initialized to be small or zero, then the cost input tokens to the attention layers will also make the corresponding attention matrix weight deactivated.

You are correct if we use nn.Embedding to process integer cost return inputs, but we regard the cost return as continuous variables since it depends on tasks. Therefore, we thought inputs with a non-zero cost return value could help to learn the Linear embedding layer weights.

Tsunehiko · 2023-08-16T01:49:54Z

Thank you for the clear explanation. I now understand that with nn.Linear, if we input a value close to 0 or 0 itself, the input weight becomes inactive. (I wasn't fully grasping the difference between nn.Linear and nn.Embedding. I learned a lot from your explanation, thank you🙏)
However, I believe that this context you just explained is deeply related to the second intuition. Is the first intuition that it increases linearly as the target cost return decreases intended to prevent the attention weight from becoming inactive when the target cost return is close to 0? Or is there another significant reason for it to increase linearly?

liuzuxin · 2023-08-16T05:29:07Z

You are correct. The intuition is simply to make a target cost return that is close to 0 less likely to deactivate the network weight. But I want to emphasize that this is just an intuition and a minor trick. If your target cost value is not 0, like 10, 20, 40 in my experiments, then using this cost_transform in the forward pass or not will not significantly affect the final results.

Tsunehiko · 2023-08-16T06:17:29Z

Thank you for your detailed response! I've understood, so I'd like to close this issue. I have other questions as well, but since they are different from 'cost_transform', I'll ask in a new issue.

Tsunehiko closed this as completed Aug 16, 2023

Tsunehiko mentioned this issue Aug 16, 2023

Questions about cost loss and target cost #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about cost_transform #7

Questions about cost_transform #7

Tsunehiko commented Aug 14, 2023

liuzuxin commented Aug 14, 2023

Tsunehiko commented Aug 15, 2023

liuzuxin commented Aug 16, 2023

Tsunehiko commented Aug 16, 2023 •

edited

Loading

liuzuxin commented Aug 16, 2023

Tsunehiko commented Aug 16, 2023

Questions about cost_transform #7

Questions about cost_transform #7

Comments

Tsunehiko commented Aug 14, 2023

liuzuxin commented Aug 14, 2023

Tsunehiko commented Aug 15, 2023

liuzuxin commented Aug 16, 2023

Tsunehiko commented Aug 16, 2023 • edited Loading

liuzuxin commented Aug 16, 2023

Tsunehiko commented Aug 16, 2023

Tsunehiko commented Aug 16, 2023 •

edited

Loading