About the operation in updating policy net #16

qingyue2014 · 2019-08-20T11:01:11Z

Can you explain the goal in computing value loss and action loss when you update the policy net? I don't think that the way to updata net is consistent with the formula in your paper.

Or what should I understand?

eric-xw · 2019-08-30T01:14:17Z

Hi,

The RL part is a common implementation of policy gradient with baseline. The overall implementation is aligned with the formulation. Can you be more specific on your questions?

Thanks,

qingyue2014 · 2019-08-30T01:57:02Z

Sorry, my previous statement was not clear. My questions are as follows.

About the loss of reward net. In your paper, the objective of reward function is to minimize the exception of reward under empirical distribution subtracting the reward under policy network' s distribution. But in your code, the sign of loss (train_AREL.py, 138 line) is just on the opposite.
loss = -torch.sum(gt_score) + torch.sum(gen_score) . Why？
About the loss of policy net. Variable opt.rl_weight is used in calculating the loss. What the
meaning of variable loss and tf-loss？
loss = opt.rl_weight * loss + (1 - opt.rl_weight) * tf_loss
Looking forward to your reply！ Thx.

eric-xw · 2019-08-30T03:24:59Z

In the paper, we show the objective functions to be maximized (gradient ascent). In practice we usually minimize the loss functions with gradient descent instead. But they are indeed equivalent.
tf_loss is the cross entropy loss to help stabilize the training.

eric-xw closed this as completed Aug 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the operation in updating policy net #16

About the operation in updating policy net #16

qingyue2014 commented Aug 20, 2019

eric-xw commented Aug 30, 2019 •

edited

Loading

qingyue2014 commented Aug 30, 2019 •

edited

Loading

eric-xw commented Aug 30, 2019

About the operation in updating policy net #16

About the operation in updating policy net #16

Comments

qingyue2014 commented Aug 20, 2019

eric-xw commented Aug 30, 2019 • edited Loading

qingyue2014 commented Aug 30, 2019 • edited Loading

eric-xw commented Aug 30, 2019

eric-xw commented Aug 30, 2019 •

edited

Loading

qingyue2014 commented Aug 30, 2019 •

edited

Loading