You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been taking a look at it and I have a question concerning those few lines of code where you wrote: # TODO: historical accident ...
Why are you multiplying the loss by 20 and 288 in those lines? grads = tf.gradients(self.loss * 20.0, pi.var_list) self.forwardloss = self.forwardloss * 288.0
I understand this is related to the batch size (or rollout steps) and to the number of features representing a state, but I can not really see the point of multiplying in such a way... could you please give me a hint?
Thanks,
The text was updated successfully, but these errors were encountered:
Yes. So, this value is nothing special. Universe starter agent originally used tf.reduce_sum instead of tf.reduce_mean to compute the loss. The hyper-parameters in my A3C+ICM code were tuned to that setup. But it is a bad practice to sum across batch and channel dimensions. This is so because as one changes the environments, batch-size or network architecture, the other hyper-parameters will stop making sense. Hence, I switched out the tf.reduce_sum with tf.reduce_mean taking the constant factor out (e.g. 288, 20 etc.). This makes the code generalizable across different network architectures, input sizes and environments.
Moreover, to help the users understand the code better, I deliberately added this comment # TODO: historical accident ... wherever the constants were factored out. Hope this answers your question.
Hi @pathak22 ,
first of all thanks for releasing the code!
I have been taking a look at it and I have a question concerning those few lines of code where you wrote:
# TODO: historical accident ...
Why are you multiplying the loss by 20 and 288 in those lines?
grads = tf.gradients(self.loss * 20.0, pi.var_list)
self.forwardloss = self.forwardloss * 288.0
I understand this is related to the batch size (or rollout steps) and to the number of features representing a state, but I can not really see the point of multiplying in such a way... could you please give me a hint?
Thanks,
The text was updated successfully, but these errors were encountered: