Temperature parameter is not handled properly #5

haarnoja · 2017-11-08T19:15:30Z

The temperature parameter (alpha) is missing from the TD updates. For example

softqlearning/softqlearning/algos/softqlearning.py

Line 353 in aca29d2

v_next = tf.squeeze(tf.reduce_logsumexp(q_next, axis=1)) # N

should have alpha as in Eq. (10) in the paper. The code is correct only if alpha = 1.

As a quick fix to change the temperature, you can set scale_reward = 1 / temperature and alpha=1, which has an equivalent effect as discussed on page 2.

haarnoja · 2018-01-29T00:55:21Z

Latest refactor removes the temperature coefficient (alpha). To adjust the temperature, you can change reward_scale instead.

immars · 2018-03-21T02:23:55Z

sorry to re-raise this issue,
I have implemented a version of soft q learning, and find it convenient to have a separate alpha, I think alpha acts effectively as entropy coefficient in policy gradient methods, which can be annealed without affecting value iteration of the critic.

haarnoja · 2018-03-22T22:13:45Z

In my experience, annealing alpha to zero was little problematic because of how it enters the value function (V = alpha * log sum exp (Q / alpha)) and naive way of implementing this and setting alpha -> 0 obviously fails. How did you fix this? If you'd like to share your code, I'd be happy to merge your PR :).

immars · 2018-04-02T08:07:51Z

that's right, in my experiment, if alpha is annealed below a threshold (near 0.08 or so), the training becomes numerically unstable. But it could take a much larger value at the beginning of the training to encourage exploration.

the code is now in a private branch of hobotrl. the code structure is very different from this repo i think, and is hard to merge. I created a gist with relevant code pieces.
alpha_exploration could be a object of a subclass of python float, with variable values when evaluated each time as float.

immars · 2018-04-02T12:54:08Z

I've pushed to hobotrl for your reference.

haarnoja closed this as completed Jan 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temperature parameter is not handled properly #5

Temperature parameter is not handled properly #5

haarnoja commented Nov 8, 2017 •

edited

Loading

haarnoja commented Jan 29, 2018

immars commented Mar 21, 2018

haarnoja commented Mar 22, 2018

immars commented Apr 2, 2018

immars commented Apr 2, 2018

Temperature parameter is not handled properly #5

Temperature parameter is not handled properly #5

Comments

haarnoja commented Nov 8, 2017 • edited Loading

haarnoja commented Jan 29, 2018

immars commented Mar 21, 2018

haarnoja commented Mar 22, 2018

immars commented Apr 2, 2018

immars commented Apr 2, 2018

haarnoja commented Nov 8, 2017 •

edited

Loading