question about sampling #11

Robotuks · 2018-08-02T13:46:41Z

Hey,

I wanted to ask about calculation in sample function.

return tf.argmax(tf.log(u) / probs, axis=1)

it divides from probs. Does that mean that lower probabilities have better chances to get picked? Better exploration???

The text was updated successfully, but these errors were encountered:

Robotuks · 2018-08-02T14:06:34Z

log(u) makes it negative. So everything is fine. But probs can be equal to 0, no?

inoryy · 2018-08-02T15:36:05Z

@Robotuks I think I ensure probs are never actually 0 somewhere (it's kind of a hack, but hey it works)

Robotuks closed this as completed Aug 2, 2018

Provide feedback