Entropy scheduling #15
Labels
good first issue
Good for newcomers
P3
Issue moderate in impact or severity
serial
Serial training related
In reinforcement learning there is well known explore-exploit dilemma. In league training it's crucial that we can have a better entropy coefficient scheduling because of the following reasons:
(1) If entropy of policy drops too fast to zero, it might get stuck in local optimum and failed to explore more states.
(2) If entropy of policy drops too slow, it might fail to select the right action at pivotal moments and the training is very slow.
One solution to address above problem is to have a good scheduling. Assume there are some validation measurements that we can use like win rate, we only decrease entropy coefficient when the win rate is on plateau.
It's similar to learning rate scheduler
ReduceLROnPlateau
in PyTorch linkMay us know if there could be some documentations about how entropy scheduling can be supported?
The text was updated successfully, but these errors were encountered: