Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entropy scheduling #15

Closed
5 of 11 tasks
zxzzz0 opened this issue Jul 25, 2021 · 2 comments
Closed
5 of 11 tasks

Entropy scheduling #15

zxzzz0 opened this issue Jul 25, 2021 · 2 comments
Labels
good first issue Good for newcomers P3 Issue moderate in impact or severity serial Serial training related

Comments

@zxzzz0
Copy link

zxzzz0 commented Jul 25, 2021

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable: (Not applicable)

In reinforcement learning there is well known explore-exploit dilemma. In league training it's crucial that we can have a better entropy coefficient scheduling because of the following reasons:
(1) If entropy of policy drops too fast to zero, it might get stuck in local optimum and failed to explore more states.
(2) If entropy of policy drops too slow, it might fail to select the right action at pivotal moments and the training is very slow.

One solution to address above problem is to have a good scheduling. Assume there are some validation measurements that we can use like win rate, we only decrease entropy coefficient when the win rate is on plateau.

It's similar to learning rate scheduler ReduceLROnPlateau in PyTorch link

May us know if there could be some documentations about how entropy scheduling can be supported?

@PaParaZz1 PaParaZz1 added P1 Issue that should be fixed within a few weeks good first issue Good for newcomers serial Serial training related labels Jul 26, 2021
@PaParaZz1
Copy link
Member

This problem can be abstracted into a general problem——how to add customized scheduling module in the whole training pipeline, and we will provide some examples in one week.

@PaParaZz1 PaParaZz1 added P3 Issue moderate in impact or severity and removed P1 Issue that should be fixed within a few weeks labels Aug 2, 2021
@zxzzz0 zxzzz0 changed the title Entropy scheduling? Entropy scheduling Aug 4, 2021
@PaParaZz1
Copy link
Member

This issue has been solved in #38

Will-Nie pushed a commit to Will-Nie/DI-engine that referenced this issue Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers P3 Issue moderate in impact or severity serial Serial training related
Projects
None yet
Development

No branches or pull requests

2 participants