You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know the Deepmind paper says a replace_rate of 0.55. But considering in that Go game under that rule, there is no "draw" result, so 0.55 is reasonable. However, in reversi there is "draw", so is it too high for the replace rate still being 0.55?
By 0.55, that saying, the next generation has to beat best model in most games, even draw is not allowed. That seems difficult. And the best model is thus less evolved, which makes the selfplay policy less improved neither. Then the training data less improved.
Or another question can be: in your practice when evaluating, how often does "draw" ending happen? In my local running, it happens in about rate of 1/8 when evaluating. I am still in early stage of training, and I rewrite the selfplay part also, so I don't know whether this 1/8 rate is reasonable or not. Just curious what rate of drawing you got.
Thanks.
The text was updated successfully, but these errors were encountered:
As you said, it is difficult to decide a good replace_rate.
you just drop out the draw games when evaluating
Yes, but I don't know whether it is a good choice.
In my evaluation configuration, game number for evaluation is 200. That of the paper is 400.
Still a little false positives are likely to occur.
Yes, but I don't know whether it is a good choice.
I think it depends on whether "draw" is a often-happened result for two same level players. If it is less seen in real practice, then it is a good choice( however I would just ignore those draw games rather than counting them in the 200 number). And if not, then not. But That's basically the game's property, not training program's. I am not good at Reversi, so I can't tell.
reversi-alpha-zero/src/reversi_zero/configs/normal.py
Line 4 in 61922cc
I know the Deepmind paper says a replace_rate of 0.55. But considering in that Go game under that rule, there is no "draw" result, so 0.55 is reasonable. However, in reversi there is "draw", so is it too high for the replace rate still being 0.55?
By 0.55, that saying, the next generation has to beat best model in most games, even draw is not allowed. That seems difficult. And the best model is thus less evolved, which makes the selfplay policy less improved neither. Then the training data less improved.
Or another question can be: in your practice when evaluating, how often does "draw" ending happen? In my local running, it happens in about rate of 1/8 when evaluating. I am still in early stage of training, and I rewrite the selfplay part also, so I don't know whether this 1/8 rate is reasonable or not. Just curious what rate of drawing you got.
Thanks.
The text was updated successfully, but these errors were encountered: