-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
increasing lower limit condition of SPRT #1734
Comments
We already have such a condition, setting a lower limit of 100 games in addition to SPRT. While this bar could potentially be raised, we never had promotions below 50% anymore since then, so I don't see this as a serious problem. |
i didnt know this condition existed the data from lz 165 and 166 shows that arround game 250, winrate went low enough for SPRT to go back to uncertain : then, the problem would be test matches where network are expected to be much stronger, but is to play 150 extra test games a big drawback from the benefit of preventing early false positives ? |
Given the attack of non-authentic games being uploaded (#1705) and the sudden unexplained extreme spike in server traffic that DOS'd the servers (#1731) I'm highly suspicious of two networks in a row being promoted that should have failed. It seems like we've attracted some malicious attention recently. There are scenarios where with enough clients under one person's control (Colab) one could change the outcome of matches. |
i dont think so, if that were the case it would be easily noticeable in match games i see it just as a fortunate opportunity to optimize match settings |
Perhaps because of the similar strength of the recent network? Most of the new challenger networks seem to have similar strength to the old one (mostly in the 48-53 range, according to match result). After multiple matches of similar strength networks some small probability results are possible (for example, a 50 winrate network may gain a high winning rate early and be promoted by SPRT). I noticed that the network promoted by SPRT still end up with 400 match games. So why dont we force the network not be promoted before 400 games? |
There seems to issue to me with what we are currently doing. It just seems that we are unlucky.
` |
for 425 games, what winrate can pass? |
http://zero.sjeng.org/match-games/5b6de84ddfd61771ce49e084 402 games 54.98 cannot pass |
3rd time that it happens, now with lz 169 https://zero.sjeng.org/match-games/5b7b67d9cc3dde4a3e75ec80 the data shows that a lower limit of 200 games for SPRT pass would have avoided that |
it is now the 2nd network that promotes with less than 55% at 400 games, on a row (lz 165 166 at 51% and 54% winrate)
i understand that SPRT is about stastical trust in the data, but there is one problem :
if the exact same games were played in the opposite reverse order (from game 400 backwards until game 1), then both of these networks would get an SPRT fail
also, networks with higher winrate at 400 games would get a fail, while the too early promoted networks, with lower winrate at 400 games get promoted
so to fix this problem, i suggest the idea of adding a restrictive condition to promoting :
"even if SPRT passed, force block promotion until more than 250 of 400 (62.5%) games are played :
->if SPRT is still in PASS at 62.5%, then promote
->if SPRT falls to uncertain at 62.5%, then continue match until game 400 or less if winrate is too low (like what normally would happen)"
with this condition, no matter in which order the games are played, if a network is significantly stronger it will get a pass at the end of all games, not too early
so it would prevent false positives (due to high variance in the first games) to get a pass
also note that this condition is restrictive, it wouldnt make a failed SPRT to play until game 400 (so no time and ressources are spent needlessly)
The text was updated successfully, but these errors were encountered: