Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increasing lower limit condition of SPRT #1734

Closed
wonderingabout opened this issue Aug 15, 2018 · 9 comments
Closed

increasing lower limit condition of SPRT #1734

wonderingabout opened this issue Aug 15, 2018 · 9 comments

Comments

@wonderingabout
Copy link
Contributor

wonderingabout commented Aug 15, 2018

it is now the 2nd network that promotes with less than 55% at 400 games, on a row (lz 165 166 at 51% and 54% winrate)
i understand that SPRT is about stastical trust in the data, but there is one problem :

if the exact same games were played in the opposite reverse order (from game 400 backwards until game 1), then both of these networks would get an SPRT fail

also, networks with higher winrate at 400 games would get a fail, while the too early promoted networks, with lower winrate at 400 games get promoted

so to fix this problem, i suggest the idea of adding a restrictive condition to promoting :
"even if SPRT passed, force block promotion until more than 250 of 400 (62.5%) games are played :
->if SPRT is still in PASS at 62.5%, then promote
->if SPRT falls to uncertain at 62.5%, then continue match until game 400 or less if winrate is too low (like what normally would happen)"

with this condition, no matter in which order the games are played, if a network is significantly stronger it will get a pass at the end of all games, not too early
so it would prevent false positives (due to high variance in the first games) to get a pass

also note that this condition is restrictive, it wouldnt make a failed SPRT to play until game 400 (so no time and ressources are spent needlessly)

@jkiliani
Copy link

We already have such a condition, setting a lower limit of 100 games in addition to SPRT. While this bar could potentially be raised, we never had promotions below 50% anymore since then, so I don't see this as a serious problem.

@wonderingabout
Copy link
Contributor Author

wonderingabout commented Aug 15, 2018

i didnt know this condition existed
then, how about increasing this lower limit to arround 250 games ?

the data from lz 165 and 166 shows that arround game 250, winrate went low enough for SPRT to go back to uncertain :
165 : https://zero.sjeng.org/match-games/5b7036fadfd61771ce509048
166 : https://zero.sjeng.org/match-games/5b7408fecc3dde4a3e5d27ed

then, the problem would be test matches where network are expected to be much stronger, but is to play 150 extra test games a big drawback from the benefit of preventing early false positives ?
also, cant test matches be set with different lower limit (for example 100 games) ?

@wonderingabout wonderingabout changed the title add a restrictive condition to SPRT increasing lower limit of SPRT Aug 15, 2018
@wonderingabout wonderingabout changed the title increasing lower limit of SPRT increasing lower limit condition of SPRT Aug 15, 2018
@nathanloop
Copy link

nathanloop commented Aug 15, 2018

Given the attack of non-authentic games being uploaded (#1705) and the sudden unexplained extreme spike in server traffic that DOS'd the servers (#1731) I'm highly suspicious of two networks in a row being promoted that should have failed.

It seems like we've attracted some malicious attention recently. There are scenarios where with enough clients under one person's control (Colab) one could change the outcome of matches.

@wonderingabout
Copy link
Contributor Author

wonderingabout commented Aug 15, 2018

i dont think so, if that were the case it would be easily noticeable in match games
also, this can happen in the future too

i see it just as a fortunate opportunity to optimize match settings

@kfc51151271
Copy link

Perhaps because of the similar strength of the recent network? Most of the new challenger networks seem to have similar strength to the old one (mostly in the 48-53 range, according to match result). After multiple matches of similar strength networks some small probability results are possible (for example, a 50 winrate network may gain a high winning rate early and be promoted by SPRT).

I noticed that the network promoted by SPRT still end up with 400 match games. So why dont we force the network not be promoted before 400 games?

@MartinDevelopment
Copy link

MartinDevelopment commented Aug 15, 2018

There seems to issue to me with what we are currently doing. It just seems that we are unlucky.
`

Network: c910dee9     9c56ae62    
  Game nr. Win amount % of total wins Game nr. Win amount % of total wins
  1-100 39 9.82 1-100 61 15.37
  101-200 46 11.59 101-200 54 13.60
  201-300 54 13.60 201-300 46 11.59
  301-397 44 11.08 301-397 53 13.35

`

@l1t1
Copy link

l1t1 commented Aug 15, 2018

for 425 games, what winrate can pass?

@l1t1
Copy link

l1t1 commented Aug 15, 2018

http://zero.sjeng.org/match-games/5b6de84ddfd61771ce49e084 402 games 54.98 cannot pass

@wonderingabout
Copy link
Contributor Author

3rd time that it happens, now with lz 169

https://zero.sjeng.org/match-games/5b7b67d9cc3dde4a3e75ec80

the data shows that a lower limit of 200 games for SPRT pass would have avoided that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants