Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAI periodically disconnects in sabaki #123

Open
cryptsport opened this issue Jul 27, 2020 · 27 comments
Open

SAI periodically disconnects in sabaki #123

cryptsport opened this issue Jul 27, 2020 · 27 comments
Assignees
Labels
bug Something isn't working

Comments

@cryptsport
Copy link

0.17.5 works without problems. but 0.17.6 when playing against lz periodically stops and you have to start the match further

@cryptsport
Copy link
Author

has anyone come across this? maybe the problem is sabaki?

@cryptsport
Copy link
Author

launched in q5go - same problem. 0.17.5 works, 0.17.6 not. and if I want to play against sai, the same. does this problem exist? or am I doing something wrong?

@Vandertic
Copy link
Member

@cryptpark can you give more information? For example command line in Sabaki for example?

@cryptsport
Copy link
Author

cryptsport commented Sep 3, 2020

Yes of course! now did it again: Sabaki 0.51.1, sai-0.17.6-gpu, networkfile a8e32bb8, --gtp --noponder -w networkfile.gz

Thinking at most 36.3 seconds...
NN eval=0.497300. Agent eval=0.497711 (lambda=0.300, mu=0.030)
cpus=10
Playouts: 25, Win: 52.85%, PV: R16 Q4 D4
Playouts: 77, Win: 51.93%, PV: D4 Q16 Q4 F16 R17 Q17 R16
Playouts: 120, Win: 51.79%, PV: Q17 Q4 D4 Q16 R16 R15 R17
Playouts: 178, Win: 51.66%, PV: R16 Q4 D4 Q16 Q17 R15 R17
Playouts: 232, Win: 51.58%, PV: Q17 Q4 D4 Q16 R16 R15 R17
Playouts: 316, Win: 51.59%, PV: Q17 Q4 D4 Q16 R16 R15 R17
Playouts: 402, Win: 51.61%, PV: Q17 Q4 D4 Q16 R16 R15 R17
Playouts: 557, Win: 51.67%, PV: Q17 Q4 D4 Q16 R16 R15 R17 F16
Playouts: 670, Win: 51.71%, PV: Q17 Q4 D4 Q16 R16 R15 R17 F16
Playouts: 742, Win: 51.72%, PV: Q17 Q4 D4 Q16 R16 R15 R17 F16
Playouts: 825, Win: 51.74%, PV: R16 Q4 D4 Q16 Q17 R15 R17
Playouts: 899, Win: 51.74%, PV: Q17 R4 D4 Q16 R16 R15 R17 O4
Playouts: 1007, Win: 51.74%, PV: R4 R16 D4 Q4 Q3 R5 R3 O16
Playouts: 1114, Win: 51.76%, PV: R4 R16 D4 Q4 Q3 R5 R3 O16

Q17 -> 112 (V: 52.29%) (LCB: 51.81%) (N: 5.74%) (A: -0.7) (B: 0.13) PV: Q17 R4 D4 Q16 R16 R15 R17 O4
R4 -> 122 (V: 52.19%) (LCB: 51.76%) (N: 6.36%) (A: -0.7) (B: 0.13) PV: R4 R16 D4 Q4 Q3 R5 R3 O16
R16 -> 104 (V: 52.20%) (LCB: 51.68%) (N: 5.69%) (A: -0.7) (B: 0.13) PV: R16 Q3 D4 Q16 Q17 P17 R17 F16
C4 -> 121 (V: 52.11%) (LCB: 51.66%) (N: 5.04%) (A: -0.7) (B: 0.13) PV: C4 R16 Q4 D4 D3 C5 C3 O16
D3 -> 118 (V: 52.10%) (LCB: 51.65%) (N: 5.19%) (A: -0.7) (B: 0.13) PV: D3 Q16 R4 D4 C4 C5 C3 P3 Q6
Q3 -> 106 (V: 52.04%) (LCB: 51.55%) (N: 6.37%) (A: -0.6) (B: 0.13) PV: Q3 R16 D4 Q4 R4 R5 R3 O16
Q4 -> 153 (V: 51.60%) (LCB: 51.13%) (N: 9.99%) (A: -0.5) (B: 0.13) PV: Q4 R16 C4 D4 D3 C5 C3 O16 E17
Q16 -> 116 (V: 51.53%) (LCB: 51.01%) (N: 8.24%) (A: -0.5) (B: 0.13) PV: Q16 D3 Q4 F16 C5 F4 C8 R17 R16
D4 -> 113 (V: 51.47%) (LCB: 50.92%) (N: 7.69%) (A: -0.5) (B: 0.13) PV: D4 R16 R4 Q4 Q3 R5 R3 O16
D16 -> 62 (V: 50.19%) (LCB: 49.16%) (N: 9.00%) (A: -0.0) (B: 0.13) PV: D16 D17 E17 D15 E16 C17 Q16 Q4 D4 J17 F14 D13 L17
C3 -> 8 (V: 50.20%) (LCB: 46.27%) (N: 0.98%) (A: -0.0) (B: 0.13) PV: C3 Q16 Q4 F16 R17 Q17 R16
R17 -> 7 (V: 49.98%) (LCB: 42.60%) (N: 0.95%) (A: 0.1) (B: 0.13) PV: R17 Q4 D4 F16 R3 Q3 R4
R3 -> 7 (V: 49.67%) (LCB: 43.31%) (N: 1.07%) (A: 0.2) (B: 0.13) PV: R3 Q16 D4 F16 R17 Q17 R16
E17 -> 6 (V: 49.43%) (LCB: 38.01%) (N: 0.95%) (A: 0.2) (B: 0.12) PV: E17 Q4 D15 C15 D14 C13

Root -> 1157 (V: 51.79%) (LCB: 51.54%) (N: 0.00%) (A: -0.6) (B: 0.13)

6.2 average depth, 15 max depth
929 non leaf nodes, 1.24 average children
1157 visits, 405223 nodes, 1155 playouts, 31 n/s

and that is all! nothing on the board! sometimes after several moves, sometimes, like now - after the first.
EDIT I noticed a difference: in 0.17.5 after lz-genmove_analyze W 50 "= info move (...)" and move, but in 0.17.6 these lines are missing

@Vandertic
Copy link
Member

Uhm, I tried to reproduce your bug, without success. I don't suppose you are using a peculiar gpu or gpu driver? Because it is known that some OpenCL drivers can be broken and this sort of things could happen.
In particular: the output of lz-genmove_analyze is the same for both versions (apart from the field areas added to 0.17.6) and the fact that info move does not appear means that the search has crashed.
On the reason why 0.17.5 appears to work and 0.17.6 no, I suppose that the problem might be triggered by some improvements on Network added by LZ devs to LZ/next that we pulled.
To be sure that the problem lies there, you should try to run SAI with --cpu-only option and see if this stops crashes.

@cryptsport
Copy link
Author

cryptsport commented Sep 4, 2020

sai-0.17.6-cpu doesn't work either (the same way). I noticed that the last attempts didn't even have one move. I'll try to find out why there were sometimes several moves before.
I run many different engines in Sabaki, this was not the case with others (katago, lz, gtp4zen, amigo...)
EDIT maybe I was able to figure out something. with a smaller net, 9b, while it works (more than 50 moves, the game continues). but the sai network is not as big as that of katago, 40x384, which works for me.
EDIT 2 sai-0.17.6-gpu (--cpu-only) doesn't work 12b network, work 9b network.
12b - a8e32bb8, 9b - c5de38e8
EDIT 3 nvidea drivers for this video card, first network 12b 88b43a77 also doesn't work (sai-0.17.6)

@cryptsport
Copy link
Author

cryptsport commented Sep 4, 2020

I added "-t 2" and sai-0.17.6-gpu, network 12b works now. interesting to see your comment.
EDIT up to "-t 5" works, "-t 6" doesn't work network 12b
EDIT 2 up to "-t 3" works, "-t 4" doesn't work network 20b c215fd3b
(AMD Athlon X4 950)
0.17.5 - everything is working

@Vandertic
Copy link
Member

Wow. This is interesting and I don't think I have ever seen this problem anywhere else.
I still believe the problem has to do with latest LZ commits which we included. Unfortunately there is no release for this version of LZ, so to check if the problem is there one would need to compile it under Windows and try.
Sorry, but I really don't understand what's happening with your configuration. I'll ask @amato-gianluca if he has any ideas...

@Vandertic Vandertic added the bug Something isn't working label Sep 6, 2020
@cryptsport
Copy link
Author

cryptsport commented Sep 6, 2020

I disabled CPU virtualization, it gave nothing. and today, at "-t 3", the stop at move 74. there is little information in the sabaki log file. can I get a more detailed log somehow?
I am interested in your project. if any tests are needed, I'm ready!

@cryptsport
Copy link
Author

I closed all other applications and sai-0.17.6-gpu, network 20b work with "t-5". but without parameter "t-" it still doesn't work. in general, nothing is clear :)

@cryptsport
Copy link
Author

cryptsport commented Sep 7, 2020

with the parameter "-t 3" sai-0.17.6-gpu is reasonably stable (~ 50%). does this parameter affect strength or playstyle?
with the same visits? sec per move?

@Vandertic
Copy link
Member

It is an optimization parameter. To get the most "nodes" per second (and hence the less seconds per move) you have to find the optimal value, which will be generally depend on your hardware configuration. Actually, changing this number will also change a bit the playing style, as the tree exploration will change. The difference should be small though.
BTW, can I ask you again your hardware and software setup? I didn't understand it well from what you wrote above.

@cryptsport
Copy link
Author

AMD Athlon X4 950, GF GT610, 8GB RAM, Windows 7. ok?

@cryptsport
Copy link
Author

cryptsport commented Sep 8, 2020

you previously wrote "info move does not appear means that the search has crashed." in the sabaki log file I found this:

[2020-09-08 15:03:05.161] sai (in) : play B D8
[2020-09-08 15:03:05.196] sai (out) : =
[2020-09-08 15:03:05.196] sai (out) :
[2020-09-08 15:03:05.217] sai (in) : lz-genmove_analyze W 50
[2020-09-08 15:03:05.246] sai (out) : =
[2020-09-08 15:03:05.250] sai (err) : Thinking at most 5.0 seconds...
[2020-09-08 15:03:05.487] sai (err) : NN eval=0.455956. Agent eval=0.463221 (lambda=0.300, mu=0.030)
[2020-09-08 15:03:05.488] sai (err) : cpus=3
[2020-09-08 15:03:05.734] sai (out) :
[2020-09-08 15:03:07.739] sai (err) : Playouts: 18, Win: 46.43%, PV: C8 D9 D10 F10 B12 B13 D7
[2020-09-08 15:03:10.393] sai (err) :
[2020-09-08 15:03:10.409] sai (err) : C8 - 20 (V: 45.63%) (LCB: 39.48%) (N: 46.87%) (A: 1.8) (B: 0.12) PV: C8 D9 D10 F10 B12 B13 D7 F8 F7
[2020-09-08 15:03:10.409] sai (err) : B9 - 11 (V: 43.05%) (LCB: 29.87%) (N: 13.27%) (A: 2.9) (B: 0.11) PV: B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9
[2020-09-08 15:03:10.410] sai (err) : D7 - 7 (V: 38.72%) (LCB: 3.70%) (N: 17.57%) (A: 4.8) (B: 0.11) PV: D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:10.410] sai (err) :
[2020-09-08 15:03:10.410] sai (err) : Root - 40 (V: 44.38%) (LCB: 35.81%) (N: 6.04%) (A: 2.6) (B: 0.11)
[2020-09-08 15:03:10.411] sai (err) :
[2020-09-08 15:03:10.411] sai (err) : 5.8 average depth, 12 max depth
[2020-09-08 15:03:10.412] sai (err) : 33 non leaf nodes, 1.15 average children
[2020-09-08 15:03:10.412] sai (err) : 40 visits, 11510 nodes, 38 playouts, 7 n/s
[2020-09-08 15:03:10.413] sai (err) :
[2020-09-08 15:03:19.758] sai (in) : undo
[2020-09-08 15:03:19.807] sai (out) : =
[2020-09-08 15:03:19.808] sai (out) :
[2020-09-08 15:03:19.846] sai (in) : lz-genmove_analyze W 50
[2020-09-08 15:03:19.899] sai (out) : =
[2020-09-08 15:03:19.900] sai (err) : Thinking at most 5.0 seconds...
[2020-09-08 15:03:19.901] sai (err) : NN eval=0.455956. Agent eval=0.463221 (lambda=0.300, mu=0.030)
[2020-09-08 15:03:19.902] sai (err) : cpus=3
[2020-09-08 15:03:20.365] sai (out) : info move C8 visits 23 winrate 4592 prior 4687 lcb 4055 areas 17311 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 12 winrate 4355 prior 1327 lcb 3166 areas 26834 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:20.869] sai (out) : info move C8 visits 26 winrate 4606 prior 4687 lcb 4141 areas 16656 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 13 winrate 4372 prior 1327 lcb 3323 areas 25939 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:21.387] sai (out) : info move C8 visits 29 winrate 4633 prior 4687 lcb 4209 areas 15565 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 14 winrate 4371 prior 1327 lcb 3437 areas 25814 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:21.891] sai (out) : info move C8 visits 32 winrate 4691 prior 4687 lcb 4252 areas 13231 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 15 winrate 4418 prior 1327 lcb 3527 areas 23619 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:22.367] sai (err) : Playouts: 20, Win: 45.18%, PV: C8 D9 D10 F10 B12 B13 D7 F8 F7
[2020-09-08 15:03:22.400] sai (out) : info move C8 visits 35 winrate 4690 prior 4687 lcb 4270 areas 13286 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 7 winrate 3872 prior 1757 lcb 369 areas 47614 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:22.915] sai (out) : info move C8 visits 38 winrate 4686 prior 4687 lcb 4300 areas 13481 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 8 winrate 4095 prior 1757 lcb 689 areas 37313 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:23.419] sai (out) : info move C8 visits 41 winrate 4701 prior 4687 lcb 4313 areas 12961 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 9 winrate 4194 prior 1757 lcb 1404 areas 32905 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:23.935] sai (out) : info move C8 visits 44 winrate 4762 prior 4687 lcb 4362 areas 10532 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 10 winrate 4270 prior 1757 lcb 1918 areas 29615 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:24.438] sai (out) : info move C8 visits 47 winrate 4753 prior 4687 lcb 4371 areas 10986 order 0 pv C8 D9 D10 F10 B12 B13 D7 F8 F7 info move B9 visits 16 winrate 4344 prior 1327 lcb 3418 areas 27001 order 1 pv B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8 info move D7 visits 11 winrate 4212 prior 1757 lcb 2189 areas 32380 order 2 pv D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:25.112] sai (out) : play C8
[2020-09-08 15:03:25.113] sai (out) :
[2020-09-08 15:03:25.114] sai (err) :
[2020-09-08 15:03:25.115] sai (err) : C8 - 49 (V: 47.33%) (LCB: 43.60%) (N: 46.87%) (A: 1.2) (B: 0.12) PV: C8 D9 D10 F10 B12 B13 D7 F8 F7
[2020-09-08 15:03:25.116] sai (err) : B9 - 16 (V: 43.45%) (LCB: 34.19%) (N: 13.27%) (A: 2.7) (B: 0.11) PV: B9 B5 H3 C8 B8 D7 E10 F10 E9 E11 F9 G9 G8
[2020-09-08 15:03:25.116] sai (err) : D7 - 13 (V: 41.91%) (LCB: 26.60%) (N: 17.57%) (A: 3.3) (B: 0.11) PV: D7 E8 C8 E7 D9 E9 B12
[2020-09-08 15:03:25.116] sai (err) :
[2020-09-08 15:03:25.117] sai (err) : Root - 80 (V: 45.59%) (LCB: 41.00%) (N: 6.04%) (A: 1.8) (B: 0.11)
[2020-09-08 15:03:25.117] sai (err) :
[2020-09-08 15:03:25.117] sai (err) : 5.9 average depth, 14 max depth
[2020-09-08 15:03:25.117] sai (err) : 61 non leaf nodes, 1.28 average children
[2020-09-08 15:03:25.118] sai (err) : 80 visits, 23314 nodes, 40 playouts, 8 n/s
[2020-09-08 15:03:25.118] sai (err) :
[2020-09-08 15:03:25.151] leelaz (in) : play W C8

here: (I clicked "start engine vs engine game", the game resumed) [2020-09-08 15:03:19.758] (in) : undo
does this not mean that the move was found, but perhaps not passed to sabaki?

@cryptsport
Copy link
Author

cryptsport commented Sep 8, 2020

it's still very interesting to find the cause of the failure! I ran it in SmartGo, and so far everything is fine! (sai-0.17.6-gpu, --gtp --noponder -w networkfile.gz, network 20b 7fa70321, game over, 242 moves) I want to check, make a match of several dozen games
EDIT match of 10 games - everything is fine (SmartGo - SAI). and plays with KataGo without any problems

@cryptsport
Copy link
Author

SmartGo does not have a separate line like Sabaki for time_settings. (eg time_settings 0 6 1) how to do it?

@Vandertic
Copy link
Member

I am at a loss. Glad to hear that at least with smartgo SAI seems to work fine. Will think about this further.

@cryptsport
Copy link
Author

works with drago too! works with sabaki 0.33.4. "incompatibility" probably arose with the latest versions of sabaki. in about an hour I will search starting from which version of sabaki the crash occurs

@cryptsport
Copy link
Author

cryptsport commented Sep 9, 2020

works with sabaki 0.33.4, 0.35.1, 0.40.0, 0.40.1. Further 0.41.0, 0.43.3 doesn't work. does it tell you anything?
is it possible to fix it? or is it a sabaki problem?
EDIT I tried it again in q5go, it does not work. works with "-t 3"

@cryptsport
Copy link
Author

I didn't expect this, but it works on the old version q5go-1.1-win! maybe there is a simple explanation for this?

@cryptsport
Copy link
Author

even "-t 1" does not help in version 0.18.1. stops after a few moves, often does not even make 1 move. with sai network and lz network

@cryptsport cryptsport changed the title version 0.17.6 periodically disconnects in sabaki SAI periodically disconnects in sabaki Feb 20, 2021
@Vandertic
Copy link
Member

Try version 0.18.2. We reverted from a shared mutex update that might make the problem worse.
Also, have you tried to use --cpu-only?

@cryptsport
Copy link
Author

I tried cpu and gpu. now I will check sai-0.18.2-gpu. but it works with drago!

@cryptsport
Copy link
Author

I was not expecting!!! sai-0.18.2-gpu works! I will check again

@cryptsport
Copy link
Author

cryptsport commented Feb 20, 2021

worked with sai network without stopping until the end of the game. for some reason the problem with the lz network remains
EDIT again launched with sai network - it works!

@cryptsport
Copy link
Author

sai-0.18.2-cpu does not work with the networks sai and lz

@cryptsport
Copy link
Author

cryptsport commented Feb 20, 2021

again launched sai-0.18.2-gpu with lz network - now it works.
sai-0.18.2-gpu with lz network doesn't work now. probably depends on the mood

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants