Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LZ winrate vs various anchors #1557

Open
ryouiki opened this issue Jun 15, 2018 · 48 comments
Open

LZ winrate vs various anchors #1557

ryouiki opened this issue Jun 15, 2018 · 48 comments

Comments

@ryouiki
Copy link

ryouiki commented Jun 15, 2018

I did over ten thousand of matches to test recent LZ networks versus various anchor networks to see how much each LZ network performs vs other known networks.

400visits200games_7

ex) Each green dot represents a win rate(y) of each LZ network(x) versus ELF network. (200 visits / 200 matches) Green line represents win rate trends.

Sometimes newer network perform worse than its predecessor vs certain anchor.
However, they seem to follow the winning trends in the long run.

Edit) update : ~LZ#150, LeelaMaster G13
Edit) update : duplicated match fixed
Edit) update 06.28 : ~LZ#152, 20 Block V16
Edit) update 07.01 : ~LZ#153
Edit) update 07.04 : 20 Block V17
Edit) update 07.23 : ~LZ#157

@dzhurak
Copy link

dzhurak commented Jun 15, 2018

Nice job!

@l1t1
Copy link

l1t1 commented Jun 15, 2018

nice diagram

@sethtroisi
Copy link
Member

Nice work! We've been doing something similar in MiniGo were we play the best network against ~20 older networks (last, 2 ago, 4 ago, 10 ago...) Which has reduced the noise in our signal but this is more need because we often have -200 elo followed by plus 200 elo.

@ryouiki
Copy link
Author

ryouiki commented Jun 21, 2018

Chart updated

  • LZ#149, LZ#150 matches.
  • LeelaMaster G13 anchor.

@TFiFiE
Copy link
Contributor

TFiFiE commented Jun 21, 2018

Can you give us the numbers in raw text? One could use those to estimate the self-play inflation factor.

@ryouiki
Copy link
Author

ryouiki commented Jun 21, 2018

LZ# vs LZ#117 vs LZ#121 vs LZ#130 vs LM_G13 vs 20Bv15 vs ELF vs LZ#133 vs 20Bv14
121 58.0%         7.0%    
122                
123                
124 61.0%              
125 62.5% 61.5%            
126 71.0%              
127 73.0% 57.5%            
128 70.0% 59.5%            
129 71.0% 63.0%            
130 70.0% 67.5%       14.0%    
131           15.0%    
132                
133 78.0% 70.5% 55.5% 26.0% 24.5% 13.5%   33.5%
134                
135 69.5% 71.5% 50.5%     10.5% 54.5% 35.5%
136                
137 77.5% 75.0% 58.5% 34.5% 35.0% 14.5% 51.5%  
138 81.5% 75.0% 59.5%     15.5% 54.0%  
139 79.5% 74.5% 59.0% 41.0% 34.5% 14.5% 49.5%  
140   78.5% 66.0%     18.5% 55.5%  
141 80.0% 73.5% 53.5% 36.0% 29.0% 19.5% 55.5% 38.5%
142 82.0%   60.5%     15.5% 60.5% 45.0%
143   82.5% 62.5%     17.0% 65.0% 44.0%
144 80.0% 76.0% 61.0% 41.5% 29.0% 18.5% 60.5% 39.0%
145 81.0% 78.0% 67.5% 39.5% 38.0% 19.0% 69.5% 47.0%
146 78.0% 81.0% 69.0% 42.5% 41.0% 19.5% 68.0% 52.0%
147 86.5% 82.0% 75.0% 48.0% 42.5% 25.5%   52.5%
148 79.0% 83.0% 71.0% 48.5% 42.5% 30.0%   53.5%
149 83.5% 82.0% 74.0% 48.5% 46.0% 29.5% 77.0%  
150 82.5% 86.0% 80.5% 54.5% 52.0% 30.5%    

@TFiFiE
Copy link
Contributor

TFiFiE commented Jun 21, 2018

If it's 200 games per match, why are some winrates not a multiple of 0.5%?

@pcengine
Copy link

@ryouiki Could you please add testing LeelaMaster G9? It seems G9 is a bit stronger than G13...

@ryouiki
Copy link
Author

ryouiki commented Jun 22, 2018

@TFiFiE Though I gave -m10 option, duplicated game could occur. Duplicated matches are not counted.
For LZ150 vs ELF matches, there were four duplicated games with ladder.
LZ150 win 60 games out of 196 = 30.6% win rate.

LZ#150 vs ELFv0 match SGFs

LZ#150 vs ELF Match Detail
# Black: Leela Zero
# BlackCommand: ..\..\..\leelaz.exe -g -v 400 --noponder -t4 -q -m10 -r10 -w ..\..\..\net#150.2b80.gz
# BlackLabel: Leela Zero:0.15
# BlackVersion: 0.15
# Date: June 20, 2018 5:19:47 AM
# Host: ---
# Komi: 7.5
# Referee: -
# Size: 19
# White: Leela Zero
# WhiteCommand: ..\..\..\leelaz.exe -g -v 400 --noponder -t4 -q -m10 -r10 -w ..\..\..\net#elf0.62b5.gz
# WhiteLabel: Leela Zero:0.15
# WhiteVersion: 0.15
# Xml: 0
#
#GAME	RES_B	RES_W	RES_R	ALT	DUP	LEN	TIME_B	TIME_W	CPU_B	CPU_W	ERR	ERR_MSG
0	W+R	W+R	W+R	0	-	184	24.4	38.6	0	0	0	
1	W+R	W+R	W+R	1	-	235	27.3	45	0	0	0	
2	W+R	W+R	W+R	0	-	184	24	35.2	0	0	0	
3	W+R	W+R	W+R	1	-	283	32.9	49.8	0	0	0	
4	B+R	B+R	B+R	0	-	109	12.3	20	0	0	0	
5	W+R	W+R	W+R	1	-	319	32.8	55.9	0	0	0	
6	W+R	W+R	W+R	0	-	162	22.6	33.9	0	0	0	
7	W+R	W+R	W+R	1	-	285	34.3	52.7	0	0	0	
8	B+R	B+R	B+R	0	-	91	10.5	18.8	0	0	0	
9	B+R	B+R	B+R	1	-	156	22.2	31.9	0	0	0	
10	B+R	B+R	B+R	0	-	95	11.7	17.5	0	0	0	
11	W+R	W+R	W+R	1	-	239	32.4	50.7	0	0	0	
12	W+R	W+R	W+R	0	-	274	37.5	58.1	0	0	0	
13	W+R	W+R	W+R	1	-	191	24.3	34.9	0	0	0	
14	B+R	B+R	B+R	0	-	91	9.1	17.3	0	0	0	
15	W+R	W+R	W+R	1	-	183	24.3	38.9	0	0	0	
16	B+R	B+R	B+R	0	-	187	25.2	38.1	0	0	0	
17	W+R	W+R	W+R	1	-	233	29.8	45.5	0	0	0	
18	B+R	B+R	B+R	0	-	117	15.8	24.1	0	0	0	
19	W+R	W+R	W+R	1	-	287	35.3	57.3	0	0	0	
20	B+R	B+R	B+R	0	-	91	9.4	19.1	0	0	0	
21	W+R	W+R	W+R	1	-	179	22.4	33.7	0	0	0	
22	W+R	W+R	W+R	0	-	178	23	36.2	0	0	0	
23	B+R	B+R	B+R	1	-	100	12.9	20.3	0	0	0	
24	W+R	W+R	W+R	0	-	164	20.8	32.5	0	0	0	
25	W+R	W+R	W+R	1	-	267	36.8	57.7	0	0	0	
26	W+R	W+R	W+R	0	-	222	29	44.7	0	0	0	
27	W+R	W+R	W+R	1	-	245	31.4	51.3	0	0	0	
28	W+R	W+R	W+R	0	-	146	20.8	30	0	0	0	
29	W+R	W+R	W+R	1	-	249	29.5	48.7	0	0	0	
30	W+R	W+R	W+R	0	-	184	21.1	31.5	0	0	0	
31	W+R	W+R	W+R	1	-	159	17.9	31.9	0	0	0	
32	W+R	W+R	W+R	0	-	294	39.2	60.2	0	0	0	
33	W+R	W+R	W+R	1	-	183	23.8	34.7	0	0	0	
34	B+R	B+R	B+R	0	-	91	6.8	12.7	0	0	0	
35	B+R	B+R	B+R	1	-	270	35.8	54.5	0	0	0	
36	B+R	B+R	B+R	0	-	91	8.6	12.7	0	0	0	
37	B+R	B+R	B+R	1	-	104	12	18	0	0	0	
38	B+R	B+R	B+R	0	-	91	9.4	13.4	0	0	0	
39	W+R	W+R	W+R	1	-	133	16.8	24.9	0	0	0	
40	W+R	W+R	W+R	0	-	268	31.8	52.8	0	0	0	
41	W+R	W+R	W+R	1	-	221	30	45	0	0	0	
42	W+R	W+R	W+R	0	-	158	19.8	30.5	0	0	0	
43	W+R	W+R	W+R	1	-	297	36.3	57.2	0	0	0	
44	W+R	W+R	W+R	0	-	100	12.2	19	0	0	0	
45	W+R	W+R	W+R	1	-	213	27	42.6	0	0	0	
46	W+R	W+R	W+R	0	-	268	36.2	60.2	0	0	0	
47	W+R	W+R	W+R	1	-	159	20.2	30.8	0	0	0	
48	W+R	W+R	W+R	0	-	138	17.9	27.6	0	0	0	
49	W+R	W+R	W+R	1	-	147	18.9	28.2	0	0	0	
50	W+R	W+R	W+R	0	-	208	27.7	42.1	0	0	0	
51	B+R	B+R	B+R	1	-	214	23	36.7	0	0	0	
52	W+R	W+R	W+R	0	-	176	22.4	32.5	0	0	0	
53	W+R	W+R	W+R	1	-	263	32.7	53.5	0	0	0	
54	W+R	W+R	W+R	0	-	286	36.2	55.9	0	0	0	
55	W+R	W+R	W+R	1	-	153	18.8	27.3	0	0	0	
56	W+R	W+R	W+R	0	-	314	34.5	54.5	0	0	0	
57	W+R	W+R	W+R	1	-	295	37.2	56.1	0	0	0	
58	B+R	B+R	B+R	0	-	91	13.7	21.5	0	0	0	
59	W+R	W+R	W+R	1	-	155	20.2	29.5	0	0	0	
60	W+R	W+R	W+R	0	-	238	30	43.8	0	0	0	
61	W+R	W+R	W+R	1	-	263	30.9	51.1	0	0	0	
62	B+R	B+R	B+R	0	-	91	10	14.3	0	0	0	
63	B+R	B+R	B+R	1	-	186	24	38.7	0	0	0	
64	W+R	W+R	W+R	0	-	284	32.7	52.9	0	0	0	
65	W+R	W+R	W+R	1	-	169	23.8	35.6	0	0	0	
66	W+R	W+R	W+R	0	-	144	17.5	28	0	0	0	
67	W+R	W+R	W+R	1	-	149	16.7	23.7	0	0	0	
68	W+R	W+R	W+R	0	-	188	23.3	35.5	0	0	0	
69	W+R	W+R	W+R	1	-	177	22.4	35.6	0	0	0	
70	W+R	W+R	W+R	0	-	166	21.1	31.6	0	0	0	
71	W+R	W+R	W+R	1	-	159	22.1	34.3	0	0	0	
72	B+R	B+R	B+R	0	-	91	8.5	13.3	0	0	0	
73	W+R	W+R	W+R	1	-	153	22	35.9	0	0	0	
74	B+R	B+R	B+R	0	-	109	13	20	0	0	0	
75	B+R	B+R	B+R	1	-	224	26	40.3	0	0	0	
76	B+R	B+R	B+R	0	-	167	20.3	30	0	0	0	
77	W+R	W+R	W+R	1	-	175	22.3	36.2	0	0	0	
78	W+R	W+R	W+R	0	-	186	25.9	40.3	0	0	0	
79	W+R	W+R	W+R	1	-	207	28.1	43.1	0	0	0	
80	B+R	B+R	B+R	0	-	127	15.2	23.6	0	0	0	
81	W+R	W+R	W+R	1	-	251	28.9	46.7	0	0	0	
82	W+R	W+R	W+R	0	-	146	17.4	26.3	0	0	0	
83	W+R	W+R	W+R	1	-	171	21.8	32.9	0	0	0	
84	W+R	W+R	W+R	0	-	136	16.1	25.4	0	0	0	
85	W+R	W+R	W+R	1	-	227	32.5	49.3	0	0	0	
86	W+R	W+R	W+R	0	-	144	18.6	30.4	0	0	0	
87	W+R	W+R	W+R	1	-	155	20.4	27.8	0	0	0	
88	W+R	W+R	W+R	0	-	306	39.1	58.6	0	0	0	
89	B+R	B+R	B+R	1	-	142	18.8	29.7	0	0	0	
90	B+R	B+R	B+R	0	-	133	16.1	22.6	0	0	0	
91	W+R	W+R	W+R	1	-	185	23.4	34.5	0	0	0	
92	W+R	W+R	W+R	0	-	212	24.9	37.7	0	0	0	
93	W+R	W+R	W+R	1	-	223	30.2	48	0	0	0	
94	W+R	W+R	W+R	0	-	132	15.9	24	0	0	0	
95	W+R	W+R	W+R	1	-	259	27.3	44.7	0	0	0	
96	W+R	W+R	W+R	0	-	182	24.2	37.8	0	0	0	
97	B+R	B+R	B+R	1	-	120	15.4	24.2	0	0	0	
98	B+R	B+R	B+R	0	20?	91	9.9	19.3	0	0	0	
99	W+R	W+R	W+R	1	-	241	30.1	47.6	0	0	0	
100	W+R	W+R	W+R	0	-	296	34.9	56.6	0	0	0	
101	W+R	W+R	W+R	1	-	91	11.2	15.8	0	0	0	
102	B+R	B+R	B+R	0	-	91	11	19	0	0	0	
103	B+R	B+R	B+R	1	-	92	12.7	18.5	0	0	0	
104	B+R	B+R	B+R	0	-	91	11.4	19.4	0	0	0	
105	B+R	B+R	B+R	1	-	92	11.3	17.8	0	0	0	
106	W+R	W+R	W+R	0	-	216	27.4	40.6	0	0	0	
107	W+R	W+R	W+R	1	-	143	17.6	26.4	0	0	0	
108	B+R	B+R	B+R	0	-	91	9.1	19.4	0	0	0	
109	B+R	B+R	B+R	1	-	164	21.2	32.4	0	0	0	
110	B+R	B+R	B+R	0	-	121	14	23.3	0	0	0	
111	B+R	B+R	B+R	1	-	148	17.8	28.9	0	0	0	
112	B+R	B+R	B+R	0	-	157	20.7	29.9	0	0	0	
113	W+R	W+R	W+R	1	-	295	35.4	58.4	0	0	0	
114	W+R	W+R	W+R	0	-	282	38.2	57.8	0	0	0	
115	W+R	W+R	W+R	1	-	217	31	44.2	0	0	0	
116	W+R	W+R	W+R	0	-	106	13.5	20.1	0	0	0	
117	B+R	B+R	B+R	1	-	280	33.9	54.9	0	0	0	
118	B+R	B+R	B+R	0	-	133	15.8	21.4	0	0	0	
119	B+R	B+R	B+R	1	-	214	28	42.8	0	0	0	
120	W+R	W+R	W+R	0	-	114	14.5	22	0	0	0	
121	W+R	W+R	W+R	1	-	203	24.7	36.4	0	0	0	
122	W+R	W+R	W+R	0	-	174	22.9	36.8	0	0	0	
123	W+R	W+R	W+R	1	-	251	32.3	50.9	0	0	0	
124	W+R	W+R	W+R	0	-	188	26.3	43.4	0	0	0	
125	W+R	W+R	W+R	1	-	293	41.5	62.7	0	0	0	
126	B+R	B+R	B+R	0	72?	91	8.7	13.2	0	0	0	
127	W+R	W+R	W+R	1	-	203	23.5	35.4	0	0	0	
128	W+R	W+R	W+R	0	-	198	25.8	39	0	0	0	
129	W+R	W+R	W+R	1	-	203	26.5	40.9	0	0	0	
130	W+R	W+R	W+R	0	-	168	22.2	35.2	0	0	0	
131	W+R	W+R	W+R	1	-	283	35.5	52.5	0	0	0	
132	W+R	W+R	W+R	0	-	286	32.5	53.5	0	0	0	
133	B+R	B+R	B+R	1	-	92	11.6	18.7	0	0	0	
134	B+R	B+R	B+R	0	-	91	13.1	21.7	0	0	0	
135	W+R	W+R	W+R	1	-	213	27.6	43.4	0	0	0	
136	B+R	B+R	B+R	0	-	173	21.5	32.4	0	0	0	
137	B+R	B+R	B+R	1	-	92	6.6	11.3	0	0	0	
138	W+R	W+R	W+R	0	-	172	21.4	32.4	0	0	0	
139	W+R	W+R	W+R	1	-	169	19.4	29.5	0	0	0	
140	W+R	W+R	W+R	0	-	248	31.7	49.1	0	0	0	
141	W+R	W+R	W+R	1	-	95	11.8	18.8	0	0	0	
142	W+R	W+R	W+R	0	-	180	23.6	35	0	0	0	
143	B+R	B+R	B+R	1	-	94	9.1	13.3	0	0	0	
144	W+R	W+R	W+R	0	-	182	22.9	35.6	0	0	0	
145	B+R	B+R	B+R	1	-	194	23.7	38.5	0	0	0	
146	B+R	B+R	B+R	0	-	91	10.9	20.8	0	0	0	
147	W+R	W+R	W+R	1	-	167	21.7	35.4	0	0	0	
148	W+R	W+R	W+R	0	-	214	27.8	45	0	0	0	
149	B+R	B+R	B+R	1	-	182	20.8	34.5	0	0	0	
150	W+R	W+R	W+R	0	-	238	29.9	48	0	0	0	
151	B+R	B+R	B+R	1	-	102	12.9	20.2	0	0	0	
152	B+R	B+R	B+R	0	-	95	9.4	17	0	0	0	
153	W+R	W+R	W+R	1	-	183	25.7	37	0	0	0	
154	B+R	B+R	B+R	0	-	91	11.3	18.1	0	0	0	
155	W+R	W+R	W+R	1	-	291	34.1	55.1	0	0	0	
156	W+R	W+R	W+R	0	-	176	22.9	34.9	0	0	0	
157	W+R	W+R	W+R	1	-	279	38.3	59.9	0	0	0	
158	B+R	B+R	B+R	0	-	91	6.6	15.8	0	0	0	
159	W+R	W+R	W+R	1	-	231	29.1	48.6	0	0	0	
160	W+R	W+R	W+R	0	-	222	28	40.1	0	0	0	
161	W+R	W+R	W+R	1	-	167	23.6	36.1	0	0	0	
162	W+R	W+R	W+R	0	-	100	11.6	16.5	0	0	0	
163	W+R	W+R	W+R	1	-	231	28.9	44.1	0	0	0	
164	W+R	W+R	W+R	0	-	152	19.4	28.9	0	0	0	
165	W+R	W+R	W+R	1	-	133	15.8	24	0	0	0	
166	W+R	W+R	W+R	0	-	266	33.7	50.6	0	0	0	
167	W+R	W+R	W+R	1	-	213	27.4	40.4	0	0	0	
168	B+R	B+R	B+R	0	-	199	23.4	37.7	0	0	0	
169	W+R	W+R	W+R	1	-	117	14.1	20.2	0	0	0	
170	W+R	W+R	W+R	0	-	296	38	58.8	0	0	0	
171	W+R	W+R	W+R	1	-	249	33.7	49.9	0	0	0	
172	W+R	W+R	W+R	0	-	190	23.3	36.6	0	0	0	
173	B+R	B+R	B+R	1	-	180	22	32.4	0	0	0	
174	B+R	B+R	B+R	0	34?	91	7.6	14.5	0	0	0	
175	W+R	W+R	W+R	1	-	331	49.4	79.2	0	0	0	
176	W+R	W+R	W+R	0	-	174	26.7	41.1	0	0	0	
177	B+R	B+R	B+R	1	-	126	17.1	26.4	0	0	0	
178	B+R	B+R	B+R	0	-	159	18.4	27.2	0	0	0	
179	W+R	W+R	W+R	1	-	151	16.9	24.7	0	0	0	
180	W+R	W+R	W+R	0	-	230	28.1	42.6	0	0	0	
181	W+R	W+R	W+R	1	-	157	19.8	32.4	0	0	0	
182	W+R	W+R	W+R	0	-	208	26.8	38.8	0	0	0	
183	W+R	W+R	W+R	1	-	91	14.6	18.2	0	0	0	
184	B+R	B+R	B+R	0	-	135	17.4	24.9	0	0	0	
185	W+R	W+R	W+R	1	-	189	23.7	39.9	0	0	0	
186	B+R	B+R	B+R	0	-	91	10.5	19	0	0	0	
187	W+R	W+R	W+R	1	-	271	33.5	57.6	0	0	0	
188	B+R	B+R	B+R	0	-	93	12	17.3	0	0	0	
189	W+R	W+R	W+R	1	-	189	23.2	34.9	0	0	0	
190	W+R	W+R	W+R	0	-	162	22.2	33.7	0	0	0	
191	W+R	W+R	W+R	1	-	263	32.5	51.8	0	0	0	
192	W+R	W+R	W+R	0	-	150	18.2	28.8	0	0	0	
193	W+R	W+R	W+R	1	-	215	26.6	44.4	0	0	0	
194	B+R	B+R	B+R	0	34?	91	7	12.5	0	0	0	
195	W+R	W+R	W+R	1	-	287	35.8	57.4	0	0	0	
196	B+R	B+R	B+R	0	-	159	20.3	32.4	0	0	0	
197	B+R	B+R	B+R	1	-	194	26	38.1	0	0	0	
198	B+R	B+R	B+R	0	-	121	14.4	22.5	0	0	0	
199	W+R	W+R	W+R	1	-	197	27.5	41.7	0	0	0	

match 20 = match 98
match 72 = match 126
match 34 = match 174, 194

@ryouiki
Copy link
Author

ryouiki commented Jun 22, 2018

@pcengine

Recent networks beat G9 easily.
LM G9 seems much weaker than G13.

LZ# vs LM G09 vs LM G13
149 63.5% 48.5%
150 68.5% 54.5%

each cell represents winrate from 200 matches (400 visits)

@pcengine
Copy link

@ryouiki Ok, thanks a lot.

@TFiFiE
Copy link
Contributor

TFiFiE commented Jun 22, 2018

If not every match consists of 200 (unique) games, maybe (also) give it as "wins-losses" or "wins/games" to prevent the unnecessary bit of information loss.

@ryouiki
Copy link
Author

ryouiki commented Jun 25, 2018

@TFiFiE There were 14 duplicated matches from total 24200 matches.
To compensate information loss, I did additional 14 matches and updated table above. 😁

@TFiFiE
Copy link
Contributor

TFiFiE commented Jun 25, 2018

Your numbers suggest the supposed rating differences of the networks are in reality inflated by a factor of about 4.

@diadorak
Copy link

An alternative interpretation is higher visits amplify strength differences.

@l1t1
Copy link

l1t1 commented Jun 26, 2018

please add b20v16 compare result

@ryouiki
Copy link
Author

ryouiki commented Jun 28, 2018

Chart updated

LZ#151, LZ#152 matches.
20Block V16 anchor.

@ryouiki
Copy link
Author

ryouiki commented Jul 4, 2018

Chart updated

LZ#153 matches.
20Block V17 anchor.

LZ#  vs LZ#117 vs LZ#121 vs LZ#130 vs LM G13 vs 20Bv15 vs 20Bv16 vs 20Bv17 vs ELF
121 58.0%             7.0%
122                
123                
124 61.0%              
125 62.5% 61.5%            
126 71.0%              
127 73.0% 57.5%            
128 70.0% 59.5%            
129 71.0% 63.0%            
130 70.0% 67.5%           14.0%
131               15.0%
132                
133 78.0% 70.5% 55.5% 26.0% 24.5% 26.5%   13.5%
134                
135 69.5% 71.5% 50.5%         10.5%
136                
137 77.5% 75.0% 58.5% 34.5% 35.0% 28.0% 24.0% 14.5%
138 81.5% 75.0% 59.5%         15.5%
139 79.5% 74.5% 59.0% 41.0% 34.5% 27.0% 26.0%  14.5%
140   78.5% 66.0%         18.5%
141 80.0% 73.5% 53.5% 36.0% 29.0% 27.5% 29.5% 19.5%
142 82.0%   60.5%         15.5%
143   82.5% 62.5%         17.0%
144 80.0% 76.0% 61.0% 41.5% 29.0% 32.0% 30.5% 18.5%
145 81.0% 78.0% 67.5% 39.5% 38.0% 43.0%   19.0%
146 78.0% 81.0% 69.0% 42.5% 41.0% 26.5% 36.0% 19.5%
147 86.5% 82.0% 75.0% 48.0% 42.5% 37.5%   25.5%
148 79.0% 83.0% 71.0% 48.5% 42.5% 38.0% 39.5% 30.0%
149 83.5% 82.0% 74.0% 48.5% 46.0% 41.5% 40.0% 29.5%
150 82.5% 86.0% 80.5% 54.5% 52.0% 46.5% 43.5% 30.5%
151 90.5% 81.0% 75.5% 52.0% 46.0% 46.5% 39.50% 33.5%
152 85.0% 81.5% 76.0% 57.0% 44.5% 51.5% 39.50% 33.0%
153     79.5% 60.0% 55.0% 51.0% 47% 32.0%

@l1t1
Copy link

l1t1 commented Jul 4, 2018

best_5b.txt.gz
in a qq group, someone uploaded this file, who says a foreigner training it with more games , and it is stronger than the offical best 6b, can you test it?

@alreadydone alreadydone mentioned this issue Jul 4, 2018
@ryouiki
Copy link
Author

ryouiki commented Jul 4, 2018

I'll test the 64x5 net. (trained by @NhanHo)
Early result :

  vs new 64x5
LZ#57 (64x5) 18.50%
LZ#91 (128x6) 78.50%

each cell represents winrate from 200 matches (400 visits)

@ryouiki
Copy link
Author

ryouiki commented Jul 5, 2018

LZ network winrates vs new 64x5

400visits200games_new5b

Much better than official 5-Block best (LZ#57) and similar to LZ#75

@jokkebk
Copy link

jokkebk commented Jul 5, 2018

Seems impressive. This net would probably work nicely for mobile?

@l1t1
Copy link

l1t1 commented Jul 6, 2018

how about the strongest 15b weight 4dad, maybe lz153 will be the last king of 15b.

2018-07-06_083331

http://storage.sjeng.org/networks/4dad2d0d72e0ea54f8be4eb366095c9255b92bbd816ccb28ac3c28fb037cbb6d.gz

@ryouiki
Copy link
Author

ryouiki commented Jul 6, 2018

@l1t1 Maybe I could try. However, it will become worthless when there comes legit promotion.

@l1t1
Copy link

l1t1 commented Jul 6, 2018

gcp said he would start next size after a reasonable long peroid in a post

@l1t1
Copy link

l1t1 commented Jul 6, 2018

#1113
gcp
commented 10 days ago
I expect this to be close to parity, so I'll switch if the next learning rate drop (around 200k non-promoted) doesn't produce anything.

@wonderingabout
Copy link
Contributor

@ryouiki can you upload the 200 games archive of lz 153 vs 20b v17, i find them more convenient to watch on my local sgf reader.

thanks

@ryouiki
Copy link
Author

ryouiki commented Jul 9, 2018

@wonderingabout Here is LZ153 vs 20B v17

LZ153vs20V17(v400).zip

Feel free to request any SGFs you would like to watch.

@wonderingabout
Copy link
Contributor

thanks
another reason why i want them is because these sgf show more diversity than official matches

@wonderingabout
Copy link
Contributor

wonderingabout commented Jul 9, 2018

btw i wanted to mention last time that arround 5-10% of the games in the old archive and this one to are wrongly resigned.
one side is clearly wining then resigns.
for example, look at leelazero-1.sgf in the v17 archive

i linked it here: http://eidogo.com/#3EYojEPLI

edit : game 9 of the archive too
http://eidogo.com/#2fFTd0kZs

edit 2 : game 15 of the archive seems so too
http://eidogo.com/#1uR7CWFE5

edit 3: game 21 too it seems
http://eidogo.com/#FpkG2wI4

@ThorAvaTahr
Copy link

hmm that's frustrating, where does this happen? In official games too?

@ThorAvaTahr
Copy link

(btw I would also resign if my opponent would continue playing in such a bad position :P , but I should assume leelaz has no such emotions :) )

@ryouiki
Copy link
Author

ryouiki commented Jul 10, 2018

These diversity comes from lower visits.
It is why I test against multiple anchors to confirm strength.

FYI, I use option -r10 to speed up test while official matches use -r5.

@wonderingabout
Copy link
Contributor

i see, but the data should be taken with a 10% delta considering the wrongly resigned games, i think
however, it is interesting because of diversity

@ryouiki
Copy link
Author

ryouiki commented Jul 10, 2018

I'm testing 20B V18 and LZ#154. However, they are not performing well as expected so far..

@l1t1
Copy link

l1t1 commented Jul 10, 2018

@ryouiki you mean they both weaker than elo shows?

@ryouiki
Copy link
Author

ryouiki commented Jul 11, 2018

In my test, 20 block V18 was inferior to V17.
LZ nets got more winrates vs V18.

LZ# vs 20B V17 vs 20B V18
139 26% 34.5%
144 30.5% 31%
146 36% 32.5%
150 43.5% 46.5%
151 39.5% 44%
152 39.5% 40.5%
153 47% 44.5%

each cell represents winrate from 200 matches (400 visits)

@l1t1
Copy link

l1t1 commented Jul 11, 2018

@l1t1
Copy link

l1t1 commented Jul 11, 2018

20b v18 failed at 0.3

@diadorak
Copy link

Could you also run the tests against LeelaZero + PhoenixGo's weights (#1477 )? Another external benchmark would be very interesting! Thanks.

@ryouiki
Copy link
Author

ryouiki commented Jul 20, 2018

I tried! But, LZ+phoenixGo was not stable to test massive matches. Crashed after 30-40 games. 😟

@wonderingabout
Copy link
Contributor

wonderingabout commented Jul 20, 2018

@ryouiki can i kindly ask you again the game archives of lz 156 vs 20b v17 + lz 157 vs 20b v17
i'd like to have a look at more interesting games

thanks

edit : v18 and v19 showed to be weaker, so i'm interested to see how v17 plays against latest official lz networks rather

@22nsuk
Copy link

22nsuk commented Jul 21, 2018

Is it possible to add a GX37?
Thank you for your work.

@ryouiki
Copy link
Author

ryouiki commented Jul 23, 2018

LZ#156vs20V17(v400).zip
LZ#157vs20V17(v400).zip

Graph updated. (some data points are missing though)

@wonderingabout
Copy link
Contributor

thanks a lot, i greatly appreciate

@diadorak
Copy link

Wow it seems #157 is quite a bit stronger than #156 using ELF as the benchmark.

@wonderingabout
Copy link
Contributor

@ryouiki

hi again ryouiki
do you plan to continue this series
if yes, i'd really like to be able to see game archives of lz 157 vs latest lz (162 for now)

thanks again

@wonderingabout
Copy link
Contributor

wonderingabout commented Aug 10, 2018

@ryouiki
i'm analyzing some of these low visit games, and they are interesting for the diversity they include
if you see my message and you want to continue this series, i'd be very thankful to be able to see lz 157 vs lz162+163, still at 400 visits (there are plenty of 1600/3200 visits games in matches or training, however i find these games have their own interest)

edit : if you do it for lz 164, i'd rather like lz163+164 vs lz 157

@TFiFiE TFiFiE mentioned this issue Mar 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests