Extract the first $100\,000$ games in algebraic notation.
```bash
gunzip all_with_filtered_anotations_since1998.txt.zip -cv | \
tail -n +6 | \
head -n 100000 | \
sed '{s/^.*### //}' | \
sed '{s/[WB][0-9]*\.//g}' > first_100000_games_raw_AN.txt
```

In [None]:
raw_data = open('first_100000_games_raw_AN.txt')

for line in raw_data.readlines()[:2]:
    print(line)

raw_data.close()

d4 d5 c4 e6 Nc3 Nf6 cxd5 exd5 Bg5 Be7 e3 Ne4 Bxe7 Nxc3 Bxd8 Nxd1 Bxc7 Nxb2 Rb1 Nc4 Bxc4 dxc4 Ne2 O-O Nc3 b6 d5 Na6 Bd6 Rd8 Ba3 Bb7 e4 f6 Ke2 Nc7 Rhd1 Ba6 Ke3 Kf7 g4 g5 h4 h6 Rh1 Re8 f3 Bb7 hxg5 fxg5 d6 Nd5+ Nxd5 Bxd5 Rxh6 c3 d7 Re6 Rh7+ Kg8 Rbh1 Bc6 Rh8+ Kf7 Rxa8 Bxd7 Rh7+ 

e4 d5 exd5 Qxd5 Nc3 Qa5 d4 Nf6 Nf3 c6 Ne5 Bf5 g4 Be4 f3 Bd5 a3 Nbd7 Be3 Nxe5 dxe5 Nxg4 Bd4 e6 b4 Qd8 Nxd5 Qxd5 c4 Ne3 cxd5 Nxd1 dxc6 bxc6 Rxd1 Be7 Ba6 O-O Ke2 Rab8 Rc1 Rfd8 Rhd1 c5 Bxc5 Rxd1 Rxd1 Bxc5 bxc5 g6 c6 Rb2+ Rd2 



In [14]:
import chess
import chess.engine
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
import time

In [22]:
EVALS_CNT = 100000
raw_data = open('../first_100000_games_raw_AN.txt')
engine = chess.engine.SimpleEngine.popen_uci('../stockfish_14.1_linux_x64_bmi2/stockfish_14.1_linux_x64_bmi2')
engine.configure({'Threads': 16})

evaluations = {}
for line in tqdm(raw_data.readlines()):
    if len(evaluations.keys()) >= EVALS_CNT:
        break
    board = chess.Board()
    for move_alg_not in line.split():
        if len(evaluations.keys()) % (EVALS_CNT // 100) == 0:
            print(f'Got {len(evaluations)} out of {EVALS_CNT} evaluations.')
        try:
            board.push_san(move_alg_not)
        except BaseException as err:
            print(err)
            break
        if board.epd() in evaluations.keys():
            continue
        evaluations[board.epd()] = engine.analyse(
            board,
            chess.engine.Limit(time=.432),
            info=chess.engine.INFO_SCORE
        )['score'].white().score(mate_score=100000)
engine.quit()
raw_data.close()

  0%|          | 0/100000 [00:00<?, ?it/s]

Got 0 out of 100000 evaluations.


  0%|          | 14/100000 [06:55<907:15:34, 32.67s/it] 

Got 1000 out of 100000 evaluations.


  0%|          | 31/100000 [14:07<632:43:20, 22.79s/it]

Got 2000 out of 100000 evaluations.


  0%|          | 45/100000 [21:13<696:23:26, 25.08s/it] 

Got 3000 out of 100000 evaluations.


  0%|          | 62/100000 [28:23<631:45:39, 22.76s/it]

Got 4000 out of 100000 evaluations.


  0%|          | 80/100000 [35:41<773:42:49, 27.88s/it]

Got 5000 out of 100000 evaluations.


  0%|          | 96/100000 [42:27<699:28:57, 25.21s/it]

Got 6000 out of 100000 evaluations.


  0%|          | 112/100000 [49:55<768:48:23, 27.71s/it]

Got 7000 out of 100000 evaluations.


  0%|          | 130/100000 [57:13<715:08:25, 25.78s/it]

Got 8000 out of 100000 evaluations.


  0%|          | 145/100000 [1:04:25<897:36:54, 32.36s/it]

Got 9000 out of 100000 evaluations.


  0%|          | 162/100000 [1:11:22<621:11:56, 22.40s/it]

Got 10000 out of 100000 evaluations.


  0%|          | 177/100000 [1:18:13<743:18:09, 26.81s/it]

Got 11000 out of 100000 evaluations.


  0%|          | 191/100000 [1:25:47<807:54:43, 29.14s/it] 

Got 12000 out of 100000 evaluations.


  0%|          | 206/100000 [1:33:04<694:53:15, 25.07s/it]

Got 13000 out of 100000 evaluations.


  0%|          | 222/100000 [1:40:30<746:26:33, 26.93s/it]

Got 14000 out of 100000 evaluations.


  0%|          | 239/100000 [1:47:52<951:27:36, 34.33s/it]

Got 15000 out of 100000 evaluations.


  0%|          | 254/100000 [1:54:53<574:28:28, 20.73s/it] 

Got 16000 out of 100000 evaluations.


  0%|          | 264/100000 [1:59:43<581:31:56, 20.99s/it]

illegal san: 'Qe2' in rn3rk1/pb1qbppp/1p1ppn2/2p5/5PP1/3P1N1P/PPPNP1B1/R1BQ1RK1 w - - 1 10


  0%|          | 266/100000 [2:00:23<529:24:27, 19.11s/it]

illegal san: 'Nbd2' in r1bqk2r/1ppn1ppp/p1n1p3/2bpP3/5P2/2P2NP1/PP1P2BP/RNBQK2R w KQkq - 0 8


  0%|          | 269/100000 [2:01:56<703:28:27, 25.39s/it]

Got 17000 out of 100000 evaluations.


  0%|          | 283/100000 [2:08:51<1087:16:09, 39.25s/it]

Got 18000 out of 100000 evaluations.


  0%|          | 300/100000 [2:16:36<701:22:47, 25.33s/it] 

Got 19000 out of 100000 evaluations.


  0%|          | 315/100000 [2:24:01<1119:59:28, 40.45s/it]

Got 20000 out of 100000 evaluations.


  0%|          | 327/100000 [2:30:53<1040:52:34, 37.59s/it]

Got 21000 out of 100000 evaluations.


  0%|          | 341/100000 [2:38:04<920:54:56, 33.27s/it] 

Got 22000 out of 100000 evaluations.


  0%|          | 357/100000 [2:45:26<714:25:20, 25.81s/it] 

Got 23000 out of 100000 evaluations.


  0%|          | 375/100000 [2:52:34<564:27:56, 20.40s/it]

Got 24000 out of 100000 evaluations.


  0%|          | 391/100000 [2:59:33<893:31:04, 32.29s/it]

Got 25000 out of 100000 evaluations.


  0%|          | 404/100000 [3:06:59<916:03:37, 33.11s/it] 

Got 26000 out of 100000 evaluations.


  0%|          | 419/100000 [3:14:20<823:27:18, 29.77s/it] 

Got 27000 out of 100000 evaluations.


  0%|          | 433/100000 [3:21:06<1048:28:35, 37.91s/it]

Got 28000 out of 100000 evaluations.


  0%|          | 444/100000 [3:28:01<963:40:25, 34.85s/it] 

Got 29000 out of 100000 evaluations.


  0%|          | 459/100000 [3:35:56<761:16:08, 27.53s/it] 

Got 30000 out of 100000 evaluations.


  0%|          | 475/100000 [3:43:02<1010:41:30, 36.56s/it]

Got 31000 out of 100000 evaluations.


  0%|          | 492/100000 [3:50:17<590:35:07, 21.37s/it] 

Got 32000 out of 100000 evaluations.


  1%|          | 508/100000 [3:57:11<768:07:59, 27.79s/it]

Got 33000 out of 100000 evaluations.


  1%|          | 526/100000 [4:04:23<673:30:54, 24.37s/it]

Got 34000 out of 100000 evaluations.


  1%|          | 541/100000 [4:11:52<886:03:54, 32.07s/it]

Got 35000 out of 100000 evaluations.


  1%|          | 557/100000 [4:19:07<801:50:05, 29.03s/it]

Got 36000 out of 100000 evaluations.


  1%|          | 576/100000 [4:26:01<516:49:59, 18.71s/it]

Got 37000 out of 100000 evaluations.


  1%|          | 591/100000 [4:32:56<730:55:34, 26.47s/it]

Got 38000 out of 100000 evaluations.


  1%|          | 607/100000 [4:40:34<812:04:24, 29.41s/it]

Got 39000 out of 100000 evaluations.


  1%|          | 623/100000 [4:48:02<766:58:48, 27.78s/it]

Got 40000 out of 100000 evaluations.


  1%|          | 640/100000 [4:55:07<790:59:26, 28.66s/it]

Got 41000 out of 100000 evaluations.


  1%|          | 656/100000 [5:01:53<676:28:54, 24.51s/it]

Got 42000 out of 100000 evaluations.


  1%|          | 672/100000 [5:09:37<912:41:09, 33.08s/it]

Got 43000 out of 100000 evaluations.


  1%|          | 691/100000 [5:16:37<645:21:46, 23.39s/it] 

Got 44000 out of 100000 evaluations.


  1%|          | 705/100000 [5:23:49<663:37:54, 24.06s/it] 

Got 45000 out of 100000 evaluations.


  1%|          | 717/100000 [5:30:45<993:03:59, 36.01s/it] 

Got 46000 out of 100000 evaluations.


  1%|          | 731/100000 [5:38:05<941:36:49, 34.15s/it] 

Got 47000 out of 100000 evaluations.


  1%|          | 749/100000 [5:45:19<598:27:38, 21.71s/it]

Got 48000 out of 100000 evaluations.


  1%|          | 762/100000 [5:52:09<927:10:56, 33.63s/it] 

Got 49000 out of 100000 evaluations.


  1%|          | 775/100000 [5:59:46<861:19:40, 31.25s/it] 

Got 50000 out of 100000 evaluations.


  1%|          | 791/100000 [6:07:03<773:10:34, 28.06s/it]

Got 51000 out of 100000 evaluations.


  1%|          | 805/100000 [13:41:59<76822:02:36, 2788.04s/it] 

Got 52000 out of 100000 evaluations.


  1%|          | 819/100000 [13:48:49<1370:12:55, 49.74s/it]   

Got 53000 out of 100000 evaluations.


  1%|          | 834/100000 [13:56:17<922:27:27, 33.49s/it] 

Got 54000 out of 100000 evaluations.


  1%|          | 850/100000 [14:03:33<863:14:59, 31.34s/it]

Got 55000 out of 100000 evaluations.


  1%|          | 867/100000 [14:10:49<830:23:16, 30.16s/it]

Got 56000 out of 100000 evaluations.


  1%|          | 885/100000 [14:24:43<639:06:42, 23.21s/it]  

Got 57000 out of 100000 evaluations.


  1%|          | 901/100000 [14:32:02<748:57:03, 27.21s/it]

Got 58000 out of 100000 evaluations.


  1%|          | 916/100000 [14:39:16<914:32:14, 33.23s/it] 

Got 59000 out of 100000 evaluations.


  1%|          | 934/100000 [14:46:37<695:29:45, 25.27s/it]

Got 60000 out of 100000 evaluations.


  1%|          | 953/100000 [14:53:38<764:52:03, 27.80s/it]

Got 61000 out of 100000 evaluations.


  1%|          | 971/100000 [15:01:07<860:04:37, 31.27s/it]

Got 62000 out of 100000 evaluations.


  1%|          | 988/100000 [15:08:16<676:54:36, 24.61s/it]

Got 63000 out of 100000 evaluations.


  1%|          | 1002/100000 [15:15:03<879:11:17, 31.97s/it]

Got 64000 out of 100000 evaluations.


  1%|          | 1017/100000 [15:22:12<835:29:37, 30.39s/it]

Got 65000 out of 100000 evaluations.


  1%|          | 1031/100000 [15:30:01<625:58:17, 22.77s/it] 

Got 66000 out of 100000 evaluations.


  1%|          | 1045/100000 [15:36:45<600:29:01, 21.85s/it]

Got 67000 out of 100000 evaluations.


  1%|          | 1061/100000 [15:44:28<743:47:06, 27.06s/it]

Got 68000 out of 100000 evaluations.


  1%|          | 1079/100000 [15:51:37<389:50:02, 14.19s/it]

Got 69000 out of 100000 evaluations.


  1%|          | 1095/100000 [15:58:56<565:31:29, 20.58s/it]

Got 70000 out of 100000 evaluations.


  1%|          | 1110/100000 [16:06:05<779:04:04, 28.36s/it] 

Got 71000 out of 100000 evaluations.


  1%|          | 1126/100000 [16:13:27<597:21:27, 21.75s/it]

Got 72000 out of 100000 evaluations.


  1%|          | 1143/100000 [16:20:42<790:09:47, 28.77s/it]

Got 73000 out of 100000 evaluations.


  1%|          | 1156/100000 [16:27:56<724:13:26, 26.38s/it] 

Got 74000 out of 100000 evaluations.


  1%|          | 1172/100000 [16:34:40<628:23:11, 22.89s/it] 

Got 75000 out of 100000 evaluations.


  1%|          | 1186/100000 [16:41:40<878:50:52, 32.02s/it] 

Got 76000 out of 100000 evaluations.


  1%|          | 1201/100000 [16:49:31<784:30:41, 28.59s/it] 

Got 77000 out of 100000 evaluations.


  1%|          | 1217/100000 [16:56:02<725:29:32, 26.44s/it]

Got 78000 out of 100000 evaluations.


  1%|          | 1234/100000 [17:03:39<681:06:48, 24.83s/it] 

Got 79000 out of 100000 evaluations.


  1%|          | 1247/100000 [17:10:50<952:28:19, 34.72s/it] 

Got 80000 out of 100000 evaluations.


  1%|▏         | 1263/100000 [17:17:51<674:08:51, 24.58s/it]

Got 81000 out of 100000 evaluations.


  1%|▏         | 1277/100000 [17:25:21<735:02:21, 26.80s/it] 

Got 82000 out of 100000 evaluations.


  1%|▏         | 1292/100000 [17:32:26<919:33:20, 33.54s/it]

Got 83000 out of 100000 evaluations.


  1%|▏         | 1311/100000 [17:40:13<756:45:54, 27.61s/it]

Got 84000 out of 100000 evaluations.


  1%|▏         | 1329/100000 [17:47:25<740:33:19, 27.02s/it]

Got 85000 out of 100000 evaluations.


  1%|▏         | 1344/100000 [17:54:02<898:49:21, 32.80s/it]

Got 86000 out of 100000 evaluations.


  1%|▏         | 1365/100000 [18:01:52<676:47:54, 24.70s/it] 

Got 87000 out of 100000 evaluations.


  1%|▏         | 1387/100000 [18:08:57<566:20:25, 20.68s/it]

Got 88000 out of 100000 evaluations.


  1%|▏         | 1398/100000 [18:15:56<956:27:32, 34.92s/it] 

Got 89000 out of 100000 evaluations.


  1%|▏         | 1413/100000 [18:23:39<1045:51:23, 38.19s/it]

Got 90000 out of 100000 evaluations.


  1%|▏         | 1428/100000 [18:30:40<766:13:47, 27.98s/it] 

Got 91000 out of 100000 evaluations.


  1%|▏         | 1443/100000 [18:37:33<652:56:47, 23.85s/it]

Got 92000 out of 100000 evaluations.


  1%|▏         | 1457/100000 [18:45:05<875:37:14, 31.99s/it]

Got 93000 out of 100000 evaluations.


  1%|▏         | 1469/100000 [18:52:28<1036:41:02, 37.88s/it]

Got 94000 out of 100000 evaluations.


  1%|▏         | 1485/100000 [18:59:54<732:48:06, 26.78s/it] 

Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.
Got 95000 out of 100000 evaluations.


  2%|▏         | 1501/100000 [19:06:57<545:47:30, 19.95s/it]

Got 96000 out of 100000 evaluations.


  2%|▏         | 1518/100000 [19:14:01<677:28:06, 24.76s/it]

Got 97000 out of 100000 evaluations.


  2%|▏         | 1537/100000 [19:20:49<477:28:07, 17.46s/it]

Got 98000 out of 100000 evaluations.


  2%|▏         | 1555/100000 [19:28:31<657:40:39, 24.05s/it]

Got 99000 out of 100000 evaluations.


  2%|▏         | 1574/100000 [19:35:58<603:05:29, 22.06s/it]

Got 100000 out of 100000 evaluations.


  2%|▏         | 1575/100000 [19:36:44<1225:37:11, 44.83s/it]


Got $100\,000$ evaluations after analyzing $1\,575$ games.

In [27]:
from_kaggle_df = pd.DataFrame({'epd': epd, 'value': value} for (epd, value) in evaluations.items())
from_kaggle_df

Unnamed: 0,epd,value
0,rnbqkbnr/pppppppp/8/8/3P4/8/PPP1PPPP/RNBQKBNR ...,30
1,rnbqkbnr/ppp1pppp/8/3p4/3P4/8/PPP1PPPP/RNBQKBN...,48
2,rnbqkbnr/ppp1pppp/8/3p4/2PP4/8/PP2PPPP/RNBQKBN...,40
3,rnbqkbnr/ppp2ppp/4p3/3p4/2PP4/8/PP2PPPP/RNBQKB...,36
4,rnbqkbnr/ppp2ppp/4p3/3p4/2PP4/2N5/PP2PPPP/R1BQ...,28
...,...,...
100086,6R1/8/4P3/2pB2b1/P1P3k1/3r4/8/4K3 b - -,509
100087,6R1/8/4P3/2pB2b1/P1P2k2/3r4/8/4K3 w - -,887
100088,8/8/4P3/2pB2R1/P1P2k2/3r4/8/4K3 b - -,949
100089,8/8/4P3/2pB2k1/P1P5/3r4/8/4K3 w - -,1016


In [28]:
from_kaggle_df.to_csv('datasets/boards_scores.csv', index=False)

In [29]:
from_kaggle_df = pd.DataFrame({'epd': epd, 'Winner': int(value > 0)} for (epd, value) in evaluations.items())
from_kaggle_df.to_csv('datasets/kaggle.csv', index=False)