<a href="https://colab.research.google.com/github/manjavacas/Data-Mining/blob/master/notebook/chess_mining_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Análisis de partidas de ajedrez mediante Data Mining**

  *Minería de Datos. Curso 2019/2020.*


*   Alberto Velasco Mata
*   Diego Pedregal Hidalgo
*   Rubén Márquez Villalta
*   Antonio Manjavacas



## **1. OBTENCIÓN DE LOS DATOS**
En esta sección se llevará a cabo la recopilación de los datos iniciales.

- Haremos uso de la API `berserk` para el acceso a la base de datos de [*lichess.org*](https://lichess.org):

In [0]:
!pip install berserk
import berserk

- Redefinimos la función `export_by_player` para poder utilizar los parámetros `opening` y `clock`, ya que la API no permite obtenerlos por defecto:

In [0]:
from berserk.formats import PGN, NDJSON
import berserk.models

def export_by_player(self, username, as_pgn=None, since=None, until=None,
                         max=None, vs=None, rated=None, perf_type=None,
                         color=None, analysed=None, moves=None, tags=None,
                         evals=None, opening=None, clocks=None):
    path = f'api/games/user/{username}'
    params = {
        'since': since,
        'until': until,
        'max': max,
        'vs': vs,
        'rated': rated,
        'perfType': perf_type,
        'color': color,
        'analysed': analysed,
        'moves': moves,
        'tags': tags,
        'clocks': str(clocks).lower(), # manually included
        'evals': evals,
        'opening': str(opening).lower(), # manually included
    }
    self.pgn_as_default = True
    fmt = PGN if (as_pgn if as_pgn is not None else self.pgn_as_default) else NDJSON
    yield from self._r.get(path, params=params, fmt=fmt, stream=True,
                            converter=berserk.models.Game.convert)

- Obtenemos los IDs de los n = `USER_COUNT` mejores jugadores de la web **en modo clásico** (`PerfType.CLASSICAL`):

In [3]:
USER_COUNT = 200

# API client
client = berserk.Client()

from berserk import PerfType
user_ids = [u['id'] for u in client.users.get_leaderboard(PerfType.CLASSICAL, USER_COUNT)]

print(f'Top {len(user_ids)} users:\n{user_ids}')

import json
with open('user_ids.json', 'w') as f:
  json.dump({'user_ids': user_ids}, f)


Top 200 users:
['wolverines1', 'procellariidae', 'classyplays', 'zugzwang_tv', 'labestia2017', 'king_to_f5', 'rediska_petrovna', 'oloap62', 'osho3058', 'jasrom', 'kaljakeisari', 'allexx7777', 'schemato', 'abdulkader-rifai', 'wittke', 'ibragim_64', 'massterofmayhem', 'angelina-lina', 'brooklynboy', 'arabian_night', 'sodkosodbileg12', 'lord-universe', 'gek76', 'subhra2019', 'caring', 'talaicito', 'h_roy', 'rufat_nasibov', 'augusto1962', 'arteler', 'vadimcernov', 'tiger_tigerov', 'erinyu', 'colakturkey', 'trailo', 'speedcobra', 'rochade_augsburg', 'salasar1955', 'kholpilova', 'marianag', 'mockmorra', 'boodesh', 'nemjeff', 'serg_01', 'thekingburak', 'demarionash', 'nikonovenik', 'oluwadurotimi', 'busonolsun56', 'cizar', 'anwen_digo', 'koh99koh', 'fatkul_c_askar', 'yudhishtira', 'raj1981', 'lookmeintheeye', 'jonathan_wolf', 'thmachine', 'bigty', 'val1957', 'fhchess', 'ulysse06', 'dmachin', 'kisel70', 'philipsen', 'sforgasi009', 'statistical', 'josip_buje', 'kotikribolow', 'leochess67', 'pla

- Obtenemos las partidas de estos jugadores en el mes de **septiembre de 2019**:

In [0]:
from datetime import datetime
from berserk import PerfType

START_TIME = datetime(year=2019, month=10, day=1)
END_TIME = datetime(year=2019, month=11, day=1)


games = []
user_ids_subset = user_ids[100:]
for user_id in user_ids_subset:
  # Get list of games for each player
  user_games = list(export_by_player(client, user_id,
                                clocks=True,
                                opening=True,
                                since=int(1000 * START_TIME.timestamp()),
                                until=int(1000 * END_TIME.timestamp()),
                                perf_type=PerfType.CLASSICAL))
  
  games.extend(user_games)
  print(f"> {len(user_games)} games from '{user_id}'")
  

print(f"Got {len(games)} games from {len(user_ids_subset)} top users ({START_TIME.strftime('%d/%m/%Y')} - {END_TIME.strftime('%d/%m/%Y')})")

- Obtenemos un conjunto de partidas en formato `.pgn`. Un ejemplo es el siguiente:

In [5]:
# Sample game
print(games[0])

[Event "Classical Shield Arena"]
[Site "https://lichess.org/FcongsoM"]
[Date "2019.10.11"]
[Round "-"]
[White "TheRealOwnage"]
[Black "bakhtin_va"]
[Result "0-1"]
[UTCDate "2019.10.11"]
[UTCTime "16:00:02"]
[WhiteElo "2182"]
[BlackElo "2223"]
[WhiteRatingDiff "-10"]
[BlackRatingDiff "+16"]
[Variant "Standard"]
[TimeControl "1200+10"]
[ECO "B12"]
[Opening "Caro-Kann Defense: Maróczy Variation"]
[Termination "Time forfeit"]

1. e4 { [%clk 0:10:00] } c6 { [%clk 0:20:00] } 2. d4 { [%clk 0:09:57] } d5 { [%clk 0:20:07] } 3. f3 { [%clk 0:09:55] } e6 { [%clk 0:20:09] } 4. Ne2 { [%clk 0:09:50] } c5 { [%clk 0:19:06] } 5. Nbc3 { [%clk 0:09:45] } Nc6 { [%clk 0:18:19] } 6. exd5 { [%clk 0:09:26] } exd5 { [%clk 0:18:15] } 7. Be3 { [%clk 0:09:23] } c4 { [%clk 0:17:22] } 8. Qd2 { [%clk 0:09:19] } Bb4 { [%clk 0:16:35] } 9. O-O-O { [%clk 0:09:18] } Nge7 { [%clk 0:16:13] } 10. a3 { [%clk 0:08:57] } Ba5 { [%clk 0:16:03] } 11. Nf4 { [%clk 0:08:51] } b5 { [%clk 0:15:36] } 12. g4 { [%clk 0:07:49] } b4 { [%clk

Guardamos todas las partidas en un fichero común (`games.pgn`):

In [0]:
# Save games as PGN file
with open('games.pgn', 'a') as f:
  for game in games:
    f.write(game)
    f.write("\n\n")