Reproducibility-Lab

This is an in class lab we focused on the reproducibility of GitHub Repos.
Reproducibility Lab Submission was for Maia-2 (NeurIPS 2024)

This repository contains my reproducibility work for the NeurIPS 2024 paper:

Maia-2: A Unified Model for Human-AI Alignment in Chess Zhenwei Tang, Difan Jiao, Reid McIlroy-Young, Jon Kleinberg, Siddhartha Sen, Ashton Anderson

Repository Contents

reproducibility_lab.ipynb — Executed Colab notebook with all baseline + variation experiments, plus reflections.

runs/ — Experiment outputs: JSON files for baseline/variation, both batch and position-wise inference.

logs/ — Environment metadata (env_meta.json) and package lockfile (requirements.lock.txt).

reproduce/ —

* Reproducibility.md — Full reproducibility report (environment, commands, results, reflection).

* run_all.sh — Script to automatically reproduce baseline + variation runs and save outputs/logs.

Environment Setup

Clone the upstream Maia-2 repo (not included here):

git clone https://github.com/CSSLab/maia2.git cd maia2

Create a fresh environment (Python 3.10+ recommended):

python -m venv .venv source .venv/bin/activate pip install -U pip

Install dependencies:

pip install chess==1.10.0 einops==0.8.0 gdown==5.2.0 numpy==2.1.3 pandas==2.2.3
pyzstd==0.15.9 requests==2.32.3 torch==2.4.0 tqdm==4.65.0

How to Reproduce Results

From the root of this repo, run the shell script:

bash reproduce/run_all.sh

This will:

Run batch inference (baseline + variation).
Run position-wise inference (baseline + variation).
Save all outputs under runs/.
Save logs and environment snapshots under logs/.
Alternatively, open reproducibility_lab.ipynb and re-execute cells manually in Colab or Jupyter.

Expected Results

Batch Inference: Accuracy numbers comparable between baseline and variation (with minor changes depending on batch size/model type).

Position-wise Inference: Predicted moves and win probabilities logged, showing changes when ELO parameters are varied.

Exact numbers may differ slightly depending on hardware (CPU vs GPU) and random seeds. Our experiment utilized a random seed of 42 and GPU A100 on Google Colab.

Notes

The upstream repo (CSSLab/maia2) is not included here — only reproducibility artifacts.

Logs and outputs in this repo were generated using Google Colab (GPU runtime).

Non-determinism may arise from GPU differences and API randomness.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
logs		logs
reproduce		reproduce
runs		runs
LICENSE		LICENSE
README.md		README.md
Reproduction_Code.ipynb		Reproduction_Code.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reproducibility-Lab

About

Uh oh!

Releases

Packages

Languages

License

jl-python/Reproducibility-Lab

Folders and files

Latest commit

History

Repository files navigation

Reproducibility-Lab

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages