This projects aims to use reinforcement learning algorithms to play the game 2048.

Agent playing

This repository is a project about using DQN(Q-Learning) to play the Game 2048 and accelarate and accelerate the environment using Numba). The algorithm used is from Stable Baselines, and the environment is a custom Open AI env. The environment contains two types of representation for the board: binary and no binary. The first one uses a power two matrix to represent each tile of the board. On the contrary, no binary uses a raw matrix board.

The model uses two different types of neural networks: CNN(Convolutional Neural Network), MLP(Multi-Layer Perceptron). The agent performed better using CNN as an extractor for features than MLP. Probably it is because CNN can extract spatial features. As a result, the agent achieve a 2048 tile in 10% of the 1000 played games.


Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. Thanks to our define-by-run API, the code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.

There is a guide of how to use this library here.


Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

There is a guide of how to use this library here.


Installing dependecies pip install -r [requirements_cpu.txt|requirements-gpu.txt], choosing the appropriate file depending on whether you wish to run the models on a CPU or a GPU.


Using conda environment

conda env create -f [conda_env_gpu.yml|conda_env_cpu.yml]

To install the environment, execute the following commands:

git clone
cd 2048-gym/gym-game2048/
pip install -e .


usage: [-h] --agent AGENT
                         [--tensorboard-log TENSORBOARD_LOG]
                         [--study-name STUDY_NAME] [--trials TRIALS]
                         [--n-timesteps N_STEPS] [--save-freq SAVE_FREQ]
                         [--save-dir SAVE_DIR] [--log-interval LOG_INTERVAL]
                         [--no-binary] [--seed SEED]
                         [--eval-episodes EVAL_EPISODES]
                         [--extractor EXTRACTOR] [--layer-normalization]
                         [--num-cpus NUM_CPUS] [--layers LAYERS [LAYERS ...]]
                         [--penalty PENALTY] [--load_path LOAD_PATH]
                         [--num_timesteps_log NUM_TIMESTEPS_LOG]

optional arguments:
  -h, --help            show this help message and exit
  --agent AGENT, -ag AGENT
                        Algorithm to use to train the model - DQN, ACER, PPO2
  --tensorboard-log TENSORBOARD_LOG, -tl TENSORBOARD_LOG
                        Tensorboard log directory
  --study-name STUDY_NAME, -sn STUDY_NAME
                        The name of study used for optuna to create the
  --trials TRIALS, -tr TRIALS
                        The number of trials tested for optuna optimize. - 0
                        is the default setting and try until the script is
  --n-timesteps N_STEPS, -nt N_STEPS
                        Number of timestems the model going to run.
  --save-freq SAVE_FREQ, -sf SAVE_FREQ
                        The interval between model saves.
  --save-dir SAVE_DIR, -sd SAVE_DIR
                        Save dictory models
  --log-interval LOG_INTERVAL, -li LOG_INTERVAL
                        Log interval
  --no-binary, -bi      Do not use binary observation space
  --seed SEED           Seed
  --eval-episodes EVAL_EPISODES, -ee EVAL_EPISODES
                        The number of episodes to test after training the
  --extractor EXTRACTOR, -ex EXTRACTOR
                        The extractor used to create the features from
                        observation space - (mlp or cnn)
  --layer-normalization, -ln
                        Use layer normalization - Only for DQN
  --num-cpus NUM_CPUS, -nc NUM_CPUS
                        Number of cpus to use. DQN only accept 1
  --layers LAYERS [LAYERS ...], -l LAYERS [LAYERS ...]
                        List of neurons to use in DQN algorithm. The number of
                        elements inside list going to be the number of layers.
  --penalty PENALTY, -pe PENALTY
                        How much penalize the model when choose a invalid
  --load_path LOAD_PATH, -lp LOAD_PATH
                        Load model from
  --num_timesteps_log NUM_TIMESTEPS_LOG, -ntl NUM_TIMESTEPS_LOG
                        Continuing timesteps for tensorboard_log


Play the game using trained agent.


OBS: It is necessary to change the model path and agent inside


See best model actions using Tkinter.


OBS: It is necessary to change the pickle game data inside


