Skip to content
Training neural networks in TensorFlow 2.0 with 5x less memory
Python Jupyter Notebook Makefile
Branch: master
Clone or download

Latest commit

aninrusimha and parasj Remove XLA and Grappler (#127)
* disabled grappler and XLA JIT

* no_xla_grappler

* fix removing grapper options in wrong scope

* init .py auto removes dependencies

* unified interface works

* tmp

* working demo

* remove old init file, no xla moved to wrapper

* should fix?

* update default solver

* removed inits

* remove memory dump code

* added to wrong branch oops

* fix 2

Co-authored-by: Paras Jain <>
Latest commit 7b86239 Apr 2, 2020


Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/workflows Open-source LP solver (#130) Apr 3, 2020
checkmate Remove XLA and Grappler (#127) Apr 3, 2020
docs Create docs subdirectory (#125) Jan 21, 2020
scripts Remove XLA and Grappler (#127) Apr 3, 2020
tests Open-source LP solver (#130) Apr 3, 2020
tutorials Update tutorial_basic_tf2_example.ipynb (#121) Jan 16, 2020
.gitbook.yaml Create docs subdirectory (#125) Jan 21, 2020
.gitignore Refactor core of package to remat package (#1) Sep 19, 2019
LICENSE Create LICENSE (#80) Dec 5, 2019
Makefile Remove evaluation code from package (#119) Jan 16, 2020 Update Jan 28, 2020 Open-source LP solver (#130) Apr 3, 2020

See the paper!

checkmate breaks the GPU memory wall by enabling researchers to train large state-of-the-art models that do not fit in GPU memory. Checkmate applies optimal tensor rematerialization (as detailed in our paper at MLSys 2020) to trade off space and time.

At the moment, Checkmate only supports TensorFlow 2.0. PyTorch support is coming soon!


Get started with pip install ""

Ensure you have installed either tensorflow-gpu>=2.0.0 or tensorflow.

Quick start

Get started in 5m with our TF2.0 quickstart tutorial

Adapt your Keras model to fit within the memory constraints of a single GPU:

import checkmate
model = tf.keras.applications.vgg19.VGG19(...)

train_iteration_fn = checkmate.tf2.compile(model, loss, optimizer,
    input_spec=sample_input[0], label_spec=sample_input[1])

for image, label in train_ds:
    prediction, loss = train_iteration_fn(image, label)

Key ideas

From our paper at MLSys 2020:

Modern neural networks are increasingly bottlenecked by the limited capacity of on-device
GPU memory. Prior work explores dropping activations as a strategy to scale to larger
neural networks under memory constraints. However, these heuristics assume uniform
per-layer costs and are limited to simple architectures with linear graphs, limiting their
usability. In this paper, we formalize the problem of trading-off DNN training time and
memory requirements as the tensor rematerialization optimization problem, a generalization
of prior checkpointing strategies. We introduce Checkmate, a system that solves for
optimal schedules in reasonable times (under an hour) using off-the-shelf MILP solvers,
then uses these schedules to accelerate millions of training iterations. Our method scales
to complex, realistic architectures and is hardware-aware through the use of
accelerator-specific, profile-based cost models. In addition to reducing training cost,
Checkmate enables real-world networks to be trained with up to 5.1× larger input sizes.


If you use Checkmate in your work, please cite us with:

  title={Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization},
  author={Jain, Paras and Jain, Ajay and Nrusimha, Aniruddha and Gholami, Amir and
          Abbeel, Pieter and Keutzer, Kurt and Stoica, Ion and Gonzalez, Joseph E},
  journal={arXiv preprint arXiv:1910.02653},
You can’t perform that action at this time.