Skip to content

lucasdavid/tf-experiment

Repository files navigation

TF-Experiment

Template for Tensorflow experiments. Includes configuration loading and mixup, model deserialization and logging on disk and wandb.

Setup

You can use this project in multiple distinct environments. I list below a few configurations necessary for each one of these.

Local

You are expected to set every single aspect of your local environment: from GPU drivers to python package dependencies.

When using GPUs, you must have all of the Tensorflow's software requirements set up. Mainly NVIDIA CUDA and cuDNN.

You must also install the python dependencies appropriately:

pip install -r requirements.txt

Docker

You can circumvent most configuration by using docker, docker-compose and docker-nvidia-container (only if you're using a GPU).

To setup the entire infrastructure, simply type:

docker-compose build

If you don't have a GPU setup, and haven't installed the docker-nvidia-container, remove the runtime: nvidia entry from the docker-compose.yml file, and change the BASE_IMAGE to tensorflow/tensorflow:latest-jupyter. Now try running the build and you should see Docker setting everything for you.

SDumont Supercomputer

The job submitted to the executing queue must set the right environment.

For example, your job script can be:

#!/bin/bash
#...

nodeset -e $SLURM_JOB_NODELIST
module load gcc/7.4 python/3.9.1 cudnn/8.2_cuda-11.1

pip3.9 install -r requirements.txt

python ...

Running

Local

SOURCE=experiments/train_and_finetune.py
LOGS=./logs/classification/cifar10/train.rn50.noaug

python $SOURCE                                            \
  config/classification/train_and_finetune.yml            \
  config/classification/datasets/cifar10.yml              \
  setup.paths.ckpt=$LOGS/backup                           \
  -F $LOGS

Docker

The simplest way to run an experiment is to start the container and run the python interpreter inside the container, which can be achieved by prepending the previous run command with docker-compose exec tf-experiment. For example:

source config/docker/.env

SOURCE=experiments/train_and_finetune.py
LOGS=./logs/classification/cifar10/train.rn50.noaug


docker-compose exec $SERVICE                                  \
  python -X pycache_prefix=$PYTHONPYCACHEPREFIX $SOURCE with  \
  config/classification/train_and_finetune.yml                \
  setup.paths.ckpt=$LOGS/backup                               \
  -F $LOGS

You can add new mixins to modify components used in the experiment by simply appending them to the command:

EXPERIMENT=voc12-noaug
EXPERIMENT_TAGS="['voc12', 'rn50']"

docker-compose exec $SERVICE                                  \
  python -X pycache_prefix=$PYTHONPYCACHEPREFIX $SOURCE with  \
  config/classification/train_and_finetune.yml                \
  config/classification/datasets/voc12.yml                    \
  config/augmentation/randaug.yml                             \
  config/logging/wandb.train.yml                              \
  setup.paths.ckpt=$LOGS/backup                               \
  setup.paths.wandb_dir=$LOGS                                 \
  setup.wandb.name=$EXPERIMENT                                \
  setup.wandb.tags="$EXPERIMENT_TAGS"                         \
  -F $LOGS

Check the runners at runners/docker for examples on how to run experiments in the docker environment.

SDumont

Check the runners at runners/sdumont for examples on how to run experiments on the LNCC Santos Dumont Super Computer.