Template for Tensorflow experiments. Includes configuration loading and mixup, model deserialization and logging on disk and wandb.
You can use this project in multiple distinct environments. I list below a few configurations necessary for each one of these.
You are expected to set every single aspect of your local environment: from GPU drivers to python package dependencies.
When using GPUs, you must have all of the Tensorflow's software requirements set up. Mainly NVIDIA CUDA and cuDNN.
You must also install the python dependencies appropriately:
pip install -r requirements.txt
You can circumvent most configuration by using docker, docker-compose and docker-nvidia-container (only if you're using a GPU).
To setup the entire infrastructure, simply type:
docker-compose build
If you don't have a GPU setup, and haven't installed the docker-nvidia-container,
remove the runtime: nvidia
entry from the docker-compose.yml file,
and change the BASE_IMAGE
to tensorflow/tensorflow:latest-jupyter
.
Now try running the build and you should see Docker setting everything for you.
The job submitted to the executing queue must set the right environment.
For example, your job script can be:
#!/bin/bash
#...
nodeset -e $SLURM_JOB_NODELIST
module load gcc/7.4 python/3.9.1 cudnn/8.2_cuda-11.1
pip3.9 install -r requirements.txt
python ...
SOURCE=experiments/train_and_finetune.py
LOGS=./logs/classification/cifar10/train.rn50.noaug
python $SOURCE \
config/classification/train_and_finetune.yml \
config/classification/datasets/cifar10.yml \
setup.paths.ckpt=$LOGS/backup \
-F $LOGS
The simplest way to run an experiment is to start the container and run the python interpreter
inside the container, which can be achieved by prepending the previous run command with
docker-compose exec tf-experiment
. For example:
source config/docker/.env
SOURCE=experiments/train_and_finetune.py
LOGS=./logs/classification/cifar10/train.rn50.noaug
docker-compose exec $SERVICE \
python -X pycache_prefix=$PYTHONPYCACHEPREFIX $SOURCE with \
config/classification/train_and_finetune.yml \
setup.paths.ckpt=$LOGS/backup \
-F $LOGS
You can add new mixins to modify components used in the experiment by simply appending them to the command:
EXPERIMENT=voc12-noaug
EXPERIMENT_TAGS="['voc12', 'rn50']"
docker-compose exec $SERVICE \
python -X pycache_prefix=$PYTHONPYCACHEPREFIX $SOURCE with \
config/classification/train_and_finetune.yml \
config/classification/datasets/voc12.yml \
config/augmentation/randaug.yml \
config/logging/wandb.train.yml \
setup.paths.ckpt=$LOGS/backup \
setup.paths.wandb_dir=$LOGS \
setup.wandb.name=$EXPERIMENT \
setup.wandb.tags="$EXPERIMENT_TAGS" \
-F $LOGS
Check the runners at runners/docker for examples on how to run experiments in the docker environment.
Check the runners at runners/sdumont for examples on how to run experiments on the LNCC Santos Dumont Super Computer.