<a href="https://colab.research.google.com/github/neuroidss/EEG-GAN-audio-video/blob/main/GPU_Training_Alias_Free_GAN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GPU Training - Alias-Free GAN
by duskvirkus

This is a notebook for training Alias-Free GAN on a Colab GPU instance.

Repository: https://github.com/duskvirkus/alias-free-gan

# GPU check

If this fails change the runtime type in `Runtime > Change runtime type > Select GPU`.

In [None]:
!nvidia-smi -L

GPU 0: Tesla K80 (UUID: GPU-1d414d62-3efc-9e84-a3a5-4ff14511b2bc)


## Connect Google Drive

This notebook is designed to be used with google drive connected. If you'd like to use it without google drive you'll have to make changes.

The main reason behind this is Colab sessions automaticall shut off after a number of hours (~10 for free, ~20 for pro, ~24 pro+). This risks loosing training progress if it's not saved to persistent storage.

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


## Clone / cd into Repository

In [None]:
import os
drive_path = '/content/drive/MyDrive/'
repo_container_dir = 'colab-alias-free-gan'
repo_name = 'alias-free-gan'
git_repo = 'https://github.com/duskvirkus/alias-free-gan.git'
branch_name = 'stable'

working_dir = os.path.join(drive_path, repo_container_dir, repo_name)

if os.path.isdir(working_dir):
  %cd {working_dir}
else:
  container_path = os.path.join(drive_path, repo_container_dir)
  os.makedirs(container_path)
  %cd {container_path}
  !git clone --branch {branch_name} {git_repo}
  %cd {repo_name}
  !mkdir pretrained

/content/drive/MyDrive/colab-alias-free-gan/alias-free-gan


## Install Dependancies

In [None]:
!python install.py

Collecting pytorch-lightning
  Downloading pytorch_lightning-1.4.8-py3-none-any.whl (924 kB)
[K     |████████████████████████████████| 924 kB 5.4 MB/s 
[?25hCollecting pytorch-lightning-bolts
  Downloading pytorch_lightning_bolts-0.3.2-py3-none-any.whl (253 kB)
[K     |████████████████████████████████| 253 kB 47.6 MB/s 
[?25hCollecting wandb
  Downloading wandb-0.12.2-py2.py3-none-any.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 31.7 MB/s 
[?25hCollecting ninja
  Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
[K     |████████████████████████████████| 108 kB 46.9 MB/s 
Collecting pydantic
  Downloading pydantic-1.8.2-cp37-cp37m-manylinux2014_x86_64.whl (10.1 MB)
[K     |████████████████████████████████| 10.1 MB 33.5 MB/s 
[?25hCollecting pyhocon
  Downloading pyhocon-0.3.58.tar.gz (114 kB)
[K     |████████████████████████████████| 114 kB 47.6 MB/s 
[?25hCollecting opencv-python-headless
  Downloading opencv_python_he

## Convert Dataset

You can skip this section if you already have a dataset in the correct format.

Currently only supports datasets with only one of the following dimensions of images. 256 by 256 **or** 512 by 512 **or** 1024 by 1024

Preparing your dataset for conversion. Tools to prep a data set are beyond the scope of this notebook dvschultz/dataset-tools(https://github.com/dvschultz/dataset-tools) is suggested to help with this process.

Structure of your dataset:
```
dataset_root_dir # name of your dataset is suggested
  |- sub_directory # anything (this has to do with labels which is an unsupported feature at current time)
    |- image01.png
    |- images_can_have_any_names.png
    |- they_also_be.jpg
    |...continued # Suggested minimum size is 1000+ images.
```

The above example would result in an input of `unconverted_dataset='path/to/dataset_root_dir'`

In [None]:
model_size = 512

In [None]:
%rmdir /content/dataset-creation
%mkdir /content/dataset-creation
#%mkdir /content/dataset-creation/sq-512
!unzip -j -o -q /content/drive/MyDrive/sq-{model_size}.zip -d /content/dataset-creation/sq-{model_size}

rmdir: failed to remove '/content/dataset-creation': No such file or directory


In [None]:
unconverted_dataset = '/content/dataset-creation'
out_path = '/content/drive/MyDrive/datasets-aliasfree/sq-'+model_size
%mkdir /content/drive/MyDrive/datasets-aliasfree
dataset_size = model_size # one of the following 256, 512, 1024
!python scripts/convert_dataset.py --size {dataset_size} {unconverted_dataset} {out_path}

mkdir: cannot create directory ‘/content/drive/MyDrive/datasets-aliasfree’: File exists
Make dataset of image sizes: 512
  "Argument interpolation should be of type InterpolationMode instead of int. "
  "Argument interpolation should be of type InterpolationMode instead of int. "
2193it [00:29, 73.84it/s]


## Info on training options

Most training options work rather well out of the box. See the training section for suggested arguments.

You can see a full list of training options by running the following cell.

In [None]:
!python scripts/trainer.py --help

## Training

Results from training can be found in `results` directory.

**Resume from Checkpoint**

Set `--resume_from 'path/to/checkpoint.pt'`

If resuming from a checkpoint that doesn't use the new kimg naming scheme use `--start_kimg_count` to set the starting count manually.

**Transfer Learning Options**

See repository for transfer learning options. https://github.com/duskvirkus/alias-free-gan/blob/devel/pretrained_models.json

Use `--resume_from 'model_name'`. wget is used to automatically download the pretrained models.

**Training from Scratch**

This is not recommended as transfer learning off of any model even if it's not related to your dataset will be faster and consume less resources. Unless there is no pretrained models or you have an explicit reason use transfer learning. To train from scratch simply leave resume blank, like so `--resume_from ''`.

**Augmentations**

Use `--augment True` to enable augmentations with `AdaptiveAugmentation`. See help for more options.

### Suggested Batch Size

For colab pro gpus (16GB) here are the suggested batch sizes:
- 256: batch size 8 recommended
- 512: batch size 4? recommended
- 1024: batch size 4 for (p100) or 2 for (v100)

Feel free to play around to see if you can get things higher. For the best performance try to keep batch in powers of 2.

### Trouble Shooting

If you get a cuda out of memory error try reducing the `batch`.

If you get another error please report it at https://github.com/duskvirkus/alias-free-gan/issues/new

If the model makes it through the first epoch you're unlike to encounter any errors after that.




In [None]:
#model_size = 512
#dataset_location = '/content/drive/MyDrive/datasets-aliasfree/sq-512'
dataset_location = '/content/drive/MyDrive/sq-512.zip'
#resume = 'rosinality-ffhq-800k'
#resume = 'pretrained/000000020-kimg-sq-256-checkpoint.pt'
#resume = 'results/training-000003/000000066-kimg-sq-512-checkpoint.pt'
#resume = 'results/training-0000011/000000022-kimg-sq-512-checkpoint.pt'
batch_size = 4
#batch_size = 8
augmentations = True # ada

sample_frequency = 1 # in kimgs or thousands of images
checkpoint_frequency = 1 # in kimgs or thousands of images

In [None]:
!python scripts/trainer.py \
  --gpus 1 \
  --max_epochs 1000000 \
  --accumulate_grad_batches 4 \
  --size {model_size} \
  --dataset_path {dataset_location} \
  --batch {batch_size} \
  --save_sample_every_kimgs {sample_frequency} \
  --save_checkpoint_every_kimgs {checkpoint_frequency} \
  --augment {augmentations} \
  --auto_scale_batch_size True

In [None]:
#model_size = 512
dataset_location = '/content/drive/MyDrive/datasets-aliasfree/sq-'+model_size
#resume = 'rosinality-ffhq-800k'
#resume = 'pretrained/000000020-kimg-sq-256-checkpoint.pt'
#resume = 'results/training-000009/000000011-kimg-sq-512-checkpoint.pt'
resume = 'results/training-000011/000000022-kimg-sq-512-checkpoint.pt'
batch_size = 4
#batch_size = 8
augmentations = True # ada

sample_frequency = 1 # in kimgs or thousands of images
checkpoint_frequency = 1 # in kimgs or thousands of images

In [None]:
!python scripts/trainer.py \
  --gpus 1 \
  --max_epochs 1000000 \
  --accumulate_grad_batches 4 \
  --size {model_size} \
  --dataset_path {dataset_location} \
  --resume_from {resume} \
  --batch {batch_size} \
  --save_sample_every_kimgs {sample_frequency} \
  --save_checkpoint_every_kimgs {checkpoint_frequency} \
  --augment {augmentations} \
  --auto_scale_batch_size True

Using Alias-Free GAN version: 1.1.0
Resuming from custom checkpoint...
Dataset path: /content/drive/MyDrive/datasets-aliasfree/sq-512
Initialized MultiResolutionDataset dataset with 2193 images
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type          | Params
------------------------------------------------
0 | generator     | Generator     | 14.4 M
1 | g_ema         | Generator     | 14.4 M
2 | discriminator | Discriminator | 29.0 M
------------------------------------------------
57.9 M    Trainable params
0         Non-trainable params
57.9 M    Total params
231.431   Total estimated model params size (MB)
Training: -1it [00:00, ?it/s]

Resuming from: results/training-000011/000000022-kimg-sq-512-checkpoint.pt

AlignFreeGAN device: cuda:0


  f"conv2d_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.conv2d()."
Epoch