<a href="https://colab.research.google.com/github/jeffheaton/present/blob/master/youtube/gan/colab_gan_train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![Jeff Heaton](https://raw.githubusercontent.com/jeffheaton/present/master/images/github.jpg)

Copyright 2021 by [Jeff Heaton](https://www.youtube.com/channel/UCR1-GEpyOPzT2AO4D_eifdw), [released under Apache 2.0 license](https://github.com/jeffheaton/present/blob/master/LICENSE)
# Training StyleGAN2 in Google CoLab

GANs can be trained with either Google Colab Free or Pro.  The Pro version is reccomended due to better GPU instances, longer runtimes, and timeouts.  Make sure that you are running this notebook with a GPU runtime.

Your training data and trained neural networks will be stored to GDRIVE.  For GANs, I lay out my GDRIVE like this:

* ./data/gan/images - RAW images I wish to train on.
* ./data/gan/datasets - Actual training datasets that I convert from the raw images.
* ./data/gan/experiments - The output from StyleGAN2, my image previews and saved network snapshots.

The drive is mounted to the following location.

```
/content/drive/MyDrive/data
```


# What Sort of GPU do you Have?

The type of GPU assigned to you by Colab will greatly affect your training time. Some sample times that I achieved with Colab are given here.  I've found that Colab Pro generally starts you with a V100, however, if you run scripts non-stop for 24hrs straight for a few days in a row, you will generally be throttled back to a P100.

* 1024x1024 - V100 - 566 sec/tick (CoLab Pro)
* 1024x1024 - P100 - 1819 sec/tick (CoLab Pro)
* 1024x1024 - T4 - 2188 sec/tick (CoLab Free)

If you use Google CoLab Pro, generally, it will not disconnect before 24 hours, even if you (but not your script) are inactive.  Free CoLab WILL disconnect a perfectly good running script if you do not interact for a few hours.  The following describes how to circumvent this issue.

* [How to prevent Google Colab from disconnecting?](https://stackoverflow.com/questions/57113226/how-to-prevent-google-colab-from-disconnecting)


In [None]:
!nvidia-smi

# Set Up New Environment

You will likely need to train for >24 hours.  Colab will disconnect you.  You must be prepared to restart training when this eventually happens.  Training is divided into ticks, every so many ticks (50 by default) your neural network is evaluated and a snapshot is saved.  When CoLab shuts down, all training after the last snapshot is lost. It might seem desirable to snapshot after each tick; however, this snapshotting process itself takes nearly an hour.  It is important to learn an optimal snapshot size for your resolution and training data.

We will mount GDRIVE so that your snapshots are saved there.  You must also place your training images in GDRIVE.

In [None]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

You must also install NVIDIA StyleGAN2 ADA PyTorch.  We also need to downgrade PyTorch to a version that supports StyleGAN.

In [None]:
!pip install torch==1.8.1 torchvision==0.9.1
!git clone https://github.com/NVlabs/stylegan2-ada-pytorch.git
!pip install ninja

# Find Your Files

The drive is mounted to the following location.

```
/content/drive/MyDrive/data
```

It might be helpful to use an ```ls``` command to establish the exact path for your images.

In [None]:
!ls /content/drive/MyDrive/data/gan/images

# Convert Your Images

In [None]:
!python /content/stylegan2-ada-pytorch/dataset_tool.py --source /content/drive/MyDrive/data/gan/images/circuit --dest /content/drive/MyDrive/data/gan/dataset/circuit

The following command can be used to clear out the newly created dataset.  If something goes wrong and you need to clean up your images and rerun the above command, you should delete your partially created dataset directory.

In [None]:
#!rm -R /content/drive/MyDrive/data/gan/dataset/circuit/*

# Clean Up your Images

It is important that all images have the same dimensions and color depth.  This code can identify images that have issues.

In [None]:
from os import listdir
from os.path import isfile, join
import os
from PIL import Image
from tqdm.notebook import tqdm

IMAGE_PATH = '/content/drive/MyDrive/data/gan/images/fish'
files = [f for f in listdir(IMAGE_PATH) if isfile(join(IMAGE_PATH, f))]

base_size = None
for file in tqdm(files):
  file2 = os.path.join(IMAGE_PATH,file)
  img = Image.open(file2)
  sz = img.size
  if base_size and sz!=base_size:
    print(f"Inconsistant size: {file2}")
  elif img.mode!='RGB':
    print(f"Inconsistant color format: {file2}")
  else:
    base_size = sz


# Perform Initial Training

In [None]:
import os

# Modify these to suit your needs
EXPERIMENTS = "/content/drive/MyDrive/data/gan/experiments"
DATA = "/content/drive/MyDrive/data/gan/dataset/circuit"
SNAP = 10

# Build the command and run it
cmd = f"/usr/bin/python3 /content/stylegan2-ada-pytorch/train.py --snap {SNAP} --outdir {EXPERIMENTS} --data {DATA}"
!{cmd}

In [None]:
!/usr/bin/python3 /content/stylegan2-ada-pytorch/train.py --snap 25 --resume /content/drive/MyDrive/data/gan/experiments/00007-circuit-auto1/network-snapshot-000500.pkl --outdir /content/drive/MyDrive/data/gan/experiments --data /content/drive/MyDrive/data/gan/dataset/circuit

# Resume Training

In [None]:
import os

# Modify these to suit your needs
EXPERIMENTS = "/content/drive/MyDrive/data/gan/experiments"
NETWORK = "network-snapshot-000100.pkl"
RESUME = os.path.join(EXPERIMENTS, "00008-circuit-auto1-resumecustom", NETWORK)
DATA = "/content/drive/MyDrive/data/gan/dataset/circuit"
SNAP = 10

# Build the command and run it
cmd = f"/usr/bin/python3 /content/stylegan2-ada-pytorch/train.py --snap {SNAP} --resume {RESUME} --outdir {EXPERIMENTS} --data {DATA}"
!{cmd}