# GANSFORMER
---------------
This notebook needs to be ran inside of the Tensorflow 1.15 Py3 GPU Docker container.

## Docker Configuration
If running this container for the first time, the lines in the cell below must be executed. We install requirements, install certain required Linux packages from APT, and define environment variables needed to compile the custom CUDA ops.

In [2]:
# # Prep Tensorflow Docker with required software and libraries
# !apt-get update
# !apt-get install ffmpeg libsm6 libxext6 -y
# !pip install -r requirements.txt
# !pip install joblib 

# # Set ENV variables for compiling TF ops
# !export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
# !export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

In [2]:
!python3 --version

Python 3.6.9


In [3]:
!nvidia-smi

Mon Dec  6 16:56:06 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0  On |                  N/A |
| 40%   34C    P8    22W / 260W |    951MiB / 10985MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+---------------------------------------------------------------------------

In [3]:
# Verify lung data volume mounted in container
!ls data | wc -l

194921


In [4]:
import os
import shutil
from PIL import Image
from joblib import Parallel, delayed
import cv2

---------------
## Build COVIDx Dataset
To keep things simple, we will utilize the tools provided by the team that developed gansformer. Here, we need to create a dataset so we have TF Records for the covidx image data.

Notes:
- Had to update `dnnlib/tflib/custom_ops.py` to account for TF 1.15 and allow multi cpu.
- Modify `dataset_tool.py` to capture image shape errors.
- Process all images so they are the same size, as the documentation is misleading.

In [9]:
# # Related to update of dnnlib
# !rm -rf /external_code/gan/gansformer/dnnlib/tflib/cudacache/

With the changes, we need to verify that the container is correctly configured and ready to compile the CUDA functions. `test_nvcc.cu` was pulled from the StyleGAN2 repo, as it was not available within the Gansformer library.

In [4]:
!nvcc test_nvcc.cu -o test_nvcc -run

CPU says hello.
GPU says hello.


#### Comments
There were issues creating a custom dataset using the provided tools. After opening up an issue on Github, I was informed that `prepare_data.py` sorts the files alphabetically. Thus, when I was trying to catch malformed images, I was actually moving ones that weren't causing the errors. I modified the the assert statment so that it would provided information on file issues. Here we see that there is a file of size 1024x1024. The data object stores the first image size as `self.shape` and expects all following images to be the same size.

In [7]:
!python prepare_data.py \
--task covidx \
--images-dir 'data' \
--format png --ratio 0.7 \
--shards-num 20 --max-images 194921

[1mPreparing the covidx dataset...[0m
Loading images from data
  8%|██▋                               | 15340/194921 [12:45<2:29:15, 20.05it/s]
Traceback (most recent call last):
  File "prepare_data.py", line 217, in <module>
    run_cmdline(sys.argv)
  File "prepare_data.py", line 214, in run_cmdline
    prepare(**vars(args))
  File "prepare_data.py", line 185, in prepare
    shards_num = shards_num, max_imgs = max_images)
  File "prepare_data.py", line 78, in <lambda>
    "png": lambda tfdir, imgdir, **kwargs: dataset_tool.create_from_imgs(tfdir, imgdir, format = "png", **kwargs),
  File "/tf/notebooks/Final Project/gansformer/dataset_tool.py", line 696, in create_from_imgs
    tfr.add_img(img)
  File "/tf/notebooks/Final Project/gansformer/dataset_tool.py", line 84, in add_img
    assert img.shape == self.shape, f'Img: {img.shape} Self: {self.shape} \n {img}'
AssertionError: Img: (3, 1024, 1024) Self: (3, 512, 512) 
 [[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  .

In [None]:
def resize_img(path):
    """
    Resize the images to 512x512 so the TF Record can be compiled.
    """
    temp_img = cv2.imread(path)
    temp_img = cv2.resize(temp_img, (512, 512))
    cv2.imwrite(path, temp_img)
        
# Process all images    
files = [os.path.join('data', x) for x in os.listdir('data')]

Parallel(n_jobs=24)(delayed(resize_img)(path) for path in files)

## DATASET IS BUILT DO NOT RERUN

STOP!

In [24]:
!python prepare_data.py \
--task covidx \
--images-dir 'data' \
--format png --ratio 0.7 \
--shards-num 20 --max-images 194921

[1mPreparing the covidx dataset...[0m
Loading images from data
100%|█████████████████████████████████| 194921/194921 [2:56:18<00:00, 18.43it/s]
[1m[34mCompleted preparations for covidx![0m


------------------------------
# Train GANSFORMER
Starting with the clever dataset, we will continue training on our covid lung dataset. The hope is to transition this model to a new domain, without having to train for over a week using 8 or so GPUs. As the model trains, output will be generated in the `/gansformer/results/clever-*` directory. This includes model weights saved as a pickle file, as well has generated images and attention maps.

In [10]:
!python run_network.py \
--train \
--gpus 0 \
--ganformer-default \
--expname clevr-scratch \
--dataset covidx \
--eval-images-num 10000 \
--keep-samples True \
--metrics fid

[1m[37mStart model training from scratch[0m
Local submit - run_dir: results/clevr-scratch-000
dnnlib: Running training.training_loop.training_loop() on localhost...
[1m[37mStreaming data using training.dataset.TFRecordDataset datasets...[0m
Dataset shape:  [1m[34m[3, 256, 256][0m
Dynamic range:  [1m[34m[0, 255][0m
[1m[37mConstructing networks...[0m
Setting up TensorFlow plugin 'fused_bias_act.cu': Preprocessing... Compiling... Loading... Done.
Setting up TensorFlow plugin 'upfirdn_2d.cu': Preprocessing... Compiling... Loading... Done.

[1m[34mG                                            Params    OutputShape         WeightShape     [0m
---                                          ---       ---                 ---             
[1mltnt_emb/emb[0m                                 512       (16, 32)            (16, 32)        
[1mG_mapping/AttLayer_0[0m                         6336      (?, 32)             -               
[1mG_mapping/Dense0_0[0m                    


[1m[34mD                   Params    OutputShape         WeightShape     [0m
---                 ---       ---                 ---             
[1mltnt_emb/emb[0m        512       (16, 32)            (16, 32)        
[1m256x256/FromRGB[0m     512       (?, 128, 256, 256)  (1, 1, 3, 128)  
[1m256x256/Conv0[0m       147584    (?, 128, 256, 256)  (3, 3, 128, 128)
[1m256x256/Conv1_down[0m  295168    (?, 256, 128, 128)  (3, 3, 128, 256)
[1m256x256/Skip[0m        32768     (?, 256, 128, 128)  (1, 1, 128, 256)
[1m128x128/Conv0[0m       590080    (?, 256, 128, 128)  (3, 3, 256, 256)
[1m128x128/Conv1_down[0m  1180160   (?, 512, 64, 64)    (3, 3, 256, 512)
[1m128x128/Skip[0m        131072    (?, 512, 64, 64)    (1, 1, 256, 512)
[1m64x64/Conv0[0m         2359808   (?, 512, 64, 64)    (3, 3, 512, 512)
[1m64x64/Conv1_down[0m    2359808   (?, 512, 32, 32)    (3, 3, 512, 512)
[1m64x64/Skip[0m          262144    (?, 512, 32, 32)    (1, 1, 512, 512)
[1m32x32/Conv0[0m        

tick [1m20   [0m kimg [1m[31m   161.4[0m   loss/reg: G ([1m[34m 1.289[0m [1m 0.000[0m) D ([1m[34m 1.106[0m [1m 0.084[0m)   grad norms: G ( 4.960  0.000) D ( 4.468  0.422)   time [1m9h 19m 11s[0m sec/kimg  195.81 maxGPU  7.4GB [1mclevr-scratch[0m 
tick [1m21   [0m kimg [1m[31m   169.5[0m   loss/reg: G ([1m[34m 1.260[0m [1m 0.000[0m) D ([1m[34m 1.073[0m [1m 0.086[0m)   grad norms: G ( 5.172  0.000) D ( 4.133  0.437)   time [1m9h 45m 27s[0m sec/kimg  195.48 maxGPU  7.4GB [1mclevr-scratch[0m 
network-snapshot-000169        time 5m 19s      [1m[34m fid 95.8610   [0m
tick [1m22   [0m kimg [1m[31m   177.5[0m   loss/reg: G ([1m[34m 1.310[0m [1m 0.000[0m) D ([1m[34m 1.060[0m [1m 0.095[0m)   grad norms: G ( 6.001  0.000) D ( 3.942  0.458)   time [1m10h 17m 12s[0m sec/kimg  195.48 maxGPU  7.4GB [1mclevr-scratch[0m 
tick [1m23   [0m kimg [1m[31m   185.6[0m   loss/reg: G ([1m[34m 1.316[0m [1m 0.000[0m) D ([1m[34m 1.031[0m [1m

network-snapshot-000387        time 5m 19s      [1m[34m fid 47.2564   [0m
tick [1m49   [0m kimg [1m[31m   395.3[0m   loss/reg: G ([1m[34m 1.068[0m [1m 0.000[0m) D ([1m[34m 1.100[0m [1m 0.082[0m)   grad norms: G ( 5.602  0.000) D ( 2.488  0.615)   time [1m22h 46m 14s[0m sec/kimg  193.18 maxGPU  7.4GB [1mclevr-scratch[0m 
tick [1m50   [0m kimg [1m[31m   403.3[0m   loss/reg: G ([1m[34m 1.079[0m [1m 0.000[0m) D ([1m[34m 1.078[0m [1m 0.089[0m)   grad norms: G ( 6.001  0.000) D ( 2.541  0.615)   time [1m23h 12m 10s[0m sec/kimg  192.97 maxGPU  7.4GB [1mclevr-scratch[0m 
tick [1m51   [0m kimg [1m[31m   411.4[0m   loss/reg: G ([1m[34m 1.051[0m [1m 0.000[0m) D ([1m[34m 1.081[0m [1m 0.085[0m)   grad norms: G ( 6.271  0.000) D ( 2.499  0.605)   time [1m23h 36m 14s[0m sec/kimg  179.05 maxGPU  7.4GB [1mclevr-scratch[0m 
network-snapshot-000411        time 4m 51s      [1m[34m fid 42.5393   [0m
tick [1m52   [0m kimg [1m[31m   419.5[0m

The model has finished training and is currently getting and FID score of 28.95. The weights for this section of the model have been uploaded to our Checkpoints folder on Google Drive. Training or image generation can be resumed by downloading the 604 network snapshot: 

https://drive.google.com/drive/folders/167zK1KXJmceGagUwymjBYiutWbs-7UUi?usp=sharing

-------------------------------------
# Create Dataset for Head CT Scans


In [5]:
# Verify head ct volume mounted in container
!ls head_data | wc -l

170028


In [7]:
!python prepare_data.py \
--task headct \
--images-dir 'head_data' \
--format png --ratio 0.7 \
--max-images 170028 

[1mPreparing the headct dataset...[0m
Loading images from head_data
100%|█████████████████████████████████| 170028/170028 [2:21:20<00:00, 20.05it/s]
[1m[34mCompleted preparations for headct![0m


In [10]:
weights = os.path.join('gansformer', 'results', 'clevr-scratch-000', 'network-snapshot-000604.pkl')

In [23]:
!python run_network.py \
--train \
--gpus 0 \
--ganformer-default \
--expname clevr-scratch \ # Pick up from previous experiment
--dataset headct \
--eval-images-num 10000 \
--keep-samples True \
--metrics fid

[1m[37mResuming clevr-scratch-000, from results/clevr-scratch-000/network-snapshot-000604.pkl, kimg 604[0m
Local submit - run_dir: results/clevr-scratch-000
dnnlib: Running training.training_loop.training_loop() on localhost...
[1m[37mStreaming data using training.dataset.TFRecordDataset datasets...[0m
Dataset shape:  [1m[34m[3, 256, 256][0m
Dynamic range:  [1m[34m[0, 255][0m
[1m[37mLoading networks from results/clevr-scratch-000/network-snapshot-000604.pkl...[0m
Setting up TensorFlow plugin 'fused_bias_act.cu': Loading... Done.
Setting up TensorFlow plugin 'upfirdn_2d.cu': Loading... Done.

[1m[34mG                                            Params    OutputShape         WeightShape     [0m
---                                          ---       ---                 ---             
[1mltnt_emb/emb[0m                                 512       (16, 32)            (16, 32)        
[1mG_mapping/AttLayer_0[0m                         6336      (?, 32)             -     


[1m[34mD                   Params    OutputShape         WeightShape     [0m
---                 ---       ---                 ---             
[1mltnt_emb/emb[0m        512       (16, 32)            (16, 32)        
[1m256x256/FromRGB[0m     512       (?, 128, 256, 256)  (1, 1, 3, 128)  
[1m256x256/Conv0[0m       147584    (?, 128, 256, 256)  (3, 3, 128, 128)
[1m256x256/Conv1_down[0m  295168    (?, 256, 128, 128)  (3, 3, 128, 256)
[1m256x256/Skip[0m        32768     (?, 256, 128, 128)  (1, 1, 128, 256)
[1m128x128/Conv0[0m       590080    (?, 256, 128, 128)  (3, 3, 256, 256)
[1m128x128/Conv1_down[0m  1180160   (?, 512, 64, 64)    (3, 3, 256, 512)
[1m128x128/Skip[0m        131072    (?, 512, 64, 64)    (1, 1, 256, 512)
[1m64x64/Conv0[0m         2359808   (?, 512, 64, 64)    (3, 3, 512, 512)
[1m64x64/Conv1_down[0m    2359808   (?, 512, 32, 32)    (3, 3, 512, 512)
[1m64x64/Skip[0m          262144    (?, 512, 32, 32)    (1, 1, 512, 512)
[1m32x32/Conv0[0m        

tick [1m21   [0m kimg [1m[31m   773.5[0m   loss/reg: G ([1m[34m 1.353[0m [1m 0.000[0m) D ([1m[34m 0.895[0m [1m 0.125[0m)   grad norms: G ( 7.998  0.000) D ( 2.252  0.726)   time [1m9h 05m 28s[0m sec/kimg  179.17 maxGPU  7.5GB [1mclevr-scratch[0m 
network-snapshot-000773        time 4m 53s      [1m[34m fid 49.1943   [0m
tick [1m22   [0m kimg [1m[31m   781.5[0m   loss/reg: G ([1m[34m 1.331[0m [1m 0.000[0m) D ([1m[34m 0.903[0m [1m 0.113[0m)   grad norms: G ( 7.178  0.000) D ( 2.184  0.754)   time [1m9h 34m 36s[0m sec/kimg  179.29 maxGPU  7.5GB [1mclevr-scratch[0m 
tick [1m23   [0m kimg [1m[31m   789.6[0m   loss/reg: G ([1m[34m 1.307[0m [1m 0.000[0m) D ([1m[34m 0.876[0m [1m 0.110[0m)   grad norms: G ( 6.836  0.000) D ( 2.094  0.742)   time [1m9h 58m 42s[0m sec/kimg  179.29 maxGPU  7.5GB [1mclevr-scratch[0m 
tick [1m24   [0m kimg [1m[31m   797.7[0m   loss/reg: G ([1m[34m 1.367[0m [1m 0.000[0m) D ([1m[34m 0.843[0m [1m 

tick [1m49   [0m kimg [1m[31m   999.3[0m   loss/reg: G ([1m[34m 1.261[0m [1m 0.000[0m) D ([1m[34m 0.908[0m [1m 0.121[0m)   grad norms: G ( 7.488  0.000) D ( 2.255  1.046)   time [1m21h 12m 54s[0m sec/kimg  180.20 maxGPU  7.5GB [1mclevr-scratch[0m 
tick [1m50   [0m kimg [1m[31m  1007.3[0m   loss/reg: G ([1m[34m 1.260[0m [1m 0.000[0m) D ([1m[34m 0.899[0m [1m 0.114[0m)   grad norms: G ( 7.450  0.000) D ( 2.162  1.038)   time [1m21h 37m 11s[0m sec/kimg  180.59 maxGPU  7.5GB [1mclevr-scratch[0m 
tick [1m51   [0m kimg [1m[31m  1015.4[0m   loss/reg: G ([1m[34m 1.298[0m [1m 0.000[0m) D ([1m[34m 0.900[0m [1m 0.106[0m)   grad norms: G ( 7.554  0.000) D ( 2.050  1.038)   time [1m22h 01m 29s[0m sec/kimg  180.82 maxGPU  7.5GB [1mclevr-scratch[0m 
network-snapshot-001015        time 4m 49s      [1m[34m fid 37.1312   [0m
tick [1m52   [0m kimg [1m[31m  1023.5[0m   loss/reg: G ([1m[34m 1.332[0m [1m 0.000[0m) D ([1m[34m 0.859[0m [

-------------------
# Generate Samples for Classifier

In [36]:
!python generate.py --gpus 0 --model network-snapshot-001087.pkl --output-dir finals --images-num 1000

Loading networks...
Setting up TensorFlow plugin 'fused_bias_act.cu': Loading... Done.
Setting up TensorFlow plugin 'upfirdn_2d.cu': Loading... Done.

[1m[34mGs                                           Params    OutputShape         WeightShape     [0m
---                                          ---       ---                 ---             
[1mltnt_emb/emb[0m                                 512       (16, 32)            (16, 32)        
[1mG_mapping/AttLayer_0[0m                         6336      (?, 32)             -               
[1mG_mapping/Dense0_0[0m                           1056      (?, 32)             (32, 32)        
[1mG_mapping/Dense0_1[0m                           1056      (?, 32)             (32, 32)        
[1mG_mapping/AttLayer_1[0m                         6336      (?, 32)             -               
[1mG_mapping/Dense1_0[0m                           1056      (?, 32)             (32, 32)        
[1mG_mapping/Dense1_1[0m                          

100%|██████████| 1000/1000 [01:49<00:00,  9.16image (125 batches of 8 images)/s]
Saving images...
100%|███████████████████████████████████████| 1000/1000 [00:13<00:00, 72.59it/s]
