<a href="https://colab.research.google.com/github/serhatataman/ProjectAmbitionColab/blob/main/Project_Ambition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Ambition



## Setup

The type of GPU assigned to you by Colab will greatly affect your training time. Some sample times that I achieved with Colab are given here. I've found that Colab Pro generally starts you with a V100, however, if you run scripts non-stop for 24hrs straight for a few days in a row, you will generally be throttled back to a P100.

*   1024x1024 - V100 - 566 sec/tick (CoLab Pro)
*   1024x1024 - P100 - 1819 sec/tick (CoLab Pro)
*   1024x1024 - T4 - 2188 sec/tick (CoLab Free)


If you use Google CoLab Pro, generally, it will not disconnect before 24 hours, even if you (but not your script) are inactive. Free CoLab WILL disconnect a perfectly good running script if you do not interact for a few hours. The following describes how to circumvent this issue.


Note: if this step gives `NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running` error, select GPU as hardware accelerator in `Edit > Notebook settings`.

In [20]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Sat Jan  8 01:44:23 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [21]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 27.3 gigabytes of available RAM

You are using a high-RAM runtime!


### Mount to the Google Drive
```
/content/drive/MyDrive/data
```

Use ```ls``` command to establish the exact path for your images.

In [22]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Using Google CoLab")
except:
    print("Not using Google CoLab")
    COLAB = False

Mounted at /content/drive
Using Google CoLab


In [None]:
!ls /content/drive/MyDrive

'Colab Notebooks'			  'Master Application Docs.'
'Copy of [#1 Sharing Session] TSO.docx'    Personal
'Copy of [#4 Re-Entry Session] TSO.docx'   ProjectAmbition
'CS:GO config'				  'Reflection groups guideline.gdoc'
 CV					   tests
'Graduation Thesis'			   UDI
 ielts					   Wallpapers



### Import NVIDIA stylegan3

If the repo is already installed, it will skip the installation process and change into the repo’s directory. If not, it will install all the files necessary.
Also, create `downloads` and `datasets` folders.

In [None]:
import os
if os.path.isdir("/content/drive/MyDrive/ProjectAmbition"):
    %cd "/content/drive/MyDrive/ProjectAmbition"
else:
    #install script
    %cd "/content/drive/MyDrive/"
    !mkdir ProjectAmbition
    %cd ProjectAmbition
    !git clone https://github.com/NVlabs/stylegan3
    !mkdir downloads
    !mkdir datasets
    !mkdir raw_data

/content/drive/MyDrive/ProjectAmbition


In [None]:
%cd "/content/drive/MyDrive/ProjectAmbition/stylegan3"
!git config --global user.name "serhatataman"
!git config --global user.email "serhatataman13@hotmail.com"
!git fetch origin
!git checkout origin/main -- train.py

/content/drive/MyDrive/ProjectAmbition/stylegan3



### Checking if directories/files exist

In [None]:
import os

if not os.path.exists("/content/drive/MyDrive/ProjectAmbition/stylegan3/dataset_tool.py"):
  raise FileNotFoundError("dataset_tool.py file does not exist!")

if not os.path.exists("/content/drive/MyDrive/ProjectAmbition/raw_data"):
  raise FileNotFoundError("raw_data folder does not exist!")

if not os.path.exists("/content/drive/MyDrive/ProjectAmbition/dataset"):
  print("dataset folder does not exist! creating the folder...")
  os.mkdir("/content/drive/MyDrive/ProjectAmbition/dataset")

if not os.path.exists("/content/drive/MyDrive/ProjectAmbition/output"):
  print("output folder does not exist! creating the folder...")
  os.mkdir("/content/drive/MyDrive/ProjectAmbition/output")

## Fetch images

### Download from webscrap of WikiArt

In [None]:
import urllib
import re
from bs4 import BeautifulSoup
import time

file_path = "/content/drive/MyDrive/ProjectAmbition/raw_data"
base_url = "https://www.wikiart.org"

# iterate through all artists by last name alphabetically
for c in range(ord('a'), ord('z') + 1):
    char = chr(c)
    artist_list_url = base_url + '/en/Alphabet/' + char + '/text-list'

    genre_soup = BeautifulSoup(urllib.request.urlopen(artist_list_url), "lxml")
    artist_list_main = genre_soup.find("main")
    lis = artist_list_main.find_all("li")

    # for each list element
    for li in lis:

        # get the date range
        for line in li.text.splitlines():
            # if line.startswith(",") and "-" in line:
            #     parts = line.split('-')
            #     if len(parts) == 2:
            #         born = int(re.sub("[^0-9]", "", parts[0]))
            #         died = int(re.sub("[^0-9]", "", parts[1]))

            # look for artists who may have created work that could in public domain
            # if born > 1850 and died > 0 and (born < 1900 or died < 1950):

            link = li.find("a")
            if link is None:
                continue

            artist = link.attrs["href"]

            # get the artist's main page
            artist_url = base_url + artist
            artist_soup = BeautifulSoup(urllib.request.urlopen(artist_url), "lxml")

            # only look for artists with the word abstract on their main page
            if "Abstract" in artist_soup.text or "abstract" in artist_soup.text or "Avant-garde" \
                    in artist_soup.text or "avant-garde" in artist_soup.text:
                print('Artist: ' + artist)

                # get the artist's web page for the artwork
                url = base_url + artist + '/all-works/text-list'

                try:
                    artist_work_soup = BeautifulSoup(urllib.request.urlopen(url), "lxml")
                except:
                    print("Error retrieving artist's work list. Url was: " + url)
                    continue

                # get the main section
                artist_main = artist_work_soup.find("main")
                image_count = 0
                artist_name = artist.split("/")[2]

                # get the list of artwork
                lis = artist_main.find_all("li")

                # for each list element
                for li in lis:
                    link = li.find("a")

                    if link != None:
                        painting = link.attrs["href"]

                        # get the painting
                        url = base_url + painting
                        # print('Painting base url: ' + url)

                        try:
                            painting_soup = BeautifulSoup(urllib.request.urlopen(url), "lxml")

                        except:
                            print("error retrieving page")
                            continue

                        # check the copyright
                        if "Public domain" in painting_soup.text:

                            # check the genre
                            genre = painting_soup.find("span", {"itemprop": "genre"})
                            if genre != None and genre.text == "abstract":

                                # get the url
                                og_image = painting_soup.find("meta", {"property": "og:image"})
                                image_url = og_image["content"].split("!")[0]  # ignore the !Large.jpg at the end

                                save_path = file_path + "/" + artist_name + "_" + str(image_count) + ".jpg"

                                # download the file
                                try:
                                    print(f"Downloading {image_url} to {save_path}")
                                    time.sleep(0.2)  # try not to get a 403
                                    urllib.request.urlretrieve(image_url, save_path)
                                    image_count = image_count + 1
                                except Exception as e:
                                    print("Failed downloading " + image_url, e)


### Download zip of art pieces

In [None]:
import requests

zip_file_url = 'http://web.fsktm.um.edu.my/~cschan/source/ICIP2017/wikiart.zip'

# NOTE the stream=True parameter below
with requests.get(zip_file_url, stream=True) as response:
  response.raise_for_status()
  handle = open('/content/drive/MyDrive/ProjectAmbition/downloads/data.zip', "wb")
  for chunk in response.iter_content(chunk_size=8192): 
      # If you have chunk encoded response uncomment if
      # and set chunk_size parameter to None.
      #if chunk: 
      handle.write(chunk)

  handle.close()

# Note that the number of bytes returned using iter_content is not exactly the chunk_size;
# it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.
# See body-content-workflow and Response.iter_content for further reference.

print('Download completed...')

Download completed...


Unzip downloaded files

In [18]:
import zipfile

zip_filename = "/content/drive/MyDrive/ProjectAmbition/downloads/data.zip"
directory_to_extract_to = "/content/drive/MyDrive/ProjectAmbition/downloads/"

with zipfile.ZipFile(zip_filename, "r") as zip_ref:
    for name in zip_ref.namelist():
        try:
            zip_ref.extract(name, directory_to_extract_to)
        except (Exception, zipfile.BadZipFile) as e:
            print("A file is corrupted. Filename: " + str(name))
            print(e)

print("Unzip successful...")

A file is corrupted. Filename: wikiart/Baroque/rembrandt_woman-standing-with-raised-hands.jpg
Bad CRC-32 for file 'wikiart/Baroque/rembrandt_woman-standing-with-raised-hands.jpg'
A file is corrupted. Filename: wikiart/Post_Impressionism/vincent-van-gogh_l-arlesienne-portrait-of-madame-ginoux-1890.jpg
Error -3 while decompressing data: invalid block type
Unzip successful...


## Cleanup images

See what dataset_tool.py can do

In [None]:
cmd = f"/usr/bin/python3 /content/drive/MyDrive/ProjectAmbition/stylegan3/dataset_tool.py --help"
!{cmd}

### Remove frames

Remove frames if there are any. This process overrides the picture with frames removed.

In [None]:
import os
import numpy as np
from PIL import Image
from scipy.spatial import distance

# This python script removes pictures' frames if there are any

from_path = '/content/drive/MyDrive/ProjectAmbition/raw_data/'
to_path = '/content/drive/MyDrive/ProjectAmbition/raw_data/'


def find_left():
    left = 0
    for i in range(0, w_pad):
        r_stdev = np.std(np_img[h_pad:-h_pad, i:i + 1, 0:1])
        g_stdev = np.std(np_img[h_pad:-h_pad, i:i + 1, 1:2])
        b_stdev = np.std(np_img[h_pad:-h_pad, i:i + 1, 2:3])
        if r_stdev * r_stdev + g_stdev * g_stdev + b_stdev * b_stdev > thresh1:
            break

        r_med = np.median(np_img[h_pad:-h_pad, i:i + 1, 0:1])
        g_med = np.median(np_img[h_pad:-h_pad, i:i + 1, 1:2])
        b_med = np.median(np_img[h_pad:-h_pad, i:i + 1, 2:3])
        dst = distance.euclidean((r_med, g_med, b_med), (r_global_med, g_global_med, b_global_med))
        if dst < thresh2:
            break

        left = left + 1
    return left


def find_top():
    top = 0
    for i in range(0, h_pad):
        r_stdev = np.std(np_img[i:i + 1, w_pad:-w_pad, 0:1])
        g_stdev = np.std(np_img[i:i + 1, w_pad:-w_pad, 1:2])
        b_stdev = np.std(np_img[i:i + 1, w_pad:-w_pad, 2:3])
        if r_stdev * r_stdev + g_stdev * g_stdev + b_stdev * b_stdev > thresh1:
            break

        r_med = np.median(np_img[i:i + 1, w_pad:-w_pad, 0:1])
        g_med = np.median(np_img[i:i + 1, w_pad:-w_pad, 1:2])
        b_med = np.median(np_img[i:i + 1, w_pad:-w_pad, 2:3])
        dst = distance.euclidean((r_med, g_med, b_med), (r_global_med, g_global_med, b_global_med))
        if dst < thresh2:
            break

        top = top + 1
    return top


def find_right(right):
    right = w
    for i in range(0, w_pad):
        r_stdev = np.std(np_img[h_pad:-h_pad, w - i - 1:w - i, 0:1])
        g_stdev = np.std(np_img[h_pad:-h_pad, w - i - 1:w - i, 1:2])
        b_stdev = np.std(np_img[h_pad:-h_pad, w - i - 1:w - i, 2:3])
        if r_stdev * r_stdev + g_stdev * g_stdev + b_stdev * b_stdev > thresh1:
            break

        r_med = np.median(np_img[h_pad:-h_pad, w - i - 1:w - i, 0:1])
        g_med = np.median(np_img[h_pad:-h_pad, w - i - 1:w - i, 1:2])
        b_med = np.median(np_img[h_pad:-h_pad, w - i - 1:w - i, 2:3])
        dst = distance.euclidean((r_med, g_med, b_med), (r_global_med, g_global_med, b_global_med))
        if dst < thresh2:
            break

        right = right - 1
    return right


def find_bottom(bottom):
    for i in range(0, h_pad):
        r_stdev = np.std(np_img[h - i - 1:h - i, w_pad:-w_pad, 0:1])
        g_stdev = np.std(np_img[h - i - 1:h - i, w_pad:-w_pad, 1:2])
        b_stdev = np.std(np_img[h - i - 1:h - i, w_pad:-w_pad, 2:3])
        if r_stdev * r_stdev + g_stdev * g_stdev + b_stdev * b_stdev > thresh1:
            break

        r_med = np.median(np_img[h - i - 1:h - i, w_pad:-w_pad, 0:1])
        g_med = np.median(np_img[h - i - 1:h - i, w_pad:-w_pad, 1:2])
        b_med = np.median(np_img[h - i - 1:h - i, w_pad:-w_pad, 2:3])
        dst = distance.euclidean((r_med, g_med, b_med), (r_global_med, g_global_med, b_global_med))
        if dst < thresh2:
            break

        bottom = bottom - 1
    return bottom


for file in os.listdir(from_path):
    path = os.path.join(from_path, file)
    img = Image.open(path)
    file_name, file_extension = os.path.splitext(path)
    print('Removing frames for image:', file_name + file_extension)

    np_img = np.asarray(img)
    # print("shape = " + str(np_img.shape))

    thresh1 = 15000
    thresh2 = 30
    w = img.width
    h = img.height
    pad = 30
    w_pad = w // pad
    h_pad = h // pad

    r_global_med = np.median(np_img[h_pad:-h_pad, w_pad:-w_pad, 0:1])
    g_global_med = np.median(np_img[h_pad:-h_pad, w_pad:-w_pad, 1:2])
    b_global_med = np.median(np_img[h_pad:-h_pad, w_pad:-w_pad, 2:3])

    left = find_left()
    top = find_top()
    right = find_right(w)
    bottom = find_bottom(h)

    # print("left = " + str(left) + ", top = " + str(top) +
      #    ", right = " + str(right) + ", bottom = " + str(bottom) + "\n")

    # img.save(to_path + file)  # save the original
    cropped_img = img.crop((left, top, right, bottom))
    cropped_img.save(file_name + file_extension)  # and the cropped version


Removing frames for image: /content/drive/MyDrive/ProjectAmbition/raw_data/peasant-and-horse-1910.jpg
Removing frames for image: /content/drive/MyDrive/ProjectAmbition/raw_data/horses.jpg
Removing frames for image: /content/drive/MyDrive/ProjectAmbition/raw_data/spring-1907.jpg
Removing frames for image: /content/drive/MyDrive/ProjectAmbition/raw_data/the-rise-of-green-square-and-the-woman-s-violin-1916.jpg


### Resize images

Resize images to 1024x1024 and override them.

In [None]:
import os
from PIL import Image

image_source_path = '/content/drive/MyDrive/ProjectAmbition/raw_data/'
image_target_path = '/content/drive/MyDrive/ProjectAmbition/raw_data/'

# We are using Pillow to resize all images to our desired size

for filename in os.listdir(image_source_path):
    path = os.path.join(image_source_path, filename)
    image = Image.open(path).resize((1024, 1024), Image.ANTIALIAS)

    resized_image = image.save(image_target_path + filename)
    print(f"{filename} image is resized...")


## Convert images

Convert raw data images to dataset by using StyleGAN3's built-in dataset_tool.py

In [None]:
!python /content/drive/MyDrive/ProjectAmbition/stylegan3/dataset_tool.py --source /content/drive/MyDrive/ProjectAmbition/raw_data/ --dest /content/drive/MyDrive/ProjectAmbition/dataset/ --resolution=1024x1024

100% 4/4 [00:01<00:00,  2.90it/s]


The following command can be used to clear out the newly created dataset. If something goes wrong and you need to clean up your images and rerun the above command, you should delete your partially created dataset directory.

In [None]:
# !rm -R /content/drive/MyDrive/ProjectAmbition/dataset/*

## Training

### Initial training

In [None]:
cmd = f"/usr/bin/python3 /content/drive/MyDrive/ProjectAmbition/stylegan3/train.py --help"
!{cmd}

Ninja is a required library. Must be installed

In [None]:
!pip install ninja

In [None]:
!pip install torch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 torchtext==0.10.0
!pip install transformers==4.8.0

In [None]:
import os

# !export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128

# Modify these to suit your needs
OUTPUT = "/content/drive/MyDrive/ProjectAmbition/output"
DATA = "/content/drive/MyDrive/ProjectAmbition/dataset"
# Snap: How often should the model generate samples and a .pkl file
SNAP = 4
# Mirrored: Should the images be mirrored left to right?
MIRRORED = True


# Build the command and run it
cmd = f"/usr/bin/python3 /content/drive/MyDrive/ProjectAmbition/stylegan3/train.py --cfg=stylegan3-t --gpus=1 --batch=8 --gamma=8.2 --snap {SNAP} --outdir {OUTPUT} --data {DATA} --mirror={MIRRORED}"
!{cmd}

### Resume training

In [None]:
import os


# Modify these to suit your needs
OUTPUT = "/content/drive/MyDrive/ProjectAmbition/output"
NETWORK = "network-snapshot-000100.pkl"
RESUME = os.path.join(OUTPUT, "00008-circuit-auto1-resumecustom", NETWORK)
DATA = "/content/drive/MyDrive/ProjectAmbition/dataset"
# Snap: How often should the model generate samples and a .pkl file
SNAP = 4
# Mirrored: Should the images be mirrored left to right?
MIRRORED = True


# Build the command and run it
cmd = f"/usr/bin/python3 /content/drive/MyDrive/ProjectAmbition/stylegan3/train.py --cfg=stylegan3-t --gpus=8 --batch=32 --gamma=8.2 --snap {SNAP} --resume {RESUME} --outdir {OUTPUT} --data {DATA} --mirror={MIRRORED}"
!{cmd}

## Generate image