Skip to content

Materials and Methods

Jimena Lozano edited this page Sep 26, 2021 · 13 revisions

This project aims to build an interface that allows adequate navigation of the latent space generated by StyleGAN and generate a personalized output controlled by the user. This will require complying with all the computational, software and hardware requirements, which StyleGAN implies, and at the same time, making it available to the laboratory.

Software and Hardware

Hardware

For a machine learning development environment, one of the most important ingredient is a very powerful compute level: High-performance CPUs and GPUs to train models. Because StyleGAN is a project developed by NVIDIA, it can only work on NVIDIA GPUs. It is required that one or more high-end NVIDIA GPUs with at least 11GB of DRAM are used. StyleGAN recommends NVIDIA DGX-1 with 8 Tesla V100 GPUs. According to this research [https://www.chrisplaysgames.com/gadgets/2019/02/26/training-at-home-and-in-the-cloud/], comparing with other NVIDIA GPU models performance has resulted:

GPUs-perf

The following table shows training times for a V100 GPU. At ITBA we have an NVIDIA Titan XP GPU, that has almost the same number of CUDA cores and other specs as the GTX 1080, so training time is expected to be 2.29x as follows:

readme

Using an NVIDIA GPU is not the only requirement. GPU should also support:

  • NVIDIA driver 391.35 or newer,
  • CUDA toolkit 9.0 or newer,
  • cuDNN 7.3.1 or newer.

Software

A challenge with machine learning development environments is that they rely on complex and continuously evolving open source machine learning frameworks and toolkits, and complex and continuously evolving hardware ecosystems. The frameworks and toolkits required for StyleGAN are:

  • 64-bit Python 3.6 installation. Anaconda3 is recommended, with numpy 1.14.3 or newer.
  • TensorFlow 1.10.0 or newer with GPU support.

Another software requirement is the use of both Linux or Windows, but Linux is strongly recommended for performance and compatibility reasons.

Solution Design

Our solution consists of developing an application that is hosted in the ITBA Titan GPU, where the StyleGAN environment is set and the training of the network and generation of results can be obtained. This application will be an API Rest that will serve as an interface to the methods that can be used from StyleGAN2, using it as a library, with a fixed pre trained network. For the laboratory, another application will be developed in order to have a layer of usability more appealing to the researchers at the lab. These two application will work together, the front application consuming the StyleGAN API, and ultimately the user will only see and use the front application at the lab.

The following architecture illustrates how the application flow works:

The user will generate images from the pre trained network, and use the transition from one face (let's call it the "original" one) to another (the "destination") to make use of the style mixing properties of the latent space and generate similar faces of the original face, while also controlling which features to mix and which to maintain the same (or almost the same), by selecting the destination face to which the original face should transition to.

Python and StyleGAN2's repository

From StyleGAN2's public repository at Github, the repository is downloaded and used as a library in methods and scripts programmed in Python for the API. The only script programmed to use StyleGAN2 is encapsulated in a Generator class, which needs all the input the StyleGAN2 methods need to download and initialize the network in Tensorflow. Once the Generator class and the network are running, the following methods can be used:

  • generate_random_images: receives as input the number of images to generate, and a random seed to generate those images from.
  • generate_transition: receives the seed corresponding to the original face, the starting point of the transition, and the seed corresponding to the destination face, to which the transition will be directed. It also receives a speed scalar and a number of faces to generate in the transition. With these two parameters, the steps in the transition are defined: step = (seed_to - seed_from) * speed / qty

Clone this wiki locally