-
Notifications
You must be signed in to change notification settings - Fork 0
Materials and Methods
This project aims to build an interface that allows adequate navigation of the latent space generated by StyleGAN and generate a personalized output controlled by the user. This will require complying with all the computational, software and hardware requirements, which StyleGAN implies, and at the same time, making it available to the laboratory.
For a machine learning development environment, one of the most important ingredient is a very powerful compute level: High-performance CPUs and GPUs to train models. Because StyleGAN is a project developed by NVIDIA, it can only work on NVIDIA GPUs. It is required that one or more high-end NVIDIA GPUs with at least 11GB of DRAM are used. StyleGAN recommends NVIDIA DGX-1 with 8 Tesla V100 GPUs. According to this research [https://www.chrisplaysgames.com/gadgets/2019/02/26/training-at-home-and-in-the-cloud/], comparing with other NVIDIA GPU models performance has resulted:

The following table shows training times for a V100 GPU. At ITBA we have an NVIDIA Titan XP GPU, that has almost the same number of CUDA cores and other specs as the GTX 1080, so training time is expected to be 2.29x as follows:

Using an NVIDIA GPU is not the only requirement. GPU should also support:
- NVIDIA driver 391.35 or newer,
- CUDA toolkit 9.0 or newer,
- cuDNN 7.3.1 or newer.
A challenge with machine learning development environments is that they rely on complex and continuously evolving open source machine learning frameworks and toolkits, and complex and continuously evolving hardware ecosystems. The frameworks and toolkits required for StyleGAN are:
- 64-bit Python 3.6 installation. Anaconda3 is recommended, with numpy 1.14.3 or newer.
- TensorFlow 1.10.0 or newer with GPU support.
Another software requirement is the use of both Linux or Windows, but Linux is strongly recommended for performance and compatibility reasons.
Our solution consists of developing a web application that is hosted in the ITBA Titan GPU, where the StyleGAN environment is set and the training of the network and generation of results can be obtained via HTTP requests. This web application will be an API Rest that will serve as an interface to the methods that can be used from StyleGAN2, using it as a library, with a fixed pre-trained network. For the laboratory, another application will be developed in order to have a layer of usability more appealing to the researchers at the lab. These two applications will work together, the front application consuming the StyleGAN API, or the back-end application, and ultimately the user will only see and use the front application at the lab. The back-end application will be consulted via HTTP requests to perform the different functionalities, and provide the necessary responses. Besides using the StyleGAN network, the back-end application will persist the faces generated by the network in a in-memory database, to keep track of all the faces the user will be able to experiment with and eventually ask for similar faces of a particular previously generated face.
The user will generate images from the pre trained network, and use the transition from one face (let's call it the "original" one) to another (the "destination") to make use of the style mixing properties of the latent space and generate similar faces of the original face, while also controlling which features to mix and which to maintain the same (or almost the same), by selecting the destination face to which the original face should transition to.
From StyleGAN2's public repository at Github, the repository is downloaded and used as a library in methods and scripts programmed in Python for the API. The only script programmed to use StyleGAN2 is encapsulated in a Generator class, which needs all the input the StyleGAN2 methods need to download and initialize the network in Tensorflow. Once the Generator class and the network are running, the following methods can be used:
- generate_random_images: receives as input the number of images to generate, and a random seed to generate those images from.
- generate_transition: receives the seed corresponding to the original face, the starting point of the transition, and the seed corresponding to the destination face, to which the transition will be directed. It also receives a speed scalar and a number of faces to generate in the transition. With these two parameters, the steps in the transition are defined:
step = (seed_to - seed_from) * speed / qty
A service layer was programmed for the web application to consume. It instantiates the Generator and the database, to persist the generated faces using the seed the Generator used, and associating them with an identification number, that will be returned to the user to ID the faces and later make the transitions with the Generator.
The proposed solution does not consider the training of a custom dataset as an input. The StyleGAN network used to generate images is a pre trained network provided by NVIDIA researchers, available to download through a Google Drive link. The dataset used to train this network, Flickr-Faces-HQ (FFHQ), is a high-quality image dataset of human faces, originally created for StyleGAN. The dataset consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains a high diversity in terms of age, ethnicity and image background. It also considers accessories such as eyeglasses, hats, etc. The images are from a website called Flickr, available to store and share photos, thus the dataset contains all the biases of that website. Filters were used to prune the set, to remove the occasional statues, paintings, or photos of photos. Finally, the dataset was cropped and aligned and used to train the network.
The database is an in-memory database using a SQL motor called SQLite3. This reduces latency in accessing the database, as it is part of the application instead of using a server-client database hosted outside the application's environment. As mentioned above, it contains a table that associates an ID with a Seed as columns, to persist as a row every face generated with the seed as a unique long number, and the ID as a primary key. When a face is identified, the seed can be consulted by connecting to the database, and later mapped in the latent space using the Generator's methods.
Flask is a Python framework used for web applications. Using Flask, an API Rest was developed with endpoints that use the Generator Service. The following endpoints are declared:
- GET
/faces: to obtain the ids of the generated faces stored in the database. - POST
/faces?id={id}: to generate the face of the givenidand saved in the results folder. - POST
/faces?amount={amount}: to generate a randomamountof faces and save them in the results folder. - POST
/faces?amount={amount}&id1={id1}&id2={id2}&speed={speed}: to generate a transition from the face ofid1to the face ofid2, using speed and amount as the parameters needed for the transition.id2is an optional parameter, which in case it is not given, a random face will be chosen to direct the transition to.
Although Flask is a great framework to declare the endpoints needed for the API, it is not a web server. For those purposes we used Gunicorn, in order to receive the HTTP requests and use the Flask API. It can be argued that Flask contains a default WSGI (Web Server Gateway Interface): Werkzeug's development WSGI server. However, this development server is not intended for use in production. It is not designed to be particularly efficient, stable, or secure, and it does not support all the possible features of a HTTP server, needed to use in production. The Werkzeug development server was replaced with a production-ready WSGI server such as Gunicorn. Once the Gunicorn server is running, the front application can make all the requests specified above.