Skip to content

nivedwho/GSoC-2021-TF-GAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Google Summer of Code 2021 - TensorFlow

Logo

Improving TensorFlow GAN library

Project Link | TF-GAN Library | Blog Post

Project Abstract

TensorFlow GAN is a lightweight library that provides a convenient way to train and evaluate GAN models.The main aim of this project is to update the TF-GAN library by adding more recent variants of GANs that has more applications. For this we selected ESRGAN1 for the task of Image-Super-Resolution and ControlGAN2 for Text-to-Image translation. Along with these examples we also add more functionalities to TF-GAN library that can help in the process of training, evaluation or inference of GAN models. This project also aims to add notebook tutorials for these examples which can also help users to gain insights into the implementation, training and evaluation process for these models and can help in exploring different useful features of the TF-GAN library. This also allows users to train the models directly on Google Colaboratory.

Project Scope

ESRGAN1 - Enhanced Super-Resolution Adversarial Network

Image Super-Resolution is the process of reconstructing high resolution (HR) image from a given low resolution (LR) image. Such a task has numerous application in today's world. The Super-Resolution GAN model was a major breathrough in this field and was capable of generating photorealistic images, however the model also generated artifacts that reduced the overall visual quality. To overcome this, the ESRGAN1 model was proposed with three major changes made to the SRGAN model :

  1. Using Residual-in-Residual Dense Block (RRDB) without batch normalization as basic network building unit
  2. Using an improved method to calculate adversarial loss used in RelativisticGAN
  3. Improving perceptual loss function by using features before activation.

Through this project, the ESRGAN model was added as an example to TF-GAN library (#47). Additionally two notebook files for end-to-end training of the model on GPU as well as TPU are also implemented which can be directly run on Google Colaboratory (#48). The model was trained on the DIV2K dataset and was able to achieve great results. Some of the results obtained are displayed in the tutorial notebook.Evaluation metrics such as FID and Inception Scores, for evaluating the model was also calculated using TF-GAN. The Relativistic Average GAN loss used in the model was also added as a loss function to TF-GAN (#46). The ESRGAN example has not been merged to TF-GAN yet, and is in review.

ControlGAN2 - Controllable Text-to-Image Generation

The Controllable text-to-image Generative Adversarial Network is used for the task of generating high-quality images based on textual descriptions and can make changes to certain visual attributes of the image based on the same. This can potentially have numerous applications in areas such as art generation, UI designing and image editing. The generator of ControlGAN makes use of two attention modules - Spatial Attention Module and Channel-Wise Attention module. The discriminator used is also different from other GAN networks, and checks the correlation between subregions of the generated image and the descriptions. Perceptual loss function is also used for improving the quality of the generated images.

This is a work in progress and although the basic implementation is completed, the model is currently being trained on the Caltech-UCSD Birds dataset. Once the training process is done it will also be added to the TF-GAN library.

What's Next?

Currently almost all the text-to-image generation models are being trained on datasets such as CUB and COCO for benchmarking their performance and as far as we know only results for such models are publicly available. Once the implementation of ControlGAN is completed, we plan to extend it to serve some real-world applications in areas such as art generation or image editing and for doing so we are looking for other relevant datasets to train the model. At the same time we are also looking for ways to improve its performance.

At present there are not a lot of publicly available resources exploring the area of text-image generation and as a result we are also planning to publish a tutorial / blog post discussing the implementation, training process of ControlGAN and the results obtained.

Acknowledgement

I would like to thank Margaret Maynard Reid and Joel Shor for their valuable guidance and mentorship. I would like to also thank Google Cloud Platform and TPU Research Cloud for extending their support which has helped in accelerating the development of this project.

Updates:

Reference

[1] ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

[2] Controllable Text-to-Image Generation