Skip to content

ilyak93/SinGan

Repository files navigation

SinGAN

image

As a final project in the course “Digital Image Processing” (Computer Science faculty, Technion - Israel Institute for Technology) we were given list of papers, one of which we had to choose to try to use it for a different task or improve performance. The idea of this project isn’t to came with an article, but to investigate in details the paper and try your own ideas, analyze and report the outcomes.

I had chosen the paper “SinGAN: Learning a Generative Model from a Single Natural Image” (Tamar Rott Shaham, Tali Dekel, Tomer Michaeli). That was a natural choice for me as it was a great opportunity to refresh and expand my theoretical knowledge and practical skills in the domain of deep learning, which I’m a great enthusiast of.

(*) Links for the original work:

Project | Arxiv | CVF | Supplementary materials | Talk (ICCV`19)

One idea I’ve chosen to implement is adding an attention mechanism to the GAN on each of the pyramidal levels for an attempt to include the ability for the network to perceive features across the image (as attentions mechanism do) and measuring the performance with the same measurements the original SinGAN was measured, SIFID (a variant of Frechet Inception Distance) and RMSE.

In some cases better SIFID was achieved for a particular architecture with attention on some examples, although visually it wasn’t noticeable and it was too memory-expensive to test it on the whole test dataset SinGAN was tested on.

Another idea I’ve also examined in this project, is to create multiple animations from one in the same manner SinGAN creates multiple similar images from one. The difference from the paper approach is to add sequence-memory to the method instead of only do a walk in learned z-space of one image. For this, I’ve used RNN/LSTM-like architectures and expanded the pyramidal SinGAN architecture.

In both implementation a noticeable bottleneck for the runs was a lack of GPU memory, which I handled by investigating less costly attentions mechanism for the first idea, and pruning training on the finest scales of the pyramidal training for the second (among other technical tweaks). The report at first briefly introduce the work and the motivation for it, then overlooks and explains to some level the used mechanisms and networks, later reproduces closely enough the results of the paper on one image of originally 50.

Here are some demonstrations of the main results:

Image Generation (with similiar results to the reproduced results and a little bit better):

image

image

image

Super-Resolution (ours gets better NIQE results)::

image

Finally, here are presented some gifs we created in the proposed approach (with the memory restrictions we had):

The original (regular and reverted):

Generated:

About

moderating the original algorithm & model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages