PyTorch-based implementations (from scratch) of several distinct deep learning approaches [1-3] that aim to solve a popular problem in computer vision called style transfer. Put simply, the task in style transfer is to generate an image that preserves the content of image x (i.e. semantics, shapes, edges, etc.) while matching the style of image y (i.e. textures, patterns, color, etc.). One may ask: what is the correct balance between content and style? As it turns out, the answer is more subjective than typical optimization/ML problems - "beauty is in the eye's of the beholder", as they say.
An interactive notebook can be accessed here.
- PyTorch (>=1.12.1)
- Torchvision (>=0.13.1)
- Pillow (>=7.1.2)
- Matplotlib (>=3.2.2)
- Tqdm (>=4.64.0)
Clone the repo to install:
$ git clone https://github.com/kianzohoury/style_transfer.git
and install the dependencies with the PyPi package manager:
$ pip install style_transfer/.
Unlike the other methods that require training, Gatys et al. [1] proposed optimization directly on images themselves. In this manner, the pixels of an image are considered parameters, and "training" involves updating the pixel values rather than a neural network's parameters. As seen below, the stylized images are visually pleasing and preserve content quite well, but are not efficient to generate (~ 75 seconds for 150 L-BFGS iterations on an NVIDIA V100 GPU).
To run this method from the command line, cd into /style_transfer
and
execute the following:
python -m stylize gatys --content-src <content path> --style-src <style path>
Optionally, the same can be achieved by calling stylize.run_gatys_optimization()
:
from stylize import run_gatys_optimization
stylized_img = run_gatys_optimization(
content_src="examples/content/tuebingen_neckarfront.jpeg",
style_src="examples/style/van_gogh_starry_night.jpeg",
...
)
Some of the important options are described below:
--content-weight
(float): Content loss weight. Default: 1.0.
--style-weight
(float): Style loss weight. Default: 1e6.
--tv-weight
(float): Total variation regularization weight. Default: 1e-6.
--lbfgs-iters
(int): Max number of L-BFGS iterations per optimization step.
Default: 10.
--num-steps
Number of image optimizations, resulting in a maximum of
lbfgs_iters
* num_steps
total L-BFGS iterations. Default: 50.
--lr
(float): Learning rate for L-BFGS optimizer. Default: 1e-3.
--init-noise
(bool): Initializes generated image with noise. Default: False.
--save-fp
(str, optional): Path to save generated image, using a valid format
(e.g. jpg, tiff). Default: None.
--save-gif
If True, saves a .gif version of the image saved under save_fp
.
Default: False.
Refer to the method signature of stylize.run_gatys_optimization()
for the full list of options.
As an aside, feature inversion can be conducted when only the style is optimized for the perceptual loss objective. Starting with a noise image, the gram matrix (G=FF^T) w.r.t each style layer is the signal that guides the inversion process of the pretrained features. The resulting image resembles a texture. For more info, refer to Gatys et al.'s texture paper [add citation].
Johnson et al. [2] proposed a transformation network, which is significantly faster than Method I (~1000x) faster for 256 x 256 images. However, the major drawback with this method is that a transformation network must be trained separately for each style...
[1] L. A. Gatys, A. S. Ecker, and M. Bethge, "Image Style Transfer Using Convolutional Neural Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2414-2423. DOI
[2] J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution," in Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 694-711. DOI
[3] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2223-2232. DOI