# References | 12. Generative Deep Learning

In [None]:
from IPython.display import YouTubeVideo

---

## Text Generation

In [None]:
YouTubeVideo('LY7x2Ihqjmc', width=853, height=480) #  Sunspring | A Sci-Fi Short Film Starring Thomas Middleditch

#### Tutorials:

["Text generation with a miniature GPT"](https://keras.io/examples/generative/text_generation_with_miniature_gpt/): pretty much the same as here, with some interesting variations (the Transformer architecture is closer to what's used for ChatGPT).  
["Text generation with an RNN"](https://www.tensorflow.org/text/tutorials/text_generation) using an RNN to train an auto-regressive char-level language model (some nice tricks using `tf.data.Dataset`).

#### Reference

One of the most famous blog posts in deep learning, the inspiration for the above tutorial: [Andrej Karpathy, "The Unreasonable Effectiveness of Recurrent Neural Networks"](https://karpathy.github.io/2015/05/21/rnn-effectiveness/).   
[Holtzman et al, "The Curious Case of Neural Text Degeneration"](https://arxiv.org/abs/1904.09751)


### The rise of large language models (LLMs)

Truly remarkable results emerge with very large models. Several companies have all built such models to try and make a business out of it. They have APIs with a free tier that allow you to test these capabilities:

- [OpenAI's ChatGPT](https://openai.com/blog/chatgpt/)  
- [OpenAI's GPT-4](https://openai.com/api/)
- [Cohere](https://cohere.ai/)
- [Anthropic](https://www.anthropic.com)
- [GooseAI](https://goose.ai/) (open-source)
- [Huggingface Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) (open-source)

In [None]:
YouTubeVideo('Dmm4UG-6jxA', width=853, height=480) # MIT 6.S191: Deep Generative Modeling
                                                   # 3G5hWM6jqPk for the 2023
                                                   # QcLlc9lj2hk for the 2022 edition

In [None]:
YouTubeVideo('5WoItGTWV54', width=853, height=480) # Stanford CS 231N, Lecture 13 | Generative Models

### Even more sampling

- [min p sampling](https://arxiv.org/abs/2407.01082) ([video](https://www.youtube.com/watch?v=LTf_SJOQH4s)): take the top probability, multiply it by a value (e.g. `0.2`, 20% of that), and use the result as a threshold (any token with less probability than that is discarded)
- [top a sampling](https://github.com/BlinkDL/RWKV-LM/tree/4cb363e5aa31978d801a47bc89d28e927ab6912e?tab=readme-ov-file#the-top-a-sampling-method): same idea as *min p*, except the threshold is computed using $\alpha * \text{top-prob}^\beta$, with $\text{top-prob}$ being the top probability among our tokens, $\alpha\ (= 0.2)$ and $\beta\ (=2)$ as hyperparameters
- [locally typical sampling](https://arxiv.org/abs/2202.00666) ([video](https://www.youtube.com/watch?v=_EDr3ryrT_Y&pp=ygUYdHlwaWNhbCBzYW1wbGluZyBraWxjaGVy) & [interview](https://www.youtube.com/watch?v=AvHLJqtmQkE)): sample only from tokens with an expected information content close to the conditional entropy of the model

---

## Style Transfer

### Tutorials

Have a look at the [official TensorFlow tutorial](https://www.tensorflow.org/tutorials/generative/style_transfer), it contains useful additional information!

There are various differences in the implementation that make the two not a direct comparison (for instance looking at the numbers used for the various weights).

However, understanding those differences and being able to integrate the two approaches into one is a *very* good exercise!

Also, have a look at Andrew Ng's videos below!

### References

[Gatis et al., "A Neural Algorithm of Artistic Style"](https://arxiv.org/abs/1508.06576)

Andrew Ng's videos on Neural Style Transfer (as part of [this playlist](https://www.youtube.com/watch?v=R39tWYYKNcI&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF&index=37)):

In [None]:
YouTubeVideo('R39tWYYKNcI', width=853, height=480) # Andrew Ng, C4W4L06 What is neural style transfer?

In [None]:
YouTubeVideo('ChoV5h7tw5A', width=853, height=480) # Andrew Ng, C4W4L07 What are deep CNs learning?

In [None]:
YouTubeVideo('xY-DMAJpIP4', width=853, height=480) # Andrew Ng, C4W4L08 Cost Function

In [None]:
YouTubeVideo('b1I5X3UfEYI', width=853, height=480) # Andrew Ng, C4W4L09 Content Cost Function

In [None]:
YouTubeVideo('QgkLfjfGul8', width=853, height=480) # Andrew Ng, C4W4L10 Style Cost Function

### John O Whittaker, Gram Matrix

In [None]:
YouTubeVideo("PdNHkTLU2oQ", width=853, height=480, start=3366) # style transfer starts at 1239

---

## Deep Dream

An online tool for DeepDream: [deepdreamgenerator.com](https://deepdreamgenerator.com/).

In [None]:
YouTubeVideo('BsSmBPmPeYQ', width=853, height=480) # Deep Dream (Google) - Computerphile

---

## Variational Autoencoders

### Latent space, KL divergence

In [None]:
YouTubeVideo("sV2FOdGqlX0", width=853, height=480) #  Variational Autoencoder (VAE) Latent Space Visualization 

In [None]:
YouTubeVideo('ErfnhcEV1O8', width=853, height=480) # A Short Introduction to Entropy, Cross-Entropy and KL-Divergence

In [None]:
YouTubeVideo('SxGYPqCgJWM', width=853, height=480) #  Intuitively Understanding the KL Divergence

### Talks & courses

In [None]:
YouTubeVideo('rjZL7aguLAs', width=853, height=480) # ICLR14: D Kingma: Auto-Encoding Variational Bayes 

In [None]:
YouTubeVideo('MAGBUh77bNg', width=853, height=480) # Stanford CS236: Deep Generative Models I 2023 I Lecture 5 - VAEs 

In [None]:
YouTubeVideo('8cO61e_8oPY', width=853, height=480) # Stanford CS236: Deep Generative Models I 2023 I Lecture 6 - VAEs 

In [None]:
YouTubeVideo('NlIqjtbjjRE', width=853, height=480) # L4 Latent Variable Models and Variational AutoEncoders -- CS294-158 SP24 Deep Unsupervised Learning 

---

### References

[TensorFlow tutorial](https://www.tensorflow.org/tutorials/generative/cvae)

[Kingma and Welling, "Auto-Encoding Variational Bayes"](https://arxiv.org/abs/1312.6114)  
[Kingma and Welling, "An Introduction to Variational Autoencoders"](https://arxiv.org/abs/1906.02691)

In [None]:
YouTubeVideo('9zKuYvjFFS8', width=853, height=480) # Arxiv insight, Variational Autoencoders

---

## GANs

### Tutorials

- [Soumith Chintala, "How to Train a GAN? Tips and tricks to make GANs work"](https://github.com/soumith/ganhacks)
- [TensorFlow DCGAN tutorial](https://www.tensorflow.org/tutorials/generative/dcgan)
- The TensorFlow website also has [one tutorial on CycleGAN](https://www.tensorflow.org/tutorials/generative/cyclegan) and one on [Pix2Pix](https://www.tensorflow.org/tutorials/generative/pix2pix), two GAN variants.

### Zoos: list of all GAN variants

When it comes to GANs, the explosion has been so enormous it is rather difficult (impossible?) to keep up:

- [Avinash Hindupur, "The GAN Zoo"](https://github.com/hindupuravinash/the-gan-zoo)
- [Jihye Back, "GAN-Zoos"](https://happy-jihye.github.io/gan/)

### References

[Goodfellow et al. "Generative Adversarial Networks"](https://arxiv.org/abs/1406.2661)

[Radford et al, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"](https://arxiv.org/abs/1511.06434)

If you want to know more about where this idea of [minimax](https://en.wikipedia.org/wiki/Minimax) comes from, I can recommend the [Yale Game Theory lecture series](https://www.youtube.com/watch?v=nM3rTU927io&list=PL6EF60E1027E1A10B).

See also [this page](https://cs.stanford.edu/people/eroberts/courses/soco/projects/1998-99/game-theory/Minimax.html).

In [None]:
YouTubeVideo('ilkSwsggSNM', width=853, height=480) # Sebastian Rashka, transposed convolutions

In [None]:
YouTubeVideo('myGAju4L7O8', width=853, height=480) #  NIPS 2016 Workshop on Adversarial Training - Soumith Chintala - How to train a GAN

In [None]:
YouTubeVideo('ANszao6YQuM', width=853, height=480) # Stanford CS230: Deep Learning | Autumn 2018 | Lecture 4 - Adversarial Attacks / GANs

In [None]:
YouTubeVideo('9JpdAg6uMXs', width=853, height=480) # Introduction to GANs, NIPS 2016 | Ian Goodfellow, OpenAI

In [None]:
YouTubeVideo('HGYYEUSm-0Q', width=853, height=480) # Ian Goodfellow: Generative Adversarial Networks (NIPS 2016 tutorial)

In [None]:
YouTubeVideo('eyxmSmjmNS0', width=853, height=480) # [Classic] Generative Adversarial Networks (Paper Explained)

### Notable experiments

In [None]:
YouTubeVideo('9QuDh3W3lOY', width=853, height=480) # Synthesizing High-Resolution Images with StyleGAN2

In [None]:
YouTubeVideo('9reHvktowLY', width=853, height=480) # CycleGAN horse zebra 0'7

In [None]:
YouTubeVideo('A6bo_mIOto0', width=853, height=480) # Mario Klingemann: StyleGAN2 - mapping music to facial expressions in real time

---

## Positional Embeddings

Strike back!

In [None]:
YouTubeVideo('1biZfFLPRSY', width=853, height=480) #  Positional embeddings in transformers EXPLAINED | Demystifying positional encodings. 

In [None]:
YouTubeVideo('T3OT8kqoqjc', width=853, height=480) # How positional encoding works in transformers? 

---

## Diffusion

Original tutorial: [Denoising Diffusion Probabilistic Model](https://keras.io/examples/generative/ddpm/)  

See also this, with an introduction to the FID score (to measure the quality of images):  [Denoising Diffusion Implicit Models](https://keras.io/examples/generative/ddim/)  

Two more in Keras:

[High-performance image generation using Stable Diffusion in KerasCV](https://keras.io/guides/keras_cv/generate_images_with_stable_diffusion/)  
[A walk through latent space with Stable Diffusion](https://keras.io/examples/generative/random_walks_with_stable_diffusion/)

The original paper: [Ho et al, "Denoising Diffusion Probabilistic Models"](https://arxiv.org/abs/2006.11239) (and for an extra-thick cream top-up, the [author's implementation](https://github.com/hojonathanho/diffusion)).

In [None]:
YouTubeVideo('1CIpzeNxIhU', width=853, height=480) # How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

In [None]:
YouTubeVideo('-lz30by8-sU', width=853, height=480) # Stable Diffusion in Code (AI Image Generation) - Computerphile

In [None]:
YouTubeVideo('fbLgFrlTnGU', width=853, height=480) # What are Diffusion Models?

In [None]:
YouTubeVideo('W-O7AZNzbzQ', width=853, height=480) # DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)

In [None]:
YouTubeVideo('cS6JQpEY9cs', width=853, height=480) # Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications 

In [None]:
YouTubeVideo('T0Qxzf0eaio', width=853, height=480) # Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models 

Good resources exist in PyTorch, such as the series of videos by [Fast.ai](https://www.fast.ai/) (in PyTorch):
- [Lesson 9: Deep Learning Foundations to Stable Diffusion, 2022](https://www.youtube.com/watch?v=_7rMfsA24Ls)
- [Lesson 9A 2022 - Stable Diffusion deep dive](https://www.youtube.com/watch?v=0_BBRNYInx8)
- [Lesson 9B - the math of diffusion](https://www.youtube.com/watch?v=mYpjmM7O-30)
- [Lesson 10: Deep Learning Foundations to Stable Diffusion, 2022](https://www.youtube.com/watch?v=6StU6UtZEbU)

As well as John O. Whittaker's [intro on Diffusion](https://www.youtube.com/watch?v=XTs7M6TSK9I) from his own course, [AIAIART](https://github.com/johnowhitaker/aiaiart).

And the in-depth code guide: the [Annotated Diffusion Model](https://huggingface.co/blog/annotated-diffusion).