# References | Text & Image Generation

---

## Text Generation

[Sunspring | A Sci-Fi Short Film Starring Thomas Middleditch](https://www.youtube.com/watch?v=LY7x2Ihqjmc)

#### Tutorials:

["Text generation with a miniature GPT"](https://keras.io/examples/generative/text_generation_with_miniature_gpt/): pretty much the same as here, with some interesting variations (the Transformer architecture is closer to what's used for ChatGPT).  
["Text generation with an RNN"](https://www.tensorflow.org/text/tutorials/text_generation) using an RNN to train an auto-regressive char-level language model (some nice tricks using `tf.data.Dataset`).

#### Reference

One of the most famous blog posts in deep learning, the inspiration for the above tutorial:  
[Andrej Karpathy, "The Unreasonable Effectiveness of Recurrent Neural Networks"](https://karpathy.github.io/2015/05/21/rnn-effectiveness/).   
[Holtzman et al, "The Curious Case of Neural Text Degeneration"](https://arxiv.org/abs/1904.09751)


### The rise of large language models (LLMs)

Truly remarkable results emerge with very large models. Several companies have all built such models to try and make a business out of it. They have APIs with a free tier that allow you to test these capabilities:

- [OpenAI's ChatGPT](https://openai.com/blog/chatgpt/)  
- [OpenAI's GPT-4](https://openai.com/api/)
- [Cohere](https://cohere.ai/)
- [Anthropic](https://www.anthropic.com)
- [GooseAI](https://goose.ai/) (open-source)
- [Huggingface Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) (open-source)

[MIT 6.S191: Deep Generative Modeling (2025)](https://www.youtube.com/watch?v=SdTZAMDKrNY)  
[MIT 6.S191: Deep Generative Modeling (2024)](https://www.youtube.com/watch?v=Dmm4UG-6jxA)  
[MIT 6.S191: Deep Generative Modeling (2023)](https://www.youtube.com/watch?v=3G5hWM6jqPk)  
[MIT 6.S191: Deep Generative Modeling (2022)](https://www.youtube.com/watch?v=QcLlc9lj2hk)  

[Stanford CS 231N, Lecture 13 | Generative Models](https://www.youtube.com/watch?v=5WoItGTWV54)

### Even more sampling

- [min p sampling](https://arxiv.org/abs/2407.01082) ([video](https://www.youtube.com/watch?v=LTf_SJOQH4s)): take the top probability, multiply it by a value (e.g. `0.2`, 20% of that), and use the result as a threshold (any token with less probability than that is discarded)
- [top a sampling](https://github.com/BlinkDL/RWKV-LM/tree/4cb363e5aa31978d801a47bc89d28e927ab6912e?tab=readme-ov-file#the-top-a-sampling-method): same idea as *min p*, except the threshold is computed using $\alpha * \text{top-prob}^\beta$, with $\text{top-prob}$ being the top probability among our tokens, $\alpha\ (= 0.2)$ and $\beta\ (=2)$ as hyperparameters
- [locally typical sampling](https://arxiv.org/abs/2202.00666) ([video](https://www.youtube.com/watch?v=_EDr3ryrT_Y&pp=ygUYdHlwaWNhbCBzYW1wbGluZyBraWxjaGVy) & [interview](https://www.youtube.com/watch?v=AvHLJqtmQkE)): sample only from tokens with an expected information content close to the conditional entropy of the model

### Positional Embeddings

[Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.](https://www.youtube.com/watch?v=1biZfFLPRSY)  
[How positional encoding works in transformers?](https://www.youtube.com/watch?v=T3OT8kqoqjc)

---

## Variational Autoencoders

### Latent space, KL divergence

[Variational Autoencoder (VAE) Latent Space Visualization](https://www.youtube.com/watch?v=sV2FOdGqlX0)  
[A Short Introduction to Entropy, Cross-Entropy and KL-Divergence](https://www.youtube.com/watch?v=ErfnhcEV1O8)  
[Intuitively Understanding the KL Divergence](https://www.youtube.com/watch?v=SxGYPqCgJWM)

### Talks & courses

[ICLR14: D Kingma: Auto-Encoding Variational Bayes](https://www.youtube.com/watch?v=rjZL7aguLAs)  
[Stanford CS236: Deep Generative Models I 2023 I Lecture 5 - VAEs](https://www.youtube.com/watch?v=MAGBUh77bNg)  
[Stanford CS236: Deep Generative Models I 2023 I Lecture 6 - VAEs](https://www.youtube.com/watch?v=8cO61e_8oPY)  
[L4 Latent Variable Models and Variational AutoEncoders -- CS294-158 SP24 Deep Unsupervised Learning](https://www.youtube.com/watch?v=NlIqjtbjjRE)

---

### References

[TensorFlow tutorial](https://www.tensorflow.org/tutorials/generative/cvae)

[Kingma and Welling, "Auto-Encoding Variational Bayes"](https://arxiv.org/abs/1312.6114)  
[Kingma and Welling, "An Introduction to Variational Autoencoders"](https://arxiv.org/abs/1906.02691)

[Arxiv insight, Variational Autoencoders](https://www.youtube.com/watch?v=9zKuYvjFFS8)

---

## Diffusion

Original tutorial: [Denoising Diffusion Probabilistic Model](https://keras.io/examples/generative/ddpm/)  

See also this, with an introduction to the FID score (to measure the quality of images):  [Denoising Diffusion Implicit Models](https://keras.io/examples/generative/ddim/)  

Two more in Keras:

[High-performance image generation using Stable Diffusion in KerasCV](https://keras.io/guides/keras_cv/generate_images_with_stable_diffusion/)  
[A walk through latent space with Stable Diffusion](https://keras.io/examples/generative/random_walks_with_stable_diffusion/)

The original paper: [Ho et al, "Denoising Diffusion Probabilistic Models"](https://arxiv.org/abs/2006.11239) (and for an extra-thick cream top-up, the [author's implementation](https://github.com/hojonathanho/diffusion)).

[How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile](https://www.youtube.com/watch?v=1CIpzeNxIhU)  
[Stable Diffusion in Code (AI Image Generation) - Computerphile](https://www.youtube.com/watch?v=-lz30by8-sU)  
[What are Diffusion Models?](https://www.youtube.com/watch?v=fbLgFrlTnGU)  
[DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)](https://www.youtube.com/watch?v=W-O7AZNzbzQ)  
[Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications](https://www.youtube.com/watch?v=cS6JQpEY9cs)  
[Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models](https://www.youtube.com/watch?v=T0Qxzf0eaio)

Good resources exist in PyTorch, such as the series of videos by [Fast.ai](https://www.fast.ai/) (in PyTorch):
- [Lesson 9: Deep Learning Foundations to Stable Diffusion, 2022](https://www.youtube.com/watch?v=_7rMfsA24Ls)
- [Lesson 9A 2022 - Stable Diffusion deep dive](https://www.youtube.com/watch?v=0_BBRNYInx8)
- [Lesson 9B - the math of diffusion](https://www.youtube.com/watch?v=mYpjmM7O-30)
- [Lesson 10: Deep Learning Foundations to Stable Diffusion, 2022](https://www.youtube.com/watch?v=6StU6UtZEbU)

As well as John O. Whittaker's [intro on Diffusion](https://www.youtube.com/watch?v=XTs7M6TSK9I) from his own course, [AIAIART](https://github.com/johnowhitaker/aiaiart).

And the in-depth code guide: the [Annotated Diffusion Model](https://huggingface.co/blog/annotated-diffusion).