Skip to content

Commit 9a7023a

Browse files
committed
text conditional generative modelling
1 parent cd83949 commit 9a7023a

File tree

1 file changed

+111
-3
lines changed

1 file changed

+111
-3
lines changed

src/content/lessons/introduction.mdx

Lines changed: 111 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ import T from '../../components/TypstMath.astro'
1313
This is an introduction to the main settings encountered in generative modelling. The first Lectures will introduce the main algorithms and concept for the vanilla unconditional generative modelling task.
1414
At the end of the course, we will make excursions to class-conditional and text-conditional generative modelling.
1515

16+
The goal of this Lecture is to understand the relationship between vanilla unconditional generative modelling and industrial generative models such as DALL-E, Stable Diffusion, GPT, etc.
17+
1618

1719
#### Unconditional Generative Modelling
1820

@@ -91,6 +93,8 @@ export const catGallery = [
9193
</figcaption>
9294
</figure>
9395

96+
97+
9498
**Assumption** The core underlying assumption of generative modelling is that the data $x_1, \dots, x_n$, is drawn from some *unknown* underlying distribution $p_{data}$: for all $i \in 1, \dots, n$
9599

96100
<T block v='x_i ~ underbrace(p_"data", "unknown") .' />
@@ -179,18 +183,122 @@ export const catDogGallery = [
179183
</figure>
180184

181185

182-
**Assumption** The core underlying assumption of generative modelling is that the data $x_1, \dots, x_n$, is drawn from some *unknown* underlying distribution $p_{data}( \cdot | y_i)$: for all $i \in 1, \dots, n$
186+
**Assumption** For *class-conditional* generative models, the assumption is that the data $x_1, \dots, x_n$, is drawn from some *unknown* underlying conditional probability distribution**s** $p_{data}( \cdot | y = y_i)$: for all $i \in 1, \dots, n$
183187

184-
<T block v='x_i ~ underbrace(p_"data" (dot | y_i), "unknown") .' />
188+
<T block v='x_i ~ underbrace(p_"data" (dot | y = y_i), "unknown"), y_i in {"cat", "dog"}.' />
185189

186-
**Goal** Using the empirical data distribution $x_i \sim p_{data}(\cdot | y_i)$, the goal is to *generate* new samples $x^{\text{new}}$ that look like they were drawn from the same *unknown* distribution $p_{data}$
190+
**Goal** Using the empirical data distributions $(x_1, y_1), \dots, (x_n, y_n) $, the goal is to *generate* new samples $x^{\text{new}}$ that look like they were drawn from the same *unknown* distributions $p_{data}(\cdot | y)$. More precisely, we want to be able to generate new images of cats $x^{\text{new cat}}$ and dogs $x^{\text{new dog}}$ that follow the conditional probability distributions
187191

188192

189193
<T block v='x^"new cat" ~ p_"data" (dot | y="cat") ,' />
190194
<T block v='x^"new dog" ~ p_"data" (dot | y="dog") .' />
191195

196+
**Remark i)**
197+
To train class-conditional generative models, we could split the dataset into two parts, one with all the cat images and one with all the dog images, and train two separate unconditional generative models. However, this would not leverage similarities between the two classes: both cats and dogs have four legs, a tail, fur, etc. Class-conditional generative models can share information across classes.
198+
199+
**Remark ii)**
200+
*Generative modelling is a very different task than standard supervised learning*. The usual classification task is the following, given an empirical labelled data distribution $(x_1, y_1), \dots, (x_n, y_n)$, the goal is to estimate the probability a given new image $x$ is a cat or a dog, i.e. we want to estimate $p_{data}(y = cat | x)$.
201+
On the opposite, in class-conditional generative modelling, we are given a class (e.g. cat), and we want to estimate the probability distribution of images of cats $p_{data}(x | y = cat)$, and sample new images from this distribution.
202+
192203
#### Text-Conditional Generative Modelling
193204

205+
**What** In *text-conditional* generative modelling, we are given a set of data (e.g. images) and their text description
206+
207+
<T block v='text("Data: ") underbrace({(x_1, y_1),dots, (x_n, y_n)}, n "images" x_i "and their text description " y_i) .' />
208+
209+
210+
export const catDogTextGallery = [
211+
{
212+
url: "https://images.pexels.com/photos/57416/cat-sweet-kitty-animals-57416.jpeg?auto=compress&cs=tinysrgb&w=800",
213+
caption: "x1, y1='A cat licking his hand'",
214+
alt: "Cute cat 1"
215+
},
216+
{
217+
url: "https://images.pexels.com/photos/20787/pexels-photo.jpg?auto=compress&cs=tinysrgb&w=800",
218+
caption: "x2, y2='A cat starring into the camera'",
219+
alt: "Cute cat 2"
220+
},
221+
{
222+
url: "https://images.pexels.com/photos/1183434/pexels-photo-1183434.jpeg?auto=compress&cs=tinysrgb&w=800",
223+
caption: "x3, y3='A cat yawning'",
224+
alt: "Cute cat 3"
225+
},
226+
{
227+
url: "https://images.pexels.com/photos/58997/pexels-photo-58997.jpeg?auto=compress&cs=tinysrgb&w=800",
228+
caption: "x4, y4='A dog running'",
229+
alt: "Cute cat 4"
230+
},
231+
{
232+
url: "https://images.pexels.com/photos/731022/pexels-photo-731022.jpeg?auto=compress&cs=tinysrgb&w=800",
233+
caption: "x5, y5='A dog sleeping'",
234+
alt: "Cute cat 5"
235+
},
236+
{
237+
url: "https://images.pexels.com/photos/551628/pexels-photo-551628.jpeg?auto=compress&cs=tinysrgb&w=800",
238+
caption: "x6, y6='A dog starring into the camera.'",
239+
alt: "Cute cat 5"
240+
}
241+
]
242+
243+
<figure>
244+
<div style={{
245+
display: "grid",
246+
gridTemplateColumns: "repeat(auto-fit, minmax(200px, 1fr))",
247+
gap: "1rem",
248+
alignItems: "start"
249+
}}>
250+
{catDogTextGallery.map((cat, idx) => (
251+
<figure key={idx} style={{ margin: 0 }}> {/* remove default margin */}
252+
<img
253+
src={cat.url}
254+
alt={cat.alt}
255+
style={{ width: "100%", height: "auto", objectFit: "cover", display: "block" }}
256+
/>
257+
<figcaption style={{
258+
textAlign: "center",
259+
fontSize: "0.85rem",
260+
color: "#6b7280",
261+
margin: 0, // remove figcaption margin
262+
marginTop: "0.25rem" // optional small spacing
263+
}}>
264+
{cat.caption}
265+
</figcaption>
266+
</figure>
267+
))}
268+
</div>
269+
270+
<figcaption style={{
271+
textAlign: "center",
272+
marginTop: "1rem",
273+
fontStyle: "italic",
274+
color: "#6b7280"
275+
}}>
276+
A dataset of cat and dog photos, and their text description (source, Pexels.com).
277+
</figcaption>
278+
</figure>
279+
280+
281+
For instance, [Stable Diffusion](https://stabledifffusion.com/) was trained on the [LAION-5B dataset](https://laion.ai/blog/laion-5b/), a dataset of 5 billion images and their textual description.
282+
283+
**Assumption** For *text-conditional* generative models, the assumption is that the data $x_1, \dots, x_n$, is drawn from some *unknown* underlying conditional probability distribution**s** $p_{data}( \cdot | y = y_i)$: for all $i \in 1, \dots, n$
284+
285+
<T block v='x_i ~ underbrace(p_"data" (dot | y = y_i), "unknown"), y_i "is a text description".' />
286+
287+
The main difference with class-conditional is that the conditioning variable $y_i$ is now a text description, not a fixed number of classes.
288+
289+
290+
**Goal** Using the data and their text description $(x_1, y_1), \dots, (x_n, y_n) $, the goal is to *generate* new samples $x^{\text{new}}$, given a text description.
291+
More precisely, given a text description $y^{new}$ we want to be able to generate new images $x^{\text{new}}$ that follow the conditional probability distributions
292+
293+
<T block v='x^"new" ~ p_"data" (dot | y=y^"new") ,' />
294+
295+
296+
**Remark** Text-conditional generative modelling is very challenging regarding multiple aspects:
297+
- one usually observes only one sample $x_i$ per textual description $y_i$, i.e., one has to leverage similarities between text descriptions to learn the conditional distributions $p_{data}(\cdot | y)$.
298+
- one has to handle *new text descriptions* $y^{new}$ that were not seen during training.
299+
- text descriptions complex objects, that are not easy to handle (multiple sequence size). Handling text requires a lot of engineering and is out of the scope of this Lecture (tokenization, embeddings, transformers, etc.).
300+
301+
194302
### Unconditional Generative Modelling
195303

196304
#### 1 and 2-Dimensional Examples

0 commit comments

Comments
 (0)