> #### In practice, if we have a dataset of horse photos, we can train a generative model on this dataset to learn the complex relationships between the pixels in the images. Then, we can sample from this trained model to generate new, realistic images of horses that did not exist in the original dataset.
> > ![image.png](attachment:79602f25-2f20-452e-bd8f-a47e706a9afb.png)
> > - Generative modeling estimates $p(x)$:
>>> - generative modeling aims to model the probability of observing an observation $x$. Sampling from this distribution allows us to generate new observations.

> - To build a generative model, we need a dataset of many examples of the entity we want to generate, called the training data. Each data point, or observation, consists of many features, such as pixel values for images or words/letters for text.
> - The goal is to build a model that can generate new sets of features that look like they were created using the same rules as the original data, which is an incredibly difficult task for image generation due to the vast number of possible pixel value arrangements.
> - A generative model should be probabilistic, not deterministic, to produce various outputs rather than the same output repeatedly. The model should have a random component influencing the generated samples.
> - We aim to create a model that mimics the unknown probabilistic distribution that explains the likelihood of images in the training dataset and generate new, distinct observations that resemble the original training set.

# Generative Versus Discriminative Modeling
> Generative and discriminative models are two types of machine learning algorithms.
> > `Generative models` focus on modeling the underlying distribution of the data and can generate new data instances
> > `Discriminative models` learn the boundary between classes and are more robust to outliers.
> > > Generative models have `explanatory power` and are suited for unsupervised learning tasks, while discriminative models generally perform better for classification tasks.

>> ![image.png](attachment:1986cc89-23bb-4826-9c6e-37e6a4deabed.png)
>> - Discriminative modeling estimates p(y|x):
>>> - discriminative modeling aims to model the probability of a label $y$ given some observation $x$.

> **Note:** `Conditional Generative Models` can also build a generative model to model the conditional probability $p(y|x)$ (the probability of seeing an observation x with a specific label y.)
>> **Example:** if our dataset contains different types of fruit, we could tell our generative model to specifically generate an image of
an apple.

# Generative Modeling and AI
> Generative modeling goes beyond categorizing data and aims to capture a comprehensive understanding of the data distribution. Although challenging due to the vast possible outputs and limited dataset, generative models can employ techniques like deep learning to achieve this. >> - Additionally, generative modeling is used in other areas of AI, such as `reinforcement learning`. Instead of solely focusing on optimizing a policy for a specific task, agents can learn a world model of the environment using generative models. This allows them to adapt quickly to new tasks without the need for retraining.

> To achieve human-like intelligence, generative modeling is a crucial component. Humans are excellent generative models, capable of imagining different perspectives, envisioning various outcomes, and planning for the future.
> > - Neuroscientific theories propose that our perception of reality is a generative model that accurately simulates our surroundings. Understanding how to replicate this ability in machines is vital for advancements in brain research and general artificial intelligence.

# The Generative Modeling Framework
> - We have a dataset of observations.
> - We assume that the observations have been generated according to some unknown distribution, $p_{data}$.
> - We want to build a generative model $p_{model}$ that mimics $p_{data}$. If we achieve this goal, we can sample from $p_{model}$ to generate observations that appear to have been drawn from $p_{data}$.
> - Therefore, the desirable properties of $p_{model}$ are:
> > 1. `Accuracy:` If $p_{model}$ is high for a generated observation, it should look like it has been drawn from $p_{data}$. If $p_{model}$ is low for a generated observation, it should not look like it has been drawn from $p_{data}$.
> > 2. `Generation:`  It should be possible to easily sample a new observation from $p_{model}$.
> > 3. `Representation:` It should be possible to understand how different high-level features in the
data are represented by $p_{model}$.


# Representation Learning
> `Representation learning is a key concept`. To illustrate this, imagine describing your appearance to someone who has never seen you. Instead of listing the color of every pixel in a photo, you would give them a general idea of what an average person looks like and then add specific features like hair color or glasses. With just a few statements, they could create a rough image of you in their mind. This is similar to how representation learning works.
> > Rather than `directly modeling the high-dimensional data`, we `map each observation to a lower-dimensional space` and `learn a function to map it back to the original domain`. Each point in this latent space represents a high-dimensional observation.

> The idea of encoding the training data into a latent space, allowing for sampling and decoding, is present in many generative modeling techniques. This process simplifies the complex manifold of the data, such as pixel space, into a more easily sampled latent space. This facilitates the generation of well-formed images.
> > ![image.png](attachment:d46d7d88-999d-4893-b7aa-ebff37770e29.png)

# Core Probability Theory
> Generative modeling is closely linked to statistical modeling of probability distributions. You don't need a deep understanding of statistical theory to build deep learning models, but it's worth building a solid understanding of basic probabilistic theory to fully appreciate the task at hand.
> > To start, we will define `five key terms`, relating them to our earlier example of a generative model for the world map in two dimensions.
> > > 1. The `sample space`, which is the complete set of possible values for an observation.
> > > 2. A `probability density function` (or simply density function) is a function $p(x)$ that maps a point $x$ in the sample space to a number between 0 and 1. The integral of the density function over all points in the sample space must equal 1, so that it is a well-defined probability distribution.
> > > 3. `Parametric modeling` is a technique that we can use to structure our approach to finding a suitable $p_{model}(x)$ . A parametric model is a family of density functions $p_θ(x)$ that can be described using a finite number of parameters, θ
> > > 4. The `likelihood` $ℒ(θ|x)$ of a parameter set $θ$ is a function that measures the plausibility of $θ$, given some observed point $x$. It is defined as follows:
>>>> $ℒ(θ|x)=p_{\theta}(x)$ <br>
>>>> That is, the likelihood of $θ$ given some observed point $x$ is defined to be the value of the density function parameterized by $θ$, at the point $x$. <br>
> > >> - If we have a whole dataset $X$ of independent observations, then we can write:
> > >> > $ ℒ(θ|X) = \prod_{x \in X}p_{\theta}(x)$
> > >> - Since the product of a large number of terms between 0 and 1 can be quite computationally difficult to work with, we often use the `log-likelihood` ℓ instead:
> > >> > $ ℓ(θ|X) = \sum_{x \in X}\space log\space p_{\theta}(x)$
> > > 5. `Maximum likelihood estimation` is the technique that allows us to estimate $\hat{θ}$ (the set of parameters θ of a density function $p_θ(x)$ that is most likely to explain some observed data $X$.
> > >> More formally: $ \hat{θ} = arg_x maxℓ(θ|X)$
> > >> > $\hat{θ}$ is also called the `maximum likelihood estimate (MLE)`.
> > >> Neural networks typically minimize a loss function, so we can equivalently talk about finding the set of parameters that minimize the `negative log-likelihood`: $ \hat{θ} = arg_θ\space\space min(-ℓ(θ|X))= arg_θ\space\space min(-log\space p_{\theta}(X))$

# Generative Model Taxonomy
> While all types of generative models ultimately aim to solve the same task, they all take slightly different approaches to modeling the density function $p_θ(x)$.
> > Broadly speaking, there are three possible approaches:
> > > 1. Explicitly model the density function, but constrain the model in some way, so that the density function is tractable (it can be calculated).
> > > 2. Explicitly model a tractable approximation of the density function
> > > 3. Implicitly model the density function, through a stochastic process that directly generates data.

>> Six families of generative models are discussed, forming a taxonomy:
>> > ![image.png](attachment:173c1f35-e380-4097-a0da-f1a6008228a6.png)

> Generative models can be split into two categories:
> > 1. Those with explicit `probability density function (pdf)` modeling
> > 2. Those with implicit modeling.
> > > - Implicit density models focus on generating data directly, without estimating `pdf`. The most well-known example is generative adversarial networks.
> > > - Explicit density models can be further divided into `tractable` models and `approximate` models.
> > > > - `Tractable models` impose constraints on the model architecture, making pdf calculation easier. Autoregressive models generate output sequentially, while normalizing flow models use invertible functions to generate complex distributions.
> > > > - `Approximate density models` include variational autoencoders, which optimize an approximation of the joint density function, and energy-based models that utilize Markov chain sampling. `Diffusion models` train a model to gradually denoise corrupted images to `approximate the density function`.

# END of Chapter 1