# **Introduction to Latent Variable Models**


Before getting started on Latent Variable Models, we need to first undestand about Latent Variables

## Learning Objectives

By the end of this lesson, the students will be able to:

- Have a basic idea about Latent Variable moodel
- Know about different Latent Variable Models

## **Where Does the Autoregressive Model Fail?**

1. **Incorrect Dependency Assumptions:**
   If the assumed order of dependencies doesn't match the true data structure, the model may miss important relationships and produce poor predictions.

2. **Slow Inference Time:**
   Autoregressive models generate outputs sequentially, making inference slower compared to parallelizable models like latent variable models or transformers with non-autoregressive decoding.

3. **Error Accumulation:**
   Since each prediction depends on previous outputs, small errors can compound over time, degrading output qualityâ€”especially in long sequences (e.g., text generation or time series).

4. **Limited Global Context:**
   The model often focuses heavily on recent history and may struggle to capture long-range dependencies unless enhanced (e.g., with attention mechanisms).

# **Latent Variable**

A **latent variable** is a hidden or unobservable factor inferred from observed data. It captures the underlying structure of data by analyzing patterns and relationships among visible variables.

For example, consider the concept of intelligence. Intelligence cannot be directly observed or measured, but it can be inferred through measurable indicators such as performance on tests, problem-solving skills, or decision-making abilities. Intelligence, in this case, is a latent variable.

Latent variables are key to **representation learning**, enabling models to extract meaningful patterns that aren't directly measurable. They are widely used in:

* **Variational Autoencoders (VAEs)**
* **Gaussian Mixture Models (GMMs)**
* **Topic Modeling**
* **Reinforcement Learning**







# **Latent Variables vs Observable Variables**

<center>

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSX963YvC_jQSf5JmjpbVin9x-731iIQfd-DA&s" width=30%>
</center>

| **Aspect**             | **Observable Variables**          | **Latent Variables**             |
|------------------------|-----------------------------------|----------------------------------|
| **Definition**         | Directly measured or observed.    | Not directly measured; inferred from observable variables. |
| **Role in Modeling**   | Input or output in statistical models. | Used to explain patterns in the data that cannot be directly measured. |
| **Properties**         | Manifest in the real world.       | Considered as hidden or unobserved. |
| **Inference**          | Directly observed.                | Inferred from observable variables. |
| **Complexity**         | Captures directly measurable aspects. | Captures complex relationships and underlying structures. |
| **Statistical Models** | Used in various statistical models. | Crucial in latent variable models (e.g., factor analysis, latent class models). |
| **Example**            | Test scores, height, weight.      | Intelligence, personality traits. |



# **Latent Variable Models**

<center>

<img src="https://i.postimg.cc/gJCvq6nS/Latent-Variable-Model.png" width=40%>
</center>

Latent variable models aim to model the probability distribution with latent variables. The key idea is:

> **Thereâ€™s a hidden structure ($z$)** behind the data ($x$), and if we can understand that structure, we can **generate** or **analyze** the data better.

In a stricter mathematical form, data points $x$ that follow a probability distribution $p(x)$, are mapped into latent variables $z$ that follow a distribution $p(z)$.

Given that idea, we can now define five basic terms:

### **1. Prior Distribution: $p(z)$**

This defines our beliefs about the **latent variables** **before** seeing any data.

> *If weâ€™re modeling handwritten digits, we might assume each digit from 0 to 9 is equally likely â€” a **uniform prior**.*

### **2. Likelihood: $p(x|z)$**

This explains **how data $x$** is generated **from** latent variables $z$.

> *Given that the digit is a "7" (i.e., $z = 7$), the likelihood models what the image of a "7" typically looks like â€” its shape, stroke thickness, etc.*

This is the **generative process**:

$$
\text{latent } z \rightarrow \text{observed } x
$$


### **3. Joint Distribution: $p(x, z)$**

This combines the prior and the likelihood:

$$
p(x, z) = p(x|z) \cdot p(z)
$$

It tells us the probability of seeing both a specific data point $x$ and a latent cause $z$ together.
This forms the full description of the model.


### **4. Marginal Distribution: $p(x)$**

This gives us the probability of the observed data **regardless of the latent variable**. We obtain it by integrating out $z$:

$$
p(x) = \int p(x, z) \, dz = \int p(x|z)p(z) \, dz
$$

This is useful for evaluating how well the model **explains the data**.


### **5. Posterior Distribution: $p(z|x)$**

This represents **inference** â€” estimating the latent variable $z$ given the observed data $x$:

$$
p(z|x) = \frac{p(x|z) \cdot p(z)}{p(x)}
$$

> *Given I saw data $x$, whatâ€™s the probability that it came from latent $z$?*

Since $p(x)$ involves an integral, calculating this posterior is often intractable â€” which is why we use **approximation methods** like Variational Inference  or Monte Carlo sampling.


### ðŸŽ¯ Two Core Processes

| Process        | Direction         | Mathematical Form | Description                           |
| -------------- | ----------------- | ----------------- | ------------------------------------- |
| **Generation** | $z \rightarrow x$ | $p(x \mid z)$     | From latent space to data (sampling)  |
| **Inference**  | $x \rightarrow z$ | $p(z \mid x)$     | From data to latent space (reasoning) |

It is evident that inference is the inverse of generation and vice versa.




# **Types of Latent Variable Models**
Different types of the latent variable models can be grouped according to whether the observable and latent variables are categorical or continuous


### 1. Factor Analysis:
- Factor Analysis is used when there are continuous observed variables, and the goal is to identify underlying factors that explain correlations among these variables. It assumes that the observed variables are influenced by a set of latent factors.

- In psychology, factor analysis might be applied to understand the underlying factors influencing observed behaviors, such as intelligence tests where various test scores could be influenced by a latent factor like "cognitive ability."

### 2. Item Response Theory (IRT):
- IRT is commonly used in educational testing to model the relationship between individuals' abilities and their responses to test items. It assumes that the probability of a correct response depends on both the individual's ability and the difficulty of the item.

- In a standardized test, IRT could be used to model how the difficulty of each question influences the probability of a correct response, providing insights into an individual's ability.

### 3. Latent Profile Analysis:
- Latent Profile Analysis is used when dealing with categorical observed variables, seeking to identify subgroups (profiles) within the population based on patterns of responses. It assumes that different latent classes underlie the observed categorical variables.

- In marketing, latent profile analysis might be applied to understand different customer segments based on their preferences and behaviors, helping businesses tailor their marketing strategies.

### 4. Latent Class Analysis:

- Latent Class Analysis is commonly used in clustering or categorizing individuals based on shared characteristics or behaviors, especially when dealing with categorical data. It assumes that there are distinct latent classes underlying the observed categorical variables.

- In healthcare, latent class analysis could be used to identify different health-related behavior patterns among individuals, such as smoking habits, diet choices, and exercise routines.


### Summary

| **Observed Variable** | **Latent Variable** | **Model Type**             | **Use Case Example**       |
| --------------------- | ------------------- | -------------------------- | -------------------------- |
| Continuous            | Continuous          | Factor Analysis            | Psychological testing      |
| Categorical           | Continuous          | Item Response Theory (IRT) | Educational testing        |
| Continuous            | Categorical         | Latent Profile Analysis    | Customer segmentation      |
| Categorical           | Categorical         | Latent Class Analysis      | Health behavior clustering |




## Different examples of Latent Variable Models in AI

### 1. **Autoencoders:**
   - **Application:** Anomaly Detection, Data Denoising.
   - Autoencoders use neural networks to map input data to a latent space and then reconstruct the input from this space. They are applied in anomaly detection and data denoising tasks.

### 2. **Variational Autoencoders (VAEs):**
   - **Application:** Image Generation, Data Compression.
   - VAEs use a probabilistic approach to learn a latent representation of data. They are trained to generate data similar to the training set and are used in tasks such as image generation and data compression.

### 3. **Generative Adversarial Networks (GANs):**
   - **Application:** Image Generation, Style Transfer.
   -  GANs consist of a generator and a discriminator that are trained simultaneously through adversarial training. The latent space captures variations in data, making GANs powerful for generating realistic images and performing style transfer.

### 4. **Hidden Markov Models (HMMs):**
   - **Application:** Speech Recognition, Natural Language Processing.
   - HMMs model sequences of observations, making them useful in speech recognition and natural language processing tasks. The latent variables represent hidden states that govern the observed sequence.

### 5. **Latent Dirichlet Allocation (LDA):**
   - **Application:** Topic Modeling, Text Analysis.
   -  LDA is used for discovering topics within a collection of documents. Latent variables in LDA represent topics, and the model assigns probabilities of topics to each document and words to each topic.

### 6. **Factor Analysis:**
   - **Application:** Psychometrics, Finance.
   - Factor Analysis is applied in psychometrics to identify underlying factors influencing observed variables. It is also used in finance for modeling common risk factors.

### 7. **Latent Semantic Analysis (LSA):**
   - **Application:** Information Retrieval, Text Mining.
   - LSA is used to discover the latent structure in large datasets of text. It reduces the dimensions of the term-document matrix and uncovers relationships between terms.

### 8. **Latent Class Models:**
   - **Application:** Marketing Segmentation, Social Sciences.
   - Latent Class Models are used for categorizing individuals into latent classes based on observed variables. They find applications in marketing, social sciences, and healthcare.


# **Reference**

The AI Summer. Latent Variable Models: A Comprehensive Guide. The AI Summer. https://theaisummer.com/latent-variable-models/

