### LECTURE 2

### **Probability Density Function (PDF), Cumulative Distribution Function (CDF), and Quantile**

- **PDF (Probability Density Function)**: Describes the probability for a continuous variable to take a specific value. The area under the PDF over an interval gives the probability of the variable falling within that interval.
- **CDF (Cumulative Distribution Function)**: It is obtained by integrating the PDF from - infinity up to a certain values X. Gives the probability that a random variable is less than or equal to a certain value. 
- **Quantile**: it's the inverse of the CDF. The value below which a certain percentage of observations fall. For example, the 0.25 quantile (or 25th percentile) is the value below which 25% of the data lie.

---

### **Empirical and Theoretical Distributions**

- **Theoretical Distribution**: A probability distribution derived from a known mathematical model (e.g., Normal, Poisson).
- **Empirical Distribution**: Based on observed data. It approximates the distribution of a dataset and is typically represented by the empirical CDF or histogram.
- Empirical distributions are used when the true distribution is unknown or difficult to model.

---

### **Homoscedastic and Heteroscedastic Errors**

- **Homoscedasticity**: The variance of the errors is constant.
- **Heteroscedasticity**: The error variance changes with the data

---

### **Kolmogorov's Axioms and Probability**

Kolmogorov formalized the foundation of probability with three axioms:

1. **Non-negativity**: For any event A, the probability is non-negative:  
   \( P(A) >= 0 \)
2. **Normalization**: The probability of the entire sample space is 1:  
   \( P($\Omega$) = 1 \)
3. **Additivity**: For any two mutually exclusive events A and B:  
   \( P(A $\cup$ B) = P(A) + P(B) \)

These axioms form the basis of modern probability theory.

---

### **Bayes' Theorem**

Bayes' Theorem updates the probability of a hypothesis based on new evidence:

$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$

-  P(A|B) : Posterior probability (updated belief)  
-  P(B|A) : Likelihood of observing B given A  
-  P(A) : Prior probability of A  
-  P(B) : Marginal probability of B  

Used in many fields like medicine, machine learning, and decision theory.

---

### **Transformations of Random Variables**

Transforming a random variable means applying a function to it, creating a new variable.

- **Example**: Let \( X \) be a random variable and \( Y = g(X) \) a transformation.
- To find the **distribution of \( Y \)**:
  - If \( X \) is continuous with PDF \( $f_X$ \) and \( g \) is invertible, then:

$
f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|
$

- This is used to derive distributions of functions of random variables (e.g., squares, sums, logarithms).


### LECTURE 3

### **Monte Carlo Integration (Crude and Hit-or-Miss)**

- **Monte Carlo integration** uses random sampling to approximate definite integrals.
- **Crude Monte Carlo**:  
  Estimate the integral $\int_a^b f(x) \, dx$ by sampling $x_i \sim \mathcal{U}(a, b)$ and computing:  
  $$
  I \approx (b - a) \cdot \frac{1}{N} \sum_{i=1}^N f(x_i)
  $$
- **Hit-or-Miss method**:  
  Sample uniformly in a rectangle that encloses the graph of $f(x)$.  
  The integral is approximated by the fraction of points that fall below the curve times the area of the rectangle.

---

### **Mean, Median, and Expected Value**

- **Mean**: Arithmetic average of a dataset.
- **Median**: Middle value when data are ordered. Less sensitive to outliers.
- **Expected value** ($\mathbb{E}[X]$): Theoretical mean of a random variable. For continuous variables:  
  $$
  \mathbb{E}[X] = \int x f(x) \, dx
  $$

---

### **Standard Deviation, MAD (1), Variance, MAD (2), Quantile Region, Interquantile Range, Mode**

- **Standard deviation** ($\sigma$): Measures spread around the mean.
- **MAD_1 (Mean Absolute Deviation)**:  
  $$
  \text{MAD}_1 = \frac{1}{N} \sum_{i=1}^N |x_i - \bar{x}|
  $$
- **Variance**:  
  $$
  \text{Var}(X) = \mathbb{E}[(X - \mu)^2]       with \mu = \mathbb{E}[X])
  $$
- **MAD_2**: Median Absolute Deviation = median $(|x_i - median({x_i})|)$
- **Quantile region**: Range containing a central portion of the distribution (e.g., 95% interval).
- **Interquantile range (IQR)**:  
  $$
  \text{IQR} = Q_{75} - Q_{25}
  $$

It contain the 50% of the dataset
- **Mode**: Most frequent value in a dataset.

---

### **Skewness and Kurtosis**

- **Skewness**: Measures asymmetry of a distribution.
  - Positive skew: tail to the right.
  - Negative skew: tail to the left.
- **Kurtosis**: Measures how likely extreme values (far from the average) are in a distribution.
  - High kurtosis: heavy tails.
  - Low kurtosis: light tails.
  - Normal distribution has kurtosis $= 3$.

---

### **PDF vs Sample Statistics, Bessel's Correction**

- **PDF statistics**: Theoretical values (mean, variance, etc.) computed from a probability distribution.
- **Sample statistics**: Estimates of these quantities based on data.
- **Bessel’s correction**: When estimating variance from a sample, divide by $N - 1$ instead of $N$ to correct bias:  
  $$
  s^2 = \frac{1}{N - 1} \sum_{i=1}^N (x_i - \bar{x})^2
  $$

---

### **Uncertainties of Estimators**

- Every estimator has **uncertainty** due to finite sample size.
- For the **sample mean**:
  $$
  \text{Standard error} = \frac{\sigma}{\sqrt{N}}
  $$
- For the **sample variance** and **standard deviation** ($s$), the standard error can be approximated as:
  $$
  \text{SE}(s) \approx \frac{\sigma}{\sqrt{2N}}
  $$
  where $\sigma$ is the true standard deviation and $N$ is the sample size.
- For the **Interquantile Range (IQR)**, the uncertainty depends on the density around the quartiles; a rough estimate of its standard error is:
  $$
  \text{SE}(\text{IQR}) \approx \frac{1.58 \times \text{IQR}}{\sqrt{N}}
  $$
- Confidence intervals express the likely range of the true parameter.

---

### **PDFs: Uniform, Gaussian, Log-Normal, Chi-Squared, Poisson**

- **Uniform**: All values in an interval have equal probability.  
  $$
  f(x) = \frac{1}{b - a}      \text{    for   } x \in [a, b]
  $$

  this distribution has $\sigma = \frac{b-a}{\sqrt(12)}$
- **Gaussian (Normal)**: Curve defined by mean $\mu$ and std $\sigma$.  
  $$
  f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \, e^{-\frac{(x - \mu)^2}{2\sigma^2}}
  $$
  - The convolution of two gaussian is a gaussian too.
  - It's the quuen of distribution , because everything follow this shape and it's quite easy to use.
  - $1\sigma$ = 68% // $2\sigma$ = 95%

- **Log-Normal**: $X \sim \text{LogNormal}$ means $\ln X \sim \text{Normal}$.
- **Chi-squared** ($\chi^2$):  
  If we define standardized variables as  
  $$
  z_i = \frac{x_i - \mu}{\sigma},
  $$  
  then the sum of their squares  
  $$
  Q = \sum_{i=1}^K z_i^2
  $$  
  follows a **chi-squared distribution** with $K$ degrees of freedom.

  The number of degrees of freedom $K$ is equal to the number of **independent** data points used in the sum.

- **Poisson**: Discrete distribution for count data.  
  $$
  P(k; \mu) = \frac{\mu^k e^{-\mu}}{k!}
  $$
  - Where: $\mu$ is the mean, K is the number of events occouring
  - Known as "law of rare events"

---

### **Importance Sampling**

- Hit or miss and Crude MC, are inefficient if the integrand has some null zone, or even if is really extendended... that's beacuse this 2 methode use the uniform distribution.
- Instead of sampling from the uniform, sample from a **proposal distribution** $g(x)$ 
- Best when $g(x)$ is close to the shape of $f(x)$.
- Reduces variance and computational cost if the $g(x)$ it's well chosen


### LECTURE 4



### **Central Limit Theorem (CLT)**

- The CLT states that the sum (or mean) of a large number of independent, identically distributed random variables tends to follow a **normal distribution**, regardless of the original distribution.

---

### **Law of Large Numbers (LLN)**

- The LLN states that as the number of observations $N$ increases, the sample mean $\bar{x}$ converges to the true mean $\mu$:
  $$
  \lim_{N \to \infty} \bar{x} = \mu
  $$
- This is a statement about convergence **in probability**.

---

### **Multidimensional PDFs**

- In 2D, the joint distribution can be described by:
  - **Mean vector**:  
    $$
    \vec{\mu} = (\mu_x, \mu_y)
    $$

  - **Covariance matrix**:  
    $$
    \Sigma = \begin{pmatrix}
    \sigma_x^2 & \text{cov}(x, y) \\
    \text{cov}(y, x) & \sigma_y^2
    \end{pmatrix}
    $$
    The two off diagonal values are equal to 0 only if x & y are totaly uncorrelated

  - **Correlation coefficient**:  
    $$
    \rho = \frac{\text{cov}(x, y)}{\sigma_x \sigma_y}
    $$
    Express the percentual of correlation between the 2 variable

  - **Principal axes**: determined by the eigenvectors of $\Sigma$; note that the correlation vanish in this system by definition.
  - **2D Confidence Ellipses**: regions where the joint probability is constant, keep attention, for each dimension the number of sigma has a different meaning: $1\sigma = 39$% in 2 dimension! I can impose 68% for the similitude with 1D, but it's not $1\sigma$.

---

### **Correlation vs Causation**

Correlation does not imply causation!
Just because the sun burns our skin and also makes us thirsty, it doesn't mean that thirst causes sunburn!

- **Pearson's correlation** (r) : Measures linear correlation between 2 different dataset; it's a value between -1 and 1, the 2 are uncorrelated only if r = 0.
It has 2 problems:
  - it's susceptible at the outliars
  - doesn't count the error

- **Spearman's rho**: Measures monotonic (rank-based) correlation.
- **Kendall's tau**: Measures ordinal association between two variables.

---

### **Rejection Sampling**

Rejection sampling is a method to generate random samples from a complex distribution $p(x)$, using a simpler proposal distribution $q(x)$.

The procedure works as follows:

1. **Choose a proposal distribution** $q(x)$ from which it's easy to sample (often a uniform distribution).  
   Make sure it's "wide enough" to cover the shape of $p(x)$, including its tails.

2. **Find a constant** $M$ such that for all $x$:
   $$
   p(x) \leq M q(x)
   $$
   This ensures the proposal dominates the target distribution.

3. **Generate a candidate sample** $x $ from $q(x)$.

4. **Draw a random number** $u $ from $ \mathcal{U}(0, 1)$.

5. **Accept or reject**:
   - Accept $x$ if  
     $$
     u < \frac{p(x)}{M q(x)}
     $$
   - Otherwise, reject $x$ and go back to step 3.

The set of accepted $x$ values will follow the target distribution $p(x)$.


---

### **Inverse Transform Sampling**

- Used to sample from a distribution with known CDF $F(x)$ and Quantile.
- Steps:
  1. Sample $u$ from  ${U}(0, 1)$.
  2. Compute $x = F^{-1}(u)$.
Normalizarion here are rellly important.
you can retrive the quantile and the CDF by numerically solution if you are not able to do in by hand

### LECTURE 5