# ML, Data Analysis
### Probability: Joint probability mass function (joint PMF)
**Joint probability mass function**, or **joint PMF** of two **discrete** random variables $X$ and $Y$ is defined by:
<br> $\large p(x,y)=P(X=x \;and\; Y=y)$
<br>Where:
- $X$ is a discrete rando mvariable with domain $\mathcal{X} = \{x_1,x_2,...,x_m\}$
- $Y$ is a discrete random variable with domain $\mathcal{Y} = \{y_1,y_2,...,y_n\}$
- $P(X=x \;and\; Y=y)$ is the probability that $X$ takes value $x% and $Y$ takes value $y$ at the same time.

<hr>

**Marginal probability mass functions** are computed by:
- Marginal of $X$: $p(x)=P(X=x)=\sum_{y\in \mathcal{Y}}p(x,y)$
- Marginal of $Y$: $p(y)=P(Y=y)=\sum_{x\in \mathcal{X}}p(x,y)$

<hr>

**Conditional probabilities mass functions:**
- Conditional of $X$ given $Y$: $p(x∣y)=P(X=x\;∣\;Y=y)=\frac{p(x,y)}{p(y)}$ for $p(y)\neq=0$
- Conditional of $Y$ given $X$: $p(y∣x)=P(Y=y;∣\;X=x)=\frac{p(x,y)}{p(x)}$ for $p(x)\neq=0$

<hr>

From **conditionals** and **marginals** to **joint PMF**:
- $p(x,y)=p(x∣y)p(y)=p(y∣x)p(x)$

**Reminder 1:** $X$ and $Y$ are **independent** if and only if:
- $p(x,y)=p(x)p(y)$ for all $x\in\mathcal{X},y∈\mathcal{Y}$

<hr>

**Estimation of PMFs from data:** Given $N$ **i.i.d.** (independent and identically distribuited) samples ${(x_i,y_i) | i=1,2,..,N}$:
- Empricial joint PMF (an estimation to the true PMF): $\hat{p}(x,y)=\frac{1}{N}\sum_{i=1}^N \mathbb{I}(x_i=x\; and\; y_i=y)$
    - Where $I(⋅)$ is the **equality indicator** function.
- Empirical marginal of $X$: $\hat{p}(x)=\frac{1}{N}\sum_{i=1}^N \mathbb{I}(x_i=x)=\sum_y\hat{p}(x,y)$
- Empirical marginal of $Y$: $\hat{p}(y)=\frac{1}{N}\sum_{i=1}^N \mathbb{I}(y_i=y)=\sum_x\hat{p}(x,y)$

**Reminder 2:** The **equality indicator** is defined by:
$\mathbb{I}(x=a)=\begin{cases}
  1, & \text{if } x=a \\
  0, & \text{otherwise}
\end{cases}$

<hr>

In the following, we give the **Python** function to compute empirical PMFs from the given data. The function is then used for a simple dataset, which holds samples of two random variables.

<hr>
https://github.com/ostad-ai/Machine-Learning
<br> Explanation: https://www.pinterest.com/HamedShahHosseini/Machine-Learning/background-knowledge

In [1]:
# Import the required modules
import numpy as np

In [2]:
def compute_probabilities(X, Y):
    """
    Compute all probability distributions from data.
    """
    X = np.asarray(X).flatten()
    Y = np.asarray(Y).flatten()
    n = len(X)
    
    # Get unique values
    x_vals = np.unique(X)
    y_vals = np.unique(Y)
    
    # Initialize arrays
    n_x = len(x_vals)
    n_y = len(y_vals)
    
    # 1. Joint probability p(x,y)
    joint_counts = np.zeros((n_x, n_y))
    for i, x in enumerate(x_vals):
        for j, y in enumerate(y_vals):
            joint_counts[i, j] = np.sum((X == x) & (Y == y))
    
    joint_probs = joint_counts / n
    
    # 2. Marginal probabilities
    p_x = np.sum(joint_probs, axis=1)  # p(x)
    p_y = np.sum(joint_probs, axis=0)  # p(y)
    
    # 3. Conditional probabilities
    # p(x|y) = p(x,y) / p(y)
    p_x_given_y = np.zeros((n_x, n_y))
    for j in range(n_y):
        if p_y[j] > 0:
            p_x_given_y[:, j] = joint_probs[:, j] / p_y[j]
    
    # p(y|x) = p(x,y) / p(x)
    p_y_given_x = np.zeros((n_x, n_y))
    for i in range(n_x):
        if p_x[i] > 0:
            p_y_given_x[i, :] = joint_probs[i, :] / p_x[i]
    
    return {
        'x_values': x_vals,
        'y_values': y_vals,
        'joint': joint_probs,
        'marginal_x': p_x,
        'marginal_y': p_y,
        'conditional_x_given_y': p_x_given_y,
        'conditional_y_given_x': p_y_given_x
    }

<hr style="height:3px; background:yellow;">

### Example
The data is given as pairs $(x_i,y_i)$, which are samples of random variables $X$ and $Y$. We may assume they represent the following sample space:  
- X∈{0,1} may denote Gender: 0=Female, 1=Male
- Y∈{0,1} may denote Smoker: 0=No, 1=Yes

In [3]:
# The dataset
X = np.array([0]*40 + [1]*60)  # 40 zeros, 60 ones
Y = np.array([0]*30 + [1]*10 + [0]*25 + [1]*35)

probs = compute_probabilities(X, Y)

print("X values (female, male):", probs['x_values'])
print("Y values (nonsmoker, smoker):", probs['y_values'])
print("\nJoint distribution p(x,y):")
print(f'[(female,non-smoker),(female,smoker)]')
print(f'[(male,non-smoker),(male,smoker)]')
print('----------')
print('[p(0,0),p(0,1)]')
print('[p(1,0),p(1,1)]')
print('-----------')
print(probs['joint'])
print("\nMarginal p(x); p(female), p(male):", probs['marginal_x'])
print("Marginal p(y); p(nonsmoker), p(smoker):", probs['marginal_y'])

X values (female, male): [0 1]
Y values (nonsmoker, smoker): [0 1]

Joint distribution p(x,y):
[(female,non-smoker),(female,smoker)]
[(male,non-smoker),(male,smoker)]
----------
[p(0,0),p(0,1)]
[p(1,0),p(1,1)]
-----------
[[0.3  0.1 ]
 [0.25 0.35]]

Marginal p(x); p(female), p(male): [0.4 0.6]
Marginal p(y); p(nonsmoker), p(smoker): [0.55 0.45]
