<a href="https://colab.research.google.com/github/wingated/cs473/blob/main/labs/cs473_lab_week_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><p><b>After clicking the "Open in Colab" link, copy the notebook to your own Google Drive before getting started, or it will not save your work</b></p>

# BYU CS 473 Lab Week 4

## Introduction:
KL divergence is one of the most commonly used concepts in machine learning. Here, we'll explore the

---
## Exercise #1: Symmetry    

KL divergence is a *measure*, but not a *metric*. This means that while it satisfies some properties of things like a distance metric, it does not satisfy all of them.

For example, KL divergence is NOT symmetric. First, implement a function that calculates the KL divergence between two discrete distributions. Then,cCraft an example to demonstrate that it is not symmetric.

In [3]:
import numpy as np

def kl(a,b):
    # a is a n-dimensional distribution
    # b is a n-dimensional distribution
    #
    # return: KL(a||b)

    # your code here
    return np.sum(np.where(a != 0, a * np.log(a / b), 0))

# find an example where kl(a,b) != kl(b,a)
a = np.array([0.1, 0.9])
b = np.array([0.8, 0.2])
kl_a_b = kl(a, b)
kl_b_a = kl(b, a)

print(f"Distribution 'a': {a}")
print(f"Distribution 'b': {b}")


print(f"KL(a || b) = {kl_a_b:.4f}")
print(f"KL(b || a) = {kl_b_a:.4f}")

print(f"Is KL(a || b) == KL(b || a)? {'Yes' if np.isclose(kl_a_b, kl_b_a) else 'No'}")

# etc

Distribution 'a': [0.1 0.9]
Distribution 'b': [0.8 0.2]
KL(a || b) = 1.1457
KL(b || a) = 1.3627
Is KL(a || b) == KL(b || a)? No


---
## Exercise #2: Triangle inequality

Another property that KL divergence does not satisfy is the triangle inequality, which states that

kl(a,c) >= kl(a,b)+kl(b,c)

Prove that KL divergence does not satisfy the triangle inequality by crafting a counter-example.

In [5]:
a = np.array([0.9, 0.1])
b = np.array([0.5, 0.5])
c = np.array([0.1, 0.9])

kl_a_c = kl(a, c)
kl_a_b = kl(a, b)
kl_b_c = kl(b, c)

indirect_path_sum = kl_a_b + kl_b_c

print(f"Distribution 'a': {a}")
print(f"Distribution 'b': {b}")
print(f"Distribution 'c': {c}")

print(f"Direct Divergence D_KL(a || c)          = {kl_a_c:.4f}")
print(f"Indirect Path D_KL(a || b) + D_KL(b || c) = {indirect_path_sum:.4f}")

print(f"Does the triangle inequality hold? (a,c <= a,b + b,c)")
print(f"Result: {kl_a_c <= indirect_path_sum}")

Distribution 'a': [0.9 0.1]
Distribution 'b': [0.5 0.5]
Distribution 'c': [0.1 0.9]
Direct Divergence D_KL(a || c)          = 1.7578
Indirect Path D_KL(a || b) + D_KL(b || c) = 0.8789
Does the triangle inequality hold? (a,c <= a,b + b,c)
Result: False


---
## Exercise #3: Proofs

Prove that:

1) kl(a,a) = 0
2) kl(a,b) >= 0

Extra credit:

3) kl(a,b) = 0 iff a==b

1.
Defination of KL divergence - $$ D_{KL}(a || b) = \sum_{i} a(i) \log\left(\frac{a(i)}{b(i)}\right) $$  
To find $ D_{KL}(a || a)$, we should substitute b with a.
$$ D_{KL}(a || a) = \sum_{i} a(i) \log\left(\frac{a(i)}{a(i)}\right) $$
if $ a(i) > 0 $ the fraction inside the logarithm becomes 1
$$ D_{KL}(a || a) = \sum_{i} a(i) \log(1) $$ where $log(1) = 0$
$$ D_{KL}(a || a) = \sum_{i} a(i).0 = 0 $$
Hence proved that $D_{KL}(a || a) = 0$.




2.
Defination of KL divergence - $$ D_{KL}(a || b) = \sum_{i} a(i) \log\left(\frac{a(i)}{b(i)}\right) $$  

using log property $ log(\frac{x}{y}) = -log(\frac{y}{x})$ the equation becomes
$$ D_{KL}(a || b) = -\sum_{i} a(i) \log\left(\frac{b(i)}{a(i)}\right) $$   
the above equation can be interpreted as expected value of $log(\frac{b}{a})$
$$ \sum_{i} a(i) \log\left( \frac{a(i)}{b(i)} \right) = \mathbb{E}_{a}\left[\log\left(\frac{a}{b}\right)\right] $$
here apply Jensen's Inequality, Since $log(x)$ is a concave function.
$$ \mathbb{E}_a\left[ \log\left( \frac{a}{b} \right) \right] \leq \log \left( \mathbb{E}_a\left[ \frac{a}{b} \right] \right) $$
after evaluating
$$ \mathbb{E}_a\left[\frac{b}{a}\right] = \sum_{i} a(i) \cdot \frac{b(i)}{a(i)} = \sum_{i} {b(i)} $$
Since b is a probability distribution, 𝚺 b(i) = 1.
$$ \mathbb{E}_a\left[ \log\left(\frac{b}{a}\right) \right] \leq \log(1) = 0 $$
using log property $ log(\frac{x}{y}) = -log(\frac{y}{x})$ the equation becomes
$$D_{KL}(a || b) = - E_a\left[\log\left(\frac{b}{a}\right)\right] \ge -(0) = 0 $$
Hence prove that $D_{KL}(a || b) ≥ 0$





3.
it asks us if and only if so it must be proved both sides.<br>
part 1: "if" direction $ a=b, then D_{KL}(a || b) = 0$.
$$ \text{ If }a(i) = b(i) \text{ for all } i, \text{ then } D_{KL}(a || b) = D_{KL}(a || a) = 0.$$
part 2: "only if" direction  $ \text{ If } D_{KL}(a || b) = 0, \text{ then }  a=b $. <br>
using the equality $\mathbb{E}[\log(X)] \leq \log(\mathbb{E}[X])$ and we know that $D_{KL}(a || b) = 0$ because of Jensen's inequality.
$$ \mathbb{E}_a\left[ \log\left( \frac{a}{b} \right) \right] = \log \left( \mathbb{E}_a\left[ \frac{a}{b} \right] \right) $$
For this equality to be true random variable that we are taking equal to constant 'c' ∀ $a(i)>0$
$$ \frac{a(i)}{b(i)} = c \implies a(i) = c \cdot b(i) $$
for finding  the value of c, we sum over all i and we know that both a and b probability distributions.
$$ \sum_i b(i) = \sum_i c \cdot a(i) $$
$$  1 = c \sum_i a(i)$$
$$ 1 = c \cdot 1 \implies c = 1 $$
Since c = 2, we can conclude that:
$$ \frac{a(i)}{b(i)} = 1 \implies b(i) = a(i) \implies a = b $$
Hence proved that $ a=b, \text{then } D_{KL}(a || b) = 0$.