# Divergence vs Metric Functions


## Introduction

In this notebook, we will discuss the difference between divergence and metric functions. We will also see how they are related to each other.

## Divergence

Divergence is a measure of how different two probability distributions are from each other. It is a non-negative scalar value that quantifies the difference between two distributions. Divergence is often used in machine learning and statistics to compare the similarity between two probability distributions.

There are many different divergence functions, such as Kullback-Leibler divergence, cross entropy divergence. Each of these divergence functions has its own properties and applications.

Divergence is not always well defined. It depends on context and community. For example:

- In statistics, divergence is often used to measure the difference between two probability distributions.

- In physics, divergence is used to measure the flow of a vector field.

- In mathematics, divergence is used to measure the rate at which a vector field spreads out from a point.

All metric functions are divergence functions, but not all divergence functions are metric functions.

For example:

Jensen-Shannon divergence, and Hellinger distance are all divergence functions that are also metric functions. It's a question of context and community.

## Metric Function

A metric function is a function that takes two points as input and outputs a scalar value. The metric function is non-negative and is equal to zero if and only if the two points are the same. The metric function satisfies the following properties:

1. Non-negativity: $d(x, y) \geq 0$ for all $x, y \in X$.

2. Identity of indiscernibles: $d(x, y) = 0$ if and only if $x = y$.

3. Symmetry: $d(x, y) = d(y, x)$ for all $x, y \in X$.

4. Triangle inequality: $d(x, y) + d(y, z) \geq d(x, z)$ for all $x, y, z \in X$.





In [5]:
from scipy.spatial.distance import euclidean, cityblock, cosine
from scipy.stats import entropy
from scipy.spatial.distance import jensenshannon

# Example data
p = [0.1, 0.2, 0.7]
q = [0.2, 0.3, 0.5]

# Metric functions
euclidean_distance = euclidean(p, q)
manhattan_distance = cityblock(p, q)
cosine_distance = cosine(p, q)

# Divergence functions
kl_divergence = entropy(p, q)
js_divergence = jensenshannon(p, q)

print(f"Euclidean Distance: {euclidean_distance}")
print(f"Manhattan Distance: {manhattan_distance}")
print(f"Cosine Distance: {cosine_distance}")
print(f"KL Divergence: {kl_divergence}")
print(f"JS Divergence: {js_divergence}")

Euclidean Distance: 0.24494897427831774
Manhattan Distance: 0.3999999999999999
Cosine Distance: 0.050751810770051864
KL Divergence: 0.08512282595722162
JS Divergence: 0.14799046918127484


### Example: Iris Data

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
from scipy.spatial.distance import jensenshannon
from scipy.stats import wasserstein_distance
import numpy as np

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Fit Naive Bayes model
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_probs = nb_model.predict_proba(X_test)

# Fit Logistic Regression model
lr_model = LogisticRegression(max_iter=200)
lr_model.fit(X_train, y_train)
lr_probs = lr_model.predict_proba(X_test)

# Calculate KL divergence
kl_divergence = np.sum(nb_probs * np.log(nb_probs / lr_probs), axis=1).mean()

# Calculate JS distance
js_distance = jensenshannon(nb_probs, lr_probs, axis=1).mean()

# Calculate Wasserstein distance
wasserstein_dist = np.mean([wasserstein_distance(nb_probs[i], lr_probs[i]) for i in range(len(nb_probs))])

print(f"KL Divergence: {kl_divergence}")
print(f"JS Distance: {js_distance}")
print(f"Wasserstein Distance: {wasserstein_dist}")

KL Divergence: 0.07279188207395282
JS Distance: 0.13794915308916347
Wasserstein Distance: 0.052222296152880265
