# Jenson-Shannon Distance
Jensen Shannon Distance (JSD) is a measure of the similarity between two probability distributions. It is a symmetric version of the Kullback-Leibler divergence and is based on the square root of the Jensen-Shannon divergence. JSD is calculated by taking the weighted average of the Kullback-Leibler divergence of the two distributions and is used to quantify the difference between two probability distributions. It is often used in natural language processing applications to measure the similarity between documents. So

$$ JS(P,Q) = \frac{1}{2} KL(P,M) + \frac{1}{2} KL(Q,M) $$

where $ M = \frac{(P + Q)}{2} $

So the key to computing Jensen-Shannon is understanding how to compute $ KL $.

$$ KL(P,Q) = \sum_{n=1}^N \left(p_n \times \log \left(\frac{p_n}{q_n}\right) \right) $$

The Jensen-Shannon distance is symmetric, meaning $ JS(P,Q) = JS(Q,P) $. This is in contrast to Kullback-Leibler divergence which is not symmetric.


In [33]:
import numpy as np

In [34]:
def KL(p, q):
    # Kullback-Leibler "from q to p"
    # p and q are np array prob distributions
    n = len(p)
    sum = p.dot(np.log(p/q))
    return sum

def JS(p, q):
    m = 0.5 * (p + q)  # avg of P and Q
    left = KL(p, m)
    right = KL(q, m)
    return np.sqrt((left + right) / 2)


In [35]:
# The sample probabilities
p = np.array([0.36, 0.48, 0.16], dtype=np.float32)
q = np.array([0.30, 0.50, 0.20], dtype=np.float32)

In [36]:
js_pq = JS(p, q)
js_qp = JS(q, p)

In [37]:
print("Jensen Shannon distance from formula")
print(f"JS(P,Q) dist = {np.around(js_pq, 5)}")
print(f"JS(Q,P) dist = {np.around(js_qp, 5)}")

Jensen Shannon distance from formula
JS(P,Q) dist = 0.0508
JS(Q,P) dist = 0.0508


---------------

In [38]:
from scipy.spatial import distance

In [39]:
sk_pq = distance.jensenshannon(p, q)
sk_qp = distance.jensenshannon(q, p)

In [40]:
print("Jensen Shannon distance from scipy")
print(f"JS(P,Q) dist = {np.around(js_pq, 5)}")
print(f"JS(Q,P) dist = {np.around(js_qp, 5)}")

Jensen Shannon distance from scipy
JS(P,Q) dist = 0.0508
JS(Q,P) dist = 0.0508
