# Metrics 

We'd like to compare asymptotic path complexity to Cyclomatic Complexity and NPATH. We can consider all 3 of these metrics as taking some set S of N functions and finding a partition. 

Then, a metric A can be considered 'better' than metric B if metric A divides S into more subsets than set B. However, we would also like each subset to contain the same number of elements. For example, suppose metric A divides set $S=\{1, 2, 3, 4\}$ into $\{\{1\}, \{2, 3, 4\}\}$, whereas metric B divides S into $\{\{1, 2\}, \{3, 4\}\}$. Although both metrics divide $S$ into the same number of subsets, we consider metric $A$ better as the elements in $S$ are more unformly distributed into the subsets. 

Hence, we'd like to be able to measure how close a distribution is to the uniform distribution. We can do this from an information-theoretic approach or using distribution distances. Many of these functions are listed below.

Note: Another possibility not discussed here is the Kolmogorov-Smirnov Test.

## Shannon Entropy 

Shannon entropy is defined as 

$$H(X) = -\sum_{i=0}^{N-1}p_i log_2 p_i $$

## Conditional Entropy

The conditional entropy of Y given $X$ is 

$$H(Y|X) = -\sum p(x, y)log \frac{p(x, y)}{p(x)}$$

## Measuring Distance Between Distributions

1. Bhattacharyya Distance 

For probability distributions $p$ and $q$ over $X$, 

$$ D_b(p,q) = -ln(BC(p,q))$$

where 

$$BC(p, q) = \sum_{x\in X} \sqrt{p(x)q(x)}$$

2. Mutual Information 

Let (X, Y) be a pair of random variables with join distribution $P_(X, Y),$ and marginal distributions $P_X, P_Y$. Further, let $D_{KL}$ be the Kullback-Leibler Divergence. Then the mutual information is 

$$ I(X; Y) = D_{KL}(P_{(X,Y)}||P_X \otimes P_Y) $$

3. Kullback-Leibler Divergence
The Kullback-Leibler divergence between distributions $P$ and $Q$ is defined as 

$$ D_{KL}(P || Q) = -\sum P(x) log \frac{Q(X)}{P(X)} $$ 

4. Entropy Ratio 

One possible approach to compare two distributions is simply to take the ratio of their entropies. In this case, suppose we have two distributions over $N$ points.

$$\frac{H_1(X)}{H_2(X)} = \frac{\sum_{i=0}^{N-1}p_{1, i} log_2 p_{1, i}}{\sum_{i=0}^{N-1}p_{2, i} log_2 p_{2, i}} $$

5. Hellinger Distance 

For two discrete probability distributions $P$ and $Q$, their Hellinger distance is 

$$ H(P, Q) = \frac{1}{\sqrt{2}} \sqrt{\sum_{i=1}^k \left(\sqrt{p_i} - \sqrt{q_i} \right)^2}$$