Skip to content

Commit

Permalink
Working on concavity of entropy and convexity of KL divergence
Browse files Browse the repository at this point in the history
  • Loading branch information
gnthibault committed Feb 22, 2024
1 parent 94328b8 commit 897b37a
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 2 deletions.
31 changes: 31 additions & 0 deletions InformationTheoryOptimization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,37 @@
" return p, -np.dot(p,SafeLog2(p))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Interesting property of entropy\n",
"### Concavity of entropy and convexity of KL-divergence\n",
"The entropy is concave in the space of probability mass function, more formally, this reads:\n",
"\\begin{align*}\n",
" H[\\lambda p_1 + (1-\\lambda p_2)] \\geq \\lambda H[p_1] + (1-\\lambda p_2) H[p_2]\n",
"\\end{align*}\n",
"where $p_1$ and $p_2$ are probability mass functions and $\\lambda \\in [0,1]$\n",
"\n",
"Proof: Let $X$ be a discrete random variable with possible outcomes $\\mathcal{X} := {x_i, i \\in 0,1,\\dots N-1}$ and let $u(x)$ be the probability mass function of a discrete uniform distribution on $X \\in \\mathcal{X}$. Then, the entropy of an arbitrary probability mass function $p(x)$ can be rewritten as\n",
"\n",
"\\begin{align*}\n",
" H(X) &= - \\sum_{i=0}^{N-1} p(x_i)log(p(x_i)) \\\\\n",
" &= - \\sum_{i=0}^{N-1} p(x_i)log\\left(\\frac{p(x_i)}{u(x_i)} u(x_i)\\right) \\\\\n",
" &= - \\sum_{i=0}^{N-1} p(x_i)log\\left(\\frac{p(x_i)}{u(x_i)}\\right) - \\sum_{i=0}^{N-1} p(x_i)log(u(x_i)) \\\\\n",
" &= -KL[p\\|u] - \\sum_{i=0}^{N-1} p(x_i)log(u(x_i)) \\\\\n",
" &= -KL[p\\|u] - log \\left(\\frac{1}{N} \\right) \\sum_{i=0}^{N-1} p(x_i) \\\\\n",
" &= log(N) - KL[p\\|u]\n",
" log(N) - H(X) &= KL[p\\|u]\n",
"\\end{align*}\n",
"\n",
"Where $KL[p\\|u]$ is the Kullback-Leibler divergence between $p$ and the discrete uniform distriution $u$ over $\\mathcal{X}$, a concept we will explain more in detail later on this page. \n",
"Note that the KL divergence is convex in the space of the pair of probability distributions $(p,q)$:\n",
"\\begin{align*}\n",
" KL[\\lambda p_1 + (1-\\lambda p_2) \\| \\lambda q_1 + (1-\\lambda q_2)] \\geq \\lambda KL[p_1\\|q_1] + (1-\\lambda p_2) KL[p_2\\|q_2]\n",
"\\end{align*}\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
4 changes: 2 additions & 2 deletions OptimalTransportWasserteinDistance.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -559,7 +559,7 @@
"metadata": {},
"source": [
"### OT and statistical concepts\n",
"Some of the basics to understand the following statements can be found in the notebook \"InformationTheoryOptimization\"\n",
"Some of the basics to understand the following statements can be found in the notebook \"InformationTheoryOptimization\" this part is also partly a direct reproduction of Marco Cuturi famous article \"Sinkhorn Distances: Lightspeed Computation of Optimal Transport\"\n",
"\n",
"I would like to stop and mention that as we now interpret $P$ as a joint probability matrix, we can define its entropy, the marginal probabiilty entropy, and KL-divergence between two different transportation matrix. These takes the form of\n",
"\n",
Expand All @@ -585,7 +585,7 @@
" KL(P\\|rcˆT) = h(r) + h(c) − h(P)\n",
"\\end{align*}\n",
"\n",
"This quantity is also the mutual information $I(X\\|Y)$ of two random variables $(X, Y)$ should they follow the joint probability $P$ (Cover and Thomas, 1991, §2). Hence, the set of tables P whose Kullback-Leibler divergence to rcT is constrained to lie below a certain threshold can be interpreted as the set of joint probabilities P in U (r, c) which have sufficient entropy with respect to h(r) and h(c), or small enough mutual information. For reasons that will become clear in Section 4, we call the quantity below the Sinkhorn distance of r and c:"
"This quantity is also the mutual information $I(X\\|Y)$ of two random variables $(X, Y)$ should they follow the joint probability $P$ . Hence, the set of tables P whose Kullback-Leibler divergence to rcT is constrained to lie below a certain threshold can be interpreted as the set of joint probabilities P in U (r, c) which have sufficient entropy with respect to h(r) and h(c), or small enough mutual information. For reasons that will become clear in Section 4, we call the quantity below the Sinkhorn distance of r and c:"
]
},
{
Expand Down

0 comments on commit 897b37a

Please sign in to comment.