# Table of contents

1. [Probability rules](#rules)  
    1.1 [Ensamble](#rules)   
    1.2 [Joint probability](#joint)  
    1.3 [Marginal probability](#marginal)  
    1.4 [Conditional probability](#conditional)  
    1.5 [Product rule](#product)  
    1.6 [Sum rule](#sum)  
    1.7 [Bayes' rule](#bayes)  
2. [Entropy](#entropy)  
    2.1 [Shanon information content](#shanonic)    
    2.2 [Entropy of an ensamble](#ensambleentropy)  
    2.3 [Joint entropy](#jointentropy)    
    


In [2]:
import numpy as np
import matplotlib.pyplot as plt



<a id="Rules"> </a>
# Probability rules

<a id="ensamble"> </a>
## Ensamble

X - ensamble  
x - outcome  
A - possible values  
P - probabilities of the possible values  


\begin{equation} 
\begin{split}
X = (x, A_{x},P_{x})
\end{split}
\end{equation}





<a id="joint"> </a>
## Joint probability


\begin{equation} \label{eq1}
\begin{split}
P(x,y) = P(y,x)
\end{split}
\end{equation}

<a id="marginal"> </a>
## Marginal probability


\begin{equation} \
\begin{split}
P(x) = \sum_{y}P(x,y)
\end{split}
\end{equation}


<a id="conditional"> </a>
## Conditional probability

\begin{equation} \
\begin{split}
P(x=a_{i}|y=b_{j}) = \frac {P(x=a_{i},y=b_{j})} {P(y = b_{j})}
\end{split}
\end{equation}


<a id="product"> </a>
## Product (chain) rule

Joint probability = conditional * marginal  

Obtained from the conditional probability (just put P(y) on the other side)
  
H - assumptions based on which the probabilities are based  

\begin{equation} \label{eq2.6}
\begin{split}
P(x,y|\mathcal{\mathcal{H}}) & = P(x|y,\mathcal{H})P(y|\mathcal{H}) \\
 & = P(y|x,\mathcal{H})P(x|\mathcal{H})
\end{split}
\end{equation}




<a id="sum"> </a>
## Sum rule

Re-write of the marginal probability when we include the chain rule 


\begin{equation} \
\begin{split}
P(x|\mathcal{\mathcal{H}}) & = \sum_{y}P(x,y|\mathcal{\mathcal{H}}) \\
 & = \sum_{y}P(x|y,\mathcal{\mathcal{H}})P(y|\mathcal{H})
\end{split}
\end{equation}


<a id="bayes"> </a>
## Bayes' rule

Chain rule and joint probability    


\begin{equation} \
\begin{split}
P(x,y|\mathcal{\mathcal{H}}) & = P(y,x|\mathcal{\mathcal{H}}) \\[10pt]    
P(x,y|\mathcal{\mathcal{H}}) & = P(x|y,\mathcal{H})P(y|\mathcal{H}) \\[10pt] 
P(y,x|\mathcal{\mathcal{H}}) & = P(y|x,\mathcal{H})P(x|\mathcal{H}) \\[10pt]  
P(x|y,\mathcal{H})P(y|\mathcal{H}) & = P(y|x,\mathcal{H})P(x|\mathcal{H}) \\[10pt]
P(x|y,\mathcal{H}) & = \frac{P(y|x,\mathcal{H})P(x|\mathcal{H})}{P(y|\mathcal{H})}
\end{split}
\end{equation}

More generally:


\begin{equation} \
\begin{split}
P(\theta|D,\mathcal{H}) & = \frac{P(D|\theta,\mathcal{H})P(\theta|\mathcal{H})}{P(D|\mathcal{H})}
\end{split}
\end{equation}

Named as:

\begin{equation} \
\begin{split}
posterior & = \frac{likelihood \times prior}{evidence}
\end{split}
\end{equation}



<a id="ind"> </a>
## Independence

Two random variables are independent iff: 


\begin{equation} \
\begin{split}
P(x,y) & = P(x)P(y) \\
\end{split}
\end{equation}


<a id="entropy"> </a>
# Entropy

<a id="shanonic"> </a>
## Shanon information content

\begin{equation} \
\begin{split}
h(x) & = log_{2} \frac{1}{P(x)}
\end{split}
\end{equation}

<a id="ensambleentropy"> </a>
## Entropy of an ensamble

\begin{equation} \
\begin{split}
H(x) & = \sum_{x} P(x)log\frac{1}{P(x)}
\end{split}
\end{equation}

<a id="jointentropy"> </a>
## Joint entropy

\begin{equation} \
\begin{split}
H(x,y) & = \sum_{xy} P(x,y)log\frac{1}{P(x,y)}
\end{split}
\end{equation}

<a id="relativeentropy"> </a>
## Relative entropy / KL divergence

\begin{equation} \
\begin{split}
D_{KL}(P||Q) & = \sum_{x} P(x)log\frac{P(x)}{Q(x)}
\end{split}
\end{equation}  

$D_{KL}$ satistifies Gibbs inequality: 

\begin{equation} \
\begin{split}
D_{KL}(P||Q) & \geq 0
\end{split}
\end{equation}    

$D_{KL}$ is not symmetric, so it's not strictly a distance:  

\begin{equation} \
\begin{split}
D_{KL}(P||Q) & \neq D_{KL}(Q||P)
\end{split}
\end{equation}    