# Probability and Information Theory



## Probability Mass Function (PMF) 


> <img src="images/3.Probability/Dice.png" height="30px" width="60px" class="hint" style="text-align:left"/> The domain of X contains discrete values

* The domain of P must be the set of all possible states of x.
 ∀x ∈ x,0≤ P(x)≤1.
* An impossible event has probability 0, and no statecan be less probable than that. Likewise, an event that is guaranteed tohappen has probability 1, and no state can have a greater chance of occurring.•
* x∈xP(x) = 1. We refer to this property as beingnormalized. 

![PMF](images/PMF.png)


Example : Uniformed distribution

$$ P(X = x_i) = \frac{1}{k} \tag{3.1} $$

![Uniformed Discrete](images\3.Probability\UniformDiscrete.png)

## Probability Density Function (PDF)


 * The domain of p must be the set of all possible states of x.
 * ∀x ∈ x, p(x) ≥ 0. Note that we do not require p(x) ≤ 1.
 * $\int p(x)dx = 1 $

Example: Normal Distribution

![Normal Distribution](images/3.Probability/PDF.png)

## Joint, Marginal and Conditional Probability

P(X = x,Y = y) is the joint distribution of X and Y, while P(X = x|Y = y) is the conditional distribution of X given Y
$$
P(X=x) = \Sigma_y P(X=x,Y=y) = \Sigma_y (X=x | Y = y)P(Y=y)
$$


$$
p(x) = \int p(x,y)dy
$$

Following image illustrate the three proability distribution via a two-way table. Highly recommend see [this great video](https://www.youtube.com/watch?v=SrEmzdOT65s) from zedstatistics .

![Probability Distribution](Images/3.Probability/ProbabilityDistribution.png)



In [31]:
import pandas as pd
import numpy as np

tv = pd.read_csv("TVShowProbability.csv",index_col=0) 
tv

Unnamed: 0,Male,Female
Game of Thrones,80,120
West World,100,25
Other,50,125


In [32]:
# Create Marginal Total (column and row)
tv["Total"] = tv.sum(axis=1)
total_row = pd.Series(tv.sum(axis=0))
total_row.name = "RowTotal"
tv = tv.append(total_row)
tv

Unnamed: 0,Male,Female,Total
Game of Thrones,80,120,200
West World,100,25,125
Other,50,125,175
RowTotal,230,270,500


In [39]:
# Convert to two-way frequency probablity table
# P(Gender) = Sum of P(TVShow,Gender) over TVShow
# P(TVShow) = Sum of P(TVShow,Gender) over Gender
tv_prob = tv/tv.ix["RowTotal","Total"]
tv_prob

Unnamed: 0,Male,Female,Total
Game of Thrones,0.16,0.24,0.4
West World,0.2,0.05,0.25
Other,0.1,0.25,0.35
RowTotal,0.46,0.54,1.0


In [47]:
# Add Conditional column over "Female" 
# P(TVShow | Gender) = P(TVShow,Gender) / P(Gender)
tv_prob["Condition | Female"] = tv_prob["Female"] / tv_prob.ix["RowTotal","Female"]
tv_prob

Unnamed: 0,Male,Female,Total,Condition | Female
Game of Thrones,0.16,0.24,0.4,0.444444
West World,0.2,0.05,0.25,0.092593
Other,0.1,0.25,0.35,0.462963
RowTotal,0.46,0.54,1.0,1.0


## Probablity Calculation



### Chain Rule

> Think joint probability as logic AND operation

$$ P(x^{(1)},...,x^{(n)}) = P(x^{(1)})\prod_{i=2}^{n} P(x^{(i}|x^{(1)},...,x^{(i-1})) $$

$$ P(a,b,c) = P(a|b,c)P(b,c) = P(a|b,c)P(b|c)P(c) $$



### Independence

$$\forall x \in X, y \in Y, p(X=x,Y=y) = p(X=x)P(Y=y) $$