###### This ipython notebook is designed to demonstrate the fundamentals of Machine Learning

##### import required python packages

In [22]:
import pandas as pd
import numpy as np

# Joint and Marginal Distribution (Two Way Distribution)

To demonostrate the concepts of Joint and Marginal Distribution, I reference data from _John Kruschke (Doing Bayesian Data Analysis, 2015)_ <br>
Data Source: _Snee, Ron. (1974). Graphical display of two-way contingency tables. The American Statistician. 28. 9-12._

In [2]:
data= pd.DataFrame({'eye':['Blue','Brown','Green','Hazel','Blue','Brown','Green','Hazel','Blue','Brown','Green','Hazel','Blue','Brown','Green','Hazel'], 
                    'hair':['Black','Black','Black','Black','Blond','Blond','Blond','Blond','Brown','Brown','Brown','Brown','Red','Red','Red','Red'], 
                    'count':[20,68,5,15,94,7,16,10,84,119,29,54,17,26,14,14]})

In [3]:
##### Observe Data

In [4]:
data.head(n=3)

Unnamed: 0,eye,hair,count
0,Blue,Black,20
1,Brown,Black,68
2,Green,Black,5


In [5]:
#### Observing Data in Pivot Table Format

In [6]:
pd.pivot_table(data, values='count', index=['eye'],columns=['hair'])

hair,Black,Blond,Brown,Red
eye,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Blue,20,94,84,17
Brown,68,7,119,26
Green,5,16,29,14
Hazel,15,10,54,14


In [7]:
##### Observing Data with Total as appended Row/Columns in the pivot tables

In [8]:
### proportion to the total- it is just a simple proportion to the total

In [9]:
data['proportion']=np.round(data['count']/np.sum(data['count']), 3)
data.head(n=3)

Unnamed: 0,eye,hair,count,proportion
0,Blue,Black,20,0.034
1,Brown,Black,68,0.115
2,Green,Black,5,0.008


#### Joint Probability Distribution <br>
$p(h, e) = p(e, h)$ 

##### It is a particular combinations of given variables - in this case hair/eye color

In [10]:
pd.pivot_table(data, values='proportion', index=['eye'],columns=['hair'])

hair,Black,Blond,Brown,Red
eye,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Blue,0.034,0.159,0.142,0.029
Brown,0.115,0.012,0.201,0.044
Green,0.008,0.027,0.049,0.024
Hazel,0.025,0.017,0.091,0.024


#### Marginal Probability Distribution : computed simply by summing the joint probability in each row

$p(h) =\sum_{e}  p(h, e) $ - marginal probability of hair color summation of joint probabilities across eye

###### Hair Color

In [11]:
pd.pivot_table(data=data, index=['hair'], values=['proportion'], aggfunc='sum').reset_index().rename(columns={'proportion':'total'})

Unnamed: 0,hair,total
0,Black,0.182
1,Blond,0.215
2,Brown,0.483
3,Red,0.121


#### Marginal Probability Distributio
$p(e) =\sum_{h}  p(h, e) $ - marginal probability of hair color summation of joint probabilities across hair color
##### Eye Color

In [12]:
pd.pivot_table(data=data, index=['eye'], values=['proportion'], aggfunc='sum').reset_index().rename(columns={'proportion':'total'})

Unnamed: 0,eye,total
0,Blue,0.364
1,Brown,0.372
2,Green,0.108
3,Hazel,0.157


### Putting everything together

In [13]:
df=pd.pivot_table(data, values='count', index=['eye'],columns=['hair'])
df['marginal_eye']=df.sum(axis=1)
df.append(pd.Series(df.sum(numeric_only=True),name='marginal_hair'))

hair,Black,Blond,Brown,Red,marginal_eye
eye,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Blue,20,94,84,17,215
Brown,68,7,119,26,220
Green,5,16,29,14,64
Hazel,15,10,54,14,93
marginal_hair,108,127,286,71,592


In [14]:
dd=round(df/592, 2)# For the Joint Probability Distribution we are just computing the combinations of hair, eye and dividing by total of hair and eye
dd.append(pd.Series(dd.sum(numeric_only=True),name='marginal_hair'))

hair,Black,Blond,Brown,Red,marginal_eye
eye,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Blue,0.03,0.16,0.14,0.03,0.36
Brown,0.11,0.01,0.2,0.04,0.37
Green,0.01,0.03,0.05,0.02,0.11
Hazel,0.03,0.02,0.09,0.02,0.16
marginal_hair,0.18,0.22,0.48,0.11,1.0


### Conditional Probabitlity  - the probability of _h given e_ 

$p(h/e)=\frac{p(e,h)}{p(e)}$ = $\frac{p(e,h)}{\sum_{h}(e,h)}$

In [15]:
dd.reset_index()

hair,eye,Black,Blond,Brown,Red,marginal_eye
0,Blue,0.03,0.16,0.14,0.03,0.36
1,Brown,0.11,0.01,0.2,0.04,0.37
2,Green,0.01,0.03,0.05,0.02,0.11
3,Hazel,0.03,0.02,0.09,0.02,0.16


In [16]:
cp=dd.reset_index()[:1]
round(cp.iloc[:,1:]/0.36, 3)

hair,Black,Blond,Brown,Red,marginal_eye
0,0.083,0.444,0.389,0.083,1.0


###### the probability of Blue eye given various hair colors. For an example, Blue/Black is 0.083