### With what information does Bayes Theorem update our previous knowledge about the data parameters?

1. By combining our previous knowledge (called the prior distribution),
2. With new information obtained from observed data,
3. Resulting in updated parameter knowledge (called the posterior distribution).

### What does the prior probability represent?

Prior probability, in Bayesian statistical inference, is the probability of an event before new data is collected. This is the best rational assessment of the probability of an outcome based on the current knowledge before an experiment is performed. 

It often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account.

### Wine Analysis

In [1]:
# dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.naive_bayes import GaussianNB 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer

In [2]:
# load training dataset
wine_train = pd.read_csv('https://raw.githubusercontent.com/ArashVafa/DESC624/master/wine_flag_training.csv')
# load test dataset
wine_test = pd.read_csv('https://raw.githubusercontent.com/ArashVafa/DESC624/master/wine_flag_test.csv')

In [3]:
wine_train.head()

Unnamed: 0,Type,Alcohol_flag,Sugar_flag
0,Red,High,High
1,Red,High,Low
2,Red,Low,High
3,Red,High,Low
4,Red,Low,Low


Build contingency tables for calculations.

In [10]:
# contingency tables for Type and Alcohol_flag
ct1 = pd.crosstab(wine_train.Type, wine_train.Alcohol_flag, margins=True)
ct1

Alcohol_flag,High,Low,All
Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Red,218,282,500
White,268,232,500
All,486,514,1000


In [12]:
# contingency tables for Type and Sugar_flag
ct2 = pd.crosstab(wine_train.Type, wine_train.Sugar_flag, margins=True)
ct2

Sugar_flag,High,Low,All
Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Red,116,384,500
White,300,200,500
All,416,584,1000


In [29]:
# The prior probability of Type = Red and Type = White.
prob1 = ct1['All']['Red']/ct1['All']['All']
print("The prior probability of Type = Red is ", round(prob1*100,2), "%")
print("The prior probability of Type = White is ", round((1-prob1)*100,2), "%")

# The probability of high and low alcohol content.
prob2 = ct1['High']['All']/ct1['All']['All']
print("The probability of high content is: ", round(prob2*100,2), "%")
print("The probability of low content is: ", round((1-prob2)*100,2), "%")

# The probability of high and low sugar content.
prob3 = ct2['High']['All']/ct2['All']['All']
print("The probability of high sugar content is: ", round(prob3*100,2), "%")
print("The probability of low sugar content is: ", round(prob3*100,2), "%")

# The conditional probabilities p(Alcohol_flag = High ∣ Type = Red) and p(Alcohol_flag = Low ∣ Type = Red).
prob4 = ct1['High']['Red']/ct1['All']['Red']
print("The onditional probabilities p(Alcohol_flag = High ∣ Type = Red) is ", round(prob4*100,2), "%")
print("The onditional probabilities p(Alcohol_flag = High ∣ Type = Red) is ", round((1-prob4)*100,2), "%")

# The conditional probabilities p(Alcohol_flag = High ∣ Type = White) and p(Alcohol_flag = Low ∣ Type = White).
prob5 = ct1['High']['White']/ct1['All']['White'] 
print("The conditional probabilities p(Alcohol_flag = High ∣ Type = White) is ", round(prob5*100, 2), "%")
print("The conditional probabilities p(Alcohol_flag = Low ∣ Type = White) is ", round((1-prob5)*100, 2), "%")

# The conditional probabilities p(Sugar_flag = High ∣ Type = Red) and p(Sugar_flag = Low ∣ Type = Red).
prob6 = ct2['High']['Red']/ct2['All']['Red']
print("The conditional probabilities p(Sugar_flag = High ∣ Type = Red) is ", round(prob6*100, 2), "%")
print("The conditional probabilities p(Sugar_flag = Low ∣ Type = Red) is ", round((1-prob6)*100, 2), "%")

# The conditional probabilities p(Sugar_flag = High ∣ Type = White) and p(Sugar_flag = Low ∣ Type = White).
prob7 = ct2['High']['White']/ct2['All']['White']
print("The conditional probabilities p(Sugar_flag = High ∣ Type = White) is ", round(prob7*100, 2), "%")
print("The conditional probabilities p(Sugar_flag = Low ∣ Type = White) is ", round((1-prob7)*100, 2), "%")

The prior probability of Type = Red is  50.0 %
The prior probability of Type = White is  50.0 %
The probability of high content is:  48.6 %
The probability of low content is:  51.4 %
The probability of high sugar content is:  41.6 %
The probability of low sugar content is:  41.6 %
The onditional probabilities p(Alcohol_flag = High ∣ Type = Red) is  43.6 %
The onditional probabilities p(Alcohol_flag = High ∣ Type = Red) is  56.4 %
The conditional probabilities p(Alcohol_flag = High ∣ Type = White) is  53.6 %
The conditional probabilities p(Alcohol_flag = Low ∣ Type = White) is  46.4 %
The conditional probabilities p(Sugar_flag = High ∣ Type = Red) is  23.2 %
The conditional probabilities p(Sugar_flag = Low ∣ Type = Red) is  76.8 %
The conditional probabilities p(Sugar_flag = High ∣ Type = White) is  60.0 %
The conditional probabilities p(Sugar_flag = Low ∣ Type = White) is  40.0 %
