# week 2, session 2
# Decision theory

We assume the following imports

In [None]:
import math
import numpy as np
import seaborn as sns;
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy import special
from sklearn.metrics import confusion_matrix
from ipywidgets import interact,FloatSlider

# 1) Bayes' Theorem

Recall, that for real variables $x$ and labels $C_k = 1,...c$, Bayes' theorem gives:

$P(C_k|x) = \dfrac{p(x|C_k)P(C_k)}{p(x)}  = \dfrac{p(x|C_k)P(C_k)}{\sum_j p(x|C_j)P(C_j)}$

In this exercise session we will illustrate Bayes' theorem, by letting the class-conditional distributions be three normals with individual parameter sets. 

We define three univariate normal distributions $p_1, p_2, p_3$ to represent the three possible classes
 $C_1,C_2$ and $C_3$. The prior probabilities of the three classes are given by $P(C_1) = 0.4, P(C_2) = 0.3, P(C_3) = 0.3$. The distributions have different mean $\mu_1 = -2, \mu_2 = 0, \mu_3 = 2$ while same variance $\sigma^2_1 = \sigma^2_2 = \sigma^2_3 = 1$.
 
 ### Questions
 
 
$\star$ Plot the three class conditional density functions $P(x|C_1)$, $P(x|C_2)$ and $P(x|C_3)$ in the same figure, in the interval $-10 \leq x \leq 10$.
 
$\star$ Compute and plot the resulting density $p(x)$.

$\star$ Compute and plot the posterior probabilities $P(C_1|x), P(C_2|x), P(C_3|x)$ in the same figure.

### Hints

$\bullet$ You can use the scipy.stats function norm.pdf to get the value of the normal probability density function in $x$ as:

    px = norm.pdf(x,mean,sigma)
    
$\bullet$ $p(x)$ can be computed using the law of total probability.


In [None]:
#Plot densities P(x|c)

#compute and plot the density p(x)

#compute and plot posterior P(C|x)

# 2) Simulate samples from the class conditional distribution

In this exercise you will write a simulator that randomly draw samples from multiple normal distributions. We still consider the same three-class situation (with the same means, variances and priors as in the previous exercise).

Sampling from the class conditional distribution $p(x)$ can be simulated by a generative process with two steps:

- First: The class of the sample is determined randomly based on the prior probability for each class. 
- Second: The value of the sample is determined by a random draw from the normal distribution associated with the selected sample class.

### Questions
$\star$ Define a function to draw samples from the class conditional distribution, simulating the generative process. The function must return two values, the sample-class and the sample-value. 

$\star$ Draw 100,000 samples using the function and plot the histogram of the sample values in the same plot as the density $p(x)$.

$\star$ Comment on the resulting figure when you draw different number of samples.

### Hints

$\bullet$ Consider using the numpy function np.random.choice for simulating the first step. For instance, the following line would simulate would simulate flipping a fair coin with sample space $S=\{0,1 \}$:
   
    flip = np.random.choice(2,1,p=[.5,.5])[0]
    
$\bullet$ You can use the numpy function np.random.normal to simulate random draws from a normal distribution, as:

    sampleValue = np.random.normal(loc = mean, scale = sigma)
    
$\bullet$ If you draw the samples iteratively in a loop, you can store the individual sample results in predefined arrays, for instance initialized as:

    N = 100000;
    sampleValues = np.zeros(N);
    sampleClasses = np.zeros(N);

In [None]:
#define function to simulate draws from the class conditional distribution

#draw samples from the distribution

#plot histogram of samples together with the density p(x)

# 3) Decision boundaries
A decision rule, is a division of the space of $x$ so that each point is uniquely associated
with a single class $C_k$.
For instance, we can set up a 1D decision rule by dividing the space of $x$ into three intervals:
$I_1 =\; ] -\infty, d_1]\,, I_2 = \; ] d_1, d_2]\,, I_3 = \;] d_2, \infty [$.

The errors of a decision rule can be summarized by an error confusion matrix $R$.
In an experimental setting, the confusion matrix summarizes true class of the samples and what class they would have been assigned to according to the decision rule. In this way the confusion matrix can be used to evaluate the performance of the decision model, and is also known as an error matrix.

We still consider the same three-class situation (with the same means, variances and priors as in the previous exercise). 

In this exercise we will display confusion matrices, such that each row represents the instances of a predicted class while each column represents the instances of an actual class.
An element $R_{ij}$ in the matrix hence shows how many samples of class $i$ were assigned to class $j$ according to the decision rule. 

### Questions

$\star$ Plot the following two decision boundaries together with the posterior class probabilities.
    
$\quad d_1 = -3.8\;,\quad d_2 = 1$

$\star$ Use your simulator to draw $N = 100,000$ samples. Record both the sample values and sample classes ('actual class').  For each sample also record the 'predicted class' (determined according to the decision rules). Store these records in two individual arrays of length $N$.

$\star$ Explain and comment on the two confusion matrices, computed by the code (using the sklearn.metrics function confusion_matrix):
             
         CM = confusion_matrix(predictedClasses,sampleClasses)
         CM_normalized = CM / CM.sum()
       
### Hints
$\bullet$ The following code-snippets will plot the confusion matrices as heat-maps.
        



In [None]:
#define decision boundaries and plot together with the posterior class probabilities P(c|x)

#sample from the distribution using the simulator function

#compute and display confusion matrix
CM = confusion_matrix(predictedClasses,sampleClasses)
CM_normalized = CM / CM.sum()

#plot CM as heatmap
plt.title('confusion matrix (counts)')
sns.heatmap(CM,xticklabels=['true1','true2','true3'],yticklabels=['pred1','pred2','pred3'],annot=True,cmap="Reds", fmt = 'd')
plt.show()
#plot CM_normalized as heatmap
plt.title('empirical confusion matrix')
sns.heatmap(CM_normalized,xticklabels=['true1','true2','true3'],yticklabels=['pred1','pred2','pred3'],annot=True,cmap="Reds",fmt=".5f")
plt.show();

# 4) Decision process
Each element in the confusion matrix $R$ can be analytically computed as: $R_{jk} = \int_{I_j} p(x|C_k) dx$. 

We still consider the same three-class situation (with the same means, variances and priors as in the previous exercise). 

Execute the following codeblock. It will show an interactive widget. It allows you to change the decision boundries and priors for the three distributions, and will plot corresponding figures including the resulting analytically computed confusion matrix.

### Questions

$\star$ Comment on how the posterior class probability $P(c|x)$ and density $p(x)$ plot changes for different prior configurations.

$\star$ Can you create a situation in which a class has no points $x$ where it is most probable? How would such a situation influence a decision process?

$\star$ How does the colored area in the first subfigure relate to the classification error?

$\star$ Comment on how you can find an 'optimal' decision rule (by changing $d_1$ and $d_2$ and monitoring the confusion matrix).




In [None]:
#define means, variances and sigmas for the three normals 
means = [-2, 0, 2];
variances = [1, 1, 1];
sigmas = np.sqrt(variances);

#function to compute confusion matrix
def getConfusionMatrix(d1, d2, prior1,prior2,prior3):
    
    #compute the confusion matrix for intervals: ]-inf, d1] , ]d1, d2] , ]d2,inf[
    R = np.array([np.zeros(3),np.zeros(3),np.zeros(3)])    
    R[0,0] = norm(means[0],sigmas[0]).cdf(d1)    
    R[0,1] = norm(means[1],sigmas[1]).cdf(d1)
    R[0,2] = norm(means[2],sigmas[2]).cdf(d1)    
    R[1,0] = norm(means[0],sigmas[0]).cdf(d2) - norm(means[0],sigmas[0]).cdf(d1)
    R[1,1] = norm(means[1],sigmas[1]).cdf(d2) - norm(means[1],sigmas[1]).cdf(d1)
    R[1,2] = norm(means[2],sigmas[2]).cdf(d2) - norm(means[2],sigmas[2]).cdf(d1)
    R[2,0] = 1-norm(means[0],sigmas[0]).cdf(d2)    
    R[2,1] = 1-norm(means[1],sigmas[1]).cdf(d2)
    R[2,2] = 1-norm(means[2],sigmas[2]).cdf(d2)        

    return R * np.array([prior1,prior2,prior3]);

#function to draw posterior class probabilites, decision boundaries and confusion matrix in same plot
def drawDecisionProcess(d1,d2,prior1,prior2,prior3):    
    
    #normalize the priors (to sum to 1)
    prior_sum = prior1 + prior2 + prior3    
    prior1 = prior1/prior_sum;
    prior2 = prior2/prior_sum;
    prior3 = prior3/prior_sum;    
    print('P(C1) = ',prior1)
    print('P(C2) = ',prior2)
    print('P(C3) = ',prior3)
    
    x = np.linspace(-10,10, 100)
    px_c1 = norm.pdf(x,means[0],sigmas[0]);
    px_c2 = norm.pdf(x,means[1],sigmas[1]);
    px_c3 = norm.pdf(x,means[2],sigmas[2]);

    fig = plt.figure(num=None, figsize=(26, 12), dpi=80, facecolor='w', edgecolor='k')
    
    ax1 = fig.add_subplot(221) 
    ax1.set_title('joint probability distributions p(x|c)*P(c)')
    ax1.plot(x, prior1 * px_c1)
    ax1.plot(x, prior2 * px_c2)
    ax1.plot(x, prior3 * px_c3)
    ax1.plot([d1,d1],[0,0.16],'k')
    ax1.plot([d2,d2],[0,0.16],'k')
    
    
    
    fill_style = {'color': 'r', 'alpha': 0.1}
    xmistake = np.linspace(d1,10,50)
    ax1.fill_between(xmistake,np.zeros_like(xmistake),prior1 * norm.pdf(xmistake,means[0],sigmas[0]),**fill_style)
    
    xmistake = np.linspace(-10,d1,50)
    ax1.fill_between(xmistake,np.zeros_like(xmistake),prior2 * norm.pdf(xmistake,means[1],sigmas[1]),**fill_style)
    xmistake = np.linspace(d2,10,50)
    ax1.fill_between(xmistake,np.zeros_like(xmistake),prior2 * norm.pdf(xmistake,means[1],sigmas[1]),**fill_style)
    
    xmistake = np.linspace(-10,d2,50)
    ax1.fill_between(xmistake,np.zeros_like(xmistake),prior3 * norm.pdf(xmistake,means[2],sigmas[2]),**fill_style)
    
    #plot P(c|x)
    px = px_c1 * prior1 + px_c2 * prior2 + px_c3 * prior3;
    P1x = px_c1*prior1/px;
    P2x = px_c2*prior2/px;
    P3x = px_c3*prior3/px;
    
    ax2 = fig.add_subplot(222)
    ax2.set_title('posterior class probabilities P(c|x)')
    ax2.plot(x,P1x)
    ax2.plot(x,P2x)
    ax2.plot(x,P3x)
    ax2.plot([d1,d1],[0,1],'k')
    ax2.plot([d2,d2],[0,1],'k')
    
    #plot p(x)
    ax3 = fig.add_subplot(223) 
    ax3.set_title('density p(x)')
    ax3.plot(x, px)
     
    #plot confusion matrix
    R = getConfusionMatrix(d1,d2,prior1,prior2,prior3);
    
    ax4 = fig.add_subplot(224) 
    
    sns.heatmap(R,xticklabels=['true1','true2','true3'],yticklabels=['pred1','pred2','pred3'],annot=True,fmt=".5f",cmap="Reds")
    ax4.set_title('confusion matrix')
    
interact(drawDecisionProcess, 
	d1=FloatSlider(min = -10.0, max = 0.0, value = -3.8, continuous_update=False),
	d2 = FloatSlider(min = 0.0, max = 10, value = 1.0, continuous_update = False),
	prior1 = FloatSlider(min = 0.0, max = 1, step = .01, value = 0.4, continuous_update = False),
	prior2 = FloatSlider(min = 0.0, max = 1, step = .01, value = 0.3, continuous_update = False),
	prior3 = FloatSlider(min = 0.0, max = 1, step = .01, value = 0.3, continuous_update = False),
);    