## Part 4: Bayes rule
1. Implement the Bayes rule. Your implementation should meet the following requirements:
   - Do not use explicit loops over the test sample size (10,000 or 5,000). 
   - You are allowed to use loops over the number of centers (10 or 20), although you can avoid all loops.
2. Test your code with the test data you just generated. (Note that you do not need training data for the Bayes rule.) Report your results (on the test data) as a 2-by-2 table. 

The Bayes rule for binary classification (under the zero-one loss), as derived in class, is: predict $Y$ to be 1, if 

$$
P(Y = 1 \mid X = x) \ge P(Y = 0 \mid X=x), 
$$

or equivalently

$$ \frac{P(Y = 1 \mid X = x)}{P(Y = 0 \mid X=x)} \ge 1.$$

Following the data generation process, we have 
$$ \displaystyle  \frac{P(Y=1\mid X=x)}{P(Y=0\mid X=x)}=\frac{P(Y=1) \cdot P(X=x\mid Y=1)}{P(Y=0) \cdot P(X=x\mid Y=0)} $$
$$\displaystyle =\frac{(1/2)\cdot 10^{-1}\sum_{l=1}^{10}(2\pi s^2)^{-1}\exp\left(-\lVert\mathbf{x}-\mathbf{m}_{1l}\rVert^2/(2s^2)\right)}{(1/2)\cdot 10^{-1}\sum_{l=1}^{10}(2\pi s^2)^{-1}\exp\left(-\lVert\mathbf{x}-\mathbf{m}_{0l}\rVert^2/(2s^2)\right)} $$
$$\displaystyle =\frac{\sum_{l=1}^{10}\exp\left(-\lVert\mathbf{x}-\mathbf{m}_{1l}\rVert^2/(2s^2)\right)}{\sum_{l=1}^{10}\exp\left(-\lVert\mathbf{x}-\mathbf{m}_{0l}\rVert^2/(2s^2)\right)}. 
$$

In [19]:
def calculate_prob(data, mu):
    result = np.linalg.norm(data[:, None] - mu, axis=2) ** 2
    result = np.sum(np.exp(-result / (2 * s ** 2)), axis=1)
    return result

In [None]:
def bayes_rule(data, mu0, mu1):
    return np.where(calculate_prob(data, mu1) >= calculate_prob(data, mu0), 1, 0)

In [20]:
y_pred_bayes = np.where(calculate_prob(X_testing, mu_k1) >= calculate_prob(X_testing, mu_k0), 1, 0)

In [22]:
print(calc_confusion_matrix(y_pred_bayes, Y_testing))

[[3855  615]
 [1145 4385]]
