# General Instructions to students:

1. There are 5 types of cells in this notebook. The cell type will be indicated within the cell.
    1. Markdown cells with problem written in it. (DO NOT TOUCH THESE CELLS) (**Cell type: TextRead**)
    2. Python cells with setup code for further evaluations. (DO NOT TOUCH THESE CELLS) (**Cell type: CodeRead**)
    3. Python code cells with some template code or empty cell. (FILL CODE IN THESE CELLS BASED ON INSTRUCTIONS IN CURRENT AND PREVIOUS CELLS) (**Cell type: CodeWrite**)
    4. Markdown cells where a written reasoning or conclusion is expected. (WRITE SENTENCES IN THESE CELLS) (**Cell type: TextWrite**)
    5. Temporary code cells for convenience and TAs. (YOU MAY DO WHAT YOU WILL WITH THESE CELLS, TAs WILL REPLACE WHATEVER YOU WRITE HERE WITH OFFICIAL EVALUATION CODE) (**Cell type: Convenience**)
    
2. You are not allowed to insert new cells in the submitted notebook.

3. You are not allowed to import any extra packages.

4. The code is to be written in Python 3.6 syntax. Latest versions of other packages maybe assumed.

5. In CodeWrite Cells, the only outputs to be given are plots asked in the question. Nothing else to be output/print. 

6. If TextWrite cells ask you to give accuracy/error/other numbers you can print them on the code cells, but remove the print statements before submitting.

7. The convenience code can be used to check the expected syntax of the functions. At a minimum, your entire notebook must run with "run all" with the convenience cells as it is. Any runtime failures on the submitted notebook as it is will get zero marks.

8. All code must be written by yourself. Copying from other students/material on the web is strictly prohibited. Any violations will result in zero marks. 

9. You may discuss broad ideas with friends, but all code must be written by yourself.

9. All datasets will be given as .npz files, and will contain data in 4 numpy arrays :"X_train, Y_train, X_test, Y_test". In that order. The meaning of the 4 arrays can be easily inferred from their names.

10. All plots must be labelled properly, all tables must have rows and columns named properly.

11. Plotting the data and prediction is highly encouraged for debugging. But remove debugging/understanding code before submitting.


In [2]:
# Cell type : CodeRead




**Cell type : TextRead**

# Problem 1: Learning Binary Bayes Classifiers from data with Max. Likelihood 

Derive Bayes classifiers under assumptions below, and use ML estimators to compute and return the results on a test set. 

1a) Assume $X|Y=-1 \sim \mathcal{N}(\mu_-, I)$ and  $X|Y=1 \sim \mathcal{N}(\mu_+, I)$. *(Same known covariance)*

1b) Assume $X|Y=-1 \sim \mathcal{N}(\mu_-, \Sigma)$ and $X|Y=1 \sim \mathcal{N}(\mu_+, \Sigma)$ *(Same unknown covariance)*

1c) Assume $X|Y=-1 \sim \mathcal{N}(\mu_-, \Sigma_-)$ and $X|Y=1 \sim \mathcal{N}(\mu_+, \Sigma_+)$ *(different unknown covariance)*




In [3]:
import matplotlib as plt
import math

def probability_function(data_point,mu,sigma):
    x_minus_mu = np.subtract(data_point,mu)
    x_minus_mu_transpose = np.subtract(data_point,mu).T
    sigma_inverse = np.linalg.inv(sigma)
    product = np.dot(x_minus_mu_transpose,sigma_inverse)
    liklihood =(1.0/(np.sqrt(np.linalg.det(sigma))))*math.exp(-0.5*np.dot(product,x_minus_mu ))
    return liklihood


def Bayes1a(X_train, Y_train, X_test):
    x_train_pos = []
    x_train_neg = []
    
    for (b, c) in zip(X_train, Y_train):
        
        if c == 1:
            x_train_pos.append(b)
        else:
            x_train_neg.append(b)
            
    x_train_pos_mean_vector = np.mean(x_train_pos, axis=0)
    
    x_train_neg_mean_vector = np.mean(x_train_neg, axis=0)
    
    prior_for_class_plus_one = float(len(x_train_pos))/float(len(X_train))
    
    prior_for_class_minus_one = float(len(x_train_neg))/float(len(X_train))
    
    sigma = np.identity(len(X_train[0]))

    Y_test_pred = []
    
    for sample in np.array(X_test):
        probability_for_class_plus_one = probability_function(sample,x_train_pos_mean_vector,
                                                              sigma)*prior_for_class_plus_one
        
        probability_for_class_minus_one = probability_function(sample,x_train_neg_mean_vector,
                                                              sigma)*prior_for_class_minus_one
        
        if probability_for_class_plus_one >= probability_for_class_minus_one :
            Y_test_pred.append(1)
            
        else:
            Y_test_pred.append(-1)
            
    return Y_test_pred

    """ Give prediction for test instance using assumption 1a.

    
    Arguments:
    X_train: numpy array of shape (n,d)
    Y_train: +1/-1 numpy array of shape (n,)
    X_test : numpy array of shape (m,d)

    Returns:
    Y_test_pred : +1/-1 numpy array of shape (m,)
    
    """




def Bayes1b(X_train, Y_train, X_test):
    """ Give prediction for test instance using assumption 1b.

    Arguments:
    X_train: numpy array of shape (n,d)
    Y_train: +1/-1 numpy array of shape (n,)
    X_test : numpy array of shape (m,d)

    Returns:
    Y_test_pred : +1/-1 numpy array of shape (m,)
    
    """
    x_train_pos = []
    x_train_neg = []
    
    for (b, c) in zip(X_train, Y_train):
        
        if c == 1:
            x_train_pos.append(b)
        else:
            x_train_neg.append(b)
            
    x_train_pos_mean_vector = np.mean(x_train_pos, axis=0)
    x_train_neg_mean_vector = np.mean(x_train_neg, axis=0)
    
    prior_for_class_plus_one = float(len(x_train_pos))/float(len(X_train))
    prior_for_class_minus_one = float(len(x_train_neg))/float(len(X_train))
    
    sigma = np.cov(X_train.T)
    

    Y_test_pred = []
    
    for sample in np.array(X_test):
        probability_for_class_plus_one = probability_function(sample, x_train_pos_mean_vector,
                                                              sigma) * prior_for_class_plus_one
        
        probability_for_class_minus_one = probability_function(sample, x_train_neg_mean_vector,
                                                               sigma) * prior_for_class_minus_one
        
        if probability_for_class_plus_one >= probability_for_class_minus_one:
            Y_test_pred.append(1)
        else:
            Y_test_pred.append(-1)
            
    return Y_test_pred


def Bayes1c(X_train, Y_train, X_test):
    """ Give prediction for test instance using assumption 1c.

    Arguments:
    X_train: numpy array of shape (n,d)
    Y_train: +1/-1 numpy array of shape (n,)
    X_test : numpy array of shape (m,d)

    Returns:
    Y_test_pred : +1/-1 numpy array of shape (m,)
    
    """
    x_train_pos = []
    x_train_neg = []
    
    for (b, c) in zip(X_train, Y_train):
        if c == 1:
            x_train_pos.append(b)
        else:
            x_train_neg.append(b)
            
    x_train_pos_mean_vector = np.mean(x_train_pos, axis=0)
    x_train_neg_mean_vector = np.mean(x_train_neg, axis=0)
    
    prior_for_class_plus_one = float(len(x_train_pos))/float(len(X_train))
    prior_for_class_minus_one = float(len(x_train_neg))/float(len(X_train))
    
    sigma_positive = np.cov(np.array(x_train_pos).T)
    sigma_negative = np.cov(np.array(x_train_neg).T)


    Y_test_pred = []
    for sample in np.array(X_test):
        
        probability_for_class_plus_one = probability_function(sample, x_train_pos_mean_vector, 
                                                              sigma_positive) * prior_for_class_plus_one
        
        probability_for_class_minus_one = probability_function(sample, x_train_neg_mean_vector,
                                                               sigma_negative) * prior_for_class_minus_one
        
        if probability_for_class_plus_one >= probability_for_class_minus_one:
            Y_test_pred.append(1)
        else:
            Y_test_pred.append(-1)
            
    return Y_test_pred



In [4]:
import numpy as np
# Cell type : Convenience

# Testing the functions above

# To TAs: Replace this cell with the testing cell developed.

# To students: You may use the example here for testing syntax issues 
# with your functions, and also as a sanity check. But the final evaluation
# will be done for different inputs to the functions. (So you can't just 
# solve the problem for this one example given below.) 


# X_train_pos = np.random.randn(1000,2)+np.array([[1.,2.]])
# X_train_neg = np.random.randn(1000,2)+np.array([[2.,4.]])
# X_train = np.concatenate((X_train_pos, X_train_neg), axis=0)
# Y_train = np.concatenate(( np.ones(1000), -1*np.ones(1000) ))
# X_test_pos = np.random.randn(1000,2)+np.array([[1.,2.]])
# X_test_neg = np.random.randn(1000,2)+np.array([[2.,4.]])
# X_test = np.concatenate((X_test_pos, X_test_neg), axis=0)
# Y_test = np.concatenate(( np.ones(1000), -1*np.ones(1000) ))

# Y_pred_test_1a = Bayes1a(X_train, Y_train, X_test)
# Y_pred_test_1b = Bayes1b(X_train, Y_train, X_test)
# Y_pred_test_1c = Bayes1c(X_train, Y_train, X_test)





**Cell type : TextRead**

# Problem 1

1d) Run the above three algorithms (Bayes1a,1b and 1c), for the three datasets given (dataset1_1.npz, dataset1_2.npz, dataset1_3.npz) in the cell below.

In the next CodeWrite cell, Plot all the classifiers (3 classification algos on 3 datasets = 9 plots) on a 2d plot (color the positively classified area light green, and negatively classified area light red). Add the training data points also on the plot. Plots to be organised into 3 plots follows: One plot for each dataset, with three subplots in each for the three classifiers. Label the 9 plots appropriately. 

In the next Textwrite cell, summarise (use the plots of the data and the assumptions in the problem to explain) your observations regarding the six learnt classifiers, and also give the error rate of the three classifiers on the three datasets as 3x3 table, with appropriately named rows and columns.


In [5]:
# Cell type : CodeWrite
import matplotlib.pyplot as plt
import numpy as np
# write the code for loading the data, running the three algos, and plotting here. 
# (Use the functions written previously.)
with np.load('dataset1_1.npz') as data:
    X_train_one = data['arr_0']
    Y_train_one = data['arr_1']
    X_test_one = data['arr_2']
    Y_test_one = data['arr_3']

y_predicted11a = Bayes1a(X_train_one,Y_train_one,X_test_one)
y_predicted11b = Bayes1b(X_train_one,Y_train_one,X_test_one)
y_predicted11c = Bayes1c(X_train_one,Y_train_one,X_test_one)


plt.figure(1,figsize=(10,5))

plt.subplot(131)

length_of_dataset_one = len(X_test_one)

for data in range(length_of_dataset_one):
    
    if (y_predicted11a[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='tomato', marker='o' )
    
    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='tomato', marker='o' )
    
    
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 1a",fontsize=10)
        
    
    
plt.subplot(132)

for data in range(length_of_dataset_one):
    
    if (y_predicted11b[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='tomato', marker='o' )   

    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='tomato', marker='o' )
        
        
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 1b ",fontsize=10)
        
    
plt.subplot(133)

for data in range(length_of_dataset_one):
    
    if (y_predicted11c[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='tomato', marker='o' ) 
    
    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='tomato', marker='o' )
    
        
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 1c ",fontsize=10)

plt.show()

##############################################################
with np.load('dataset1_2.npz') as data:
    X_train_two = data['arr_0']
    Y_train_two = data['arr_1']
    X_test_two= data['arr_2']
    Y_test_two= data['arr_3']

y_predicted12a = Bayes1a(X_train_two,Y_train_two,X_test_two)
y_predicted12b = Bayes1b(X_train_two,Y_train_two,X_test_two)
y_predicted12c = Bayes1c(X_train_two,Y_train_two,X_test_two)

    
plt.figure(2,figsize=(10,5))
plt.subplot(131)

length_of_dataset_two = len(X_test_two)

for data in range(length_of_dataset_two):
    
    if (y_predicted12a[data] == 1):
        plt.scatter(X_test_two[data][0], X_test_two[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_two[data][0], X_test_two[data][1], c='tomato', marker='o' )
               
    if (Y_train_two[data] == 1):
        plt.scatter(X_train_two[data][0], X_train_two[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_two[data][0], X_train_two[data][1], c='tomato', marker='o' )
        
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2a ",fontsize=10)

plt.subplot(132)

for data in range(length_of_dataset_two):
    
    if (y_predicted12b[data] == 1):
        plt.scatter(X_test_two[data][0], X_test_two[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_two[data][0], X_test_two[data][1], c='tomato', marker='o' ) 
        
    if (Y_train_two[data] == 1):
        plt.scatter(X_train_two[data][0], X_train_two[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_two[data][0], X_train_two[data][1], c='tomato', marker='o' )
        
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2b ",fontsize=10)

plt.subplot(133)

for data in range(length_of_dataset_two):
    
    if (y_predicted12c[data] == 1):
        plt.scatter(X_test_two[data][0], X_test_two[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_two[data][0], X_test_two[data][1], c='tomato', marker='o' ) 
        
        
    if (Y_train_two[data] == 1):
        plt.scatter(X_train_two[data][0], X_train_two[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_two[data][0], X_train_two[data][1], c='tomato', marker='o' )

plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2c",fontsize=10)

plt.show()


###################################################

with np.load('dataset1_3.npz') as data:
    X_train_three = data['arr_0']
    Y_train_three = data['arr_1']
    X_test_three = data['arr_2']
    Y_test_three = data['arr_3'] 

y_predicted13a = Bayes1a(X_train_three,Y_train_three,X_test_three)
y_predicted13b = Bayes1b(X_train_three,Y_train_three,X_test_three)
y_predicted13c = Bayes1c(X_train_three,Y_train_three,X_test_three)
    
plt.figure(3,figsize=(10,5))

plt.subplot(131)

length_of_dataset_three = len(X_test_three)

for data in range(length_of_dataset_three):
    
    if (y_predicted13a[data] == 1):
        plt.scatter(X_test_three[data][0], X_test_three[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_three[data][0], X_test_three[data][1], c='tomato', marker='o' )   
    
    if (Y_train_three[data] == 1):
        plt.scatter(X_train_three[data][0], X_train_three[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_three[data][0], X_train_three[data][1], c='tomato', marker='o' )
        
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 1a",fontsize=10)

plt.subplot(132)

for data in range(length_of_dataset_three):
    
    if (y_predicted13b[data] == 1):
        plt.scatter(X_test_three[data][0], X_test_three[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_three[data][0], X_test_three[data][1], c='tomato', marker='o' ) 
        
    if (Y_train_three[data] == 1):
        plt.scatter(X_train_three[data][0], X_train_three[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_three[data][0], X_train_three[data][1],c='tomato', marker='o' )
        
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2b ",fontsize=10)

plt.subplot(133)

for data in range(length_of_dataset_three):
    
    if (y_predicted13c[data] == 1):
        plt.scatter(X_test_three[data][0], X_test_three[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_test_three[data][0], X_test_three[data][1], c='tomato', marker='o' ) 
        
    if (Y_train_three[data] == 1):
        plt.scatter(X_train_three[data][0], X_train_three[data][1], c='lightgreen', marker='o' )
        
    else:
        plt.scatter(X_train_three[data][0], X_train_three[data][1], c='tomato', marker='o' )
        
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 3c ",fontsize=10)

plt.show()



<Figure size 1000x500 with 3 Axes>

KeyboardInterrupt: 

** Cell type : TextWrite ** 
(Write your observations and table of errors here)




** Cell type : TextRead ** 


# Problem 2 : Learning Multiclass Bayes Classifiers from data with Max. Likeli.

Derive Bayes classifiers under assumptions below, and use ML estimators to compute and return the results on a test set. The $4\times 4$ loss matrix giving the loss incurred for predicting $i$ when truth is $j$ is below.

$L=\begin{bmatrix} 0 &1 & 2& 3\\ 1 &0 & 1& 2\\ 2 &1 & 0& 1\\ 3 &2 & 1& 0 \end{bmatrix}$ 

2a) Assume $X|Y=a$ is distributed as Normal with mean $\mu_a$ and variance $I$.

2b) Assume $X|Y=a$ is distributed as Normal with mean $\mu_a$ and variance $\Sigma$.

2c) Assume $X|Y=a$ is distributed as Normal with mean $\mu_a$ and variance $\Sigma_a$.



In [None]:
# Cell type : CodeWrite
# Fill in functions in this cell

import  numpy as np
def Bayes2a(X_train, Y_train, X_test):
    """ Give Bayes classifier prediction for test instances 
    using assumption 2a.

    Arguments:
    X_train: numpy array of shape (n,d)
    Y_train: {1,2,3,4} numpy array of shape (n,)
    X_test : numpy array of shape (m,d)

    Returns:
    Y_test_pred : {1,2,3,4} numpy array of shape (m,)
    
    """
    loss_matrix = [[0,1,2,3],
                   [1,0,1,2],
                   [2,1,0,1],
                   [3,2,1,0]]
    
    x_train_class_1 = []
    x_train_class_2 = []
    x_train_class_3 = []
    x_train_class_4 = []
    
    for (b, c) in zip(X_train, Y_train):
        if c == 1:
            x_train_class_1.append(b)
        elif c == 2:
            x_train_class_2.append(b)
        elif c == 3:
            x_train_class_3.append(b)
        elif c == 4:
            x_train_class_4.append(b)
            
    x_train_class_1_mean = np.mean(x_train_class_1, axis=0)
    
    x_train_class_2_mean = np.mean(x_train_class_2, axis=0)
    
    x_train_class_3_mean = np.mean(x_train_class_3, axis=0)
    x_train_class_4_mean = np.mean(x_train_class_4, axis=0)
    
    prior_for_class_1 = float(len(x_train_class_1)) / float(len(X_train))
    
    prior_for_class_2 = float(len(x_train_class_2)) / float(len(X_train))
    
    prior_for_class_3 = float(len(x_train_class_3)) / float(len(X_train))
    prior_for_class_4 = float(len(x_train_class_4)) / float(len(X_train))
    
    covariance_for_data = np.identity(len(X_train[0]))
    
    Y_test_pred = []
    
    for test_data in np.array(X_test):
        
        probability_vector = []
        
        probability_for_class_1 = probability_function(test_data, x_train_class_1_mean,
                                                       covariance_for_data)*prior_for_class_1
        
        probability_for_class_2 = probability_function(test_data, x_train_class_2_mean,
                                                       covariance_for_data)*prior_for_class_2
        
        probability_for_class_3 = probability_function(test_data, x_train_class_3_mean,
                                                       covariance_for_data)*prior_for_class_3
        
        probability_for_class_4 = probability_function(test_data, x_train_class_4_mean,
                                                       covariance_for_data)*prior_for_class_4
        
        probability_vector.append( probability_for_class_1)
        
        probability_vector.append(probability_for_class_2 )
        
        probability_vector.append(probability_for_class_3 )
        
        probability_vector.append( probability_for_class_4)
       
        dot_product = np.dot( probability_vector,loss_matrix)

        minimum_value = np.argmin(dot_product, axis=0 )
        
        Y_test_pred.append(minimum_value+1)
        
    return Y_test_pred
    #sigma = np.cov(X_train.T)
    
    
def Bayes2b(X_train, Y_train, X_test):
    """ Give Bayes classifier prediction for test instances 
    using assumption 2b.

    Arguments:
    X_train: numpy array of shape (n,d)
    Y_train: {1,2,3,4} numpy array of shape (n,)
    X_test : numpy array of shape (m,d)

    Returns:
    Y_test_pred : {1,2,3,4} numpy array of shape (m,)
    
    """
    loss_matrix = [[0,1,2,3],
                   [1,0,1,2],
                   [2,1,0,1],
                   [3,2,1,0]]
    
    x_train_class_1 = []
    x_train_class_2 = []
    x_train_class_3 = []
    x_train_class_4 = []
    
    for (b, c) in zip(X_train, Y_train):
        if c == 1:
            x_train_class_1.append(b)
        elif c == 2:
            x_train_class_2.append(b)
        elif c == 3:
            x_train_class_3.append(b)
        elif c == 4:
            x_train_class_4.append(b)
            
    x_train_class_1_mean = np.mean(x_train_class_1, axis=0)
    x_train_class_2_mean = np.mean(x_train_class_2, axis=0)
    
    x_train_class_3_mean = np.mean(x_train_class_3, axis=0)
    
    x_train_class_4_mean = np.mean(x_train_class_4, axis=0)
    
    prior_for_class_1 = float(len(x_train_class_1)) / float(len(X_train))
    
    prior_for_class_2 = float(len(x_train_class_2)) / float(len(X_train))
           
    prior_for_class_3 = float(len(x_train_class_3)) / float(len(X_train))
    prior_for_class_4 = float(len(x_train_class_4)) / float(len(X_train))
    
    covariance_for_data = np.cov(np.array(X_train).T)
    
    Y_test_pred = []
    
    for test_data in np.array(X_test):
        
        probability_vector = []
        
        probability_for_class_1 = probability_function(test_data, x_train_class_1_mean,
                                                       covariance_for_data)*prior_for_class_1
        
        probability_for_class_2 = probability_function(test_data, x_train_class_2_mean,
                                                       covariance_for_data)*prior_for_class_2
        
        probability_for_class_3 = probability_function(test_data, x_train_class_3_mean,
                                                       covariance_for_data)*prior_for_class_3
        
        probability_for_class_4 = probability_function(test_data, x_train_class_4_mean,
                                                       covariance_for_data)*prior_for_class_4
        
        probability_vector.append(probability_for_class_1)
        
        probability_vector.append(probability_for_class_2)
        
        probability_vector.append(probability_for_class_3)
        
        probability_vector.append(probability_for_class_4)
       
        dot_product = np.dot(probability_vector,loss_matrix)

        minimum_value = np.argmin(dot_product, axis=0)
        
        Y_test_pred.append(minimum_value+1)
        
    return Y_test_pred

def Bayes2c(X_train, Y_train, X_test):
    """ Give Bayes classifier prediction for test instances 
    using assumption 2c.

    Arguments:
    X_train: numpy array of shape (n,d)
    Y_train: {1,2,3,4} numpy array of shape (n,)
    X_test : numpy array of shape (m,d)

    Returns:
    Y_test_pred : {1,2,3,4} numpy array of shape (m,)
    
    """
    loss_matrix = [[0,1,2,3],
                   [1,0,1,2],
                   [2,1,0,1],
                   [3,2,1,0]]
    
    x_train_class_1 = []
    x_train_class_2 = []
    x_train_class_3 = []
    x_train_class_4 = []
    
    for (b, c) in zip(X_train, Y_train):
        if c == 1:
            x_train_class_1.append(b)
        elif c == 2:
            x_train_class_2.append(b)
        elif c == 3:
            x_train_class_3.append(b)
        elif c == 4:
            x_train_class_4.append(b)
            
    x_train_class_1_mean = np.mean(x_train_class_1, axis=0)
    
    x_train_class_2_mean = np.mean(x_train_class_2, axis=0)
    x_train_class_3_mean = np.mean(x_train_class_3, axis=0)
    
    x_train_class_4_mean = np.mean(x_train_class_4, axis=0)
    
    prior_for_class_1 = float(len(x_train_class_1)) / float(len(X_train))
    
    prior_for_class_2 = float(len(x_train_class_2)) / float(len(X_train))
    prior_for_class_3 = float(len(x_train_class_3)) / float(len(X_train))
    prior_for_class_4 = float(len(x_train_class_4)) / float(len(X_train))
    
    covariance_for_data_of_class_1 = np.cov(np.array(x_train_class_1).T)
    
    covariance_for_data_of_class_2 = np.cov(np.array(x_train_class_2).T)
    
    covariance_for_data_of_class_3 = np.cov(np.array(x_train_class_3).T)
    
    
    covariance_for_data_of_class_4 = np.cov(np.array(x_train_class_4).T)
    
    
    Y_test_pred = []
    
    for test_data in np.array(X_test):
        probability_vector = []
        
        probability_for_class_1 = probability_function(test_data, x_train_class_1_mean,
                                                       covariance_for_data_of_class_1)*prior_for_class_1
        
        probability_for_class_2 = probability_function(test_data, x_train_class_2_mean,
                                                       covariance_for_data_of_class_2)*prior_for_class_2
        
        probability_for_class_3 = probability_function(test_data, x_train_class_3_mean,
                                                       covariance_for_data_of_class_3)*prior_for_class_3
        
        probability_for_class_4 = probability_function(test_data, x_train_class_4_mean,
                                                       covariance_for_data_of_class_4)*prior_for_class_4
        
        probability_vector.append(probability_for_class_1)
        
        probability_vector.append(probability_for_class_2)
        
        probability_vector.append(probability_for_class_3)
        
        probability_vector.append(probability_for_class_4)
       
        dot_product = np.dot(probability_vector,loss_matrix)

        minimum_value = np.argmin(dot_product, axis=0)
        Y_test_pred.append(minimum_value+1)
    #print(Y_test_pred)
    return Y_test_pred

    
    


    


In [None]:
# Cell type : Convenience

# Testing the functions above

# Data 1
import numpy as np

# mat1=np.array([[1.,0.],[0.,1.]])
# mat2=np.array([[1.,0.],[0.,1.]])
# mat3=np.array([[1.,0.],[0.,1.]])
# mat4=np.array([[1.,0.],[0.,1.]])

# X_train_1 = np.dot(np.random.randn(1000,2), mat1)+np.array([[0.,0.]])
# X_train_2 = np.dot(np.random.randn(1000,2), mat2)+np.array([[0.,2.]])
# X_train_3 = np.dot(np.random.randn(1000,2), mat3)+np.array([[2.,0.]])
# X_train_4 = np.dot(np.random.randn(1000,2), mat4)+np.array([[2.,2.]])
# 
# X_train = np.concatenate((X_train_1, X_train_2, X_train_3, X_train_4), axis=0)
# Y_train = np.concatenate(( np.ones(1000), 2*np.ones(1000), 3*np.ones(1000), 4*np.ones(1000) ))
# 
# 
# X_test_1 = np.dot(np.random.randn(1000,2), mat1)+np.array([[0.,0.]])
# X_test_2 = np.dot(np.random.randn(1000,2), mat2)+np.array([[0.,2.]])
# X_test_3 = np.dot(np.random.randn(1000,2), mat3)+np.array([[2.,0.]])
# X_test_4 = np.dot(np.random.randn(1000,2), mat4)+np.array([[2.,2.]])
# 
# X_test = np.concatenate((X_test_1, X_test_2, X_test_3, X_test_4), axis=0)
# Y_test = np.concatenate(( np.ones(1000), 2*np.ones(1000), 3*np.ones(1000), 4*np.ones(1000) ))


#Y_pred_test_2a = Bayes2a(X_train, Y_train, X_test)
#Y_pred_test_2b = Bayes2b(X_train, Y_train, X_test)
#Y_pred_test_2c = Bayes2c(X_train, Y_train, X_test)

**Cell type : TextRead**

# Problem 2

2d) Run the above three algorithms (Bayes2a,2b and 2c), for the two datasets given (dataset2_1.npz, dataset2_2.npz) in the cell below.

In the next CodeWrite cell, Plot all the classifiers (3 classification algos on 2 datasets = 6 plots) on a 2d plot (color the 4 areas classified as 1,2,3 and 4 differently). Add the training data points also on the plot. Plots to be organised as follows: One plot for each dataset, with three subplots in each for the three classifiers. Label the 6 plots appropriately. 

In the next Textwrite cell, summarise your observations regarding the six learnt classifiers. Give the *expected loss* (use the Loss matrix given in the problem.) of the three classifiers on the two datasets as 2x3 table, with appropriately named rows and columns. Also, give the 4x4 confusion matrix of the final classifier for all three algorithms and both datasets. 


In [None]:
# Cell type : CodeWrite
# write the code for loading the data, running the three algos, and plotting here. 
# (Use the functions written previously.)
import numpy as np

with np.load('dataset2_1.npz') as data:
    X_train_one = data['arr_0']
    Y_train_one = data['arr_1']
    X_test_one = data['arr_2']
    Y_test_one = data['arr_3']

y_predicted21a = Bayes2a(X_train_one,Y_train_one,X_test_one)
y_predicted21b = Bayes2b(X_train_one,Y_train_one,X_test_one)
y_predicted21c = Bayes2c(X_train_one,Y_train_one,X_test_one)


plt.figure(4,figsize=(10,5))

plt.subplot(131)

length_of_dataset_one = len(X_test_one)

for data in range(length_of_dataset_one):
    
    if (y_predicted21a[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='r', marker='o' )
    
    elif (y_predicted21a[data] == 2):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='g', marker='o' )
    
    elif (y_predicted21a[data] == 3):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='b', marker='o' )
    
    elif (y_predicted21a[data] == 4):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='y', marker='o' )
        
for data in range(length_of_dataset_one):
    
    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='r', marker='o' )
        
    elif (Y_train_one[data] == 2):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='g', marker='o' )
    
    elif (Y_train_one[data] == 3):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='b', marker='o' )
    
    elif (Y_train_one[data] == 4):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='y', marker='o' )
    
    
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2a",fontsize=10)
        
    
    
plt.subplot(132)

length_of_dataset_one = len(X_test_one)

for data in range(length_of_dataset_one):
    
    if (y_predicted21b[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='r', marker='o' )
    
    elif (y_predicted21b[data] == 2):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='g', marker='o' )
    
    elif (y_predicted21b[data] == 3):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='b', marker='o' )
    
    elif (y_predicted21b[data] == 4):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='y', marker='o' )
        
for data in range(length_of_dataset_one):
    
    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='r', marker='o' )
        
    elif (Y_train_one[data] == 2):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='g', marker='o' )
    
    elif (Y_train_one[data] == 3):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='b', marker='o' )
    
    elif (Y_train_one[data] == 4):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='y', marker='o' )
    
    
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2b",fontsize=10)
        
    
plt.subplot(133)

length_of_dataset_one = len(X_test_one)

for data in range(length_of_dataset_one):
    
    if (y_predicted21c[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='r', marker='o' )
    
    elif (y_predicted21c[data] == 2):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='g', marker='o' )
    
    elif (y_predicted21c[data] == 3):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='b', marker='o' )
    
    elif (y_predicted21c[data] == 4):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='y', marker='o' )
        
for data in range(length_of_dataset_one):
    
    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='r', marker='o' )
        
    elif (Y_train_one[data] == 2):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='g', marker='o' )
    
    elif (Y_train_one[data] == 3):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='b', marker='o' )
    
    elif (Y_train_one[data] == 4):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='y', marker='o' )
    
    
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2c",fontsize=10)

plt.show()

##############################################################

with np.load('dataset2_2.npz') as data:
    X_train_one = data['arr_0']
    Y_train_one = data['arr_1']
    X_test_one = data['arr_2']
    Y_test_one = data['arr_3']

y_predicted22a = Bayes2a(X_train_one,Y_train_one,X_test_one)
y_predicted22b = Bayes2b(X_train_one,Y_train_one,X_test_one)
y_predicted22c = Bayes2c(X_train_one,Y_train_one,X_test_one)


plt.figure(4,figsize=(10,5))

plt.subplot(131)

length_of_dataset_one = len(X_test_one)

for data in range(length_of_dataset_one):
    
    if (y_predicted22a[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='r', marker='o' )
    
    elif (y_predicted22a[data] == 2):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='g', marker='o' )
    
    elif (y_predicted22a[data] == 3):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='b', marker='o' )
    
    elif (y_predicted22a[data] == 4):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='y', marker='o' )
        
for data in range(length_of_dataset_one):
    
    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='r', marker='o' )
        
    elif (Y_train_one[data] == 2):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='g', marker='o' )
    
    elif (Y_train_one[data] == 3):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='b', marker='o' )
    
    elif (Y_train_one[data] == 4):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='y', marker='o' )
    
    
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2a",fontsize=10)
        
    
    
plt.subplot(132)

length_of_dataset_one = len(X_test_one)

for data in range(length_of_dataset_one):
    
    if (y_predicted22b[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='r', marker='o' )
    
    elif (y_predicted22b[data] == 2):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='g', marker='o' )
    
    elif (y_predicted22b[data] == 3):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='b', marker='o' )
    
    elif (y_predicted22b[data] == 4):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='y', marker='o' )
        
for data in range(length_of_dataset_one):
    
    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='r', marker='o' )
        
    elif (Y_train_one[data] == 2):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='g', marker='o' )
    
    elif (Y_train_one[data] == 3):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='b', marker='o' )
    
    elif (Y_train_one[data] == 4):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='y', marker='o' )
    
    
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2b",fontsize=10)
        
    
plt.subplot(133)

length_of_dataset_one = len(X_test_one)

for data in range(length_of_dataset_one):
    
    if (y_predicted22c[data] == 1):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='r', marker='o' )
    
    elif (y_predicted22c[data] == 2):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='g', marker='o' )
    
    elif (y_predicted22c[data] == 3):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='b', marker='o' )
    
    elif (y_predicted22c[data] == 4):
        plt.scatter(X_test_one[data][0], X_test_one[data][1], c='y', marker='o' )
        
for data in range(length_of_dataset_one):
    
    if (Y_train_one[data] == 1):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='r', marker='o' )
        
    elif (Y_train_one[data] == 2):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='g', marker='o' )
    
    elif (Y_train_one[data] == 3):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='b', marker='o' )
    
    elif (Y_train_one[data] == 4):
        plt.scatter(X_train_one[data][0], X_train_one[data][1], c='y', marker='o' )
    
    
plt.xlabel("x1",fontsize=8)

plt.ylabel("x2", fontsize=8)

plt.title("classifier 2c",fontsize=10)

plt.show()

** Cell type : TextWrite ** 
(Write your observations and table of errors here)



**Cell type : TextRead **

# Problem 3 : Bias-Variance analysis in regression

Do bias variance analysis for the following setting: 

$X \sim Unif([-1,1]\times[-1,1])$

$Y=\exp(-4*||X-a||^2) + \exp(-4*||X-b||^2) + \exp(-4*||X-c||^2)$

where $a=[0.5,0.5], b=[-0.5,0.5], c=[0.5, -0.5]$.

Regularised Risk = $\frac{1}{m} \sum_{i=1}^m (w^\top \phi(x_i) - y_i)^2 + \frac{\lambda}{2} ||w||^2 $ 

Sample 50 (X,Y) points from above distribution, and do ridge regularised polynomial regression with degrees=[1,2,4,8,16] and regularisation parameters ($\lambda$) = [1e-9, 1e-7, 1e-5, 1e-3, 1e-1, 1e1]. Repeat for 100 times, and estimate the bias and variance for all 15 algorithms. You may approximate the distribution over X by discretising the $[-1,1]\times[-1,1]$ space into 10000 points. (Both expectations over S and (x,y) are simply estimates due to the finiteness of our experiments and sample)
 
3a) For each of the 30 algorithms (corresponding to 5 degrees and 6 lambda values) analyse the contour plot of the estimated $f_S$ for 3 different training sets. And the average $g(x) = E_S [f_S(x)]$. Write one function for doing everything in the code cell below. 

3b) In the next text cell, give the Bias and Variance computed as a $5\times 6$ matrix, appropriately label the rows and columns. And give your conclusion in one or two sentences. 




In [136]:
# Cell type : CodeWrite
from numpy import linalg as LA
import matplotlib.pyplot as plt
import random
coordinates_list = []
def generate_X_train():
    global coordinates_list
    x = np.linspace(-1,1,10)   
    for xcord in x :
        for ycord in x:
            a= [xcord,ycord]
            coordinates_list.append(a)
            
generate_X_train()
def polynomial_regression_ridge_pred(X_test, wt_vector, degree=1):
    """ Give the value of the learned polynomial function, on test data.

    Arguments:
    X_test: numpy array of shape (n,d)
    wt_vec: numpy array of shape (d',)

    Returns:
    Y_test_pred : numpy array of shape (n,)
    
    """
    Y_test_pred = []
    feature_mapped_X_test = generate_feature_mapped_data(X_test,degree)
    for test_data in X_test:
        Y_test_pred.append(np.dot(np.array(test_data), wt_vector))
    return Y_test_pred
    

    
    
def visualise_polynomial_2d(wt_vector, degree, title=""):
    """
    Give a contour plot over the 2d-data domain for the learned polynomial given by the weight
     vector wt_vector.
    
    """
   
   X,Y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))

    z = np.random.rand(100,100)
    for i in range(100):
        for j in range(100):
            feature_mapped_data = feature_mapping_function(X[i][j],Y[i][j],degree)
            z[i][j] = np.dot(wt_vector,feature_mapped_data)
            

    plt.contourf(X,Y,z,levels=np.linspace(0.,1.2 , 20))
    plt.title('learned function : degree= '+ str(degree) + title)
    plt.colorbar()
    
def generate_feature_mapped_data(X_train,degree):
    feature_mapped_X_data = []
    for data in X_train:
        feature_mapped_data = feature_mapping_function(data[0],data[1],degree)
        feature_mapped_X_data.append(feature_mapped_data)
    return feature_mapped_X_data   
    
def feature_mapping_function(x1,x2,degree):
        feature_mapped_vector = []
        for i in range(degree+1):
            for j in range(i+1):
                feature_mapped_vector.append((x1**j)*x2**(i-j))
        return feature_mapped_vector


def sample_X_train():
    global coordinates_list
    return random.sample(coordinates_list, 50)
def sample_Y_train(sample_points):
   
    Y_train = []
    for points in sample_points:
        Y_train.append(y_value(points))
    return Y_train

def y_value(points):
    a =[0.5,0.5]
    b = [-0.5,0.5]
    c = [0.5,-0.5]
    #sub = np.subtract(points,a)
    x = LA.norm(np.asmatrix(np.subtract(points,a)), 'fro')
    y = LA.norm(np.asmatrix(np.subtract(points,b)), 'fro')
    z = LA.norm(np.asmatrix(np.subtract(points,c)), 'fro')
    value = np.exp(-4*(x)**2) + np.exp(-4*(y)**2) + np.exp(-4*(z)**2)

    return value

def polynomial_regression_ridge_train(X_train, Y_train, degree=1, reg_param=0.01):
    """ Give best polynomial fitting data, based on empirical squared error minimisation.

    Arguments:
    X_train: numpy array of shape (n,d)
    Y_train: numpy array of shape (n,)

    Returns:
    w : numpy array of shape (d',) with appropriate d'
    
    """
    
    fo_identity_matrix = len(X_train[0])
   
    tranposed_data = np.array(X_train).transpose()
    
    inside_bracket =  np.linalg.inv( np.dot(tranposed_data,np.array(X_train)) + \
                      len(X_train)*reg_param*.05*np.identity(fo_identity_matrix))
  

    inverse_of_matrix = np.linalg.inv(inside_bracket) 
   
    
    xtrain_tarnpose_mul_y_train = np.dot(tranposed_data,Y_train) 
    w_hat = np.dot(inverse_of_matrix,xtrain_tarnpose_mul_y_train)
    x = LA.norm(np.asmatrix(w_hat), 'fro')
    
    return w_hat/x
    

    
    
def compute_BV_error_sample_plot(degree, reg_param, num_training_samples=50):
    global coordinates_list
    
    weight_list_for_100_samples = []
    for i in range(100):
        sample_x_train = sample_X_train()
        sampl_y_train = sample_Y_train(sample_x_train)
        sample_featured_x_train = generate_feature_mapped_data(sample_x_train,degree)
        
        weight_list_for_100_samples.append(polynomial_regression_ridge_train\
                                               (sample_featured_x_train,sampl_y_train,\
                                                degree,reg_param))
   
    g_x = np.mean(weight_list_for_100_samples, axis=0)
    visualise_polynomial_2d(g_x, degree, title="")
    

    feature_mapped_Xtrain = generate_feature_mapped_data(coordinates_list,degree)
    sum_for_each_weight = []
    #variance finding
    for weight in weight_list_for_100_samples :
        sum =0
        for co_ordinate in feature_mapped_Xtrain:
            sum = sum + (np.dot(co_ordinate, weight) - np.dot(co_ordinate,g_x))**2
        sum_for_each_weight.append(np.average(sum))
    variance = np.average(sum_for_each_weight)
    
    #bias finding
    bias = 0
    
    for co_ordinate in coordinates_list:
        f = feature_mapping_function(co_ordinate[0],co_ordinate[1],\
                                                           degree)
        #print("feature mapped",f)
        g = g_x
        #print("g_x",g)
        
        h = y_value(co_ordinate)
        #print("y_value",h)
        
        product = np.dot(g,f)
        #print("np.dot(g,f)",product)
        bias = bias + (product-h)**2
        #print(count)
        #print("bias",bias)
        #print("********************************")
        
    bias = bias/10000.0 
    
    #mean_square_error
    mean_square_error = []
    for weight in weight_list_for_100_samples :
        sum =0
        for co_ordinate in coordinates_list:
            sum = sum + (np.dot(feature_mapping_function(co_ordinate[0],co_ordinate[1],\
                                                degree), weight) - y_value(co_ordinate))**2
        mean_square_error.append(np.average(sum))
    error = np.average(mean_square_error)
        
    return bias,variance,error
        
        
        
        
            
            
            
        
    
    """Write code for generating data, fitting polynomial for given degree and reg_param. 
    Use num_training_samples samples for training.
        
    Compute the $f_S$ of 100 runs. 

    Plot 3 examples of learned function to illustrate how learned function varies 
    with different training samples. Also plot the average $f_S$ of all 100 runs.
    
    In total 4 subplots in one plot with appropriate title including degree and lambda value.
    
    Fill code to compute bias and variance, and average mean square error using the computed 100 $f_S$ functions.
    
    All contourplots are to be drawn with levels=np.linspace(0,1.2,20)
    
    Also return bias, variance, mean squared error. """


for degree in [1,2,4,8,16]:
     for reg_param in [1e-9, 1e-7, 1e-5, 1e-3, 1e-1, 1e1]:
        b,v,e = compute_BV_error_sample_plot(2, 1e-9)
        print('================================')
        print('Degree= '+str(2)+' lambda= '+str(1e-9))
        print('Bias = '+str(b))
        print('Variance = '+str(v))
        print('MSE = '+str(e))



visualise 2d data [array([[1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       ...,
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.]]), array([[-1.        , -1.        , -1.        , ..., -1.        ,
        -1.        , -1.        ],
       [-0.97979798, -0.97979798, -0.97979798, ..., -0.97979798,
        -0.97979798, -0.97979798],
       [-0.95959596, -0.95959596, -0.95959596, ..., -0.95959596,
        -0.95959596, -0.95959596],
       ...,
       [ 0.95959596,  0.95959596,  0.95959596, ...,  0.95959596,
         0.95959596,  0.95959596],
       [ 0.97979798,  0.97979798,  0.97979798, ...,  0.97979798,
         0.97979798,  0.97979798],
       [ 1.        ,  1.        ,  1.        , ...,  1.        ,
         1.        ,  1.        ]]), array([[-1.        , -0.97979798, -0.95959596, ...,  0.95959596,
         0.97979798,  1.        ],
       [-1.        , -0.9

NameError: name 'Z' is not defined

In [None]:
#Cell type: convenience
# X_train_pos = np.random.randn(1000,2)+np.array([[1.,2.]])
# X_train_neg = np.random.randn(1000,2)+np.array([[2.,4.]])
# X_train = np.concatenate((X_train_pos, X_train_neg), axis=0)
# Y_train = np.concatenate(( np.ones(1000), -1*np.ones(1000) ))
# X_test_pos = np.random.randn(1000,2)+np.array([[1.,2.]])
# X_test_neg = np.random.randn(1000,2)+np.array([[2.,4.]])
# X_test = np.concatenate((X_test_pos, X_test_neg), axis=0)
# Y_test = np.concatenate(( np.ones(1000), -1*np.ones(1000) ))
# sample_x_train = sample_X_train()
# #print("sample_x_train",sample_x_train)
# sampl_y_train = sample_Y_train(sample_x_train)
# print("sampl_y_train", sampl_y_train)
# sample_featured_x_train = generate_feature_mapped_data(sample_x_train,1)
# #print("sample_featured_x_train",sample_featured_x_train)
# polynomial_regression_ridge_train(sample_featured_x_train, sampl_y_train, degree=1, reg_param=0.01)


** Cell type: TextWrite **
Give the biases and variances computed for the various algorithms with various degrees and lambdas and summarise your findings.



** Cell type : TextRead **

# Problem 4 : Analyse overfitting and underfitting in Regression


Consider the 2-dimensional regression dataset "dateset4_1.npz". Do polynomial ridge regression for degrees = [1,2,4,8,16], and regularisation parameter $\lambda$ = [1e-9, 1e-7, 1e-5, 1e-3, 1e-1, 1e1]. Do all the above by using three different subset sizes of the training set : 50, 100, 200 and 1000. (Just take the first few samples of X_train and Y_train.)

Regularised Risk = $\frac{1}{m} \sum_{i=1}^m (w^\top \phi(x_i) - y_i)^2 + \frac{\lambda}{2} ||w||^2 $ 

The lambda value is given by the regularisation parameter. 

In the next codewrite cell, for each training set size compute how the train and test squared error varies with degree (by changing $\phi$) and regularisation parameter (changing $\lambda$). Compute the "best" degree and regularisation parameter based on the test squared error. Give a contour plot of the learned function for the chosen hyper-parameters, with appropriate title including the hyperparameters. Total number of figures = 4 (one for each training set size.)

Summarise your findings in the next tex cell in a few sentences. And reproduce the tables showing train and test error for various training sizes, with appropriate row and column names.




In [122]:
# Cell type : CodeWrite 
import numpy as np
import matplotlib.pyplot as plt
with np.load('dataset4_1.npz') as data:
    X_train = data['arr_0']
    Y_train = data['arr_1']
    X_test = data['arr_2']
    Y_test = data['arr_3']
    
def train_error(w_vector,sample_x_train,sampl_y_train):
    total_error = 0
    for i in range(len(sample_x_train)):
        total_error = total_error + (np.dot(w_vector,sample_x_train[i]) - sampl_y_train[i])**2
    return float(total_error)/len(sample_x_train)

def test_error(w_vector, sample_x_test, sampl_y_test):
    total_error = 0
    for i in range(len(sample_x_test)):
        total_error = total_error + (np.dot(w_vector,sample_x_test[i]) - sampl_y_test[i])**2
    return float(total_error)/len(sample_x_test)
    
        
        
    
minimum_test_error
best_degree
best_reg_param 
def plot_the_graph_for_points(sample_points):
    global minimum_test_error 
    minimum_test_error = 2323432433443
    global best_degree
    global best_reg_param
    global Y_test
    
    for degree in [1,2,4,8,16]:
        for reg_param in [1e-9, 1e-7, 1e-5, 1e-3, 1e-1, 1e1]:

            sample_x_train = X_train[0:sample_points,:]
            sampl_y_train = Y_train[:sample_points]

            sample_featured_x_train = generate_feature_mapped_data(sample_x_train,degree)
            w_vector = polynomial_regression_ridge_train(sample_featured_x_train, sampl_y_train, degree, reg_param)

            trained_error = train_error(w_vector,sample_featured_x_train,sampl_y_train)
            print(trained_error)

            sample_featured_x_test = generate_feature_mapped_data(X_test,degree)
            print("&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&")
            tested_error = test_error(w_vector, sample_featured_x_test, Y_test )
            print(tested_error)
            #print(degree ,  reg_param,  trained_error,  tested_error )
            #print(minimum_test_error)

            if tested_error<minimum_test_error :
                minimum_test_error = tested_error
                best_degree = degree
                best_reg_param = reg_param
    print("results",best_degree,best_reg_param)

def plotting_the_graph():
    #for size 50
    plot_the_graph_for_points(50)
    
    #for size 100
    plot_the_graph_for_points(100)
    
    #for size 200
    plot_the_graph_for_points(200)
    
    #for size 1000
    plot_the_graph_for_points(1000)
    
plotting_the_graph()
    
            
            
        
        

0.8168239735973664
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
0.9969201258378148
0.8168239733854373
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
0.9969201258449569
0.8168239521925216
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
0.9969201265590775
0.8168218329669785
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
0.9969201979198459
0.8166105691692198
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
0.996926823433949
0.8008057436903526
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
0.9952458190193434
1.051560479356989
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1.2184115828980415
1.0515604786714385
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1.2184115823907122
1.0515604101163278
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1.2184115316575916
1.051553554657866
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1.2184064582768368
1.0508685424114896
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1.2178984439471745
0.9915345573267073
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1.168565772859734
1.1707834814345208
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1.273767893800578
1.1707834802622168
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1.27376789296047

** Cell type : TextWrite **