For Questions 13-15, consider the target function:

$f(x_1,x_2)=sign(x^2_1+x^2_2−0.6)$

Generate a training set of $N = 1000$ points on $\chi =[−1,1] \times [−1,1]$ with uniform probability of picking each $\mathbf{x} \in \chi$

Generate simulated noise by flipping the sign of the output in a random $10\%$ subset of the generated training set.


$13.$ Carry out Linear Regression without transformation, 

i.e. with feature vector $(1, x_1, x_2)$ to find the weight $\mathbf{w}_{\rm lin}$ and use $\mathbf{w}_{\rm lin}$ directly for classification. 

What is the closest value to the classification (0/1) in-sample error $E_{\rm in}$? 

Run the experiment 1000 times and take the average $E_{\rm in}E$ in order to reduce variation in your results.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import SGDClassifier

def f(x):
    return x[0]**2 + x[1]**2 - 0.6

In [2]:
e = []
# run for 1000 times
for _ in range(1000):
    # Generate the training set
    X_train = np.random.uniform(-1, 1, size=(1000, 2))
    y_train = np.sign(list(map(f, X_train)))
    X_train = np.insert(X_train, 0, 1, axis=1)

    # Flip 10% of the signs
    flip_ids = np.random.choice(1000, 100, replace=False)
    y_train[flip_ids] = -y_train[flip_ids]

    # run linear regression
    LR = LinearRegression(fit_intercept=False, )
    LR.fit(X_train, y_train)

    # calculate in-sample error
    y_pred = np.sign(LR.predict(X_train))
    e_in = np.sum(y_train != y_pred) / len(y_train)
    e.append(e_in)
    
e = sum(e) / len(e)
print('Average in-sample error: ' + str(e))

Average in-sample error: 0.504855


$14.$ Now, transform the training data into the following nonlinear feature vector:

$(1, x_1, x_2, x_1x_2, x_1^2, x_2^2)$

Find the vector $\tilde{\mathbf{w}}$ that corresponds to the solution of Linear Regression, and take it for classification. 

Which of the following hypotheses is closest to the one you find using Linear Regression on the transformed input? 

Closest here means agrees the most with your hypothesis (has the most probability of agreeing on a randomly selected point).

In [3]:
coefs = []
for _ in range(1000):
    # Generate the training set
    X_train = np.random.uniform(-1, 1, size=(1000, 2))
    y_train = np.sign(list(map(f, X_train)))
    X_train = np.insert(X_train, 0, 1, axis=1)
    X_train = np.insert(X_train, X_train.shape[1], X_train[:, 1] * X_train[:, 2], axis=1)
    X_train = np.insert(X_train, X_train.shape[1], X_train[:, 1]**2, axis=1)
    X_train = np.insert(X_train, X_train.shape[1], X_train[:, 2]**2, axis=1)
    
    # Flip 10% of the signs
    flip_ids = np.random.choice(1000, 100, replace=False)
    y_train[flip_ids] = -y_train[flip_ids]

    # run linear regression
    LR = LinearRegression(fit_intercept=False)
    LR.fit(X_train, y_train)

    # bookkeep the weights
    coefs.append(LR.coef_)
    
coef = np.average(coefs, axis=0)
print("Weights: " + str(coef))

Weights: [ -9.92140857e-01   9.62066238e-04   1.85701329e-03  -1.28203618e-03
   1.56198573e+00   1.55271854e+00]


$15.$ Following Question 14, what is the closest value to the classification out-of-sample error $E_{\rm out}$ of your hypothesis? 

Estimate it by generating a new set of 1000 points and adding noise as before. Average over 1000 runs to reduce the variation in your results.

In [4]:
e = []
for _ in range(1000):
    # Generate the training set
    X_train = np.random.uniform(-1, 1, size=(1000, 2))
    y_train = np.sign(list(map(f, X_train)))
    X_train = np.insert(X_train, 0, 1, axis=1)
    X_train = np.insert(X_train, X_train.shape[1], X_train[:, 1] * X_train[:, 2], axis=1)
    X_train = np.insert(X_train, X_train.shape[1], X_train[:, 1]**2, axis=1)
    X_train = np.insert(X_train, X_train.shape[1], X_train[:, 2]**2, axis=1)

    # Flip 10% of the signs
    flip_ids = np.random.choice(1000, 100, replace=False)
    y_train[flip_ids] = -y_train[flip_ids]

    # run linear regression
    LR = LinearRegression(fit_intercept=False)
    LR.fit(X_train, y_train)

    # Generate the test set
    X_test = np.random.uniform(-1, 1, size=(1000, 2))
    y_test = np.sign(list(map(f, X_test)))
    X_test = np.insert(X_test, 0, 1, axis=1)
    X_test = np.insert(X_test, X_test.shape[1], X_test[:, 1] * X_test[:, 2], axis=1)
    X_test = np.insert(X_test, X_test.shape[1], X_test[:, 1]**2, axis=1)
    X_test = np.insert(X_test, X_test.shape[1], X_test[:, 2]**2, axis=1)

    # Flip 10% of the signs
    flip_ids = np.random.choice(1000, 100, replace=False)
    y_test[flip_ids] = -y_test[flip_ids]

    # run prediction and calculate error
    y_pred = np.sign(LR.predict(X_test))
    e_out = np.sum(y_pred != y_test) / len(y_pred)
    e.append(e_out)
    
e = sum(e) / len(e)
print("Average out-of-sample error: " + str(e))

Average out-of-sample error: 0.125684


For Questions 18-20, you will play with logistic regression. Please use the following set for training:

https://www.csie.ntu.edu.tw/~htlin/mooc/datasets/mlfound_algo/hw3_train.dat

and the following set for testing:

https://www.csie.ntu.edu.tw/~htlin/mooc/datasets/mlfound_algo/hw3_test.dat

In [5]:
data = np.genfromtxt('hw3_train.dat')
X_train = data[:, :-1]
y_train = data[:, -1]

data = np.genfromtxt('hw3_test.dat')
X_test = data[:, :-1]
y_test = data[:, -1]

$18.$ Implement the fixed learning rate gradient descent algorithm for logistic regression. Run the algorithm with $\eta = 0.001$ and $T = 2000$. 

What is $E_{out}(g)$ from your algorithm, evaluated using the 0/1 error on the test set?

In [6]:
# run logistic regression
LR = SGDClassifier(loss='log', fit_intercept=False, learning_rate='constant', eta0=0.001, penalty='none')
for _ in range(2000):
    idx = np.random.randint(0, X_train.shape[0])
    LR.partial_fit(X_train[idx].reshape(1, -1), y_train[idx].reshape(1, -1), classes=np.unique(y_train))
y_pred = LR.predict(X_test)
e = np.sum(y_pred != y_test) / len(y_pred)
print('Out-of-sample error: ' + str(e))

  y = column_or_1d(y, warn=True)


Out-of-sample error: 0.475


$19.$ Implement the fixed learning rate gradient descent algorithm for logistic regression. Run the algorithm with $\eta = 0.01$ and $T = 2000$.

What is $E_{out}(g)$ from your algorithm, evaluated using the 0/1 error on the test set?

In [7]:
# run logistic regression
LR = SGDClassifier(loss='log', fit_intercept=False, learning_rate='constant', eta0=0.01, penalty='none')
for idx in range(2000):
    idx = np.random.randint(0, X_train.shape[0])
    LR.partial_fit(X_train[idx].reshape(1, -1), y_train[idx].reshape(1, -1), classes=np.unique(y_train))
y_pred = LR.predict(X_test)
e = np.sum(y_pred != y_test) / len(y_pred)
print('Out-of-sample error: ' + str(e))

  y = column_or_1d(y, warn=True)


Out-of-sample error: 0.27


$20.$ Implement the fixed learning rate stochastic gradient descent algorithm for logistic regression. Instead of randomly choosing nn in each iteration, please simply pick the example with the cyclic order $n = 1, 2, \ldots, N, 1, 2, \ldots$

Run the algorithm with $\eta = 0.001$ and $T = 2000$. What is $E_{out}(g)$ from your algorithm, evaluated using the 0/1 error on the test set?

In [8]:
# run logistic regression
LR = SGDClassifier(loss='log', fit_intercept=False, learning_rate='constant', eta0=0.001, penalty='none')
for i in range(2000):
    idx = i % X_train.shape[0]
    LR.partial_fit(X_train[idx].reshape(1, -1), y_train[idx].reshape(1, -1), classes=np.unique(y_train))
y_pred = LR.predict(X_test)
e = np.sum(y_pred != y_test) / len(y_pred)
print('Out-of-sample error: ' + str(e))

  y = column_or_1d(y, warn=True)


Out-of-sample error: 0.471666666667
