# **CS 5361/6361 Machine Learning**

**Linear Regression Exercise**

**Authors:**
Ruben Martinez
Alberto Valles

**Exercise:**

1.  Classify the MNIST dataset using linear regression by simply treating the class as a continuous variable and then rounding the results.  
2.  Classify the MNIST dataset using linear regression converting the class to a 10-D vector using a one-hot representation.  
3.  Repeat the previous step but now add the squares of the pixel intensities as features.  

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import time

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [2]:
x_train = np.float32(x_train/255).reshape(x_train.shape[0],-1)
x_test = np.float32(x_test/255).reshape(x_test.shape[0],-1)

In [3]:
print(x_train.shape)

(60000, 784)


**1. Classify the MNIST dataset using linear regression by simply treating the class as a continuous variable and then rounding the results. ***

In [4]:
from sklearn.linear_model import LinearRegression

# create linear regression model
model = LinearRegression()
model.fit(x_train, y_train) # train the model
predictions = model.predict(x_train) # predict class labels in training data
print(predictions)

[4.192953  1.2161877 3.2152336 ... 6.724541  4.980859  6.311388 ]


In [5]:
# rounding predictions to nearest integer to get the predicted classes
predicted_classes = np.round(predictions).astype(int)


# clip the predicted classes to be within the correct range of digits [0, 9]
predicted_classes = np.clip(predicted_classes, 0, 9) # what this will do is replace negative predictions with 0 and replace predictions greater than 9 with 9

# calculate accuracy
accuracy = np.mean(predicted_classes == y_train)
print(f'Accuracy (treating classes as continuous): {accuracy * 100:.2f}%')

Accuracy (treating classes as continuous): 23.45%


**2. Classify the MNIST dataset using linear regression converting the class to a 10-D vector using a one-hot representation. ***

In [6]:
from sklearn.preprocessing import OneHotEncoder # the class labels are converted to a 10-dimensional vector using OneHotEncoder module


one_hot_encoder = OneHotEncoder(sparse_output=False) # create encoder

# one-hot encode the class labels (0-9) into a 10-D vector
y_train_one_hot = one_hot_encoder.fit_transform(y_train.reshape(-1, 1)) # (60000, 10)
y_test_one_hot = one_hot_encoder.transform(y_test.reshape(-1, 1))

# fit the model with y_one_hot
model.fit(x_train, y_train_one_hot)

# make the predictions
predictions = model.predict(x_train)

# take the argmax of the predictions to convert them back to class labels
predicted_classes = np.argmax(predictions, axis=1)

# calculate accuracy
accuracy = np.mean(predicted_classes == y_train)
print(f'Accuracy (one-hot encoding): {accuracy * 100:.2f}%')

Accuracy (one-hot encoding): 85.77%


**3. Repeat the previous step but now add the squares of the pixel intensities as features.***

In [7]:
# features by adding the squared pixel intensities
x_train_augmented = np.concatenate([x_train, x_train**2], axis=1)
x_test_augmented = np.concatenate([x_test, x_test**2], axis=1)

# train the model
model.fit(x_train_augmented, y_train_one_hot)

# make the predicionts on the new train matrix
predictions = model.predict(x_train_augmented)

# take argmax of the predictions to convert them back to class labels
predicted_classes = np.argmax(predictions, axis=1)

# calculate accuracy
accuracy = np.mean(predicted_classes == y_train)
print(f'Accuracy (one-hot encoding with squares of pixel intensities): {accuracy * 100:.2f}%')


Accuracy (one-hot encoding with squares of pixel intensities): 89.17%
