__(a)__ Below you can find the function to convert the images into grayscaled and flattened vectors with the help of OpenCV library. Later we will use this function to convert the images in the _3dshapes_train_ and _3dshapes_test_. We also keep the 2d versions as well since we will use them to extract features later.

In [240]:
import numpy as np
import cv2 as cv
import math

y_train = np.load("orientations_train.npy")

train_size = 10000
image_length = 4096
image_dim = 64

def vectorize_images(filename, data_size):
    # X for training and X_2d for feature extraction
    X = np.empty(shape=(data_size, image_length))
    X_2d = np.empty(shape=(data_size, image_dim, image_dim), dtype='uint8')

    for i in range(data_size):        
        img = cv.imread("{filename}/{i}.jpg".format(filename = filename, i = i), cv.IMREAD_GRAYSCALE)
        X[i] = img.flatten()
        X_2d[i] = img
        
    return X, X_2d

X_train, X_train_2d = vectorize_images("3dshapes_train" ,train_size)

__(b)__ In order to find the optimal weight vector coefficients, we set the gradient of the least square error function to 0. The closed-form solution we obtain is as follows: <br>

$ \mathbf{w}^* = {({X}^\intercal X)}^{-1}{X}^\intercal \mathbf{t} $ <br>

Below we implemented the function to calculate the optimal weights and used it to train the model. <br> <br>
To test our model we first generated our predictions using the formula: $ \hat{y} = Xw = X{({X}^\intercal X)}^{-1}{X}^\intercal \mathbf{t} $ and then compared it with the actual labels.

In [241]:
# computing the optimal parameters by minimizing the least squares error function
def linear_regression(X, y):
    return np.linalg.pinv(X.T @ X) @ X.T @ y

w = linear_regression(X_train, y_train)

In [242]:
y_test = np.load("orientations_test.npy")

test_size = 1000

X_test, X_test_2d = vectorize_images("3dshapes_test" ,test_size)
y_prediction = X_test @ w

RMSE = math.sqrt(np.square(np.subtract(y_test, y_prediction)).mean()) 
print(f"Root Mean Square Error is: {RMSE}")

Root Mean Square Error is: 1.511688025740367e-06


__(c)__ Below you can find SIFT (Scale-invariant feature transform) algorithm implementation of ours with the help of OpenCV documents. It basically finds keypoints and descriptors for each of the images. We trained our model with stacking these descriptors and their corresponding labels. <br>
We chose SIFT as the feature extraction algorithm because as a result of our research we found that it is especially good at extracting features related to edges, corners, blobs, scale-invariant features and rotation-invariant features. Thus, we thought that it would make sense to use this algorithm to extract orientation related features.

In [243]:
# RESOURCES: https://docs.opencv.org/4.x/da/df5/tutorial_py_sift_intro.html

def extract_features(X_2d, y, data_size):
    keypoints = np.empty(data_size, dtype=object)
    number_of_keypoints = 0

    # Initiate SIFT detector
    sift = cv.xfeatures2d.SIFT_create()

    for i in range(data_size):
        # find the keypoints
        kp = sift.detect(X_2d[i], None)
        keypoints[i] = kp
        number_of_keypoints += len(kp)

    X_sift = np.empty(shape = (number_of_keypoints, 128))
    y_sift = np.empty(number_of_keypoints)

    idx = 0
    for i in range(data_size):
        # find the descriptors
        kp, desc = sift.compute(X_2d[i], keypoints[i])

        if len(kp) != 0:
            for d in desc:
                X_sift[idx] = d
                y_sift[idx] = y[i]
                idx += 1
    
    return X_sift, y_sift

In [244]:
# extracting features for training
X_train_sift, y_train_sift = extract_features(X_train_2d, y_train, train_size)

# training the model with the extracted features
w_sift = linear_regression(X_train_sift, y_train_sift)

# extracting features from the test data 
X_test_sift, y_test_sift = extract_features(X_test_2d, y_test, test_size)

# making predictions using the w coming from the extracted data set
y_prediction_sift = X_test_sift @ w_sift

# calculating the RMSE
RMSE_sift = math.sqrt(np.square(np.subtract(y_test_sift, y_prediction_sift)).mean()) 
print(f"Root Mean Square Error with the extracted features is: {RMSE_sift}")

Root Mean Square Error with the extracted features is: 17.759797638538384


__(c)__ As it clearly can be seen, current result is much worse than the previous result. The main reason for this could be that we lose too much information in the feature extraction process, we don't extract the right features, or we do something wrong during implementation. 