# The Netflix Recommendation Problem - Dimension Reduction Techniques
## Authors: Ba Khuong DANG, Kartik VISWANATHAN
## MSD 2024-25

### Read the dataset

In [1]:
!wget -O ml-latest-small.zip -N https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
!unzip -o ml-latest-small.zip

for details.

--2025-02-12 13:21:51--  https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 978202 (955K) [application/zip]
Saving to: ‘ml-latest-small.zip’


2025-02-12 13:21:52 (1.49 MB/s) - ‘ml-latest-small.zip’ saved [978202/978202]

Archive:  ml-latest-small.zip
  inflating: ml-latest-small/links.csv  
  inflating: ml-latest-small/tags.csv  
  inflating: ml-latest-small/ratings.csv  
  inflating: ml-latest-small/README.txt  
  inflating: ml-latest-small/movies.csv  


In [2]:
# Load ther ratings.csv file into a pandas dataframe
import pandas as pd
ratings = pd.read_csv('ml-latest-small/ratings.csv')

In [3]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [4]:
ratings.shape

(100836, 4)

### Sparse Representation
* Since the data is sparse (most users have not rated most movies), we should represent the data in a sparse format to save memory and improve computational efficiency.
* The sparse format typically represents the data as a user-item matrix, where rows correspond to users, columns correspond to movies, and the values are the ratings.

### Features
* UserID and MovieID are categorical features and should be encoded using one-hot encoding or LabelEncoding
* Rating is the target variable (output) we want to predict
* Timestamp can be dropped as it won't contain any information regarding the ratings

In [5]:
# Label encode the user and movie ids
from sklearn.preprocessing import LabelEncoder

# Initialize LabelEncoders
user_enc = LabelEncoder()
movie_enc = LabelEncoder()

# Fit and transform the training data
ratings['user'] = user_enc.fit_transform(ratings['userId'].values)
ratings['movie'] = movie_enc.fit_transform(ratings['movieId'].values)

# Display the first few rows of the test data
print("\nRatings data:")
print(ratings.head())



Ratings data:
   userId  movieId  rating  timestamp  user  movie
0       1        1     4.0  964982703     0      0
1       1        3     4.0  964981247     0      2
2       1        6     4.0  964982224     0      5
3       1       47     5.0  964983815     0     43
4       1       50     5.0  964982931     0     46


In [6]:
from scipy.sparse import coo_matrix
from scipy.sparse import csr_matrix

#create a sparse matrix of train data
n_users = ratings['user'].max() + 1
n_movies = ratings['movie'].max() + 1
print(n_users, n_movies)

sparse_matrix_train = csr_matrix((ratings['rating'], (ratings['user'], ratings['movie'])), shape=(n_users, n_movies))

print(sparse_matrix_train.shape)
print(sparse_matrix_train)

610 9724
(610, 9724)
<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 100836 stored elements and shape (610, 9724)>
  Coords	Values
  (0, 0)	4.0
  (0, 2)	4.0
  (0, 5)	4.0
  (0, 43)	5.0
  (0, 46)	5.0
  (0, 62)	3.0
  (0, 89)	5.0
  (0, 97)	4.0
  (0, 124)	5.0
  (0, 130)	5.0
  (0, 136)	5.0
  (0, 184)	5.0
  (0, 190)	3.0
  (0, 197)	5.0
  (0, 201)	4.0
  (0, 224)	5.0
  (0, 257)	3.0
  (0, 275)	3.0
  (0, 291)	5.0
  (0, 307)	4.0
  (0, 314)	4.0
  (0, 320)	5.0
  (0, 325)	4.0
  (0, 367)	3.0
  (0, 384)	4.0
  :	:
  (609, 9238)	5.0
  (609, 9246)	4.5
  (609, 9256)	4.0
  (609, 9268)	5.0
  (609, 9274)	3.5
  (609, 9279)	3.5
  (609, 9282)	3.0
  (609, 9288)	3.0
  (609, 9304)	3.0
  (609, 9307)	2.5
  (609, 9312)	4.5
  (609, 9317)	3.0
  (609, 9324)	3.0
  (609, 9339)	4.0
  (609, 9341)	4.0
  (609, 9348)	3.5
  (609, 9371)	3.5
  (609, 9372)	3.5
  (609, 9374)	5.0
  (609, 9415)	4.0
  (609, 9416)	4.0
  (609, 9443)	5.0
  (609, 9444)	5.0
  (609, 9445)	5.0
  (609, 9485)	3.0


### Rank of the users x movie sparse matrix

The rank is calculated based on $\epsilon$ which is a fixed parameter that determines the level at which singular values are "shrunk" towards zero

In [7]:
import numpy as np
from scipy.sparse.linalg import svds

def rank_calculator(sparse_matrix_train, epsilon):
  
  #Convert to array
  train_data_matrix = sparse_matrix_train.toarray()

  #Perform SVD
  u, s, vt = svds(train_data_matrix, k=min(train_data_matrix.shape) - 1)

  #Sort the s values in descending order
  s = np.sort(s)[::-1]

  for i in range(len(s)-1):
    if s[i] - s[i+1] < epsilon:
      tau = s[i]
      rank = i + 1
      break

  total_energy = np.sum(s**2)  # Calculate total energy
  current_energy = np.sum(s[:rank]**2)  # Calculate energy based on the correct rank
  energy = current_energy / total_energy  # Calculate the energy based on the correct rank


  return rank, energy, tau

### Create the error calculation function

In [8]:
from scipy.sparse import issparse

# error calculation function for sparse matrix

def error_calc(X_S, Y_S):

    # Convert sparse matrices to dense arrays if necessary
    if issparse(X_S):
        X_S = X_S.toarray()
    if issparse(Y_S):
        Y_S = Y_S.toarray()
    
    #Norm calculation X_S - Y_S
    normXminusY = np.linalg.norm(X_S - Y_S, ord=2)

    #Norm of Y_S
    norm_Y = np.linalg.norm(Y_S, ord=2)

    if norm_Y == 0:
        raise ValueError("Norm of Y is 0")

    #Calculate the error
    error = normXminusY/norm_Y
  
    return error

## Singular Value Projection Algorithm

SVP aims to find a low-rank matrix $X$ that approximates an observed matrix $Y$ by solving:
$$
\min_{X} \|X - Y\|_F^2 \quad \text{subject to} \quad \text{rank}(X) \leq r
$$
where the rank $r$ is a fixed desired rank.

In [9]:
# Get the rank for 90% energy
rank, energy, tau = rank_calculator(sparse_matrix_train, 0.002)
print("rank=", rank, "energy=", energy, "tau=", tau)

rank= 216 energy= 0.8809382042360632 tau= 34.276247085610635


In [10]:
import numpy as np
from scipy.sparse import csr_matrix, lil_matrix
from scipy.sparse.linalg import svds
from sklearn.base import BaseEstimator
import time
from scipy.sparse import issparse

In [11]:
#SVP Estimator class to use with cross validation in scikit learn

class SVPEstimator(BaseEstimator):
    def __init__(self, rank=10, max_iter=100, tol=1e-3, learning_rate=1):
        self.rank = rank
        self.max_iter = max_iter
        self.tol = tol
        self.learning_rate = learning_rate
        self.X = None
        self.fit_time = None
        self.error = None
        self.iter = None
        

    def fit(self, Y, y=None):
        start_time = time.time()
        #Init X
        self.X = lil_matrix(Y.shape, dtype = np.float64) #Sparse matrix with 64 bit float values for better stability
        iter = 0
        error = self.tol + 99
        prev_error = 100
        while(error > self.tol and iter < self.max_iter and prev_error > error):
            #Update the previous error
            prev_error = error

            #Update the iteration count
            iter += 1

            #Update the X matrix
            X_half = self.X + self.learning_rate * (Y - self.X)
            U, S, Vt = svds(X_half, k = self.rank)
            U_sparse = csr_matrix(U)
            Vt_sparse = csr_matrix(Vt)
            S_sparse = csr_matrix(np.diag(S))
            self.X = U_sparse @ S_sparse @ Vt_sparse 
            
            #Calculate the error
            error = self._error_calc(self.X, Y)

        self.error = error
        self.iter = iter
        self.fit_time = time.time() - start_time
        print(f"Fit time: {self.fit_time:.6f} seconds with {self.iter} iterations and error {self.error:.6f}.")
        return self
    
    def score(self, Y, y=None):
        # Extract observed entries in the validation fold
        val_indices = list(zip(Y.nonzero()[0], Y.nonzero()[1]))
        val_values = Y.data

        # Compute the error only on the observed entries in the validation fold
        error = self._error_calc_validation(self.X, val_indices, val_values)
        return -error  # For scikit-learn, higher score is better

    def _error_calc(self, X_S, Y_S):

        # Convert sparse matrices to dense arrays if necessary
        if issparse(X_S):
            X_S = X_S.toarray()
        if issparse(Y_S):
            Y_S = Y_S.toarray()
        
        #Norm calculation X_S - Y_S
        normXminusY = np.linalg.norm(X_S - Y_S, ord=2)

        #Norm of Y_S
        norm_Y = np.linalg.norm(Y_S, ord=2)

        if norm_Y == 0:
            raise ValueError("Norm of Y is 0")

        #Calculate the error
        error = normXminusY/norm_Y
    
        return error

    def _error_calc_validation(self, X, val_indices, val_values):
        
        # Extract predicted values at the validation indices
        pred_values = np.array([X[i, j] for (i, j) in val_indices])

        # Compute the Frobenius norm of the difference
        numerator = np.linalg.norm(pred_values - val_values, ord=2)

        # Compute the Frobenius norm of the validation values
        denominator = np.linalg.norm(val_values, ord=2)

        if denominator == 0:
            raise ValueError("Norm of validation values is 0")

        # Compute the relative error
        error = numerator / denominator

        return error


In [12]:
from sklearn.model_selection import cross_val_score
#Initialize the SVP Estimator
svp_estimator = SVPEstimator(rank=rank, max_iter=100, tol=1e-3, learning_rate=1)

#fit the model once for timing purposes
svp_estimator.fit(sparse_matrix_train)
SVPTimingOneIter = svp_estimator.fit_time/svp_estimator.iter
print("Fit time SVP One iteration: ", SVPTimingOneIter)

#Perform 5 fold CV
cv_scores = cross_val_score(svp_estimator, sparse_matrix_train, cv=5, n_jobs=10)

print("Cross validation errors: ", -cv_scores)
print("Mean CV error: ", -np.mean(cv_scores))

Fit time: 18.471084 seconds with 3 iterations and error 0.064134.
Fit time SVP One iteration:  6.157027959823608
Fit time: 9.495206 seconds with 2 iterations and error 0.060736.
Fit time: 9.621317 seconds with 2 iterations and error 0.063242.
Fit time: 9.919137 seconds with 2 iterations and error 0.058039.
Fit time: 9.946097 seconds with 2 iterations and error 0.060150.
Fit time: 10.099894 seconds with 2 iterations and error 0.059354.
Cross validation errors:  [0.949641   0.94135507 0.95269472 0.95222441 0.95165312]
Mean CV error:  0.9495136629343819


### Improve SVP based on learning rate

In [14]:
from sklearn.model_selection import KFold
# Define a range for learning rates
learning_rates = [0.08, 0.1, 0.2, 0.5, 1]

# Perform cross-validation
cv_errors = []
for learning_rate in learning_rates:
    svp_estimator = SVPEstimator(rank=rank, learning_rate=learning_rate, max_iter=100, tol=1e-3)
    cv_scores = cross_val_score(svp_estimator, sparse_matrix_train, cv=KFold(n_splits=5, shuffle=True, random_state=42), n_jobs=10)
    cv_errors.append(-np.mean(cv_scores))  # Convert scores to errors
    print("Cross validation errors: ", -cv_scores)
    print("Mean CV error: ", -np.mean(cv_scores))

# Find the best lambda
best_learning_rate = learning_rates[np.argmin(cv_errors)]
print(f"Best leraning rate: {learning_rate}")

# Best error
best_error_svp = np.min(cv_errors)
print(f"Best error: {best_error_svp}")


Fit time: 526.345350 seconds with 35 iterations and error 0.062799.
Fit time: 567.554138 seconds with 36 iterations and error 0.059501.
Fit time: 572.897774 seconds with 35 iterations and error 0.059296.
Fit time: 576.597844 seconds with 36 iterations and error 0.061912.
Fit time: 579.490688 seconds with 37 iterations and error 0.058775.
Cross validation errors:  [0.95945764 0.95602964 0.93400425 0.95142328 0.95331217]
Mean CV error:  0.950845395702809
Fit time: 434.294079 seconds with 29 iterations and error 0.062799.
Fit time: 446.902071 seconds with 28 iterations and error 0.061912.
Fit time: 449.548540 seconds with 28 iterations and error 0.058775.
Fit time: 456.115894 seconds with 29 iterations and error 0.059501.
Fit time: 456.949611 seconds with 28 iterations and error 0.059296.
Cross validation errors:  [0.9594225  0.95601807 0.9339909  0.95146222 0.95333579]
Mean CV error:  0.9508458980906127
Fit time: 221.946574 seconds with 14 iterations and error 0.058775.
Fit time: 222.054

### Conclusion SVP
* The SVP algorithm converges quite fast since the rank is fixed. (Hard rank constraint)
* There seems to be over-fitting happening looking at the cross validation error scores. This means that it is capturing the noise in the training folds. Probably the rank of the matrix is chosen to be too high.
* This SVP method does not have regularization. Maybe with regularization the results could be better, as we can penelize large values. (As we will see in the next algorithm)
* Increasing the data could also yield better results.
* Learning rate of 1 has the best error.

## Singular Value Threshold

SVT introduces a soft thresholding of the singular values which brings about a nuclear norm regularization.
$$
\min_{X} \|X - Y\|_F^2 + \lambda \|X\|_*
$$

### Selection of hyper parameter $\tau$
It is very important to select the correct values for the hyper parameter $\tau$.
* $\tau$ determines the threshold for the singular values
* Larger $\tau$ values shrink more singular values towards zero, resulting in lower-rank matrices (under-fitting)
* Smaller $\tau$ values do the opposite, resulting in better fit for the training data (over-fitting)
* Using the right $\tau$ we can control the over/under fitting of the data

In our case we have selected $\tau = 0.02$ in order to have around $90\%$ of energy.

In [15]:
#SVP Estimator class to use with cross validation in scikit learn

class SVTEstimator(BaseEstimator):
    def __init__(self, max_iter=100, tol=1e-3, learning_rate=1, tau=0):
        self.max_iter = max_iter
        self.tol = tol
        self.learning_rate = learning_rate
        self.tau = tau
        self.X = None
        self.fit_time = None
        self.error = None
        self.iter = None
        

    def fit(self, Y, y=None):
        start_time = time.time()
        #Init X
        self.X = lil_matrix(Y.shape, dtype = np.float64) #Sparse matrix with 64 bit float values for better stability
        iter = 0
        error = self.tol + 99
        prev_error = 100
        while(error > self.tol and iter < self.max_iter and prev_error > error):
            #Update the previous error
            prev_error = error

            #Update the iteration count
            iter += 1

            #Update the X matrix
            X_half = self.X + self.learning_rate * (Y - self.X)
            U, S, Vt = svds(X_half, k = min(X_half.shape) - 1) # Take full possible rank
            S = np.maximum(S - self.tau, 0) #Soft thresholding according to the paper
            U_sparse = csr_matrix(U)
            Vt_sparse = csr_matrix(Vt)
            S_sparse = csr_matrix(np.diag(S))
            self.X = U_sparse @ S_sparse @ Vt_sparse 

            #Calculate the error
            error = self._error_calc(self.X, Y)

        self.error = error
        self.iter = iter
        self.fit_time = time.time() - start_time
        print(f"Fit time: {self.fit_time:.6f} seconds with {self.iter} iterations and error {self.error:.6f}.")
        return self
    
    def score(self, Y, y=None):
        # Extract observed entries in the validation fold
        val_indices = list(zip(Y.nonzero()[0], Y.nonzero()[1]))
        val_values = Y.data

        # Compute the error only on the observed entries in the validation fold
        error = self._error_calc_validation(self.X, val_indices, val_values)
        return -error  # For scikit-learn, higher score is better

    def _error_calc(self, X_S, Y_S):

        # Convert sparse matrices to dense arrays if necessary
        if issparse(X_S):
            X_S = X_S.toarray()
        if issparse(Y_S):
            Y_S = Y_S.toarray()
        
        #Norm calculation X_S - Y_S
        normXminusY = np.linalg.norm(X_S - Y_S, ord=2)

        #Norm of Y_S
        norm_Y = np.linalg.norm(Y_S, ord=2)

        if norm_Y == 0:
            raise ValueError("Norm of Y is 0")

        #Calculate the error
        error = normXminusY/norm_Y
    
        return error

    def _error_calc_validation(self, X, val_indices, val_values):
        
        # Extract predicted values at the validation indices
        pred_values = np.array([X[i, j] for (i, j) in val_indices])

        # Compute the Frobenius norm of the difference
        numerator = np.linalg.norm(pred_values - val_values, ord=2)

        # Compute the Frobenius norm of the validation values
        denominator = np.linalg.norm(val_values, ord=2)

        if denominator == 0:
            raise ValueError("Norm of validation values is 0")

        # Compute the relative error
        error = numerator / denominator

        return error


In [17]:
from sklearn.model_selection import KFold

# Fit once for timing purposes
svt_estimator = SVTEstimator(learning_rate=1, max_iter=100, tol=1e-3, tau=tau)
svt_estimator.fit(sparse_matrix_train)
SVTTimingOneIter = svt_estimator.fit_time/svt_estimator.iter
print("Fit time SVT One iteration: ", SVTTimingOneIter)

# Perform cross-validation

svt_estimator = SVTEstimator(learning_rate=1, max_iter=100, tol=1e-3, tau=tau)
   
#Perform 5 fold CV
cv_scores = cross_val_score(svt_estimator, sparse_matrix_train, cv=5, n_jobs=10)

print("Cross validation errors: ", -cv_scores)
print("Mean CV error: ", -np.mean(cv_scores))


Fit time: 35.117518 seconds with 3 iterations and error 0.064137.
Fit time SVT One iteration:  11.705839236577352
Fit time: 11.886038 seconds with 2 iterations and error 0.070169.
Fit time: 11.927572 seconds with 2 iterations and error 0.072201.
Fit time: 12.304155 seconds with 2 iterations and error 0.071113.
Fit time: 16.400495 seconds with 3 iterations and error 0.069509.
Fit time: 19.446574 seconds with 4 iterations and error 0.074855.
Cross validation errors:  [0.93201281 0.92213269 0.93473676 0.93388739 0.93098753]
Mean CV error:  0.9307514365661478


### Improve SVT via learning rate

In [22]:
from sklearn.model_selection import KFold
# Define a range for learning rates
learning_rates = [0.7, 0.8, 0.9, 1]

# Perform cross-validation
cv_errors = []
for learning_rate in learning_rates:
    svp_estimator = SVTEstimator(tau=tau, learning_rate=learning_rate, max_iter=100, tol=1e-3)
    cv_scores = cross_val_score(svp_estimator, sparse_matrix_train, cv=KFold(n_splits=5, shuffle=True, random_state=42), n_jobs=10)
    cv_errors.append(-np.mean(cv_scores))  # Convert scores to errors
    print("Cross validation errors: ", -cv_scores)
    print("Mean CV error: ", -np.mean(cv_scores))

# Find the best lambda
best_learning_rate = learning_rates[np.argmin(cv_errors)]
print(f"Best leraning rate: {learning_rate}")

# Best error
best_error_svt = np.min(cv_errors)
print(f"Best error: {best_error_svt}")

Fit time: 563.936560 seconds with 31 iterations and error 0.109637.
Fit time: 579.147510 seconds with 29 iterations and error 0.099491.
Fit time: 596.240002 seconds with 29 iterations and error 0.099346.
Fit time: 606.115481 seconds with 31 iterations and error 0.103978.
Fit time: 616.166507 seconds with 32 iterations and error 0.099348.
Cross validation errors:  [0.93941369 0.93967368 0.90919421 0.93241664 0.93610611]
Mean CV error:  0.9313608659037327
Fit time: 424.424214 seconds with 23 iterations and error 0.095932.
Fit time: 442.844289 seconds with 22 iterations and error 0.087054.
Fit time: 446.545746 seconds with 22 iterations and error 0.086928.
Fit time: 455.662537 seconds with 23 iterations and error 0.090981.
Fit time: 474.814241 seconds with 25 iterations and error 0.086930.
Cross validation errors:  [0.93989508 0.93975547 0.90949782 0.93260845 0.93624331]
Mean CV error:  0.9316000249196227
Fit time: 304.901247 seconds with 16 iterations and error 0.085273.
Fit time: 337.23

### Conclusion SVT
* SVT provides soft thresholding to the singular values
* This in essence is the nuclear norm regularization
* It is less robust to noise
* Convergence speed is slower
* Performance can get worst in presense of outlier
* $\tau$ parameter tuning is very important

## ADMiRA (Atomic decomposition for Minimization Risk Approximation)
ADMIRA is based on the idea of decomposing a signal or function into a combination of simpler, "atomic" components. These atoms are selected from a predefined dictionary (a set of basis functions or features) to best approximate the target signal. The goal is to achieve a sparse representation (using as few atoms as possible) while minimizing the approximation error or risk.

ADMIRA is an extension of the compressed sensing matching pursuit algorithm, where it iteratively selects "atoms" (rank 1 matrices) to approximate the target matrix, making it particularly useful for matrix completion and robust PCA problems. The approach is a "greedy".

The algorithm flow is as follows:  
Input the target rank.
* 1. SVD the residual and select the top 2 x Rank atoms (rank 1 matrices)
* 2. Add the selected atoms to the current approximation
* 3. Truncate to retain the top rank atoms
* 4. Update the residual (Y - X)
* 5. Calculate the error

In [23]:
import numpy as np
from scipy.sparse import lil_matrix, csr_matrix, issparse
from scipy.sparse.linalg import svds
import time
from sklearn.base import BaseEstimator

class ADMIRAEstimator(BaseEstimator):
    def __init__(self, rank=10, max_iter=100, tol=1e-3):
        self.rank = rank
        self.max_iter = max_iter
        self.tol = tol
        self.X = None
        self.fit_time = None
        self.error = None
        self.iter = None

    def fit(self, Y, y=None):
        start_time = time.time()
        # Initialize the approximation matrix X
        self.X = lil_matrix(Y.shape, dtype=np.float64)
        residual = Y - self.X  # Initial residual
        iter = 0
        error = self.tol + 99
        prev_error = 100

        while error > self.tol and iter < self.max_iter and prev_error > error:
            # Update the previous error
            prev_error = error
            
            # Step 1: Select top 2 * rank atoms from the residual
            U, S, Vt = svds(residual, k=2 * self.rank)
            U = csr_matrix(U)
            Vt = csr_matrix(Vt)
            S = csr_matrix(np.diag(S))

            # Step 2: Combine the selected atoms with the current approximation
            X_candidate = self.X + U @ S @ Vt

            # Step 3: Truncate to retain only the top rank atoms
            U_trunc, S_trunc, Vt_trunc = svds(X_candidate, k=self.rank)
            U_trunc = csr_matrix(U_trunc)
            Vt_trunc = csr_matrix(Vt_trunc)
            S_trunc = csr_matrix(np.diag(S_trunc))
            self.X = U_trunc @ S_trunc @ Vt_trunc

            # Step 4: Update the residual
            residual = Y - self.X

            # Step 5: Calculate the error
            error = self._error_calc(residual, Y)
            iter += 1

        self.error = error
        self.iter = iter
        self.fit_time = time.time() - start_time
        print(f"Fit time: {self.fit_time:.6f} seconds with {self.iter} iterations and error {self.error:.6f}.")
        return self

    def score(self, Y, y=None):
        # Extract observed entries in the validation fold
        val_indices = list(zip(Y.nonzero()[0], Y.nonzero()[1]))
        val_values = Y.data

        # Compute the error only on the observed entries in the validation fold
        error = self._error_calc_validation(self.X, val_indices, val_values)
        return -error  # For scikit-learn, higher score is better

    def _error_calc(self, residual, Y):
        # Convert sparse matrices to dense arrays if necessary
        if issparse(residual):
            residual = residual.toarray()
        if issparse(Y):
            Y = Y.toarray()

        # Norm calculation residual
        norm_residual = np.linalg.norm(residual, ord=2)

        # Norm of Y
        norm_Y = np.linalg.norm(Y, ord=2)

        if norm_Y == 0:
            raise ValueError("Norm of Y is 0")

        # Calculate the error
        error = norm_residual / norm_Y

        return error

    def _error_calc_validation(self, X, val_indices, val_values):
        # Extract predicted values at the validation indices
        pred_values = np.array([X[i, j] for (i, j) in val_indices])

        # Compute the Frobenius norm of the difference
        numerator = np.linalg.norm(pred_values - val_values, ord=2)

        # Compute the Frobenius norm of the validation values
        denominator = np.linalg.norm(val_values, ord=2)

        if denominator == 0:
            raise ValueError("Norm of validation values is 0")

        # Compute the relative error
        error = numerator / denominator

        return error

In [24]:
admira = ADMIRAEstimator(rank=rank, max_iter=100, tol=1e-3)

#fit the model once for timing purposes
admira.fit(sparse_matrix_train)
ADMIRATimingOneIter = admira.fit_time/admira.iter
print("Fit time ADMIRA One iteration: ", ADMIRATimingOneIter)

#Perform 5 fold CV
cv_scores = cross_val_score(admira, sparse_matrix_train, cv=5, n_jobs=10)

print("Cross validation errors: ", -cv_scores)
print("Mean CV error: ", -np.mean(cv_scores))

Fit time: 57.088973 seconds with 2 iterations and error 0.064134.
Fit time ADMIRA One iteration:  28.54448652267456
Fit time: 59.380467 seconds with 2 iterations and error 0.063242.
Fit time: 60.474447 seconds with 2 iterations and error 0.058039.
Fit time: 61.567287 seconds with 2 iterations and error 0.060150.
Fit time: 62.456772 seconds with 2 iterations and error 0.059354.
Fit time: 79.724009 seconds with 3 iterations and error 0.060736.
Cross validation errors:  [0.949641   0.94135507 0.95269472 0.95222441 0.95165312]
Mean CV error:  0.9495136629343817


## Conclusion

The relative errors of the 3 algorithms are very close (in the range of $0.05 \tilde 0.07$). In terms of computation time the SVT takes the longest to execute. SVP is the fastest followed by ADMiRA.

### Performance and Convergence
SVP is simpler to analyze and implement than other methods and has a strong geometric convergence rate, making it faster. ADMiRA has a performance guarantee for the general case where the solution is approximately low-rank and the measurements are noisy. SVT is sensitive to noise and outliers and the convergence is slow.
### Computational Complexity and Scalability
SVP has low computational demands compared to other methods. For this dataset SVT takes longer than SVP. SVT usually becomes prohibitively expensive for moderately large datasets. ADMiRA also took a longer to execute compared to SVP.

### Choice of algorithm
SVP seems like the best choice here since the rank of the matrix is fixed in the beginning. SVP is based on iterative hard-thresholding. The algorithm involves projecting candidate solutions onto the set of low-rank matrices and has a geometric convergence rate even with noisy measurements. SVT on the other hand is sensitive to outliers and is too slow to converge. 