#Project 3: Matrix Factorization Methods
by Latif Masud

##Overview
For this project, I used movie recommender prediction matrix as the baseline, and then performed an SVD to fine tune the predictions. 

##Libraries
For this project, just the standard `numpy` and `scipy` libraries are needed. 

In [3]:
from numpy import *
from scipy import optimize

##Recommender System
The code below is the code from project two. It is refactored to produce a function that will yield the predictions. The function simply takes in the number of users and movies and generates random matrix of ratings, and then makes predictions for all users. 

In [8]:
def normalize_ratings(ratings, did_rate):
    num_movies = ratings.shape[0]
    
    ratings_mean = zeros(shape = (num_movies, 1))
    ratings_norm = zeros(shape = ratings.shape)
    
    for i in range(num_movies): 
        idx = where(did_rate[i] == 1)[0]
        
        #  Calculate mean 
        ratings_mean[i] = mean(ratings[i, idx])
        ratings_norm[i, idx] = ratings[i, idx] - ratings_mean[i]
    
    return ratings_norm, ratings_mean

def unroll_params(X_and_theta, num_users, num_movies, num_features):
    m = X_and_theta[:num_movies * num_features]
    x = m.reshape((num_features, num_movies)).transpose()
    y = X_and_theta[num_movies * num_features:]
    theta = y.reshape(num_features, num_users ).transpose()
    return x, theta

def calculate_gradient(X_and_theta, ratings, did_rate, num_users, num_movies, num_features, reg_param):
	X, theta = unroll_params(X_and_theta, num_users, num_movies, num_features)
	
	# we multiply by did_rate because we only want to consider observations for which a rating was given
	difference = X.dot( theta.T ) * did_rate - ratings
	X_grad = difference.dot( theta ) + reg_param * X
	theta_grad = difference.T.dot( X ) + reg_param * theta
	
	# wrap the gradients back into a column vector 
	return r_[X_grad.T.flatten(), theta_grad.T.flatten()]

def calculate_cost(X_and_theta, ratings, did_rate, num_users, num_movies, num_features, reg_param):
    X, theta = unroll_params(X_and_theta, num_users, num_movies, num_features)

    cost = sum( (X.dot( theta.T ) * did_rate - ratings) ** 2 ) / 2
    regularization = (reg_param / 2) * (sum( theta**2 ) + sum(X**2))
    return cost + regularization

def generate_predictions (num_users, num_movies):
	ratings = random.randint(11, size= (num_movies, num_users))

	did_rate = (ratings != 0)*1

	ratings, ratings_mean = normalize_ratings(ratings, did_rate)
	num_users = ratings.shape[1]
	num_features = 3

	movie_features = random.randn( num_movies, num_features )
	user_prefs = random.randn( num_users, num_features )
	initial_X_and_theta = r_[movie_features.T.flatten(), user_prefs.T.flatten()]

	reg_param = 30

	minimized_cost_and_optimal_params = optimize.fmin_cg(
		calculate_cost, 
		fprime=calculate_gradient, 
		x0=initial_X_and_theta,
		args=(ratings, did_rate, num_users, num_movies, num_features, reg_param),
		maxiter=100, disp=True, full_output=True)


	cost, optimal_movie_features_and_user_prefs = minimized_cost_and_optimal_params[1], minimized_cost_and_optimal_params[0]
	movie_features, user_prefs = unroll_params(optimal_movie_features_and_user_prefs, num_users, num_movies, num_features)
	all_predictions = movie_features.dot( user_prefs.T )

	# predictions_for_latif = all_predictions[:, 0:1] + ratings_mean
	# print predictions_for_latif
	return ratings, all_predictions

##Singular Value Decomposition
To perform the SVD, I first get some predictions by defining the number of movies and users that I would like to get predictions. Below, we see the predictions and ratings matrices. 

In [9]:
num_movies = 4
num_users = 4

ratings, predictions = generate_predictions (num_users, num_movies)

ratings

Optimization terminated successfully.
         Current function value: 23.041667
         Iterations: 9
         Function evaluations: 22
         Gradient evaluations: 22


array([[ 0.33333333, -1.66666667,  1.33333333,  0.        ],
       [-0.75      ,  2.25      , -1.75      ,  0.25      ],
       [ 2.        ,  1.        ,  0.        , -3.        ],
       [-2.66666667, -0.66666667,  3.33333333,  0.        ]])

In [10]:
predictions = matrix(predictions)
predictions

matrix([[ -5.53853069e-17,  -1.42417524e-16,   4.56549720e-17,
          -9.26551084e-17],
        [  3.24354524e-18,  -1.51954546e-16,   8.41832418e-17,
          -1.58561635e-16],
        [  2.28809159e-16,  -7.10492278e-17,   6.80980117e-19,
           1.03227063e-16],
        [  1.77491298e-16,  -1.47839560e-16,   5.48058584e-17,
          -2.42709579e-17]])

Now, we go ahead and make linear dependencies between the rows of predictions: 

In [12]:
predictions[0] = predictions[1] + predictions[2]
predictions[-1] = predictions[1]

predictions

matrix([[  2.32052705e-16,  -2.23003774e-16,   8.48642219e-17,
          -5.53345715e-17],
        [  3.24354524e-18,  -1.51954546e-16,   8.41832418e-17,
          -1.58561635e-16],
        [  2.28809159e-16,  -7.10492278e-17,   6.80980117e-19,
           1.03227063e-16],
        [  3.24354524e-18,  -1.51954546e-16,   8.41832418e-17,
          -1.58561635e-16]])

We see above that now the rows of the matrix are related to one another. Now, we can perform our SVD using `numpy's` svd function.

In [13]:
V, Sigma, Ustar = linalg.svd(predictions)

Now that we have our initial SVD matrices, we go ahead and find the number of non-zero values we have produced by first defning the diagnals of sigma, and then counting the number of non-zero values: 

In [15]:
V = matrix(V)
Ustar = matrix(Ustar)

SigmaDag = zeros((num_movies, num_users))
SigmaDag[[0,1], [0,1]] = 1/Sigma[:2]

S = count_nonzero(SigmaDag)

Once we have figured out the number of non-zero values the SVD has produced, we simply reduce our original matrix down to the ones where the values apply: 

In [17]:
Sig = mat(eye(S)*Sigma[:S])

new_predictions = V[:,:S]

new_predictions

matrix([[-0.74274582,  0.21983777],
        [-0.41143934, -0.48033079],
        [-0.33130649,  0.70016856],
        [-0.41143934, -0.48033079]])

##Conclusion
From the data, we can see that the only prediction that held true was the on the first row, second value as this is the only positive value. Everything else came out as negative, indicating they were not good predictors. 