### Implementing FunkSVD - Solution

In this notebook we will take a look at writing our own function that performs FunkSVD, which will follow the steps you saw in the previous video.  If you find that you aren't ready to tackle this task on your own, feel free to skip to the following video where you can watch as I walk through the steps.

To test our algorithm, we will run it on the subset of the data you worked with earlier.  Run the cell below to get started.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import sparse
# import svd_tests as t
%matplotlib inline

# Read in the datasets
movies = pd.read_csv('movies.csv')
reviews = pd.read_csv('ratings.csv')

In [3]:
movies.head(2)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy


In [41]:
reviews['Implicit'] = 1
implicit_matrix = reviews.pivot(index='userId', columns='movieId', values='Implicit')
implicit_np = implicit_matrix.fillna(0).values
print(reviews.head(2))
print()
print(implicit_np)

   userId  movieId  rating   timestamp  Implicit
0       1       16     4.0  1217897793         1
1       1       24     1.5  1217895807         1

[[0. 0. 0. ... 0. 0. 0.]
 [1. 0. 1. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [1. 1. 1. ... 0. 1. 0.]]


In [42]:
# Create user-by-item matrix
user_items = reviews[['userId', 'movieId', 'rating', 'timestamp']]
user_by_movie = user_items.groupby(['userId', 'movieId'])['rating'].first().unstack()
# basically this line creating matrix with userID and Movie id and filling the values as rating

# Create data subset
ratings_mat = np.matrix(user_by_movie)
print(ratings_mat)

[[nan nan nan ... nan nan nan]
 [5.  nan 2.  ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [3.  3.  2.  ... nan 4.5 nan]]


In [43]:
user_items.head(3)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,16,4.0,1217897793
1,1,24,1.5,1217895807
2,1,32,4.0,1217896246


`1.` You will use the **user_movie_subset** matrix to show that your FunkSVD algorithm will converge.  In the below cell, use the comments and document string to assist you as you complete writing your own function to complete FunkSVD.  You may also want to try to complete the funtion on your own without the assistance of comments.  You may feel free to remove and add to the function in any way that gets you a working solution! 

**Notice:** There isn't a sigma matrix in this version of matrix factorization.

In [14]:
def FunkSVD(ratings_mat, latent_features=4, learning_rate=0.0001, iters=100):
    '''
    This function performs matrix factorization using a basic form of FunkSVD with no regularization
    
    INPUT:
    ratings_mat - (numpy array) a matrix with users as rows, movies as columns, and ratings as values
    latent_features - (int) the number of latent features used
    learning_rate - (float) the learning rate 
    iters - (int) the number of iterations
    
    OUTPUT:
    user_mat - (numpy array) a user by latent feature matrix
    movie_mat - (numpy array) a latent feature by movie matrix
    '''
    
    # Set up useful values to be used through the rest of the function
    n_users = ratings_mat.shape[0]
    n_movies = ratings_mat.shape[1]
    num_ratings = np.count_nonzero(~np.isnan(ratings_mat))  ## return the count non NAN values
    
    # initialize the user and movie matrices with random values
    user_mat = np.random.rand(n_users, latent_features)
    movie_mat = np.random.rand(latent_features, n_movies)
    
    # initialize sse at 0 for first iteration
    sse_accum = 0
    
    # header for running results
    print("Optimizaiton Statistics")
    print("Iterations | Mean Squared Error ")
    
    # for each iteration
    for iteration in range(iters):

        # update our sse
        old_sse = sse_accum
        sse_accum = 0
        
        # For each user-movie pair
        for i in range(n_users):
            for j in range(n_movies):
                
                # if the rating exists
                if ratings_mat[i, j] > 0:
                    
                    # compute the error as the actual minus the dot product of the user and movie latent features
                    diff = ratings_mat[i, j] - np.dot(user_mat[i, :], movie_mat[:, j])
                    
                    # Keep track of the sum of squared errors for the matrix
                    sse_accum += diff**2
                    
                    # update the values in each matrix in the direction of the gradient
                    for k in range(latent_features):
                        user_mat[i, k] += learning_rate * (2*diff*movie_mat[k, j])
                        movie_mat[k, j] += learning_rate * (2*diff*user_mat[i, k])

        # print results for iteration
        print("%d \t\t %f" % (iteration+1, sse_accum / num_ratings))
        
    return user_mat, movie_mat 

`2.` Try out your function on the **user_movie_subset** dataset.  First try 4 latent features, a learning rate of 0.005, and 10 iterations.  When you take the dot product of the resulting U and V matrices, how does the resulting **user_movie** matrix compare to the original subset of the data?

In [13]:
user_mat, movie_mat = FunkSVD(ratings_mat, latent_features=4, learning_rate=0.005, iters=10)

Optimizaiton Statistics
Iterations | Mean Squared Error 
1 		 1.818485
2 		 1.005561
3 		 0.815117
4 		 0.757099
5 		 0.726403
6 		 0.706726
7 		 0.692563
8 		 0.681494
9 		 0.672288
10 		 0.664250


In [14]:
print(np.dot(user_mat, movie_mat))
print(ratings_mat)

[[3.74254625 3.12564738 2.94960103 ... 2.95777575 3.7813217  3.54420361]
 [4.15346294 3.42347213 3.28718451 ... 3.16479875 4.23408375 4.01544629]
 [3.87773304 3.16788576 3.08569479 ... 2.95070402 3.86948398 3.55391836]
 ...
 [3.60165184 2.81919134 2.85003319 ... 2.44972091 3.58200693 3.35486406]
 [3.72761626 3.16716278 3.08553802 ... 3.00710869 4.05674581 3.91766681]
 [3.64610261 3.02690147 2.92221242 ... 2.83406739 3.75253126 3.53785642]]
[[nan nan nan ... nan nan nan]
 [5.  nan 2.  ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [3.  3.  2.  ... nan 4.5 nan]]


**The predicted ratings from the dot product are already starting to look a lot like the original data values even after only 10 iterations.  You can see some extreme low values that are not captured well yet.  The 5 in the second to last row in the first column is predicted as an 8, and the 4 in the second row and second column is predicted to be a 7.  Clearly the model is not done learning, but things are looking good.**

`3.` Let's try out the function again on the **user_movie_subset** dataset.  This time we will again use 4 latent features and a learning rate of 0.005.  However, let's bump up the number of iterations to 250.  When you take the dot product of the resulting U and V matrices, how does the resulting **user_movie** matrix compare to the original subset of the data?  What do you notice about your error at the end of the 250 iterations?

In [7]:
user_mat, movie_mat = FunkSVD(ratings_mat, latent_features=4, learning_rate=0.005, iters=25)

Optimizaiton Statistics
Iterations | Mean Squared Error 
1 		 1.805307
2 		 1.003988
3 		 0.813784
4 		 0.756192
5 		 0.725731
6 		 0.706158
7 		 0.691983
8 		 0.680800
9 		 0.671383
10 		 0.663040
11 		 0.655354
12 		 0.648065
13 		 0.641017
14 		 0.634127
15 		 0.627363
16 		 0.620726
17 		 0.614238
18 		 0.607926
19 		 0.601819
20 		 0.595936
21 		 0.590292
22 		 0.584892
23 		 0.579739
24 		 0.574828
25 		 0.570155


In [8]:
print(np.dot(user_mat, movie_mat))
print(ratings_mat)

[[3.86369583 3.13803501 2.90827713 ... 2.74455902 4.35590931 3.54761509]
 [4.23847276 3.35930304 3.28389381 ... 2.91620664 4.93164154 4.47033271]
 [3.83463129 2.95393082 3.00763943 ... 2.55329904 4.50206533 4.24820136]
 ...
 [3.80665088 2.64934229 2.57000037 ... 2.24605603 4.08644329 3.10014865]
 [3.979836   3.11098902 3.02227418 ... 2.70033302 4.54507699 3.94050355]
 [3.60615454 2.69598211 2.68312209 ... 2.31610277 4.10468441 3.60656789]]
[[nan nan nan ... nan nan nan]
 [5.  nan 2.  ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [3.  3.  2.  ... nan 4.5 nan]]


**In this case, we were able to completely reconstruct the item-movie matrix to obtain an essentially 0 mean squared error. I obtained 0 MSE on iteration 165.**

The last time we placed an **np.nan** value into this matrix the entire svd algorithm in python broke.  Let's see if that is still the case using your FunkSVD function.  In the below cell, I have placed a nan into the first cell of your numpy array.  

`4.` Use 4 latent features, a learning rate of 0.005, and 250 iterations.  Are you able to run your SVD without it breaking (something that was not true about the python built in)?  Do you get a prediction for the nan value?  What is your prediction for the missing value? Use the cells below to answer these questions.

In [15]:
ratings_mat[0, 0] = np.nan
ratings_mat

matrix([[nan, nan, nan, ..., nan, nan, nan],
        [5. , nan, 2. , ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [3. , 3. , 2. , ..., nan, 4.5, nan]])

In [9]:
preds = np.dot(user_mat, movie_mat)
print("The predicted value for the missing rating is : {}".format(preds[1,0]))
print()
print("The actual value for the missing rating is : {}".format(ratings_mat[1,0]))
print()
assert np.isnan(preds[0,0]) == False
print("That's right! You just predicted a rating for a user-movie pair that was never rated!")
print("But if you look in the original matrix, this was actually a value of 10. Not bad!")

The predicted value for the missing rating is : 4.238472760919372

The actual value for the missing rating is : 5.0

That's right! You just predicted a rating for a user-movie pair that was never rated!
But if you look in the original matrix, this was actually a value of 10. Not bad!


`6.` Now that you have a set of predictions for each user-movie pair.  Let's answer a few questions about your results. Provide the correct values to each of the variables below, and check your solutions using the tests below.

In [24]:
# How many actual ratings exist in first_1000_users
num_ratings = np.count_nonzero(~np.isnan(first_1000_users))
print("The number of actual ratings in the first_1000_users is {}.".format(num_ratings))
print()


# How many ratings did we make for user-movie pairs that didn't have ratings
ratings_for_missing = first_1000_users.shape[0]*first_1000_users.shape[1] - num_ratings
print("The number of ratings made for user-movie pairs that didn't have ratings is {}".format(ratings_for_missing))

The number of actual ratings in the first_1000_users is 105339.

The number of ratings made for user-movie pairs that didn't have ratings is 6791761


In [29]:
# Test your results against the solution
assert num_ratings == 105339, "Oops!  The number of actual ratings doesn't quite look right."
assert ratings_for_missing == 6791761, "Oops!  The number of movie-user pairs that you made ratings for that didn't actually have ratings doesn't look right."

# Make sure you made predictions on all the missing user-movie pairs
preds = np.dot(user_mat, movie_mat)
assert np.isnan(preds).sum() == 0
print("Nice job!  Looks like you have predictions made for all the missing user-movie pairs! But I still have one question... How good are they?")

Nice job!  Looks like you have predictions made for all the missing user-movie pairs! But I still have one question... How good are they?


In [15]:
preds.shape

(668, 10325)

In [17]:
ratings_mat.shape

(668, 10325)

In [18]:
preds

array([[3.86369583, 3.13803501, 2.90827713, ..., 2.74455902, 4.35590931,
        3.54761509],
       [4.23847276, 3.35930304, 3.28389381, ..., 2.91620664, 4.93164154,
        4.47033271],
       [3.83463129, 2.95393082, 3.00763943, ..., 2.55329904, 4.50206533,
        4.24820136],
       ...,
       [3.80665088, 2.64934229, 2.57000037, ..., 2.24605603, 4.08644329,
        3.10014865],
       [3.979836  , 3.11098902, 3.02227418, ..., 2.70033302, 4.54507699,
        3.94050355],
       [3.60615454, 2.69598211, 2.68312209, ..., 2.31610277, 4.10468441,
        3.60656789]])

In [19]:
ratings_mat

matrix([[nan, nan, nan, ..., nan, nan, nan],
        [5. , nan, 2. , ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [3. , 3. , 2. , ..., nan, 4.5, nan]])

# SVD ++

In [44]:
import numpy as np

def svdpp(ratings_mat, implicit, latent_features=4, learning_rate=0.0001, iters=100):
    '''
    SVD++ Matrix Factorization with biases and implicit feedback.
    
    INPUT:
    ratings_mat - (numpy array) User-Item matrix with ratings.
    implicit - (numpy array) User-Item implicit feedback matrix.
    latent_features - (int) Number of latent features.
    learning_rate - (float) Learning rate for gradient descent.
    iters - (int) Number of iterations for gradient descent.
    
    OUTPUT:
    user_mat - (numpy array) User factors matrix.
    movie_mat - (numpy array) Item factors matrix.
    bu - (numpy array) User bias.
    bi - (numpy array) Item bias.
    '''
    # Useful values
    n_users, n_movies = ratings_mat.shape
    num_ratings = np.count_nonzero(~np.isnan(ratings_mat))
    
    # Initialize matrices & biases
    user_mat = np.random.rand(n_users, latent_features)
    movie_mat = np.random.rand(latent_features, n_movies)
    implicit_factors = np.random.rand(n_movies, latent_features)
    mu = np.nanmean(ratings_mat)  # global bias
    bu = np.random.rand(n_users)  # user biases
    bi = np.random.rand(n_movies)  # item biases
    
    # Iterative optimization
    for iteration in range(iters):
        sse_accum = 0
        for i in range(n_users):
            N_u = np.where(implicit[i, :] == 1)[0]
            N_u_len = np.sqrt(len(N_u) + 1e-10)  # adding a small value to avoid division by zero
            for j in range(n_movies):
                if ratings_mat[i, j] > 0:
                    implicit_sum = np.sum(implicit_factors[N_u, :], axis=0)
                    implicit_factor = implicit_sum / N_u_len
                    
                    # Calculate the error for this rating
                    dot_product = np.dot(user_mat[i, :] + implicit_factor, movie_mat[:, j])
                    pred = mu + bu[i] + bi[j] + dot_product
                    diff = ratings_mat[i, j] - pred
                    
                    # Update biases
                    bu[i] += learning_rate * (diff - 0.001 * bu[i])  # 0.001 is a regularization factor
                    bi[j] += learning_rate * (diff - 0.001 * bi[j])  # same here
                    
                    # Update matrices
                    for k in range(latent_features):
                        user_mat[i, k] += learning_rate * (2 * diff * movie_mat[k, j] - 0.001 * user_mat[i, k])
                        movie_mat[k, j] += learning_rate * (2 * diff * (user_mat[i, k] + implicit_factor[k]) - 0.001 * movie_mat[k, j])
                        for l in N_u:
                            implicit_factors[l, k] += learning_rate * (diff * movie_mat[k, j] / N_u_len - 0.001 * implicit_factors[l, k])
                    
                    # Sum squared error
                    sse_accum += diff ** 2
        
        # Print progress
        print(f"Iteration {iteration+1}: MSE = {sse_accum / num_ratings:.5f}")
        
    return user_mat, movie_mat, bu, bi, implicit_factors


In [45]:
user_matsvd, movie_matsvd, ysvd = svdpp(ratings_mat, implicit_np, latent_features=4, learning_rate=0.005, iters=25)

  movie_mat[k, j] += learning_rate * (2 * diff * (user_mat[i, k] + implicit_factor[k]) - 0.001 * movie_mat[k, j])
  sse_accum += diff ** 2
  movie_mat[k, j] += learning_rate * (2 * diff * (user_mat[i, k] + implicit_factor[k]) - 0.001 * movie_mat[k, j])


KeyboardInterrupt: 

In [8]:
def predict_rating(user_mat, movie_mat, bu, bi, mu, implicit_factors, user_idx, movie_idx, implicit_vec):
    N_u = np.where(implicit_vec == 1)[0]
    implicit_sum = np.sum(implicit_factors[N_u, :], axis=0)
    implicit_factor = implicit_sum / np.sqrt(len(N_u) + 1e-10)  # avoid division by zero
    dot_product = np.dot(user_mat[user_idx, :] + implicit_factor, movie_mat[:, movie_idx])
    pred = mu + bu[user_idx] + bi[movie_idx] + dot_product
    return pred

In [None]:
user_index = 5
movie_index = 6
predicted_rating = predict_rating(user_index, movie_index, U, M, Y,implicit)

# Asymmetric SVD

In [45]:
def AsymmetricSVD(ratings_mat, user_implicit_mat, latent_features=4, learning_rate=0.0001, iters=100):
    '''
    This function performs matrix factorization using AsymmetricSVD for implicit feedback.

    INPUT:
    ratings_mat - (numpy array) a matrix with users as rows, movies as columns, and ratings as values.
    user_implicit_mat - (numpy array) a matrix representing implicit interactions of users with items.
    latent_features - (int) the number of latent features used.
    learning_rate - (float) the learning rate.
    iters - (int) the number of iterations.

    OUTPUT:
    q_mat - (numpy array) a user by latent feature matrix representing explicit preferences.
    w_mat - (numpy array) a latent feature by item matrix for implicit interactions.
    user_bias - (numpy array) bias for each user.
    item_bias - (numpy array) bias for each item.
    global_bias - (float) global bias.
    '''
    
    n_users, n_items = ratings_mat.shape
    global_bias = np.mean(ratings_mat[np.where(ratings_mat > 0)])
    ratings_mat = np.array(ratings_mat)
    
    user_bias = np.zeros(n_users)
    item_bias = np.zeros(n_items)
    
    q_mat = np.random.rand(n_items, latent_features)
    w_mat = np.random.rand(latent_features, n_items)

    print("Optimization Statistics")
    print("Iterations | Mean Squared Error ")

    for iteration in range(iters):
        sse_accum = 0
        
        for i in range(n_users):
            for j in range(n_items):
                if ratings_mat[i, j] > 0:
                    N_u = np.where(user_implicit_mat[i, :] == 1)[0]
                    implicit_sum = np.sum(q_mat[N_u, :] * ratings_mat[i, N_u][:, np.newaxis], axis=0)

                    pred = global_bias + user_bias[i] + item_bias[j] + np.dot(implicit_sum, w_mat[:, j])
                    diff = ratings_mat[i, j] - pred
                    sse_accum += diff**2
                    
                    # Update biases
                    user_bias[i] += learning_rate * (diff - 0.002 * user_bias[i])
                    item_bias[j] += learning_rate * (diff - 0.002 * item_bias[j])

                    # Update factors
                    for k in range(latent_features):
                        for l in N_u:
                            q_mat[l, k] += learning_rate * (diff * w_mat[k, j] * ratings_mat[i, l] - 0.002 * q_mat[l, k])
                            w_mat[k, j] += learning_rate * (diff * implicit_sum[k] - 0.002 * w_mat[k, j])

        print("%d \t\t %f" % (iteration+1, sse_accum / np.count_nonzero(~np.isnan(ratings_mat))))

    return q_mat, w_mat, user_bias, item_bias, global_bias

In [46]:
user_matsvd, movie_matsvd,ubias, ibias, bias = AsymmetricSVD(ratings_mat, implicit_np, latent_features=4, learning_rate=0.005, iters=25)

Optimization Statistics
Iterations | Mean Squared Error 


  sse_accum += diff**2
  w_mat[k, j] += learning_rate * (diff * implicit_sum[k] - 0.002 * w_mat[k, j])
  w_mat[k, j] += learning_rate * (diff * implicit_sum[k] - 0.002 * w_mat[k, j])


KeyboardInterrupt: 

Both AsymmetricSVD and SVD++ are extensions of the traditional matrix factorization approach to incorporate implicit feedback. While they do seem similar due to the incorporation of implicit feedback, there are nuanced differences in how they handle and utilize this feedback. Here's a comparison:

1. **Approach to Implicit Feedback**:
    - **AsymmetricSVD**: Implicit feedback is combined directly with the user latent factors. Specifically, for a given user, the sum of the latent features of the items they have implicitly interacted with is added to the user's latent factors. This adjusted user factor is then used for predictions.
    - **SVD++**: Implicit feedback is considered as an additional set of factors (usually termed `Y`). These are summed up and combined with the original user factors before making a prediction. In the case of SVD++, the implicit factors are more like "corrections" or "adjustments" to the user factors based on the items they've interacted with.

2. **Model Complexity**:
    - **AsymmetricSVD**: Typically simpler in terms of number of parameters, especially when considering a large number of items/users. The main addition is the implicit sum being added to user factors.
    - **SVD++**: More parameters are introduced with the addition of the `Y` matrix. Given that every item has an associated vector in this `Y` matrix, the model can become computationally intensive for large datasets.

3. **Predictions**:
    - **AsymmetricSVD**: The prediction for a user-item pair is a dot product between the adjusted user factor (original user factors + sum of latent features from implicit interactions) and the item latent factors.
    - **SVD++**: The prediction is a dot product between the adjusted user factor (original user factors + sum of `Y` values corresponding to items the user has interacted with) and the item latent factors.

4. **Learning**:
    - Both models update the latent factors using gradient descent. However, in SVD++, there's an additional step where the implicit factors (`Y` matrix) are updated based on the items the user has interacted with.

5. **Practical Implications**:
    - AsymmetricSVD might be faster to compute due to its lesser complexity. 
    - SVD++ can potentially capture more intricate patterns due to the additional parameters, but at the cost of computational complexity.

To put it succinctly, while both methods aim to incorporate implicit feedback, they do so in slightly different ways. SVD++ introduces a distinct set of factors for the implicit feedback and adjusts user factors with these before making predictions. AsymmetricSVD, on the other hand, combines the user factors with a sum of item factors from the implicit feedback directly. These differences can lead to variations in prediction accuracy, training time, and computational complexity between the two methods.

In [31]:
def predict_rating(user_index, item_index, user_mat, item_mat, implicit_factors_mat, user_implicit_mat, user_bias, item_bias, global_bias):
    '''
    Predict the rating of a given user for a given item using the AsymmetricSVD model.

    INPUT:
    user_index - (int) Index of the user.
    item_index - (int) Index of the item/movie.
    user_mat - (numpy array) a user by latent feature matrix.
    item_mat - (numpy array) a latent feature by movie matrix.
    implicit_factors_mat - (numpy array) matrix of latent factors for implicit interactions.
    user_implicit_mat - (numpy array) a matrix representing implicit interactions of users with items.
    user_bias - (numpy array) bias for each user.
    item_bias - (numpy array) bias for each item.
    global_bias - (float) global bias.

    OUTPUT:
    prediction - (float) Predicted rating of the user for the item.
    '''
    
    N_u = np.where(user_implicit_mat[user_index, :] == 1)[0]
    implicit_sum = np.sum(implicit_factors_mat[N_u, :], axis=0)
    user_pref = user_mat[user_index, :] + implicit_sum / np.sqrt(len(N_u))

    prediction = global_bias + user_bias[user_index] + item_bias[item_index] + np.dot(user_pref, item_mat[item_index, :])
    
    return prediction

# Example usage:
# Let's say you want to predict the rating of user with index 5 for item with index 10:
# predicted_rating = predict_rating(5, 10, user_mat, item_mat, implicit_factors_mat, user_implicit_mat, user_bias, item_bias, global_bias)
# print(predicted_rating)

### Understanding Asymmetric SVD

In traditional matrix factorization, we attempt to decompose the rating matrix \( R \) into two matrices. One matrix represents user latent factors, while the other represents item latent factors. The primary goal of this decomposition is to reconstruct the original matrix as closely as possible by the dot product of these two matrices.

Asymmetric SVD offers a nuanced approach, especially with the incorporation of implicit feedback.

Let's breakdown the structure of the predicted rating formula:

$$
\tilde{r}_{ui} = \mu + b_i + b_u + \sum_{f=0}^{nfactors} \sum_{j=0}^{nitems} r_{uj} Q_{j,f} W_{f,i}
$$

Here's what each component represents:

1. \( \mu \): This symbolizes the global bias or the average rating across all items and users. It establishes a general baseline for all ratings.

2. \( b_i \): This is the bias for item \( i \). Essentially, it captures the divergence of ratings for item \( i \) from the global average.

3. \( b_u \): This represents the bias for user \( u \), indicating how a user \( u \) rates items compared to the average.

4. The double summation: This component is what distinctly characterizes Asymmetric SVD.
   - \( r_{uj} \): Represents the rating user \( u \) allocates to item \( j \). 
   - \( Q_{j,f} \): Represents the weight of the \( f^{th} \) latent factor for item \( j \). Intriguingly, rather than presenting user preferences directly, this matrix discerns user preferences through their ratings.
   - \( W_{f,i} \): Specifies the weight of the \( f^{th} \) latent factor for item \( i \).

This double summation is fundamentally a weighted sum of all the latent factors for every item rated by user \( u \). This weighted sum is then employed to predict the rating for item \( i \).

Furthermore, the equivalent item-item recommender would be expressed as:

$$
\tilde{R} = RS = RQ^TW
$$

Where:
- \( R \): The original matrix of ratings.
- \( Q^T \): The transposed form of the Q matrix, which is pivotal in capturing the "preferences" based on ratings.
- \( W \): This matrix is key in capturing the weight of latent factors for items.

The product \( RQ^T \) results in a user-by-latent factor matrix. These latent factors are directly inferred from user ratings. Multiplying this with \( W \) produces the predicted rating matrix.

The term "asymmetric" is derived from the fact that the similarity matrix, which arises from \( Q \) and \( W \), is inherently asymmetric. This asymmetry is attributed to the unique aspects captured by \( Q \) and \( W \). While \( Q \) deduces user preferences from their ratings, \( W \) discerns the significance of the latent factors for the items.
