## Stochastic Gradient Descent
For the stochastic gradient descent function, we make update formulae for the U and V respectively under SDG.

Within SGD the update functions for vectors in the user and item matrices are as follows: 

For all $m, i \in R$ where $R_{m,i}$ is an observed rating and $\alpha$ is the rate parameter,

$U_{i} = U_{i} + \alpha V_{m}(R_{m,i} - <V_{m}, U_{i}>)$

$V_{m} = V_{m} + \alpha U_{i}(R_{m,i} - <V_{m}, U_{i}>)$

Within the error function we pass in the inner product of $V$ and $U$, $\hat{R}$, along with ratings matrix $R$ to calculate the RMSE. RMSE was calculated as follows:

For vectors $x_i \in R$, $y_i \in \hat{R}$, $\text{RMSE}(R, \hat{R}) =  \left[\frac{1}{n}\sum_{i=1}^{n} \|x_i - y_i\|_2^2 \right]^{1/2}$

## Alternating Least Squares
### Introduction
Our implementation of ALS was based off of lecture 14 from CME 323: Distributed Algorithms and Optimization, Spring 2015 from Stanford. ALS is quite similar to stochastic gradient descent but differs in one key aspect; instead of updating by vector, $U$ and $V$ alternate on being fixed while optimizing the other. Additionally, under our implementation of ALS from Stanford the user and item matrices $U$ and $V$ both are of dimension  $k$ x $n$ and $k$ x $m$ respectively. The complete ratings matrix $R$ is thus estimated via $\hat{R} = U^TV$. 

This is formulated as the following non-convex optimization problem which seeks to minimize least squared error and handle regularization to avoid overfitting:

$$\min_{U,V} \sum_{r_{ij} \text{observed}}{(r_{ij}-u_{i}^Tv_{j})^2} + \lambda(\sum_{i}\|u_i\|^2 + \sum_{j}\|v_j\|^2)$$

While gradient descent can be used, it is slow and requires a large amount of iterations which leads us into ALS. By fixing $U$ we obtain a convex function of $V$ and vice versa. Therefore in ALS we fix and optimize opposite matrices until convergence. Below is the general algorithm as described in the Stanford materials:

* Initialize $k$ x $n$ and $k$ x $m$ matrices $U$ and $V$
* Repeat the the following until convergence
    * For all column vectors $i = 1,... , n$    
    $$ u_i = (\sum_{r_{ij}\in r_{i *}}{v_jv_j^T + \lambda I_k})^{-1} \sum_{r_{ij}\in r_{i *}}{r_{ij}v_{j}}$$

    * For all column vectors $j = 1,... , m$
    $$ v_j = (\sum_{r_{ij}\in r_{* j}}{u_iu_i^T + \lambda I_k})^{-1} \sum_{r_{ij}\in r_{* j}}{r_{ij}u_{i}}$$

To break it down into pieces:
* $\sum v_jv_j^T$ and $\sum u_iu_i^T$ represent the sum of column vectors multiplied by their transpose where the vectors are determined by either the column vectors correspond to items that user u_i has rated in $V$ or the column vectors correspond to the users in $U$ that have rated item $v_j$.
* $\lambda I_k$ represents the addition of a regularization term $\lambda$ to avoid overfitting.
* $\sum_{r_{ij}\in r_{i *}}{r_{ij}v_{j}}$ and $\sum_{r_{ij}\in r_{* j}}{r_{ij}u_{i}}$ represent the scaling of each column feature vector by a rating with indexing handled in the same way as $\sum v_jv_j^T$ and $\sum u_iu_i^T$

maybe I want to just discuss indexing separately at the start and then modify talking about it in each point?


### Implementation 
Dont forget to talk about regularization
ALS begins when the matrix $R$ is created. Since ALS requires us to subset $V$ and $U$ for columns that correspond to items a user has rated or users that have rated an item we used several hash maps to store these indices. Hash maps were created during matrix initialization which was efficient as we were already iterating over items that a user has rated meaning we could efficiently populate our hash maps with necessary information. With the matrix created and our maps initialized, we created $U$ and $V$ as random matrices with numbers drawn from a uniform distribution. For the optimization steps we found that $\sum_{r_{ij}\in r_{i *}}{v_jv_j^T}$ and $\sum_{r_{ij}\in r_{* j}}{u_iu_i^T}$ are the same as $V_jV_j^T$ and $U_iU_i^T$ with $U_i$ and $U_j$ being the subsets of $U$ and $V$ corresponding to observed ratings. However, this same process did not apply to $\sum_{r_{ij}\in r_{i *}}{r_{ij}v_{j}}$ and $\sum_{r_{ij}\in r_{* j}}{r_{ij}u_{i}}$, instead we found that we could multiply the observed ratings as a row vector by $V_j^T$ or $U_i^T$ and get the same result as taking the sum.

Our final update functions for our matrices thus looked like:

$ u_i = ({V_jV_j^T + \lambda I_k})^{-1} {R_{i*}V_{j}^T}$


$ v_j = ({U_iU_i^T + \lambda I_k})^{-1} {R_{*j}U_{i}^T}$

#### Starting implementation

In [None]:
# Imports
from tqdm import tqdm
import matrix_modules
import pandas as pd
import numpy as np

# load in the data
ratings, news, users = matrix_modules.load_dataset()

# create the matrix and user and item hash maps 
R, item_idx, user_idx = matrix_modules.create_item_cluster_mat(ratings, news, isALS=True)

# make them into lists and sort
item_idx = {cluster_number : sorted(list(users)) for cluster_number, users in item_idx.items()}
user_idx = {user_id : sorted(list(ratings)) for user_id, ratings in user_idx.items()}

# is it possible to just filter out users that havent shown up yet? no but this is relying on the full data of them
# so well need to implement more

# initialize U and V
K = 5 # five latent factors tentatively 
I = len(user_idx) # number of users
M = 30 # number of items
np.random.seed(42)
U = np.random.uniform(0, 1, size=K*I).reshape((K, I))
V = np.random.uniform(0, 1, size=K*M).reshape((K, M))
Uold = np.zeros_like(U)
Vold = np.zeros_like(V)

# initialize a dataframe of the matrix to look at data
df = pd.DataFrame(R)
df

In [None]:
def rmse(X, Y):
    return np.sqrt(np.nanmean((X-Y)**2))

def max_update(X, Y):
    return np.noram(((X-Y)/Y).ravel(), np.inf)

error = [(0, rmse(R, np.inner(U, V)))]
update = [(0, max(max_update(Uold, U), max_update(Vold, V)))]

In [None]:
def alternating_least_squares(U, V, R, user_map, item_map, max_iterations=10, lambda_reg=0.01):
    """
    Takes in the ratings matrix, user and item matrices, and performes alternating least squares optimization for iterations
    determined by max_iterations regularized by lambda_reg.

    Args:
        U (np.ndarray) : The k x n user feature matrix.
        V (np.ndarray) : The k x m item feature matrix.
        R (np.ndarray) : The ratings matrix.
        user_map (dict) : The hash map containing user ids as keys and item indices as values, gets used to subset the ratings matrix and V.
        item_map (dict) : The hash map containing item ids as keys and user indices as values, gets used to subset the ratings matrix and U.
        max_iterations (int) : The number of iterations to run alternating least squares for.
        lambda_reg (float) : The regularization term in the alternating least squares algorithm.

    Returns:
        U (np.ndarray) : The optimized user feature matrix.
        V (np.ndarray) : The optimized item feature matrix.
    """
    # Initialize k and the number of columns in each matrix
    k, u_cols = U.shape
    _, v_cols = V.shape
    k_In = np.diag(np.full(k, lambda_reg))

    # Start optimizing U and V
    for iteration in tqdm(range(1, max_iterations+1), total=max_iterations, desc='Starting ALS iterations'):
        # Fix V and optimize U
        for i in tqdm(range(u_cols), total=u_cols, desc='Optimizing U', leave=False):
            # Using translator inverse here to make sure we are using small matrix
            ratings_row = R[i, user_map[i]]
            rated_items = V[:, user_map[i]]

            # Update the ith vector of U
            U[:, i] = np.linalg.inv((rated_items @ rated_items.T) + k_In) @ (ratings_row @ rated_items.T)

        # Fix U and optimize V
        for j in tqdm(range(v_cols), total=v_cols, desc='Optimizing V', leave=False):
            # Get the ratings for the item 
            ratings_row = R[item_map[j], j]
            user_features = U[:, item_map[j]]

            # Update the jth vector of V
            V[:, j] = np.linalg.inv((user_features @ user_features.T) + k_In) @ (ratings_row @ user_features.T)

        Uold = U.copy()
        Vold = V.copy()
        # Calculate the error for this iteration
        error += [(iteration, rmse(R, np.inner(V,U)))]
        update += [(iteration, max(max_update(Uold, U), max_update(Vold, V)))]

    
    return U, V


In [None]:
U_new, V_new = alternating_least_squares(U, V, R, user_idx, item_idx)

In [None]:
r_hat = U_new.T @ V_new

## Factorization Machines
### Introduction
Introduced in 2010, factorization machines offered a combination between matrix factorization methods and regression? svm? _check_. Factorization machines capture all single and pairwise interactions between variables with a closed model equation computable in linear time. This is advantagous as it allows for the usage of stochastic gradient descent to learn model parameters. 
Factorization machines utilize high dimensional feature vectors along with a feature matrix denoted as $V$. For our implementation we implemented a factorization model of degree 2, which per __source__ has the following equation: 

$$ \hat{y}(x) := w_0 + \sum_{i=1}^{n}{w_i x_i} + \sum_{i=1}^{n}\sum_{j=i+1}^{n} \langle \bold{v}_i, \bold{v}_j\rangle x_i x_j$$

The model parameters that are estimated include $w_0$, $w$ and $V$ where $w_0$ represents global bias, $w$ represents the weights of all possible features in a feature vector $x$ and $V$ is a $n$ x $k$ feature matrix. Pairwise interactions are modeled by $\langle \bold{v}_i, \bold{v}_j\rangle$.

A row within the feature matrix $V$ is defined as $v_i$ which describes the i-th feature with $k$ factors where $k$ represents the dimensionality of the factorization. 





### Gradient Descent
Per the introductory paper on factorization machines, model parameters $w_0$, $w$ and $V$ can all be learned via gradient descent methods on a variety of losses. As a result, we utilized stochastic gradient descent to optimize and tune our model parameters with our data. Below is the gradient vector of the function $\hat{y}$ for the estimated model parameters.
$$\frac{\partial}{\partial\theta}\hat{y}(x) = \begin{cases} 1, & \text{if } \theta \text{ is } w_0 \\ x_i, & \text{if } \theta \text{ is } w_i \\ x_i\sum_{j=1}^{n}{v_{j,f}}x_j - v_{i,f}x_i^2, & \text{if } \theta \text{ is } v_{i,f} \end{cases} $$

Additionally, to stay consistent in our judgement of our baseline models, we focused on minimizing residuals under the squared loss function, $(y - \hat{y})^2$, which is shown below: 

 $$(y -  (w_0 + \sum_{i=1}^{n}{w_i x_i} + \sum_{i=1}^{n}\sum_{j=i+1}^{n} \langle \bold{v}_i, \bold{v}_j\rangle x_i x_j))^2$$

Given the gradient vector and standard loss we utilize the following update formulae: 

$w_0^{new} = w_0 + \alpha2(y - \hat{y})$

$w_i^{new} = w_i + \alpha2(y - \hat{y}) * x_i $

$v_{i,f}^{new} = v_{i,f} + \alpha2(y - \hat{y}) * x_i\sum_{j=1}^{n}{v_{j,f}}x_j - v_{i,f}x_i^2$

### Implementation
Factorization machines facilitate the usage of high dimensional feature vectors. To manage the large amount of users and items in our dataset, a sparse matrix was created from the tensorflow compatible dataset as a way to hold our feature vectors efficiently in memory. Feature vectors included information about the user, their interaction, rating, previous interactions, median time of day of interactions, item features and personal taste (maybe add later on once it all works). Model parameters $V$, $w_0$ and $w$ were initialized randomly with samples from a uniform distribution. Parameters were updated at different cadences as every feature vector contained multiple $w_i$ and $v_{i,f}$, meaning $w_0$ was updated once every feature vector and $w_i$ and $v_{i,f}$ were updated for every instance of a feature within a single feature vector. 

### Current implementation is not utilizing clustering results and operating under sparse matrix with just zero and 1 for now

In [None]:
# matrix[index] = {'user_id' : user_id_coded[user_id], 'time' : time_coded[time],
#                         'news_id' : news_id_coded[news_id], 'category' : category_coded[category],
#                         'sub_category' : sub_category_coded[sub_category], 'history' : []}
        
# under clustering how does this change? instead of having 70000 indices for articles we have 30 for clusters and the rating Y is a lot larger to predict, and we'll have to change the format of the tensorflow dataset
# to better reflect this, namely in tfds we would trim down every set of interactions into clusters, so if user A interacted with 2 items from IC 1 and 3 items from IC2 we would just have those cumulative scores for them 

def y_hat(w_0, x, w, features_matrix):
    """ 
    Calculates y hat which is used to find the loss
    """
    # Initialize our feature indices with x, and the scores to be all 1's.
    feature_indices = x

    # Scores is all 1's for now since the ratings are all implicit.
    scores = np.full((len(x), 1), 1, dtype='int8') # will have to change under the different clustering methods because ratings are a lot more diverse.
    
    # Get the \sum{i=1}^{n}{w_ix_i} term
    scaled_feature_weights = w[feature_indices] * scores
    
    # Get a total for the summation of inner product and the corresponding feature ratings
    # Initialize a total to add on to during the summation (can check later if just doing inner product of matrix is better)
    total = 0

    # Get the number of rows in the subset of the original features matrix
    rows, _ = features_matrix.shape

    # Nested for loop, for all rows in the feature matrix we need to calculate their pairwise calculations (this might be better done with numbers that are gained with clustering, however I believe this is still general enough)
    for row_1 in range(rows):
        for row_2 in range(rows): 
            if row_1 == row_2:
                # if one row is the same as the other we dont care about that
                pass
            else:
                # Add the full calculation to total
                total += x[feature_indices[row_1]] * x[feature_indices[row_2]] * np.inner(features_matrix[row_1, :], features_matrix[row_2, :])
    # return the linear combination of everything to predict y
    return w_0 + total + scaled_feature_weights

def update_w_0():
    """
    Updates w_0 for stochastic gradient descent
    """

def update_w_i(x_i, w_i, alpha, err):
    """
    Updates all w_i terms 
    """
    return w_i + 2*alpha*err*x_i

def update_v_ij(x, v_ij, alpha, err, subset, rowi):
    """
    Updates all v_ij terms
    """
    total = 0
    rows, _ = subset.shape
    for rowj in range(rows):
        if rowj == rowi:
            pass
        else:
            total += subset[rowj, :] * x[rowj] - subset[rowi, :] * x[rowi]^2
    return v_ij + 2*alpha*err*x[rowi]*total





In [None]:
np.random.seed(42)

def factorization_machine(feature_vectors, k, n_features, alpha):
    """
    Implementation of the factorization machine algorithm.
    """
    # Steps for factorization machine:
    
    # init random model parameters w_0, w and V
    w_0 = 1

    # Initialize w
    w = np.random.uniform(0, 1, size=n_features).reshape((1, n_features))
    w_old = np.zeros_like(w)

    # Initialize V
    V = np.random.uniform(0, 1, size=n_features*K).reshape((n_features, k))
    Vold = np.zeros_like(V)

    # iterate through every value
    for row in range(len(feature_vectors)):
        # update all w_i and v_ij at iteration
        # Might be more efficient if we do it this way instead... 

        # Create a subset of the matrix for all rows corresponding to the features that are found in the feature vector
        indices = feature_vectors[row]["indices"] # gets a list of all populated indices in the row, indices is of form [1, 456, 990899, etc]
        scores = feature_vectors[row]["scores"] # gets a list of all scores in the row, for now scores is of the form [1, 1,1,1, 1,1 ] but this is modular enough to utilize the previous clustering results
        features_matrix = V[indices, :] # gets the subset of the feature matrix 
        err = y_hat(w_0, indices, w, features_matrix) # calculates the error
        for i in range(len(scores)):
            w[:, indices[i]] = update_w_i(scores[i], w[:, indices[i]], alpha, err)
            V[indices[i], :] = update_v_ij(scores, V[indices[i], :], alpha, err, features_matrix, i) # update the index corresponding to indices i in the features matrix  
        w_0 = w_0 + 2 * alpha * err


    return w_0, w, V


Recall that Gradient Descent is calculated as follows: var (new) = var(old) - learning_rate (gradient)

In [1]:
import matrix_modules
import numpy as np
import pandas as pd
from tqdm import tqdm

In [None]:
def prepare_sparse_feature_vectors(dataset, news):
    """ 
    Prepares a sparse representation of feature vectors for factorization machines.
    
    Args:
        dataset (pd.DataFrame) : The tensorflow compatible dataset for the sparse vectors.
        news (pd.DataFrame) : The news dataset that is used to get news Ids for proper indexing in the sparse representation.
    """
    # Encode user IDs to indices for mapping in the feature vector w and feature matrix V
    user_id_coded = {u_id : index for index, u_id in enumerate(dataset['user_id'])} 
    
    # Get the maximum to add to the next set of indices
    max_ = user_id_coded[dataset['user_id'].to_list()[-1]] + 1

    # Repeat the above process for categories, sub categories, news IDs, history, and times
    category_coded = {category : index + max_ for index, category in enumerate(news['category'].unique())}
    max_2 = category_coded[news['category'].unique()[-1]] + 1    
    sub_category_coded = {sub_category : index + max_2 for index, sub_category in enumerate(news['sub_category'].unique())} 
    max_3 = sub_category_coded[news['sub_category'].unique()[-1]] + 1
    news_id_coded = {news_id : index + max_3 for index, news_id in enumerate(news['news_id'])}
    max_4 = news_id_coded[news['news_id'].to_list()[-1]] + 1
    news_id_history_coded = {news_id : index + max_4 for index, news_id in enumerate(news['news_id'])}
    max_5 = news_id_history_coded[news['news_id'].to_list()[-1]] + 1
    time_coded = {index : index + max_5 for index in range(24)}
    
    # load the dataset with different aggregation techniques (should probably just update the function in matrix modules.)
    dataset = pd.DataFrame() 
    train_split = '80_20'
    for i in range(2):
        df = pd.read_csv(f"../MIND_large/{train_split}/train_chunk{i}.csv", index_col=0)
        dataset = pd.concat([dataset, df])   
    previously_viewed = dataset.groupby('user_id')['news_id'].apply(list)
    user_item_map = previously_viewed.to_dict()
    new_data = []
    for user_id in dataset['user_id']:
        new_data.append(user_item_map[user_id])
    dataset['viewed_items'] = new_data

    def create_sparse_vectors(df):
        """
        Uses the hash maps which map features to their indices to create a sparse matrix representation of the data in the form of a dictionary
        with row index as keys and the rows indices and scores as values.

        Args:
            df (pd.DataFrame) : The dataframe of the tensorflow compatible dataset used to create feature vectors.

        Returns:
            matrix (dict) : A matrix as a sparse representation.
        """
        # Initialize the empty matrix.
        matrix = {}

        # Iterate over the relevant information in the matrix.
        for index, row in tqdm(enumerate(zip(df['user_id'], df['time'], df['news_id'], df['category'], df['sub_category'], df['viewed_items'])), total=len(df), desc='Populating Sparse Matrix'):

            # Unpack the values from the row.
            user_id, time, news_id, category, sub_category, history = row

            # Populate the matrix at the row.
            matrix[index] = {'indices' : [user_id_coded[user_id], category_coded[category], sub_category_coded[sub_category], news_id_coded[news_id]], 'history' : [], 'scores' : []}
            
            # Might need to figure out how to populate scores
            # DEAL WITH SCORES HERE


            #  {'user_id' : user_id_coded[user_id], 'time' : time_coded[time],
                            # 'news_id' : news_id_coded[news_id], 'category' : category_coded[category],
                            # 'sub_category' : sub_category_coded[sub_category], 'history' : []}
            
            # Add the items in the users history to the index 
            for item in history:
                matrix[index]['history'].append(news_id_history_coded[item])
            matrix[index]['indices'].append(time_coded[time])
            
        return matrix

    matrix = create_sparse_vectors(dataset)
    return matrix

In [2]:
dataset, _, _ = matrix_modules.load_dataset()
news = pd.read_csv('../MIND_large/csv/news.csv', index_col=0)

user_id_coded = {u_id : index for index, u_id in enumerate(dataset['user_id'])} # datset needs to be the one with grouped by info 
user_id_coded
max_ = user_id_coded[dataset['user_id'].to_list()[-1]] + 1
category_coded = {category : index + max_ for index, category in enumerate(news['category'].unique())}
category_coded

max_2 = category_coded[news['category'].unique()[-1]] + 1
max_2
sub_category_coded = {sub_category : index + max_2 for index, sub_category in enumerate(news['sub_category'].unique())} 
sub_category_coded

max_3 = sub_category_coded[news['sub_category'].unique()[-1]] + 1
max_3
news_id_coded = {news_id : index + max_3 for index, news_id in enumerate(news['news_id'])}
news_id_coded


max_4 = news_id_coded[news['news_id'].to_list()[-1]] + 1
max_4
news_id_history_coded = {news_id : index + max_4 for index, news_id in enumerate(news['news_id'])}
news_id_history_coded 
max_5 = news_id_history_coded[news['news_id'].to_list()[-1]] + 1
max_5
time_coded = {index : index + max_5 for index in range(24)}
time_coded


In [None]:
# category_coded sub_category_coded news_id_coded news_id_history_coded time_coded
def create_sparse_vectors(df):
    """
    Creates sparse feature vectors 
    """
    matrix = {}
    for index, row in tqdm(enumerate(zip(df['user_id'], df['time'], df['news_id'], df['category'], df['sub_category'], df['viewed_items'])), total=len(df), desc='Populating Sparse Matrix'):
        user_id, time, news_id, category, sub_category, history = row
        matrix[index] = {'indices' : [user_id_coded[user_id], category_coded[category], sub_category_coded[sub_category], news_id_coded[news_id]], 'history' : []}
        
        
        #  {'user_id' : user_id_coded[user_id], 'time' : time_coded[time],
                        # 'news_id' : news_id_coded[news_id], 'category' : category_coded[category],
                        # 'sub_category' : sub_category_coded[sub_category], 'history' : []}
        for item in history:
            matrix[index]['history'].append(news_id_history_coded[item])
        matrix[index]['indices'].append(time_coded[time])
        matrix[index].sort()

    return matrix

matrix = create_sparse_vectors(dataset)
matrix

In [2]:
import pandas as pd
import matrix_modules
import numpy as np

In [15]:
full_ratings, news, users = matrix_modules.load_dataset(full=True)

In [17]:
R, item_idx, user_idx = items[0]
item_idx

{0: {1,
  3,
  4,
  5,
  6,
  10,
  11,
  12,
  13,
  14,
  15,
  16,
  22,
  23,
  24,
  27,
  30,
  31,
  33,
  38,
  39,
  40,
  47,
  49,
  52,
  55,
  56,
  57,
  58,
  60,
  64,
  65,
  67,
  72,
  73,
  77,
  79,
  81,
  83,
  85,
  86,
  87,
  90,
  94,
  96,
  98,
  102,
  103,
  105,
  106,
  109,
  113,
  118,
  121,
  122,
  124,
  130,
  135,
  137,
  140,
  142,
  144,
  149,
  151,
  154,
  156,
  158,
  161,
  163,
  165,
  168,
  169,
  172,
  176,
  178,
  179,
  180,
  182,
  189,
  190,
  192,
  194,
  195,
  196,
  200,
  201,
  202,
  207,
  212,
  216,
  226,
  229,
  230,
  231,
  232,
  233,
  234,
  235,
  236,
  237,
  240,
  246,
  247,
  249,
  253,
  255,
  256,
  259,
  262,
  264,
  265,
  266,
  267,
  268,
  270,
  272,
  275,
  276,
  279,
  280,
  282,
  287,
  292,
  294,
  297,
  298,
  301,
  302,
  305,
  306,
  308,
  309,
  311,
  312,
  313,
  314,
  316,
  318,
  320,
  323,
  324,
  327,
  328,
  343,
  345,
  348,
  355,
  357,
  358,
  360

In [18]:
len(item_idx)

30

In [19]:
len(user_idx)

255990

In [3]:
print("Loading in data")
full_ratings, news, users = matrix_modules.load_dataset(full=True)


# Create a ratings matrix R, and item and user index hash maps for easy subsetting.
items = [matrix_modules.create_item_cluster_mat(full_ratings, news, isALS=True, num_users=len(users), num_clusters=len(news['cluster'].unique()))]

del news, users

# items is now of the form [(R, user_idx, item_idx)]
counter = 0
results = pd.DataFrame()

R, item_idx, user_idx = items[0]

# Make the indices into lists and sort them.
item_idx = {num : sorted(list(users)) for num, users in item_idx.items()}
user_idx = {user_id : sorted(list(ratings)) for user_id, ratings in user_idx.items()}

# Create feature matrices for testing.
user_features = pd.read_csv("../MIND_large/csv/full_user_clusters.csv", index_col=0)
item_features = pd.read_csv("../MIND_large/csv/full_item_features.csv", index_col=0)
user_features_new, item_features_new = matrix_modules.create_features(user_features, item_features, userClustering=False, umapDim=1)



Loading in data


OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


In [14]:
len(users)

NameError: name 'users' is not defined

In [13]:
len(item_idx)

255990

In [6]:
user_features_new.T.shape

(1, 255990)

In [10]:
item_features_new.T.shape

(1, 30)

In [20]:
K = 5 # Here is where we choose the number of latent factors we would like to include in our matrices.
I = len(user_idx) # number of users
M = len(item_idx) # number of items
U = np.random.uniform(0, 1, size=K*I).reshape((K, I))
V = np.random.uniform(0, 1, size=K*M).reshape((K, M))
# V = np.concatenate((V, item_features_new.T), axis=0)

In [21]:
U.shape

(5, 255990)

In [22]:
U_news = np.concatenate((U, user_features_new.T), axis=0)
U_news.shape

(6, 255990)