Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test data generated from training data? #9

Open
diskun00 opened this issue Jun 10, 2019 · 0 comments
Open

test data generated from training data? #9

diskun00 opened this issue Jun 10, 2019 · 0 comments

Comments

@diskun00
Copy link

Hi Rianne,

I have a question regarding the support matrix data. From the code, it seems you are using train rating matrix as a full dataset to generate also test support matrix.

  1. rating support matrix rating_mx_train is generated from training rating data. (in testing, it contains training and validation data).

    gc-mc/gcmc/preprocessing.py

    Lines 191 to 193 in 722f37d

    rating_mx_train = np.zeros(num_users * num_items, dtype=np.float32)
    rating_mx_train[train_idx] = labels[train_idx].astype(np.float32) + 1.
    rating_mx_train = sp.csr_matrix(rating_mx_train.reshape(num_users, num_items))

  2. support matrix is generated from the adj_train which is the rating_mx_train

    gc-mc/gcmc/train.py

    Lines 203 to 216 in 722f37d

    adj_train_int = sp.csr_matrix(adj_train, dtype=np.int32)
    for i in range(NUMCLASSES):
    # build individual binary rating matrices (supports) for each rating
    support_unnormalized = sp.csr_matrix(adj_train_int == i + 1, dtype=np.float32)
    if support_unnormalized.nnz == 0 and DATASET != 'yahoo_music':
    # yahoo music has dataset split with not all ratings types present in training set.
    # this produces empty adjacency matrices for these ratings.
    sys.exit('ERROR: normalized bipartite adjacency matrix has only zero entries!!!!!')
    support_unnormalized_transpose = support_unnormalized.T
    support.append(support_unnormalized)
    support_t.append(support_unnormalized_transpose)

  3. But then 'test_support' is extracted from 'support'.

    test_support = support[np.array(test_u)]

Shouldn't we change line 192 to
rating_mx_train[idx_nonzero] = labels[idx_nonzero].astype(np.float32) + 1.0
such that all rating_mx_train contains all rating data.

gc-mc/gcmc/preprocessing.py

Lines 191 to 193 in 722f37d

rating_mx_train = np.zeros(num_users * num_items, dtype=np.float32)
rating_mx_train[train_idx] = labels[train_idx].astype(np.float32) + 1.
rating_mx_train = sp.csr_matrix(rating_mx_train.reshape(num_users, num_items))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant