Why put the test[u][0] into the candidate seqences? #21

BEbillionaireUSD · 2021-02-07T05:46:34Z

In the evaluate function, it makes a item_index list and put the test[u][0] in it.
What I consider is that the test[u][0] should be what we want to predict, but in this way, the model knows it should predict from the possibility of these candidates, including the one we want to predict.
Is this a kind of data leaking? Or did I misunderstand something?

BEbillionaireUSD · 2021-02-07T05:48:42Z

Specifically, what I mean is this part

for i in reversed(train[u]):
seq[idx] = i
idx -= 1
if idx == -1: break
rated = set(train[u])
rated.add(0)
item_idx = []#[test[u][0]]
for _ in range(101):
t = np.random.randint(1, itemnum + 1)
while t in rated: t = np.random.randint(1, itemnum + 1)
item_idx.append(t)

    predictions = -model.predict(*[np.array(l) for l in [[u], [seq], item_idx]])

kang205 · 2021-02-07T08:06:02Z

As I remember, test[u][0] should be the validation item. The meaning is: given the training events, plus the validation item (I.e., the second last item), predict the test item (the last item).

On Sat, Feb 6, 2021 at 9:48 PM cherylLbt ***@***.***> wrote: Specifically, what I mean is this part for i in reversed(train[u]): seq[idx] = i idx -= 1 if idx == -1: break rated = set(train[u]) rated.add(0) item_idx = []#[test[u][0]] for _ in range(101): t = np.random.randint(1, itemnum + 1) while t in rated: t = np.random.randint(1, itemnum + 1) item_idx.append(t) predictions = -model.predict(*[np.array(l) for l in [[u], [seq], item_idx]]) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA5DLVCOPB3XYUMXWROCUSDS5YSURANCNFSM4XHCQ4OQ> .

-- Wang-Cheng Kang

BEbillionaireUSD · 2021-02-07T11:11:04Z

Thanks for your quick reply!
But there is another function to calculate the validation recall rate, i.e., "evaluate_valid()"
In the function, it doesn't involve test but adds the valid item into the item_index

Here is the function:

rated = set(train[u])
rated.add(0)
item_idx = [valid[u][0]]
for _ in range(100):
    t = np.random.randint(1, itemnum + 1)
    while t in rated: t = np.random.randint(1, itemnum + 1)
    item_idx.append(t)

predictions = -model.predict(sess, [u], [seq], item_idx)
predictions = predictions[0]

My understanding of this phase is that: The model randomly chooses 100 candidates from all items (except those that have appeared before ) and adds the one it wants to predict into the candidate set. Then it predicts the probability of these 101 candidates.
The loop seems to be a little bit strange.

kang205 · 2021-02-07T15:45:40Z

The evaluation is all about time: the training/validation/test partition is based on time; for validation, the task is predicting the second last item in the user sequence given training events (all previous actions), and thus we don’t involve the test item (we haven’t seen the test item by the time of validation, you can think in this way) Your understanding is correct, it evaluates the ranking of the test item among 100 random samples, instead of all items. This is for faster evaluation that was commonly adopted in the literature, but don’t use this whenever possible as it’s actually a biased estimator.

On Sun, Feb 7, 2021 at 3:11 AM cherylLbt ***@***.***> wrote: Thanks for your quick reply! But there is another function to calculate the validation recall rate, i.e., "evaluate_valid()" In the function, it doesn't involve test but adds the valid item into the item_index Here is the function: rated = set(train[u]) rated.add(0) item_idx = [valid[u][0]] for _ in range(100): t = np.random.randint(1, itemnum + 1) while t in rated: t = np.random.randint(1, itemnum + 1) item_idx.append(t) predictions = -model.predict(sess, [u], [seq], item_idx) predictions = predictions[0] My understanding of this phase is that: The model randomly chooses 100 candidates from all items (except those that have appeared before ) and adds the one it wants to predict into the candidate set. Then it predicts the probability of these 101 candidates. The loop seems to be a little bit strange. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA5DLVEO3BRPA4PU5E2C6PTS5ZYNNANCNFSM4XHCQ4OQ> .

-- Wang-Cheng Kang

BEbillionaireUSD · 2021-02-08T03:03:29Z

Thanks! But what if I want to predict the next item without knowing the real-next-item? Let item_index contains all items?

coolsubbu · 2021-06-22T09:02:36Z

Hi ,

I would like to know the answer to same question asked by the CherlyLbt.

How do we predict the next item without knowing the real-next item?..

Thanks
coolsubbu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why put the test[u][0] into the candidate seqences? #21

Why put the test[u][0] into the candidate seqences? #21

BEbillionaireUSD commented Feb 7, 2021

BEbillionaireUSD commented Feb 7, 2021

kang205 commented Feb 7, 2021 via email

BEbillionaireUSD commented Feb 7, 2021

kang205 commented Feb 7, 2021 via email

BEbillionaireUSD commented Feb 8, 2021

coolsubbu commented Jun 22, 2021

Why put the test[u][0] into the candidate seqences? #21

Why put the test[u][0] into the candidate seqences? #21

Comments

BEbillionaireUSD commented Feb 7, 2021

BEbillionaireUSD commented Feb 7, 2021

kang205 commented Feb 7, 2021 via email

BEbillionaireUSD commented Feb 7, 2021

kang205 commented Feb 7, 2021 via email

BEbillionaireUSD commented Feb 8, 2021

coolsubbu commented Jun 22, 2021