Question on indices mapping - pure collaborative-filtering example #12

micheledemeo · 2020-10-13T10:29:27Z

import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import SVDpp  # pure data, algorithm SVD++

data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::", 
                   names=["user", "item", "label", "time"])

# split whole data into three folds for training, evaluating and testing
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])

train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_testset(eval_data)
test_data = DatasetPure.build_testset(test_data)

train_data.item_indices[np.where(train_data.user_indices==2124)]
# => array([ 990, 2207, 2125, 2051, 2534, 2452,  950, 1219, 1680, 1110])

data[(data['user']==2124) & (data['item']==990)]
# => no record for user 2124 & item 990**

Can you clarify how the map of indices works?

The text was updated successfully, but these errors were encountered:

Shadz13 · 2020-10-13T11:56:50Z

I have also encountered this for the you tube model and i am also working on tracing it back to the original id numbers for both item and user. My mapping differs from the original user id number.

Does this also change for the item id, and how can we match this back?

@massquantity -Can you please provide clarity on the you tube model as well, instead of us creating a new issue?

massquantity · 2020-10-13T12:34:19Z

Well sorry guys, I think this whole id-mapping thing needs a more thorough design.
The way of mapping can be found in libreco/data/data_info.py. When you call DatasetPure.build_trainset, the mapping will happen to all users and items. Here is the code:

    @property
    def user2id(self):
        unique = np.unique(self.interaction_data["user"])
        u2id = dict(zip(unique, range(self.n_users)))
        u2id[-1] = len(unique)   # -1 represent new user
        return u2id

Basically this operation maps original ids into range of [0, n_users]. Because it's way more convenient to deal with ids by mapping index first in the library.

To get the original ids :

>>> mapping_user = data_info.user2id        # get dict of mapping from original index to index used in the library
>>> mapping_user[2124] = ...
>>> mapping_id = data_info.id2user            # get dict of mapping from index used in the library to original index

And this also works for item2id and id2item.

micheledemeo · 2020-10-13T13:05:39Z

Thanks 👍

micheledemeo closed this as completed Oct 13, 2020

massquantity mentioned this issue Dec 8, 2020

Problem installation #30

Open

massquantity mentioned this issue Jan 16, 2021

Simple Loading of the You Tube Recommendation Model #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on indices mapping - pure collaborative-filtering example #12

Question on indices mapping - pure collaborative-filtering example #12

micheledemeo commented Oct 13, 2020

Shadz13 commented Oct 13, 2020 •

edited

Loading

massquantity commented Oct 13, 2020

micheledemeo commented Oct 13, 2020

Question on indices mapping - pure collaborative-filtering example #12

Question on indices mapping - pure collaborative-filtering example #12

Comments

micheledemeo commented Oct 13, 2020

Shadz13 commented Oct 13, 2020 • edited Loading

massquantity commented Oct 13, 2020

micheledemeo commented Oct 13, 2020

Shadz13 commented Oct 13, 2020 •

edited

Loading