Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Q about sequential models #615

Closed
mayaKaplansky opened this issue Dec 24, 2020 · 4 comments
Closed

[QUESTION] Q about sequential models #615

mayaKaplansky opened this issue Dec 24, 2020 · 4 comments
Labels
question Further information is requested

Comments

@mayaKaplansky
Copy link

Hi
I would like to experiment on several sequential models.
I have user session set of clicks and I want to be able to predict the next click in each stage of the session.
For example, a session has: click1, click2, click3, click4.
I want to be able to predict from click1 to click2, and also given click1 and click 2 I want to predict click3 and so on.
Not just the last click.
In other models I examined, I had to build the train data in a way that I feed each session by click, meaning:
X1: click1
Y1: click2
X2: click1, click2
Y2: click3
X3: click1, click2,click3
Y3: click4

Should I do the same using your library or no need?
Many thanks!!

@mayaKaplansky mayaKaplansky added the enhancement New feature or request label Dec 24, 2020
@ShanleiMu
Copy link
Member

Please refer to #614.

@ShanleiMu ShanleiMu added question Further information is requested and removed enhancement New feature or request labels Dec 25, 2020
@chenyushuo
Copy link
Collaborator

We always do data augmentation in the sequential model. You can see the following code for details:

def prepare_data_augmentation(self):
"""Augmentation processing for sequential dataset.
E.g., ``u1`` has purchase sequence ``<i1, i2, i3, i4>``,
then after augmentation, we will generate three cases.
``u1, <i1> | i2``
(Which means given user_id ``u1`` and item_seq ``<i1>``,
we need to predict the next item ``i2``.)
The other cases are below:
``u1, <i1, i2> | i3``
``u1, <i1, i2, i3> | i4``
Returns:
Tuple of ``self.uid_list``, ``self.item_list_index``,
``self.target_index``, ``self.item_list_length``.
See :class:`SequentialDataset`'s attributes for details.
Note:
Actually, we do not realy generate these new item sequences.
One user's item sequence is stored only once in memory.
We store the index (slice) of each item sequence after augmentation,
which saves memory and accelerates a lot.
"""
self.logger.debug('prepare_data_augmentation')
if hasattr(self, 'uid_list'):
return self.uid_list, self.item_list_index, self.target_index, self.item_list_length
self._check_field('uid_field', 'time_field')
max_item_list_len = self.config['MAX_ITEM_LIST_LENGTH']
self.sort(by=[self.uid_field, self.time_field], ascending=True)
last_uid = None
uid_list, item_list_index, target_index, item_list_length = [], [], [], []
seq_start = 0
for i, uid in enumerate(self.inter_feat[self.uid_field].values):
if last_uid != uid:
last_uid = uid
seq_start = i
else:
if i - seq_start > max_item_list_len:
seq_start += 1
uid_list.append(uid)
item_list_index.append(slice(seq_start, i))
target_index.append(i)
item_list_length.append(i - seq_start)
self.uid_list = np.array(uid_list)
self.item_list_index = np.array(item_list_index)
self.target_index = np.array(target_index)
self.item_list_length = np.array(item_list_length)
return self.uid_list, self.item_list_index, self.target_index, self.item_list_length

@mayaKaplansky
Copy link
Author

Thats perfect thank you so much!!

@cramraj8
Copy link

cramraj8 commented Dec 20, 2022

@chenyushuo

When I run the run_recbole for a GRU4Rec in sequential setting, the data augmentation were never run through. I tried to put print statements in the scripts to see if the data_augmentation function is called (

self.data_augmentation()
), but it never did.

And I am getting the below error because of that

"/home/xxx/xxx/RecBole/RecBole/recbole/model/sequential_recommender/gru4rec.py", line 84, in forward
    seq_output = self.gather_indexes(gru_output, item_seq_len - 1)
  File "/home/xxx/xxx/RecBole/RecBole/recbole/model/abstract_recommender.py", line 174, in gather_indexes
    output_tensor = output.gather(dim=1, index=gather_index)
RuntimeError: index 122111 is out of bounds for dimension 1 with size 53

Any ideas how I can solve it or am I missing any config statements ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants