[QUESTION] Q about sequential models #615

mayaKaplansky · 2020-12-24T19:19:33Z

Hi
I would like to experiment on several sequential models.
I have user session set of clicks and I want to be able to predict the next click in each stage of the session.
For example, a session has: click1, click2, click3, click4.
I want to be able to predict from click1 to click2, and also given click1 and click 2 I want to predict click3 and so on.
Not just the last click.
In other models I examined, I had to build the train data in a way that I feed each session by click, meaning:
X1: click1
Y1: click2
X2: click1, click2
Y2: click3
X3: click1, click2,click3
Y3: click4

Should I do the same using your library or no need?
Many thanks!!

ShanleiMu · 2020-12-25T02:12:33Z

Please refer to #614.

chenyushuo · 2020-12-25T02:50:07Z

We always do data augmentation in the sequential model. You can see the following code for details:

RecBole/recbole/data/dataset/sequential_dataset.py

Lines 40 to 94 in 90d5c3f

    
               def prepare_data_augmentation(self): 
        
                   """Augmentation processing for sequential dataset. 
        
                   E.g., ``u1`` has purchase sequence ``<i1, i2, i3, i4>``, 
        
                   then after augmentation, we will generate three cases. 
        
                   ``u1, <i1> | i2`` 
        
                   (Which means given user_id ``u1`` and item_seq ``<i1>``, 
        
                   we need to predict the next item ``i2``.) 
        
                   The other cases are below: 
        
                   ``u1, <i1, i2> | i3`` 
        
                   ``u1, <i1, i2, i3> | i4`` 
        
                   Returns: 
        
                       Tuple of ``self.uid_list``, ``self.item_list_index``, 
        
                       ``self.target_index``, ``self.item_list_length``. 
        
                       See :class:`SequentialDataset`'s attributes for details. 
        
                   Note: 
        
                       Actually, we do not realy generate these new item sequences. 
        
                       One user's item sequence is stored only once in memory. 
        
                       We store the index (slice) of each item sequence after augmentation, 
        
                       which saves memory and accelerates a lot. 
        
                   """ 
        
                   self.logger.debug('prepare_data_augmentation') 
        
                   if hasattr(self, 'uid_list'): 
        
                       return self.uid_list, self.item_list_index, self.target_index, self.item_list_length 
        
                   self._check_field('uid_field', 'time_field') 
        
                   max_item_list_len = self.config['MAX_ITEM_LIST_LENGTH'] 
        
                   self.sort(by=[self.uid_field, self.time_field], ascending=True) 
        
                   last_uid = None 
        
                   uid_list, item_list_index, target_index, item_list_length = [], [], [], [] 
        
                   seq_start = 0 
        
                   for i, uid in enumerate(self.inter_feat[self.uid_field].values): 
        
                       if last_uid != uid: 
        
                           last_uid = uid 
        
                           seq_start = i 
        
                       else: 
        
                           if i - seq_start > max_item_list_len: 
        
                               seq_start += 1 
        
                           uid_list.append(uid) 
        
                           item_list_index.append(slice(seq_start, i)) 
        
                           target_index.append(i) 
        
                           item_list_length.append(i - seq_start) 
        
                   self.uid_list = np.array(uid_list) 
        
                   self.item_list_index = np.array(item_list_index) 
        
                   self.target_index = np.array(target_index) 
        
                   self.item_list_length = np.array(item_list_length) 
        
                   return self.uid_list, self.item_list_index, self.target_index, self.item_list_length

mayaKaplansky · 2020-12-25T09:08:10Z

Thats perfect thank you so much!!

cramraj8 · 2022-12-20T07:45:09Z

@chenyushuo

When I run the run_recbole for a GRU4Rec in sequential setting, the data augmentation were never run through. I tried to put print statements in the scripts to see if the data_augmentation function is called (

RecBole/recbole/data/dataset/sequential_dataset.py

Line 49 in 0c7197b

self.data_augmentation()

), but it never did.

And I am getting the below error because of that

"/home/xxx/xxx/RecBole/RecBole/recbole/model/sequential_recommender/gru4rec.py", line 84, in forward
    seq_output = self.gather_indexes(gru_output, item_seq_len - 1)
  File "/home/xxx/xxx/RecBole/RecBole/recbole/model/abstract_recommender.py", line 174, in gather_indexes
    output_tensor = output.gather(dim=1, index=gather_index)
RuntimeError: index 122111 is out of bounds for dimension 1 with size 53

Any ideas how I can solve it or am I missing any config statements ?

mayaKaplansky added the enhancement New feature or request label Dec 24, 2020

ShanleiMu added question Further information is requested and removed enhancement New feature or request labels Dec 25, 2020

ShanleiMu closed this as completed Dec 26, 2020

Sherry-XLL added the dataset label Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Q about sequential models #615

[QUESTION] Q about sequential models #615

mayaKaplansky commented Dec 24, 2020

ShanleiMu commented Dec 25, 2020

chenyushuo commented Dec 25, 2020

mayaKaplansky commented Dec 25, 2020

cramraj8 commented Dec 20, 2022 •

edited

Loading

[QUESTION] Q about sequential models #615

[QUESTION] Q about sequential models #615

Comments

mayaKaplansky commented Dec 24, 2020

ShanleiMu commented Dec 25, 2020

chenyushuo commented Dec 25, 2020

mayaKaplansky commented Dec 25, 2020

cramraj8 commented Dec 20, 2022 • edited Loading

cramraj8 commented Dec 20, 2022 •

edited

Loading