Non-session based custom dataset #48

halilergul1 · 2023-11-20T08:20:16Z

Hi,

I want to try to train this model on my custom transactional dataset. I wonder whether [GRU4Rec] works intrinsically with session-based data. I have only item_id, user_id and time dimensions. Can I treat users as sessions or should I create a session dimension to run this model? Thanks in advance.

hidasib · 2023-11-29T14:05:41Z

The short answer is yes, but there are two things to keep in mind.

In some cases user histories can be considered as sequences, but in other cases not. This depends on the domain, but also on what type of events you use. For example, purchase histories in general e-commerce (e.g, electronics) rarely exhibit sequential patterns, because the purchases are not really related directly (e.g. the phone I buy today is very loosely connected to the laptop I bought 6 months ago; those are two mostly unrelated problems from the user's perspective). On the other hand, if you take the item page view events of the same store, you'll probably see sequential patterns, because while I was looking for the laptop (or the phone) I went over a sequential process of finding the best one for me, looked at different options, changed my preferences slightly during the process, etc. Also, the events were close to each other in time so it is more likely that the existing sequential behaviour can be observed. But this is not only about the timeframe, regularity can be another factor. If you have experience with the domain from which you have the data, you can probably decide if the observable user behaviour - given the event type, regularity, resolution, etc - can be considered sequential. If you are unsure, one thing you can do is to separately run a hyperparameter optimization with GRU4Rec and "FFN4Rec" from here (which is basically the same algorithm, but with the GRU layer replaced with a feedforward network). If you get very similar final evaluation scores than there is probably no sequentiality in your data.
If your user histories are long, the BPTT version of GRU4Rec can probably give you better results. It is not in the repo at the moment. (The base version should still work.)

If your user's events can (and usually do) happen closely after other (e.g. item page views), you can sessionize user histories. For example, you can say that if more than 30-60 minutes pass between two subsequent events of the same user, you consider those to be different sessions. You can look at how we sessionize our data here (look for the *_preproc.py scripts).
Even if you use user histories, your train/test split should be time based. Unfortunately, the public version doesn't really support starting your prediction from a non-zero hidden state, which is something you'd probably want in this case (i.e. training up to time T and predicting events after time T for the same users, that requires moving the hidden state to time T).

halilergul1 · 2023-12-11T09:00:00Z

Thanks a lot for your detailed and helpful answer. My data is on e-commerce domain. It is transactional data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-session based custom dataset #48

Non-session based custom dataset #48

halilergul1 commented Nov 20, 2023

hidasib commented Nov 29, 2023

halilergul1 commented Dec 11, 2023 •

edited

Non-session based custom dataset #48

Non-session based custom dataset #48

Comments

halilergul1 commented Nov 20, 2023

hidasib commented Nov 29, 2023

halilergul1 commented Dec 11, 2023 • edited

halilergul1 commented Dec 11, 2023 •

edited