Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about the training process #3

Closed
lllmmmyyy opened this issue Mar 8, 2018 · 5 comments
Closed

questions about the training process #3

lllmmmyyy opened this issue Mar 8, 2018 · 5 comments

Comments

@lllmmmyyy
Copy link

hi,
I'm not familiar with Theano, so I have some questions about the training process.

According to the code in line 919-936 in hgru4rec.py, it seems that the input length of data is set to 1 in each mini-batch, which means each mini-batch only consists of data from one time step. I am wondering, in this way, could the error back propagation through time?

@mquad
Copy link
Owner

mquad commented Mar 9, 2018

Hi,

we are not using BPTT in training the network (sequences in recommender systems are rather short and BPTT didn't pay off the additional complexity). I suggest you take a look at this paper to get a better idea of the training process. It is essentially the same, HGRU just uses an additional layer to keep track of the user of each step of the sessions included in the minibatch.

@lllmmmyyy
Copy link
Author

Hi,
I've examine the training process of the paper "Session-based Recommendations with RNN", but I think there is a difference between these two papers. In the HGRU, there is an additional layer to keep track of each user among different sessions. Since your training process works with the additional layer, does it mean that most users in the data set only have a few number of sessions?

@mquad
Copy link
Owner

mquad commented Mar 12, 2018

That's right, most users had few sessions (5/10). Nevertheless, in other experiments (not reported in the paper) we saw that the model behaves well also with users having more sessions (up to 20-30) despite the training doesn't use BPTT; interestingly, the gain of HGRU over GRU grows with the number of sessions in the user history in the scenarios we have tested (similar to video recommendation).
Hope it helps

Massimo

@lllmmmyyy
Copy link
Author

em, that's really interesting. Have you tried using BPTT in the HGRU when users have up to 20-30 sessions? If yes, dose the training without BPTT behave better than that with the BPTT?

@mquad
Copy link
Owner

mquad commented Mar 14, 2018

No I did not, I cannot help you with that, sorry.

@mquad mquad closed this as completed Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants