-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questions about the training process #3
Comments
Hi, we are not using BPTT in training the network (sequences in recommender systems are rather short and BPTT didn't pay off the additional complexity). I suggest you take a look at this paper to get a better idea of the training process. It is essentially the same, HGRU just uses an additional layer to keep track of the user of each step of the sessions included in the minibatch. |
Hi, |
That's right, most users had few sessions (5/10). Nevertheless, in other experiments (not reported in the paper) we saw that the model behaves well also with users having more sessions (up to 20-30) despite the training doesn't use BPTT; interestingly, the gain of HGRU over GRU grows with the number of sessions in the user history in the scenarios we have tested (similar to video recommendation). Massimo |
em, that's really interesting. Have you tried using BPTT in the HGRU when users have up to 20-30 sessions? If yes, dose the training without BPTT behave better than that with the BPTT? |
No I did not, I cannot help you with that, sorry. |
hi,
I'm not familiar with Theano, so I have some questions about the training process.
According to the code in line 919-936 in hgru4rec.py, it seems that the input length of data is set to 1 in each mini-batch, which means each mini-batch only consists of data from one time step. I am wondering, in this way, could the error back propagation through time?
The text was updated successfully, but these errors were encountered: