-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learning with Marks #8
Comments
It could be that If this doesn't solve the problem, here are a few follow-up questions:
|
Thanks for the pointers - the data did indeed have duplicate timestamps - I've cleaned those up now. I should note that my |
In general, there is nothing wrong with having very negative values. As you said, this just reflects different scaling of the data. Simply rescaling the arrival (or inter-event times) should fix this. I guess a good idea is to rescale the times such that the average inter-arrival time is equal to one. It's important, though, that you scale all the sequences by the same factor. A simple example to demonstrate the above point: Imagine having a uniform distribution Do you still get NaNs now after removing the duplicates? |
Yes, removing the duplicates got rid of the NaNs. Thanks much I've rescaled such that the average delta is 1 - although I do still see negative loss - some of my data points, even after scaling, are still very close together. The distribution of the inter-event times is very bimodal in my dataset which maybe poses a problem - I'll reduce the number of mixture components as per #5 (comment). It's mentioned in the paper that you normalise the loss by subtracting the score of LogNormMix - is this already done in the code you have provided here? I see that model.log_prob ultimately ends up calling Lastly, I'm wondering if you had implemented at some point simulation/sampling with marks as well? - with reference to your response #6 (comment), I guess it would need to draw from Thank you very much again for your time! |
Subtract the loss of LogNormMix is done only for visualization in Figure 3. As I said before, we could arbitrarily shift the loss values for all models by the same amount by rescaling the inter-event times, so the absolute value of the loss for each model is irrelevant, only the differences between the models are (e.g. if two models have losses 200.1 and 200.5, we could change them to 0.1 and 0.5 by simple rescaling). In case of marks, you would need to create a categorical distribution to sample the marks from
|
Ah okay. Thanks for clarifying. Since it relates to learning and simulation specifically in the case of marks I'll mention it here - I was able to use your code provided in the other issue for simulation without marks, but had some errors which I'm also not entirely sure how to correct when trying to sample from a model that has been trained with marks; Notably: I don't quite think I'm resolving that correctly - any guidance is appreciated. |
Here is the code that should work from torch.distributions import Categorical
next_in_time = torch.zeros(1, 1, 1)
next_mark_emb = torch.zeros(1, 1, general_config.mark_embedding_size)
h = torch.zeros(1, 1, history_size)
inter_times = []
marks = []
t_max = 1000
with torch.no_grad():
while sum(inter_times) < t_max:
rnn_input = torch.cat([next_in_time, next_mark_emb], dim=-1)
_, h = model.rnn.step(rnn_input, h)
tau = model.decoder.sample(1, h)
inter_times.append(tau.item())
next_in_time = ((tau + 1e-8).log() - mean_in_train) / std_in_train
mark_logits = model.mark_layer(h)
mark_dist = Categorical(logits=mark_logits)
next_in_mark = mark_dist.sample()
marks.append(next_in_mark.item())
next_mark_emb = model.rnn.mark_embedding(next_in_mark) |
Great! Thanks. I've managed to modify if slightly so that it works with an LSTM, although that raised one additional question, since the LSTM hidden state output is a tuple. |
It's up to you to decide whether to use the hidden state or the output of the LSTM to obtain the conditional distribution. I don't have a strong intuition here. Probably, both version should work equally well. |
Thanks for the release of your paper and code.
In trying to implement learning with marks with the provided interactive notebook, adapting the remarks in the paper, I'm also running into some trouble. Based on appendix F.2. I assume it's a case of just adding the terms?
model.log_prob
in this case returns the(time_log_prob, mark_nll, accuracy)
- so adapting for the training loop, is it as simple as changing lines as below?:for
As a side problem - when doing the above with my custom dataset (which conforms to the same formatting as the example datasets, so arrival_times and marks), all loss terms are
NaN
. I'm wondering if you might have some insight as to why this might be occurring! When using the reddit dataset with the above modifications, I get non-zero loss terms for bothlog_prob
andmark_nll
.The text was updated successfully, but these errors were encountered: