Combining data_pipeline and simple_example #7

hedgy123 · 2017-04-20T17:05:59Z

Hi Egil,

Thank you so much for making your code available! This is really great stuff.

So in trying to understand better how it all works I tried using the tensorflow.log-extracted data (as in your data_pipeline notebook) as inputs to the network (same config as in your simple_example). Unfortunately I got all nan's as losses:

Model summary:

 init_alpha:  -785.866918162
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru_1 (GRU)                  (None, 101, 1)            18        
_________________________________________________________________
dense_1 (Dense)              (None, 101, 2)            4         
_________________________________________________________________
activation_1 (Activation)    (None, 101, 2)            0         
=================================================================
Total params: 22.0
Trainable params: 22.0
Non-trainable params: 0.0

Results of running model.fit:

Train on 72 samples, validate on 24 samples
Epoch 1/75
2s - loss: nan - val_loss: nan
....

I was wondering if you've tried doing the same experiment and if so, whether it worked for you? Thanks so much!

The text was updated successfully, but these errors were encountered:

ragulpr · 2017-04-21T08:33:48Z

Hi there,
Thanks for reaching out! I need to be clearer about this, I haven't had time to join together the two scripts yet. I'll get back to you ASAP with an updated answer but for now:
init_alpha: -785.866918162 is an error. (<0)

Note that for big magnitutes of alpha mean of tte is same as the complex estimate using log etc

Furthermore

Initialization is important. Gradients explode if you're too far off. More censored data leads to higher probability of exploding grad initially.
Learning rate is dependent on data and can be in magnitudes you didn't expect
Are you feeding in masked steps? Varying length sequences has no clean implementation atm, haven't had time to get masking layer to work. Current solution: set n_timesteps = None and run training step with one input sequence with something like:

OBS NOT TESTED:

def epoch():
    for i in xrange(n_samples):
        model.fit(x_train[i,:seq_length[i],:], y_train[i,:seq_length[i],:],
                  epochs=1,
                  batch_size=1,
                  verbose=2
                  )

But even better debug-mode initially is to simply transform the data to [n_non_masked_samples,1,n_features] (feed in only seen timesteps) to a simple ANN and when that works test the RNN.

Would love to see forks!

ragulpr · 2017-04-24T15:00:46Z

There's multiple reasons for NANs to show up but just found a very important:

shift_discrete_padded_features is currently broken which is supposed to hide target but apparently doesn't. This means that if input is "event" then it's possible to make a perfect prediction, causing exploding gradient

I'm trying to fix it asap

NataliaVConnolly · 2017-04-26T15:12:09Z

Hi Egil,

Thanks for the update! Here's a fork with the notebook Combined_data_pipeline_and_analysis in examples/keras.

  https://github.com/NataliaVConnolly/wtte-rnn-1

The last cell shows an example of training with just one input sequence. It does result in a non-NaN loss, although a very large one (but I didn't optimize the initial alpha or the network config much).

Cheers,
Natalia (aka hedgy123 :))

ragulpr · 2017-05-06T12:37:02Z

@NataliaVConnolly Sorry for the wait. It took me some time to figure out what was wrong!

Too much censoring leads to instability. Works when using more frequent committers, <50% censoring. In the example I use only those who committed at least 10 days.
You train on one subject but initialize alpha using the mean over all subjects. This leads to high probability of exploding gradient.
As mentioned above, if it was done before the fix of shift_discrete_padded_features that would also lead to NaN (perfect fit) after some training.

Check out the new data_pipeline and let me know if you have more questions! :)

- Check it out. - Poor performance atm - TODO will add masking/batchsize>1 support soon.

ragulpr referenced this issue Apr 24, 2017

Fix shift_discrete_padded_features

a9c74f2

ragulpr closed this as completed May 6, 2017

ragulpr referenced this issue May 6, 2017

By popular demand, end-to-end WTTE-RNN for Tensorflow-commit data

2d61606

- Check it out. - Poor performance atm - TODO will add masking/batchsize>1 support soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining data_pipeline and simple_example #7

Combining data_pipeline and simple_example #7

hedgy123 commented Apr 20, 2017

ragulpr commented Apr 21, 2017 •

edited

ragulpr commented Apr 24, 2017

NataliaVConnolly commented Apr 26, 2017

ragulpr commented May 6, 2017

Combining data_pipeline and simple_example #7

Combining data_pipeline and simple_example #7

Comments

hedgy123 commented Apr 20, 2017

ragulpr commented Apr 21, 2017 • edited

ragulpr commented Apr 24, 2017

NataliaVConnolly commented Apr 26, 2017

ragulpr commented May 6, 2017

ragulpr commented Apr 21, 2017 •

edited