Hessian algorithm produces NaN values during the training procedure #119

itdxer · 2016-09-10T08:24:31Z

------------------------------------------------
| Epoch # | Train err | Valid err | Time       |
------------------------------------------------
| 1       | 0.162159  | 0.095479  | 00:00:26   |
| 2       | 0.096515  | 0.048322  | 00:00:25   |
| 3       | 0.048930  | 0.024670  | 00:00:26   |
| 4       | 0.025032  | 0.014477  | 00:00:25   |
| 5       | 0.014700  | 0.010096  | 00:00:26   |
| 6       | 0.010240  | 0.008018  | 00:00:31   |
| 7       | 0.008117  | 0.006877  | 00:00:26   |
| 8       | 0.006950  | nan       | 00:00:25   |
| 9       | nan       | nan       | 00:00:26   |

The text was updated successfully, but these errors were encountered:

itdxer · 2016-09-16T12:31:02Z

Hi @rmlopes,

I have two questions RE problem with Hessian algorithm

Are you able to share information about your network's structure, network's parameters and input data dimensions?
Can you reproduce problem with Hessian in your environment?

rmlopes · 2016-09-16T13:22:22Z

With the real data I get the error right away.

Main information

[ALGORITHM] Hessian

[OPTION] verbose = True
[OPTION] epoch_end_signal = None
[OPTION] show_epoch = 1
[OPTION] shuffle_data = True
[OPTION] train_end_signal = None
[OPTION] error = mse
[OPTION] addons = None
[OPTION] penalty_const = 1

[THEANO] Initializing Theano variables and functions.
[THEANO] Initialization finished sucessfully. It took 6.73 seconds

Start training

[TRAIN DATA] 97066 samples, feature shape: (4,)
[TEST DATA] 24267 samples, feature shape: (4,)
[TRAINING] Total epochs: 100

------------------------------------------------
| Epoch # | Train err | Valid err | Time       |
------------------------------------------------
| 1       | 0.080883  | nan       | 00:00:32   |
| 2       | nan       | nan       | 00:00:32   |

In the testing script even with 100000 samples I cannot reproduce the nan.
The config of the network is pretty simple:

network = algorithms.Hessian(
    [
        layers.Input(4),
        layers.Relu(16),
        layers.Relu(8),
        layers.Sigmoid(1),
    ],
    error='mse',
    step=0.01,
    verbose=True,
    shuffle_data=True
)

Concerning (2), do you mean to ask if I get the error with some other implementation of the Hessian algorithm? (I am only trying NeuPy and Keras - to my surprise the behavior varies a lot even with same parameterization -, and Keras does not have the Hessian algos)

itdxer · 2016-09-16T14:41:27Z

Yes, there might be difference between different libraries due to different initialization methods - http://neupy.com/docs/cheatsheet.html#parameter-initialization-methods.

One more question related to your dataset. What is the value range per each input feature? Is it a number beetween 0 and 1 or maybe it's between -3 and 3?

rmlopes · 2016-09-16T15:05:05Z

I am using the RobustScaler implemented in sklearn, so the inputs are generally not in the [0,1] interval.

itdxer · 2016-09-18T11:37:28Z

@rmlopes , I've tried different ways to reproduce this problem, but non of them help. I was thinking that the issue can be related to the inverse matrix computation. I've changed the way that update calculates (in that way it suppose to work in Theano after graph optimization). Can you try to run your code with this update?

pip install git+https://github.com/itdxer/neupy.git@release/v0.4.0

rmlopes · 2016-09-22T15:03:52Z

Seems to be fine now.

In another note, even though my problem is not NLP, the structure of the data is more similar to NLP than images, so how would you implement a CNN like this using neupy?

itdxer · 2016-09-22T15:23:22Z

As far as I understood first convolutional layer produces different dimensions for different filters. Basically it means that you want to make 3 different convolutions in parallel and after a few more layers concatenate outputs together in one vector. If that's correct than there is no simple building layers to construct this structure. Of course, you can construct any layer you want (docs), but it might be difficult.
I'm planning in the new version (0.4.0) a couple of a new layers that will be useful in your case.

itdxer · 2016-09-22T15:24:55Z

Seems to be fine now.

It's great that this update fixed your problem.

itdxer added the bug label Sep 10, 2016

itdxer self-assigned this Sep 10, 2016

itdxer added this to the Version 0.4.0 milestone Sep 10, 2016

itdxer closed this as completed Oct 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hessian algorithm produces NaN values during the training procedure #119

Hessian algorithm produces NaN values during the training procedure #119

itdxer commented Sep 10, 2016 •

edited

itdxer commented Sep 16, 2016

rmlopes commented Sep 16, 2016 •

edited

itdxer commented Sep 16, 2016

rmlopes commented Sep 16, 2016

itdxer commented Sep 18, 2016

rmlopes commented Sep 22, 2016

itdxer commented Sep 22, 2016

itdxer commented Sep 22, 2016

Hessian algorithm produces NaN values during the training procedure #119

Hessian algorithm produces NaN values during the training procedure #119

Comments

itdxer commented Sep 10, 2016 • edited

itdxer commented Sep 16, 2016

rmlopes commented Sep 16, 2016 • edited

itdxer commented Sep 16, 2016

rmlopes commented Sep 16, 2016

itdxer commented Sep 18, 2016

rmlopes commented Sep 22, 2016

itdxer commented Sep 22, 2016

itdxer commented Sep 22, 2016

itdxer commented Sep 10, 2016 •

edited

rmlopes commented Sep 16, 2016 •

edited