Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hessian algorithm produces NaN values during the training procedure #119

Closed
itdxer opened this issue Sep 10, 2016 · 8 comments
Closed

Hessian algorithm produces NaN values during the training procedure #119

itdxer opened this issue Sep 10, 2016 · 8 comments
Assignees
Labels
Milestone

Comments

@itdxer
Copy link
Owner

itdxer commented Sep 10, 2016

RE #118 (comment)

------------------------------------------------
| Epoch # | Train err | Valid err | Time       |
------------------------------------------------
| 1       | 0.162159  | 0.095479  | 00:00:26   |
| 2       | 0.096515  | 0.048322  | 00:00:25   |
| 3       | 0.048930  | 0.024670  | 00:00:26   |
| 4       | 0.025032  | 0.014477  | 00:00:25   |
| 5       | 0.014700  | 0.010096  | 00:00:26   |
| 6       | 0.010240  | 0.008018  | 00:00:31   |
| 7       | 0.008117  | 0.006877  | 00:00:26   |
| 8       | 0.006950  | nan       | 00:00:25   |
| 9       | nan       | nan       | 00:00:26   |
@itdxer itdxer added the bug label Sep 10, 2016
@itdxer itdxer self-assigned this Sep 10, 2016
@itdxer itdxer added this to the Version 0.4.0 milestone Sep 10, 2016
@itdxer
Copy link
Owner Author

itdxer commented Sep 16, 2016

Hi @rmlopes,

I have two questions RE problem with Hessian algorithm

  1. Are you able to share information about your network's structure, network's parameters and input data dimensions?
  2. Can you reproduce problem with Hessian in your environment?

@rmlopes
Copy link

rmlopes commented Sep 16, 2016

With the real data I get the error right away.

Main information

[ALGORITHM] Hessian

[OPTION] verbose = True
[OPTION] epoch_end_signal = None
[OPTION] show_epoch = 1
[OPTION] shuffle_data = True
[OPTION] train_end_signal = None
[OPTION] error = mse
[OPTION] addons = None
[OPTION] penalty_const = 1

[THEANO] Initializing Theano variables and functions.
[THEANO] Initialization finished sucessfully. It took 6.73 seconds

Start training

[TRAIN DATA] 97066 samples, feature shape: (4,)
[TEST DATA] 24267 samples, feature shape: (4,)
[TRAINING] Total epochs: 100

------------------------------------------------
| Epoch # | Train err | Valid err | Time       |
------------------------------------------------
| 1       | 0.080883  | nan       | 00:00:32   |
| 2       | nan       | nan       | 00:00:32   |

In the testing script even with 100000 samples I cannot reproduce the nan.
The config of the network is pretty simple:

network = algorithms.Hessian(
    [
        layers.Input(4),
        layers.Relu(16),
        layers.Relu(8),
        layers.Sigmoid(1),
    ],
    error='mse',
    step=0.01,
    verbose=True,
    shuffle_data=True
)

Concerning (2), do you mean to ask if I get the error with some other implementation of the Hessian algorithm? (I am only trying NeuPy and Keras - to my surprise the behavior varies a lot even with same parameterization -, and Keras does not have the Hessian algos)

@itdxer
Copy link
Owner Author

itdxer commented Sep 16, 2016

Yes, there might be difference between different libraries due to different initialization methods - http://neupy.com/docs/cheatsheet.html#parameter-initialization-methods.

One more question related to your dataset. What is the value range per each input feature? Is it a number beetween 0 and 1 or maybe it's between -3 and 3?

@rmlopes
Copy link

rmlopes commented Sep 16, 2016

I am using the RobustScaler implemented in sklearn, so the inputs are generally not in the [0,1] interval.

@itdxer
Copy link
Owner Author

itdxer commented Sep 18, 2016

@rmlopes , I've tried different ways to reproduce this problem, but non of them help. I was thinking that the issue can be related to the inverse matrix computation. I've changed the way that update calculates (in that way it suppose to work in Theano after graph optimization). Can you try to run your code with this update?

pip install git+https://github.com/itdxer/neupy.git@release/v0.4.0

@rmlopes
Copy link

rmlopes commented Sep 22, 2016

Seems to be fine now.

In another note, even though my problem is not NLP, the structure of the data is more similar to NLP than images, so how would you implement a CNN like this using neupy?

@itdxer
Copy link
Owner Author

itdxer commented Sep 22, 2016

As far as I understood first convolutional layer produces different dimensions for different filters. Basically it means that you want to make 3 different convolutions in parallel and after a few more layers concatenate outputs together in one vector. If that's correct than there is no simple building layers to construct this structure. Of course, you can construct any layer you want (docs), but it might be difficult.
I'm planning in the new version (0.4.0) a couple of a new layers that will be useful in your case.

@itdxer
Copy link
Owner Author

itdxer commented Sep 22, 2016

Seems to be fine now.

It's great that this update fixed your problem.

@itdxer itdxer closed this as completed Oct 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants