Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to check the training converges? #23

Closed
AminSuzani opened this issue May 23, 2014 · 11 comments
Closed

How to check the training converges? #23

AminSuzani opened this issue May 23, 2014 · 11 comments

Comments

@AminSuzani
Copy link

Hi,

I have a multivariate nonlinear regression problem and I am trying to solve it using deep neural networks. I use the code below for training. My X is 10000_40 and my Y is 10000_78. I was wondering how I can check a few things:
1- How do I know the training converged.
2- How do I know what 'learning rate', 'momentum' and 'update_num' it used as default.

e = theanets.Experiment(theanets.feedforward.Regressor,
layers=(40, 100, 200, 300, 150, 78),
optimize='sgd',
activation='tanh')
e.run(train_set, train_set)
Y_predicted = e.network(X_test_minmax)

I tried using 'hf' instead of 'sgd'. It printed some performance variables for each iteration, but it was too slow for my application. The other problem is that when I write 'layerwise' instead of 'sgd', it gives me an error. Any kind of help is appreciated.

Thanks,
Amin

@AminSuzani AminSuzani changed the title How to check the training converges How to check the training converges? May 23, 2014
@lmjohns3
Copy link
Owner

You need to enable logging to see the output from many of the trainers (including the SGD trainer). See bug #19 for details.

Can you post a full traceback for the error you're getting from the layerwise trainer?

@AminSuzani
Copy link
Author

Thanks, the layerwise training error was also resolved when I updated my packages. By enabling logging, I can see the convergence error on the screen while training. Is there a way to get this error value in the code (like as an output of the run function)? I would like to write a loop that tries different parameters and picks the one which yields to smaller error.

@AminSuzani
Copy link
Author

I just realized that the layerwise error happens only when I use it on Windows. When I use layerwise training on windows, it trains the first layer, then it gives me the following error:

IOError: [Errno 2] No such file or directory: '/tmp/layerwise-150,300,300,78-h0.000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

However, same code works fine on Linux. Other than this, the default parameters are also different when I use the same code in Linux and Windows. I reinstalled Theano and Theanets on Windows, but did not solve the issue.

@kastnerkyle
Copy link
Contributor

Looks like it could be Windows vs. Linux separators... '/' vs. ''. I only
use Linux myself, so I probably can't provide much help.

On Mon, Jun 2, 2014 at 6:08 PM, Amin Suzani notifications@github.com
wrote:

I just realized that the layerwise error happens only when I use it on
Windows. When I use layerwise training on windows, it trains the first
layer, then it gives me the following error:

IOError: [Errno 2] No such file or directory:
'/tmp/layerwise-150,300,300,78-h0.000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

However, same code works fine on Linux. Other than this, the default
parameters are also different when I use the same code in Linux and
Windows. I reinstalled Theano and Theanets on Windows, but did not solve
the issue.


Reply to this email directly or view it on GitHub
#23 (comment).

@AminSuzani
Copy link
Author

Thanks anyway. I just updated all Canopy packages, but the problem persists. It only happens when it's 'layerwise'. works well with 'hf' and 'sgd'. Here is the full traceback:

Traceback (most recent call last):
File "deep_layerwise_gpu.py", line 107, in
e.run(train_set, train_set)
File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea
nets\main.py", line 214, in run
cg_set=self.datasets['cg'])
File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea
nets\trainer.py", line 342, in train
i))
File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea
nets\feedforward.py", line 282, in save
handle = opener(filename, 'wb')
File "C:\Users\Amin\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.19
38.win-x86_64\lib\gzip.py", line 34, in open
return GzipFile(filename, mode, compresslevel)
File "C:\Users\Amin\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.19
38.win-x86_64\lib\gzip.py", line 94, in init
fileobj = self.myfileobj = builtin.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: '/tmp/layerwise-150,300,300,78-h0.
000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

@kastnerkyle
Copy link
Contributor

Can you provide a gist of your code?

@AminSuzani
Copy link
Author

Here it is:

train_set = [X_minmax, Y_xyz_minmax]
e = theanets.Experiment(theanets.feedforward.Regressor,
layers=(featuresNum, 300, 300, vertebNum*3),
optimize= 'layerwise' ,
activation='tanh',
num_updates=3,
)
e.run(train_set, train_set)

@lmjohns3
Copy link
Owner

lmjohns3 commented Jun 3, 2014

Because this apparently works on Linux, it looks to me like this is a problem trying to save/load from the temp directory on Windows.

However, it also looks to me like the whole process of saving/loading is from some older theanets code -- the current layerwise trainer does not try to do anything on disk. (Actually, this behavior was removed on 7 February, see c025646#diff-aa4bc02a676b29ad321853f71672f681L463)

@AminSuzani which version of theanets are you using? The most recent version, published just yesterday on pypi, is 0.2.0.

@AminSuzani
Copy link
Author

Thanks for your reply. You were right, that was an old version. I used "pip --upgrade" and it solved the issue. Previously, I used "pip uninstall" and again "pip install", but seems it installed the old version again.

The other question that still remains is that if there is a way to get the training error (or any other convergence measure) in the code. I do see it in the command prompt, but I need it in the code. I would like to be able to write a loop that trains the network with different parameters and automatically pick the ones that yield to better convergence.

@lmjohns3
Copy link
Owner

lmjohns3 commented Jun 4, 2014

I like the idea of providing the ongoing training error, but at the moment it's not returned during training. Could you file a separate github issue for this, so that we can close this one and keep track of the specific feature request?

Until I can get to the feature request, the Experiment#train method does already yield the current state of the trainer after each training iteration (use this method instead of Experiment#run). You could do something like this:

for _ in experiment.train(dataset):
    print(evaluate(experiment.network, dataset))

where evaluate is some function that takes in a network and computes some error estimate.

@AminSuzani
Copy link
Author

Thanks, I just filed a separate issue for this. Feel free to close this one.

Cheers,
Amin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants