How to check the training converges? #23

AminSuzani · 2014-05-23T19:37:11Z

Hi,

I have a multivariate nonlinear regression problem and I am trying to solve it using deep neural networks. I use the code below for training. My X is 10000_40 and my Y is 10000_78. I was wondering how I can check a few things:
1- How do I know the training converged.
2- How do I know what 'learning rate', 'momentum' and 'update_num' it used as default.

e = theanets.Experiment(theanets.feedforward.Regressor,
layers=(40, 100, 200, 300, 150, 78),
optimize='sgd',
activation='tanh')
e.run(train_set, train_set)
Y_predicted = e.network(X_test_minmax)

I tried using 'hf' instead of 'sgd'. It printed some performance variables for each iteration, but it was too slow for my application. The other problem is that when I write 'layerwise' instead of 'sgd', it gives me an error. Any kind of help is appreciated.

Thanks,
Amin

lmjohns3 · 2014-05-29T14:09:13Z

You need to enable logging to see the output from many of the trainers (including the SGD trainer). See bug #19 for details.

Can you post a full traceback for the error you're getting from the layerwise trainer?

AminSuzani · 2014-06-01T19:56:18Z

Thanks, the layerwise training error was also resolved when I updated my packages. By enabling logging, I can see the convergence error on the screen while training. Is there a way to get this error value in the code (like as an output of the run function)? I would like to write a loop that tries different parameters and picks the one which yields to smaller error.

AminSuzani · 2014-06-02T23:08:22Z

I just realized that the layerwise error happens only when I use it on Windows. When I use layerwise training on windows, it trains the first layer, then it gives me the following error:

IOError: [Errno 2] No such file or directory: '/tmp/layerwise-150,300,300,78-h0.000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

However, same code works fine on Linux. Other than this, the default parameters are also different when I use the same code in Linux and Windows. I reinstalled Theano and Theanets on Windows, but did not solve the issue.

kastnerkyle · 2014-06-03T05:56:20Z

Looks like it could be Windows vs. Linux separators... '/' vs. ''. I only
use Linux myself, so I probably can't provide much help.

On Mon, Jun 2, 2014 at 6:08 PM, Amin Suzani notifications@github.com
wrote:

I just realized that the layerwise error happens only when I use it on
Windows. When I use layerwise training on windows, it trains the first
layer, then it gives me the following error:

IOError: [Errno 2] No such file or directory:
'/tmp/layerwise-150,300,300,78-h0.000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

However, same code works fine on Linux. Other than this, the default
parameters are also different when I use the same code in Linux and
Windows. I reinstalled Theano and Theanets on Windows, but did not solve
the issue.

—
Reply to this email directly or view it on GitHub
#23 (comment).

AminSuzani · 2014-06-03T06:18:59Z

Thanks anyway. I just updated all Canopy packages, but the problem persists. It only happens when it's 'layerwise'. works well with 'hf' and 'sgd'. Here is the full traceback:

Traceback (most recent call last):
File "deep_layerwise_gpu.py", line 107, in
e.run(train_set, train_set)
File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea
nets\main.py", line 214, in run
cg_set=self.datasets['cg'])
File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea
nets\trainer.py", line 342, in train
i))
File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea
nets\feedforward.py", line 282, in save
handle = opener(filename, 'wb')
File "C:\Users\Amin\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.19
38.win-x86_64\lib\gzip.py", line 34, in open
return GzipFile(filename, mode, compresslevel)
File "C:\Users\Amin\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.19
38.win-x86_64\lib\gzip.py", line 94, in init
fileobj = self.myfileobj = builtin.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: '/tmp/layerwise-150,300,300,78-h0.
000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

kastnerkyle · 2014-06-03T07:41:41Z

Can you provide a gist of your code?

AminSuzani · 2014-06-03T07:51:42Z

Here it is:

train_set = [X_minmax, Y_xyz_minmax]
e = theanets.Experiment(theanets.feedforward.Regressor,
layers=(featuresNum, 300, 300, vertebNum*3),
optimize= 'layerwise' ,
activation='tanh',
num_updates=3,
)
e.run(train_set, train_set)

lmjohns3 · 2014-06-03T14:07:11Z

Because this apparently works on Linux, it looks to me like this is a problem trying to save/load from the temp directory on Windows.

However, it also looks to me like the whole process of saving/loading is from some older theanets code -- the current layerwise trainer does not try to do anything on disk. (Actually, this behavior was removed on 7 February, see c025646#diff-aa4bc02a676b29ad321853f71672f681L463)

@AminSuzani which version of theanets are you using? The most recent version, published just yesterday on pypi, is 0.2.0.

AminSuzani · 2014-06-03T18:20:41Z

Thanks for your reply. You were right, that was an old version. I used "pip --upgrade" and it solved the issue. Previously, I used "pip uninstall" and again "pip install", but seems it installed the old version again.

The other question that still remains is that if there is a way to get the training error (or any other convergence measure) in the code. I do see it in the command prompt, but I need it in the code. I would like to be able to write a loop that trains the network with different parameters and automatically pick the ones that yield to better convergence.

lmjohns3 · 2014-06-04T01:32:04Z

I like the idea of providing the ongoing training error, but at the moment it's not returned during training. Could you file a separate github issue for this, so that we can close this one and keep track of the specific feature request?

Until I can get to the feature request, the Experiment#train method does already yield the current state of the trainer after each training iteration (use this method instead of Experiment#run). You could do something like this:

for _ in experiment.train(dataset):
    print(evaluate(experiment.network, dataset))

where evaluate is some function that takes in a network and computes some error estimate.

AminSuzani · 2014-06-10T05:11:08Z

Thanks, I just filed a separate issue for this. Feel free to close this one.

Cheers,
Amin

AminSuzani changed the title ~~How to check the training converges~~ How to check the training converges? May 23, 2014

lmjohns3 added question labels May 29, 2014

lmjohns3 closed this as completed Jun 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to check the training converges? #23

How to check the training converges? #23

AminSuzani commented May 23, 2014

lmjohns3 commented May 29, 2014

AminSuzani commented Jun 1, 2014

AminSuzani commented Jun 2, 2014

kastnerkyle commented Jun 3, 2014

AminSuzani commented Jun 3, 2014

kastnerkyle commented Jun 3, 2014

AminSuzani commented Jun 3, 2014

lmjohns3 commented Jun 3, 2014

AminSuzani commented Jun 3, 2014

lmjohns3 commented Jun 4, 2014

AminSuzani commented Jun 10, 2014

How to check the training converges? #23

How to check the training converges? #23

Comments

AminSuzani commented May 23, 2014

lmjohns3 commented May 29, 2014

AminSuzani commented Jun 1, 2014

AminSuzani commented Jun 2, 2014

kastnerkyle commented Jun 3, 2014

AminSuzani commented Jun 3, 2014

kastnerkyle commented Jun 3, 2014

AminSuzani commented Jun 3, 2014

lmjohns3 commented Jun 3, 2014

AminSuzani commented Jun 3, 2014

lmjohns3 commented Jun 4, 2014

AminSuzani commented Jun 10, 2014