Encog Core is not saving large format Neural Networks Correctly #38

PetrToman · 2012-02-01T07:49:07Z

EG files store the weight array slightly differently for large format networks, so that the weights are not on a single ginormous line, that can't be read into memory. Training is failing because these networks are not being either loaded or saved correctly and the end result is an array of zeros for most of the weight matrix. Such a neural network is not trainable.

--- From original report--
It seems that error reporting is broken in the latest Workbench (built from git sources) - at least for RProp and SVMSearch - "Current Error" just hangs after couple iterations (and the chart is also freezed, if displayed). Interestingly, it works with QProp, for example. (I have no problems with Workbench 3.0.1, using the same data.)

seemasingh · 2012-02-01T19:25:20Z

I tried to reproduce this with Backprop, and am not seeing anything. Are you saying the iteration count is stuck as well, by frozen? I trained the iris data set with Backprop on a real low learning rate, so it would take awhile. everything looked fine.

seemasingh · 2012-02-01T19:30:50Z

Also tried a random data set(from workbench) with 9 inputs, 1 output. I then trained on RPROP. It will pretty much chew on that forever. I it trained down to 32%, and continues to get marginal improvements on wards, but did not see a freeze.

PetrToman · 2012-02-01T19:59:41Z

Here's my streamlined data: http://dione.zcu.cz/~toman40/encog/data1.zip

RProp stucks at 169.86% despite of increasing iterations. (Using Workbench 3.0.1 it goes to 4.47% in just 112 iterations).

seemasingh · 2012-02-01T20:52:19Z

Strange! But yes, I can reproduce what you are describing. Also trains quite well in 3.0.1. I will take a look, thanks!

seemasingh · 2012-02-01T21:34:27Z

Ah hah! I think I figured out what it is. See my note above. I will also create a unit test to cover saving larger format neural networks. This is also why my test showed okay.

seemasingh · 2012-02-01T21:53:42Z

The following unit test, which I just checked in, demonstrates this issue. This will turn the status of the build yellow, until I fix it. But clearly the following should work. It works on smaller networks.

public void testPersistLargeEG()
{
    BasicNetwork network = new BasicNetwork();
    network.addLayer(new BasicLayer(null,true,200));
    network.addLayer(new BasicLayer(new ActivationSigmoid(),true,200));
    network.addLayer(new BasicLayer(new ActivationSigmoid(),true,200));
    network.addLayer(new BasicLayer(new ActivationSigmoid(),false,200));
    network.getStructure().finalizeStructure();
    network.reset();

    EncogDirectoryPersistence.saveObject(EG_FILENAME, network);
    BasicNetwork network2 = (BasicNetwork)EncogDirectoryPersistence.loadObject(EG_FILENAME);

    double d = EngineArray.euclideanDistance(network.getStructure().getFlat().getWeights(), 
            network2.getStructure().getFlat().getWeights());

    Assert.assertTrue(d<0.01);
}

jeffheaton · 2012-02-01T22:01:49Z

That is not good! Encog mainline is pretty much useless with any neural network that results in a multi-line matrix. I added that code not too long ago. I will take a look! Thanks both for all the info.

seemasingh · 2012-02-01T22:29:24Z

kk, all yours!

jeffheaton · 2012-02-02T02:35:11Z

Okay, Seema, I checked in a fix for your unit test. All is green again. If there is an SVM issue, I believe it may be a different issue. I am assigning back to you for verification of the SVM side.

seemasingh · 2012-02-02T22:35:32Z

Okay, I believe this is resolved. I was able to create a Neural Network (119->200->TANH->1->TANH) with the data you provided. I was able to get it to converge in a few hundred iterations with RPROP. Not every set of random values does as well, some do converge to a local minimum. I did a SVM search. It took awhile longer, and you often don't see any updates for range of itarations, as it is simply not finding anything. But after around 100 iterations, the SVM was below 100%.

PetrToman · 2012-02-03T09:17:13Z

Did you try this in Workbench using Analyst? I updated the sources and even added a debug output to EncogReadHelper - before line

double[] t = NumberList.fromList(CSVFormat.EG_FORMAT, line);

to be sure I use the version that Jeff fixed, but still cannot get RProp to work...

seemasingh · 2012-02-03T15:30:37Z

Okay, had not tried it in the analyst. When those values are normalized, which they should be, I do get the same result. I did some digging and I believe it to be the weight init, NWR. There were some changes to that post 3.0.1. I created issue #41 to look at this, since what Jeff fixed here was a bug in its own right.

PetrToman · 2012-02-03T20:09:10Z

Interesting that you make it work easily. I can make it converge only if I use multiple methods and run the training again - see video: http://dione.zcu.cz/~toman40/encog/encog_training_bug.zip

seemasingh · 2012-02-03T20:17:05Z

The main reason I think that it is this, is because if I use 3.0.1 workbench, use analyst, and train a neural network, it converges just fine, as you reported. I then take the .eg file (neural network) and egb (normalized training) and copy it to a new project and fire-up 3.1. If I train with just these two (outside of the analyst) it converges quite quickly. To exactly the error that 3.0.1 analyst did. YET, if I now take 3.1 and randomize the EG file, and retrain. It does terrible and quickly converges to a local minimum. A very high local minimum at that. We really need some better graphics on the trainer so that you can actually see why a training has stalled.... i.e. hidden neurons shutting down, all of the gradients going to zero (local minimum), some hybrid of the two, etc. But that is another point. In this case it is a pure local minimum it gets stuck on. If you randomize a neural network with NWR(the default for the analyst) in 3.0.1 and 3.1 and look at a weight histogram, they are VERY different. Plus I can tell just looking at it, the NWR logic is flawed, it does not touch every weight. So this is where I am going to look next. At the very least, you are causing me to find other issues on the way to what you are experiencing.

PetrToman · 2012-02-04T06:39:42Z

Training visualisation is a great idea - don't you want to create an issue for this?

I would also suggest adding integration tests with (reasonably) larger data, perhaps for each training method. It would take couple seconds during build, but it should prevent breaking Encog's main functionality, like in this case. (You can use my data, if you like.)

seemasingh · 2012-02-06T14:18:36Z

Agree on both points. The unit tests could definably use a more advanted test case for larger neural networks.

Thanks! Yes I will add that data set.

PetrToman · 2012-02-16T08:51:19Z

Just to remind: how about that 'new training visualisation' issue?

seemasingh · 2012-02-16T15:14:23Z

Sure, added issue #58.

PetrToman mentioned this issue Feb 1, 2012

SVM ArrayIndexOutOfBoundsException #26

Closed

ghost assigned seemasingh Feb 1, 2012

ghost assigned jeffheaton Feb 1, 2012

ghost assigned seemasingh Feb 2, 2012

seemasingh closed this as completed Feb 2, 2012

seemasingh mentioned this issue Feb 3, 2012

Issue with NWR Weight Init in Encog3.1 #41

Closed

daydreamt mentioned this issue Jan 11, 2014

testTemporal fails (and so does mvn package) #163

Closed

lasha81 mentioned this issue Feb 18, 2015

problem with train.getError() #203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encog Core is not saving large format Neural Networks Correctly #38

Encog Core is not saving large format Neural Networks Correctly #38

PetrToman commented Feb 1, 2012

seemasingh commented Feb 1, 2012

seemasingh commented Feb 1, 2012

PetrToman commented Feb 1, 2012

seemasingh commented Feb 1, 2012

seemasingh commented Feb 1, 2012

seemasingh commented Feb 1, 2012

jeffheaton commented Feb 1, 2012

seemasingh commented Feb 1, 2012

jeffheaton commented Feb 2, 2012

seemasingh commented Feb 2, 2012

PetrToman commented Feb 3, 2012

seemasingh commented Feb 3, 2012

PetrToman commented Feb 3, 2012

seemasingh commented Feb 3, 2012

PetrToman commented Feb 4, 2012

seemasingh commented Feb 6, 2012

PetrToman commented Feb 16, 2012

seemasingh commented Feb 16, 2012

Encog Core is not saving large format Neural Networks Correctly #38

Encog Core is not saving large format Neural Networks Correctly #38

Comments

PetrToman commented Feb 1, 2012

seemasingh commented Feb 1, 2012

seemasingh commented Feb 1, 2012

PetrToman commented Feb 1, 2012

seemasingh commented Feb 1, 2012

seemasingh commented Feb 1, 2012

seemasingh commented Feb 1, 2012

jeffheaton commented Feb 1, 2012

seemasingh commented Feb 1, 2012

jeffheaton commented Feb 2, 2012

seemasingh commented Feb 2, 2012

PetrToman commented Feb 3, 2012

seemasingh commented Feb 3, 2012

PetrToman commented Feb 3, 2012

seemasingh commented Feb 3, 2012

PetrToman commented Feb 4, 2012

seemasingh commented Feb 6, 2012

PetrToman commented Feb 16, 2012

seemasingh commented Feb 16, 2012