Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encog Core is not saving large format Neural Networks Correctly #38

Closed
PetrToman opened this issue Feb 1, 2012 · 18 comments
Closed

Encog Core is not saving large format Neural Networks Correctly #38

PetrToman opened this issue Feb 1, 2012 · 18 comments
Milestone

Comments

@PetrToman
Copy link

EG files store the weight array slightly differently for large format networks, so that the weights are not on a single ginormous line, that can't be read into memory. Training is failing because these networks are not being either loaded or saved correctly and the end result is an array of zeros for most of the weight matrix. Such a neural network is not trainable.

--- From original report--
It seems that error reporting is broken in the latest Workbench (built from git sources) - at least for RProp and SVMSearch - "Current Error" just hangs after couple iterations (and the chart is also freezed, if displayed). Interestingly, it works with QProp, for example. (I have no problems with Workbench 3.0.1, using the same data.)

@seemasingh
Copy link
Contributor

I tried to reproduce this with Backprop, and am not seeing anything. Are you saying the iteration count is stuck as well, by frozen? I trained the iris data set with Backprop on a real low learning rate, so it would take awhile. everything looked fine.

@seemasingh
Copy link
Contributor

Also tried a random data set(from workbench) with 9 inputs, 1 output. I then trained on RPROP. It will pretty much chew on that forever. I it trained down to 32%, and continues to get marginal improvements on wards, but did not see a freeze.

@PetrToman
Copy link
Author

Here's my streamlined data: http://dione.zcu.cz/~toman40/encog/data1.zip

RProp stucks at 169.86% despite of increasing iterations. (Using Workbench 3.0.1 it goes to 4.47% in just 112 iterations).

@seemasingh
Copy link
Contributor

Strange! But yes, I can reproduce what you are describing. Also trains quite well in 3.0.1. I will take a look, thanks!

@seemasingh
Copy link
Contributor

Ah hah! I think I figured out what it is. See my note above. I will also create a unit test to cover saving larger format neural networks. This is also why my test showed okay.

@seemasingh
Copy link
Contributor

The following unit test, which I just checked in, demonstrates this issue. This will turn the status of the build yellow, until I fix it. But clearly the following should work. It works on smaller networks.

public void testPersistLargeEG()
{
    BasicNetwork network = new BasicNetwork();
    network.addLayer(new BasicLayer(null,true,200));
    network.addLayer(new BasicLayer(new ActivationSigmoid(),true,200));
    network.addLayer(new BasicLayer(new ActivationSigmoid(),true,200));
    network.addLayer(new BasicLayer(new ActivationSigmoid(),false,200));
    network.getStructure().finalizeStructure();
    network.reset();

    EncogDirectoryPersistence.saveObject(EG_FILENAME, network);
    BasicNetwork network2 = (BasicNetwork)EncogDirectoryPersistence.loadObject(EG_FILENAME);

    double d = EngineArray.euclideanDistance(network.getStructure().getFlat().getWeights(), 
            network2.getStructure().getFlat().getWeights());

    Assert.assertTrue(d<0.01);
}

@jeffheaton
Copy link
Owner

That is not good! Encog mainline is pretty much useless with any neural network that results in a multi-line matrix. I added that code not too long ago. I will take a look! Thanks both for all the info.

@ghost ghost assigned jeffheaton Feb 1, 2012
@seemasingh
Copy link
Contributor

kk, all yours!

@jeffheaton
Copy link
Owner

Okay, Seema, I checked in a fix for your unit test. All is green again. If there is an SVM issue, I believe it may be a different issue. I am assigning back to you for verification of the SVM side.

@ghost ghost assigned seemasingh Feb 2, 2012
@seemasingh
Copy link
Contributor

Okay, I believe this is resolved. I was able to create a Neural Network (119->200->TANH->1->TANH) with the data you provided. I was able to get it to converge in a few hundred iterations with RPROP. Not every set of random values does as well, some do converge to a local minimum. I did a SVM search. It took awhile longer, and you often don't see any updates for range of itarations, as it is simply not finding anything. But after around 100 iterations, the SVM was below 100%.

@PetrToman
Copy link
Author

Did you try this in Workbench using Analyst? I updated the sources and even added a debug output to EncogReadHelper - before line

double[] t = NumberList.fromList(CSVFormat.EG_FORMAT, line);

to be sure I use the version that Jeff fixed, but still cannot get RProp to work...

@seemasingh
Copy link
Contributor

Okay, had not tried it in the analyst. When those values are normalized, which they should be, I do get the same result. I did some digging and I believe it to be the weight init, NWR. There were some changes to that post 3.0.1. I created issue #41 to look at this, since what Jeff fixed here was a bug in its own right.

@PetrToman
Copy link
Author

Interesting that you make it work easily. I can make it converge only if I use multiple methods and run the training again - see video: http://dione.zcu.cz/~toman40/encog/encog_training_bug.zip

@seemasingh
Copy link
Contributor

The main reason I think that it is this, is because if I use 3.0.1 workbench, use analyst, and train a neural network, it converges just fine, as you reported. I then take the .eg file (neural network) and egb (normalized training) and copy it to a new project and fire-up 3.1. If I train with just these two (outside of the analyst) it converges quite quickly. To exactly the error that 3.0.1 analyst did. YET, if I now take 3.1 and randomize the EG file, and retrain. It does terrible and quickly converges to a local minimum. A very high local minimum at that. We really need some better graphics on the trainer so that you can actually see why a training has stalled.... i.e. hidden neurons shutting down, all of the gradients going to zero (local minimum), some hybrid of the two, etc. But that is another point. In this case it is a pure local minimum it gets stuck on. If you randomize a neural network with NWR(the default for the analyst) in 3.0.1 and 3.1 and look at a weight histogram, they are VERY different. Plus I can tell just looking at it, the NWR logic is flawed, it does not touch every weight. So this is where I am going to look next. At the very least, you are causing me to find other issues on the way to what you are experiencing.

@PetrToman
Copy link
Author

Training visualisation is a great idea - don't you want to create an issue for this?

I would also suggest adding integration tests with (reasonably) larger data, perhaps for each training method. It would take couple seconds during build, but it should prevent breaking Encog's main functionality, like in this case. (You can use my data, if you like.)

@seemasingh
Copy link
Contributor

Agree on both points. The unit tests could definably use a more advanted test case for larger neural networks.

Thanks! Yes I will add that data set.

@PetrToman
Copy link
Author

Just to remind: how about that 'new training visualisation' issue?

@seemasingh
Copy link
Contributor

Sure, added issue #58.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants