Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement regularization #28

Closed
PetrToman opened this issue Jan 21, 2012 · 15 comments
Closed

Implement regularization #28

PetrToman opened this issue Jan 21, 2012 · 15 comments

Comments

@PetrToman
Copy link

Hello,
please consider implementing regularization, as it is essential to deal with the overfitting problem.

I recommend watching 12 min. video "Regularization and Bias/Variance" of lesson X. ADVICE FOR APPLYING MACHINE LEARNING at https://class.coursera.org/ml/lecture/preview (Stanford ML course).

It would also be useful to enhance Encog Analyst - it could split data into 3 sets (training, cross validation, testing) and try to find the optimal regularization parameter automatically.

@seemasingh
Copy link
Contributor

I will take a look at this to see if we will include in Encog 3.1 or 3.2. I am in the process of finalizing features for 3.1, as we want to release it soon and move to a code freeze. Defiantly an important feature, though. Encog currently has two methods to combat overfitting. Crossvalidation and Early Stopping(new for 3.1). More info here, though these wiki pages are in need of expansion.

http://www.heatonresearch.com/wiki/Overfitting

@PetrToman
Copy link
Author

Good! Early Stopping may be useful too, but regularization should be more powerful. Basic implementation (not regarding Workbench) shouldn't be much of a problem, as the regularization term is applied after the gradients are computed.

@ghost
Copy link

ghost commented Mar 21, 2012

I wrote a piece of code in Java for the regularization: I implemented a Strategy to do so.
I only tested it for ResilientPropagation.
Feel free to use it and make remarks.

public class RegularizationStrategy implements Strategy {

    private double lambda; // Weight decay
    private MLTrain train;
    private double[] weights;

    public RegularizationStrategy(double lambda) {
        this.lambda = lambda;
    }

    @Override
    public void init(MLTrain train) {
        this.train = train;
    }

    @Override
    public void preIteration() {
        try {
            weights = ((Propagation) train).getFlatTraining()
                    .getNetwork().getWeights();
        } catch (Exception e) {
            weights = null;
        }
    }

    @Override
    public void postIteration() {
        if (weights != null) {
            double[] newWeights = ((Propagation) train).getFlatTraining()
                    .getNetwork().getWeights();
            for (int i = 0; i < newWeights.length; i++) {
                newWeights[i] -= lambda * weights[i];
            }
            ((Propagation) train).getFlatTraining()
            .getNetwork().setWeights(newWeights);
        } else {
            System.err.println("Error in RegularizationStrategy, weights are null but should not be.");
        }
    }

}

@PetrToman
Copy link
Author

poussevinm: I like the idea of implementing it as a Strategy. As for the regularization, I think the old values are not needed, so if I'm not mistaken (I haven't tested it), the above code can be simplified to:

public void postIteration() {
    double[] weights = ((Propagation) train).getFlatTraining().
                       getNetwork().getWeights();

    for (int i = 0; i < weights.length; i++) {
        weights[i] += lambda * weights[i];   // also using +
    }
}

In Encog 3.1 the weights are copied to GradientWorkers before postIteration() is called (see Propagation.iteration()), so I guess this code wouldn't work. I suggest to introduce a new Strategy method, something like public void postGradient(), to resolve this.

@ghost
Copy link

ghost commented Mar 22, 2012

My idea was that regularization is adding a term to the cost function and as the gradient is linear, you can process the influence of regularization in a second time.
So I took the initial weights before the modification by the part of the gradient that is processed with learning examples and let this part of the gradient do its work.
Once it is done, i simply added the gradient of the regularization term.

This is why i needed initial weights. This also means that the code does not depend on the way you process the gradient on training examples.

@PetrToman
Copy link
Author

Well, the problem is that weights = newWeights in postIteration(), because the array is not cloned, but assigned by a reference (try printing out values).

@jeffheaton
Copy link
Owner

Thanks for the contributed code, I will take a look.

@ghost
Copy link

ghost commented Mar 27, 2012

I see your point Petr. This is why I used the setWeights(double[]) method in my postIteration() method.
((Propagation) train).getFlatTraining().getNetwork().setWeights(newWeights);

Thanks for your attention in my code.
Do you want me to comment/document it ?

@PetrToman
Copy link
Author

My point was, that weights actually don't keep the old values. Take a look at Jeff's code (1aa783d), I think this is the way you meant to implement it.

@ghost
Copy link

ghost commented Mar 27, 2012

Ok, my bad. I see my mistake now.
Thanks.

@jeffheaton
Copy link
Owner

Okay I implemented this, with the code fix, in Encog 3.2. Have not played with it much yet. Also added issues #96 and #97 to make this easily used in the workbench.

@thomasj02
Copy link

Actually I think this code is still incorrect. You don't want to regularize weights from bias inputs. It's not clear to me though how to figure out if a weight is from a bias input when you have the flat representation.

@joetanto
Copy link

joetanto commented Jun 8, 2015

I agree with @thomasj02, the bias terms must not be included when regularizing. I'd appreciate if someone can fix that. Thank you.

@vincenzodentamaro
Copy link

Well, allowing large biases gives our networks more flexibility in behaviour: large biases make it easier for neurons to saturate, which is sometimes desirable. So regularization on biases is not necessary.

@jeffheaton
Copy link
Owner

Since this was submitted, Encog has added dropout, L1 and L2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants