Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LxReg #273

Closed
goodfeli opened this issue May 5, 2013 · 12 comments
Closed

LxReg #273

goodfeli opened this issue May 5, 2013 · 12 comments
Milestone

Comments

@goodfeli
Copy link
Contributor

goodfeli commented May 5, 2013

What is up with the LxReg class? Does it actually work? How are you meant to make the "variables" argument to the init method actually get driven by the training or monitoring data?

@vdumoulin
Copy link
Member

I wrote this class to generalize L-1 regularization, L-2 regularization, and so on... (hence the name LxReg) and make it compatible with the whole cost framework. It's not a real 'cost', since it does not take into account the training or monitoring data, but it still integrates with other costs, so you can have an expression like

cost = NLL + L-2(weights)

I've used it a couple of times and it works well. You provide it with variables (say, your weight matrices) and the order 'x' of the regularization you want, and it computes the symbolic expression for you when you call it.

Maybe I should change the docstring to explain what it does more clearly?

@goodfeli
Copy link
Contributor Author

Ah, OK. Have you just been using it to regularize parameters directly? i.e., put an L2 norm on the weights? Is there a risk of it silently doing the wrong thing if someone tried to use it to put an L1 penalty on something data-dependent, like the activations of an MLP? Should it enforce that things in the "variables" list are shared variables or something like that?

@vdumoulin
Copy link
Member

Yes, only to regularize direct parameters. Currently there is no safeguard to prevent people from using it on something like the activations of an MLP, I guessed if it was obvious that this 'cost' was to be used as a regularizer, people would use it appropriately.

As long as imposing that the things in the 'variables' list are shared variables is not too restrictive (I can't think of any use case involving something other than shared variables for now), I think it could be a good idea. It all depends on what degree of freedom we want the user to have. Maybe a warning saying "This cost was intended to be used on shared variables, such as weights in a neural net, make sure your use of data-dependent variables is intended" would be sufficient?

@yoshua
Copy link

yoshua commented May 15, 2013

What would go wrong if they used it as a penalty on activations? That may be a desirable thing from a machine learning point of view. What is the Theano-level problem?

--Yoshua

On 2013-05-14, at 14:04, vdumoulin wrote:

Yes, only to regularize direct parameters. Currently there is no safeguard to prevent people from using it on something like the activations of an MLP, I guessed if it was obvious that this 'cost' was to be used as a regularizer, people would use it appropriately.

As long as imposing that the things in the 'variables' list are shared variables is not too restrictive (I can't think of any use case involving something other than shared variables for now), I think it could be a good idea. It all depends on what degree of freedom we want the user to have. Maybe a warning saying "This cost was intended to be used on shared variables, such as weights in a neural net, make sure your use of data-dependent variables is intended" would be sufficient?


Reply to this email directly or view it on GitHub.

@vdumoulin
Copy link
Member

That's a good point. I personally don't see any theano-level problem with
that.

On Wednesday, May 15, 2013, yoshua wrote:

What would go wrong if they used it as a penalty on activations? That may
be a desirable thing from a machine learning point of view. What is the
Theano-level problem?

--Yoshua

On 2013-05-14, at 14:04, vdumoulin wrote:

Yes, only to regularize direct parameters. Currently there is no
safeguard to prevent people from using it on something like the activations
of an MLP, I guessed if it was obvious that this 'cost' was to be used as a
regularizer, people would use it appropriately.

As long as imposing that the things in the 'variables' list are shared
variables is not too restrictive (I can't think of any use case involving
something other than shared variables for now), I think it could be a good
idea. It all depends on what degree of freedom we want the user to have.
Maybe a warning saying "This cost was intended to be used on shared
variables, such as weights in a neural net, make sure your use of
data-dependent variables is intended" would be sufficient?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/273#issuecomment-17952521
.

Vincent

@dwf
Copy link
Contributor

dwf commented May 15, 2013

The question is how actual data gets plugged into the back-end of that Theano graph, and what code is responsible for constructing these variables, etc.

@vdumoulin
Copy link
Member

I'm not sure I get what problems can arise from a Theano point of view. Can you explain yourself further?

@lamblin
Copy link
Member

lamblin commented Jun 17, 2013

Currently, what would happen if we used LxReg on activations is probably an error in Theano saying that some inputs to the trainin monitoring function (depending on whether LxReg is a training cost or monitoring cost) were not provided.
The reason is that all data-driven costs have to be derived from the data argument to their expr method, as this is the variable that will be used to hold the actual data during training or monitoring.
This data argument is conform to the (space, source) data_specs defined in get_data_specs. In this case, LxReg.get_data_specs() returns (NullSpace(), ''), which means this cost uses no data at all. Without data, activations (for instance) cannot be expressed.

If some models need to use Lx regularization on data-driven quantities (whether they depend on inputs, targets, both, or something else), a new Cost class will probably have to be defined, either for that particular case, or in a more general setting, but this could get complex quickly.

@vdumoulin
Copy link
Member

Ok, I see the issue, thanks!

Would it be bad design if we made LxReg compatible with data-driven quantities but used it to compute non-data-driven quantities too?

@vdumoulin
Copy link
Member

Has anyone had time to look at my pull request for this issue?

@goodfeli
Copy link
Contributor Author

I'm back now and working through the PRs. See the PR for comments.

@vdumoulin
Copy link
Member

Pull request was merged, closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants