Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Expose a plublic method to compute the objective function #28169

Open
jeremiedbb opened this issue Jan 18, 2024 · 5 comments
Open

RFC Expose a plublic method to compute the objective function #28169

jeremiedbb opened this issue Jan 18, 2024 · 5 comments
Labels
Hard Hard level of difficulty Meta-issue General issue associated to an identified list of tasks RFC

Comments

@jeremiedbb
Copy link
Member

jeremiedbb commented Jan 18, 2024

I think that it would be valuable that all estimators that optimize some objective function expose a public method to compute it.
My main motivation is for the callbacks, for early stopping or monitoring, but I'm sure it would be useful in other contexts.

To really be practical, the signature should be the same across all estimators. What I have in mind is something like:

def objective_function(self, X, y=None, *, sample_weight=None, normalize=False):
    y_pred = self.predict(X, y)
    # or Xt = self.transform(X) because some transformers do optimize an objective function.

    data_fit = <computation of the data fit term>
    penalization = <computation of the penalization term>

    if normalize:                     # allow to return a per sample
        data_fit /= X.shape[0]        # objective function
        penalization /= X.shape[0]    # for convenience
        # X.shape[0] probably needs to be replaced by sample_weight.sum()
 
    return data_fit + penalization, data_fit, penalization

If we want to compute the objective function of the training set during fitting, we could allow to provide the current variables that are required to compute it at a given iteration, encapsulated in a single argument (a dict):

def objective_function(self, X, y=None, *, sample_weight=None, fit_state=None, normalize=False):
    if fit_state is None:
        y_pred = self.predict(X)
    else:
        y_pred = <compute y_pred using the information in fit_state>

    ...

where the content of fit_state would be estimator specific, and detailed in the docstring of objective_function for each estimator.

@jeremiedbb jeremiedbb added Hard Hard level of difficulty RFC Meta-issue General issue associated to an identified list of tasks labels Jan 18, 2024
@ogrisel
Copy link
Member

ogrisel commented Jan 18, 2024

I think the normalize option should be true by default to make the train and test set values comparable on the same scale even if the test set is 10x smaller than the training set.

We might not want not expose an estimator-specific fit_state argument as part of the public API of scikit-learn estimators. To achieve the same results, estimator could optionally implement a private _objective_function() with the freedom to pass estimator specific kwargs with meaningful names.

When this private method is present, the callbacks would attempt to use the private method first by looking up kwargs names from the signature from the fit state (or assume that they are present as attributes on self, updated at each iteration of fit). Or alternatively we keep the estimator fit_state argument, but we only expose it in the private version of method.

About the return value(s) of this method:

I think it should be more than a scalar or a fixed number of scalars, but instead a dataclass (or namedtuple):

  • it should maybe communicate its name ("penalized_squared_error", "explained_variance", "inertia" ...),
  • it should communicate if this is something to minimize or maximize,
  • it should provide access to the aggregate scalar value,
  • but also the individual terms (data fits, and each regularizer).

If we return a dataclass, we can have an HTML repr for notebooks.

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Feb 22, 2024

I opened #28505 which implements it for NMF to help moving forward with this discussion.

I think it should be more than a scalar or a fixed number of scalars, but instead a dataclass

I agree and made it a dataclass

When this private method is present, the callbacks would attempt to use the private method first by looking up kwargs names from the signature from the fit state

I think it's better to have it in the public method because callbacks are public and users can write custom callbacks so they should not have to rely on a private method.

We might not want not expose an estimator-specific fit_state argument as part of the public API of scikit-learn estimators. To achieve the same results, estimator could optionally implement a private _objective_function() with the freedom to pass estimator specific kwargs with meaningful names.

Thinking more about it, I think that it's better to have 1 keyword param per fitted parameter of the model instead of a single param being a dict. The callback documentation will mention that if a callback needs to compute the objective function on the train set, the fit_state dict that the callback receives should be used: model.objective_fuction(X, y, **fit_state).

@ogrisel
Copy link
Member

ogrisel commented Feb 22, 2024

The callback documentation will mention that if a callback needs to compute the objective function on the train set, the fit_state dict that the callback receives should be used: model.objective_fuction(X, y, **fit_state).

I suspect that there might be cases that we have in fit_state things that are not useful to pass as a kwarg to objective_function. Should we filter those out based on the signature of the objective function? Or make all objective function signature include an extra **ignored_kwargs (that sounds ugly)...

@jeremiedbb
Copy link
Member Author

That's a good point. I guess I was planning to add a **kwargs param but I understand that it's not a very good pattern.

For the context, my motivation for being able to provide fit state parameters comes from when a callback would need to compute the objective function on the training set. It doesn't apply for any other metric and it doesn't apply for the validation set. And it's just to speed-up the computation, avoiding to call predict or transform.

So I'm wondering if we really need to have this/these param(s) at all. Maybe we can just accept that computing the objective function on the train set is as fast as computing it on the validation set, despite knowing that it could be done faster. What do you think ?

@ogrisel
Copy link
Member

ogrisel commented Feb 23, 2024

Maybe we can just accept that computing the objective function on the train set is as fast as computing it on the validation set, despite knowing that it could be done faster. What do you think ?

Indeed, let's start simple and consider adding API complexity only when it is really justified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hard Hard level of difficulty Meta-issue General issue associated to an identified list of tasks RFC
Projects
None yet
Development

No branches or pull requests

2 participants