Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor BaseEstimator's and Kernel's get_params and set_params #14175

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

alegonz
Copy link

@alegonz alegonz commented Jun 24, 2019

Reference Issues/PRs

Hi there, this is a PR for #13555 .

What does this implement/fix? Explain your changes.

  • Moved _get_param_names out of BaseEstimator as utility function now called get_param_names_from_constructor.
    • This is meant for reuse in gaussian_process.kernels.Kernel .
    • Added a SignatureError class to pass the signature from the utility function up to the appropriate exception message in BaseEstimator and gaussian_process.kernels.Kernel.
  • Modified gaussian_process.kernels.Kernel to use the new utility function and remove duplication.
  • Changed GenericUnivariateSelect._make_selector to use get_params instead of the new utility function so that it raises the proper error in case of a selector with inappropriate signature.
  • Also refactored get_params to call from a new _get_params_from method. This is so get_params can be easily overridden to take a signature of, for example, the parent class in custom subclasses of client code.

Any other comments?

  • Docstrings will be added later.
  • Other that raising SignatureError (subclassed from RuntimeError) these changes don't add or change current behavior, so I haven't added any new tests. If new tests might seem necessary, please let me know.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. There's a lot going on, so I hope you can accept a slow review!

sklearn/base.py Outdated
try:
param_names = get_param_names_from_constructor(self.__class__)
except SignatureError as e:
raise SignatureError("scikit-learn estimators should always "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still more or less duplicated with kernels. Should we share this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still more or less duplicated with kernels. Should we share this?

I left that duplication to keep the error message of each class (scikit-learn estimators..., scikit-learn kernels...). I thought of 1) having the same scikit-learn estimators... message for both, or 2) passing a string argument to get_param_names_from_constructor to edit the message, but both felt awkward.

Though looking at it again, I noticed that set_params is duplicated too, so maybe Kernel could just inherit from BaseEstimator . Are there any historical reasons as to why it does not does so? By inheriting It could also benefit from the pickling and repr stuff.

Also, Kernels do not include any nested estimator-like parameters right? get_params is only shallow which seems enough, but set_params can handle nested parameters, why is this?

@alegonz
Copy link
Author

alegonz commented Jun 26, 2019

Thanks. There's a lot going on, so I hope you can accept a slow review!

No worries! 👌

@jnothman
Copy link
Member

jnothman commented Jun 27, 2019 via email

@alegonz
Copy link
Author

alegonz commented Jun 28, 2019

Kernel does not have fit.

Oh, right. Forgot about that. I'm now realizing that even though Estimators are conceptually defined as having a fit method, BaseEstimator does not enforce this with, say, an @abstractmethod.

We can have something else that is a common base class.

Right, I'll see what I can do.

@alegonz
Copy link
Author

alegonz commented Jun 30, 2019

Just updated the PR with further refactoring. This time I extracted get_params and set_params into a GetSetParamsMixin, and changed BaseEstimator and Kernel to inherit from it.

While it looks like the pickling and repr stuff in BaseEstimator could also be used in Kernel, I decided to keep it minimal for the moment and only extract get/set_params.

Note that BaseEstimator and Kernel still override get_params and set_params. This it to preserve the current docstrings and error messages (also, Kernel.get_params is overridden to be shallow). I tried to avoid this and have a single docstring and error message for both cases, but couldn't think of a word that encompasses both estimators and kernels without being too broad. (The methods in GetSetParamsMixin are worded generically.)

sklearn/base.py Outdated
"""
try:
return super().get_params(deep=deep)
except SignatureError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be happier with super().__get_params(deep=deep, name='estimator') than the introduction of these new exception classes as sentinels.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll remove these exception classes and pass the name as an argument.

self.param_name = param_name


class SignatureError(RuntimeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this should be public

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I didn't think about that. Anyway, this will be gone as per the comment above.

@alegonz alegonz changed the title [WIP] Refactor BaseEstimator's _get_param_names and get_params Refactor BaseEstimator's and Kernel's get_params and set_params Jul 14, 2019
@alegonz
Copy link
Author

alegonz commented Jul 14, 2019

I removed the custom exception classes and rephrased the error messages.

I felt that, like the custom classes I had added, introducing a name='estimator' argument adds noise to the signature, and complicates things while adding little value. Both approaches are kinda overkill to just choose estimator or kernel in the error message. So I instead rephrased the error message of get_params and set_params to avoid these workarounds.

"specify their parameters in the signature"
" of their __init__ (no varargs)."
" %s doesn't follow this convention."
% self.__class__) from e
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The from e will show the problematic signature in the error raised by get_param_names_from_constructor.

" %s doesn't follow this convention."
% self.__class__) from e

def set_params(self, **params):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left to preserve the docstring.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use the inherited method instead of overriding, but I left it as I wasn't sure if there are any rules regarding docstrings.

for arg in args:
params[arg] = getattr(self, arg, None)
return params
% self.__class__) from e

def set_params(self, **params):
Copy link
Author

@alegonz alegonz Jul 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@alegonz
Copy link
Author

alegonz commented Aug 9, 2019

@jnothman Just rebased onto latest master. The try except with the warning on AttributeError was also refactored :)

@jnothman
Copy link
Member

jnothman commented Aug 12, 2019 via email

@alegonz
Copy link
Author

alegonz commented Aug 13, 2019

Sorry I'm short on review time at the moment. Give us reasonable time to get to this and ping if you need to

No worries 👍
Just updating with the latest changes so it doesn't go stale :)

…tion.

This is meant to reuse it in gaussian_process.kernels.Kernel.
Also added SignatureError to pass the signature and print the proper
error message depending on the class.
changed BaseEstimator and gaussian_process.kernels.Kernel to
inherit from it.

Also added InvalidParameterError to pass the parameter name and
print the proper error message in set_params depending on the class.
@mglowacki100
Copy link

@alegonz
Hi, I'm dealing with the same issue and I checked your PR. Should I define get_params for my child classes or is it enough to inherit from your BaseEstimator? I'm asking because such drop-in replacement doesn't work for me with **kwargs (only parameters child class are returned).
Btw. now I use my own solution. Namely in child class I need only to define:

    def get_params(self, deep=True):
        """
        Get parameters.
        Based on: https://stackoverflow.com/questions/59248211
        """
        params = super().get_params(deep)
        cp = copy.copy(self)
        cp.__class__ = cp.__class__.__bases__[0]
        params.update(cp.__class__.get_params(cp, deep))
        return params

Subseqeuntly, all grandchildren have automagically proprer signature. However, there are a few disadvantages:

  • you can use abstract class with this trick
  • __bases__[0] makes assumptation about order in multiple inheritance
  • code looks tricky

I've looked at your solution, particularly clone function and it seems to be based on the same trick plus a lot of checks and MixIn due to kernels. Am I right or miss something?

@alegonz
Copy link
Author

alegonz commented Jan 3, 2020

@mglowacki100

Hi there. Sorry for the late reply.

Should I define get_params for my child classes or is it enough to inherit from your BaseEstimator?

It is not enough to inherit from BaseEstimator. You would need to also override get_params to gather the parameters of its class and any super-class you might want to add explicitly. So for example you could do (copy-pasted from Issue #13555):

class MySubclass(SomeSKLearnEstimator):
    def __init__(self, myparam=42, **kwargs):
        # Just pass **kwargs.
        # No duplicated code.
        super().__init__(**kwargs)
        self.myparam = myparam
        
    def get_params(self, deep=True):
        # Gotta override get_params. But it shouldn't be hard.
        
        # params from parent
        parent_names = self._get_param_names(super())
        params = self._get_params_from(parent_names, deep=deep)
        
        # Add my params
        names = self._get_param_names(self.__class__)
        params.update(self._get_params_from(names, deep=deep))
        # or just simply
        # params['myparam'] = self.myparam
        
        return params


x = MySubclass(myparam=42, param1=2, param2='bar')
print(x.get_params())
# {'param1': 2, 'param2': 'bar', 'myparam': 42}

Currently _get_param_names(cls), which does the job of extracting the parameters from the __init__ signature, is a classmethod, making the above hard to achieve. So, the purpose of this PR is 1) to refactor that so things like the above becomes possible, and 2) remove some duplicated code in sklearn.gaussian_process.kernels.py.

AFAIK, without this PR one is left with three options:

  1. Your approach, which I would not suggest for the reasons you stated. Also, I would add that that approach has another disadvantage of not being explicit on which super-classes its getting the parameters from.
  2. Maybe you could also hack _get_param_names to unbind it from the class, so you can pass to it any class you like (see this comment).
  3. Give up and duplicate the parameters on all the sub-classes :(

I've looked at your solution, particularly clone function and it seems to be based on the same trick plus a lot of checks and MixIn due to kernels. Am I right or miss something?

clone has been part of sklearn.base from long ago and this PR is not touching any of that.

@adrinjalali
Copy link
Member

@alegonz this looks pretty good, would you be able to rebase on main and have a fresh start here?

@alegonz
Copy link
Author

alegonz commented Mar 8, 2024

Hi @adrinjalali , sorry for the late reply.

Sure, I'd be happy to retake this PR. It's been a while since I opened this PR and the related issue so please give me some time to regain the context, re-study the code and address the rebase conflicts. Do you have any specific timeline for this PR?

@adrinjalali
Copy link
Member

@alegonz not specific timeline. Just wanted to know if we should close this or give it to somebody else. But if you're happy to have another go, that'd be perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants