[MNT] Refactor `.clone()` #281

tpvasconcelos · 2024-01-31T23:23:15Z

Reference Issues/PRs

#280

What does this implement/fix? Explain your changes.

Simplify .clone()'s implementation to 3 easy-to-understand steps:
1. clone self
2. if check_clone, run extra checks
3. return cloned object
Refactor extra clone checks into a private _check_clone() function
Refactor the private ._check() method into a private _check() function
- I think this is more idiomatic since the current ._check() implementation was not accessing any instance (self) attributes or methods. It would only recursively call itself again as self._clone(...), in practice making it a static method.
- I don't think this function should ever depend on any private instance attributes or ever mutate the state of the object being cloned.
- In practice, this is 100% equivalent in behaviour but I think it's a more intentional and idiomatic implementation.
- As a bonus (but probably not coincidentally), makes it more in line with sklearn's reference implementation

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Any other comments?

PR checklist

For all contributions

I've reviewed the project documentation on contributing
I've added myself to the list of contributors.
The PR title starts with either [ENH], [CI/CD], [MNT], [DOC], or [BUG] indicating whether
the PR topic is related to enhancement, CI/CD, maintenance, documentation, or a bug.

For code contributions

Unit tests have been added covering code functionality
Appropriate docstrings have been added (see documentation standards)
New public functionality has been added to the API Reference

codecov · 2024-01-31T23:25:47Z

Codecov Report

Attention: Patch coverage is 84.21053% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 84.62%. Comparing base (306958d) to head (5b81057).
Report is 9 commits behind head on main.

Files	Patch %	Lines
skbase/base/_base.py	84.21%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #281      +/-   ##
==========================================
- Coverage   85.07%   84.62%   -0.46%     
==========================================
  Files          45       45              
  Lines        3015     3050      +35     
==========================================
+ Hits         2565     2581      +16     
- Misses        450      469      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

skbase/base/_base.py

skbase/tests/conftest.py

fkiraly · 2024-01-31T23:49:29Z

Nice, thanks!

Btw, I'm not sure I remember where the "check clone" code is coming from. I added the config to ignore it by default but still have it there just in case. I always thought it is sth from sklearn, but apparently not?

tpvasconcelos · 2024-02-01T00:21:57Z

@fkiraly thanks for taking a look at this already. I opened a draft PR just because I wanted to start typing some notes in the description.

Btw, I'm not sure I remember where the "check clone" code is coming from. I added the config to ignore it by default but still have it there just in case. I always thought it is sth from sklearn, but apparently not?

It’s a bit strange indeed. It doesn’t seem like the right place to run such checks. If these checks ever fail it’s because the BaseObject’s implementation is not compliant with the scikit estimator protocol which requires all params passed to the constructor to be set as instance variables “as is” during instantiation. Having these checks at runtime during a cloning call seems quite arbitrary and out of place (and unnecessary overhead). If someone wants to check whether their estimator is scikit-compliant, this should probably happen in a test suite and maybe a helper function could be provided (maybe already exists?). Also, the check only runs for self and never for inner/nested objects

fkiraly · 2024-02-01T00:41:55Z

It’s a bit strange indeed. It doesn’t seem like the right place to run such checks.

Ah, I remember!

I agree with you and that's why I turned it into an opt-in configurable feature.

When we refactored the base classes from sktime, we added some tests independent of sktime estimator types. The test test_clone_raises_error_for_nonconforming_objects was added, which checks for various error messages being raised.

I did not add that test (it was @RNKuhns), I believe they were added because back then sklearn had similar tests. This was at a time where BaseObject.clone still dispatched to sklearn clone directly.

Then we removed the sklearn dependency, I must have copied over code, that must have caused both tests and code to be in the same package - which must have been when I saw those strange checks and thought the same, seems out of place at runtime, instead tests should check that. But the old tests want that, so I added a config that makes the checks opt-in, and set that config in the test test_clone_raises_error_for_nonconforming_objects (instead of deleting both checks and the tests, which I thought was the "safer" bet, just in case this is still sth sklearn wants under who knows what circumstance).

tpvasconcelos · 2024-02-01T08:47:50Z

Thanks for the background!

What's your take, should this be deprecated and removed in a future version? Or you see value in leaving it in?

fkiraly · 2024-02-01T09:18:55Z

Hm, I would say, let's deprcate, given that it seems gone in sklearn - if a user sets the config, they should be pointed to a check_estimator-like utility instead.

This could be TestAllObjects.run_tests, or w could make a check_estimator like in sktime that points to it.

tpvasconcelos · 2024-02-21T13:25:08Z

@fkiraly For the reasons mentioned above, I think that moving the cloning logic to a separate function makes sense. In that regard, this PR would be ready to review and merge.

However, after doing a quick scan through the codebase, it seems like the use of the term 'estimator' is misused in a few places. I guess that abstracting BaseEstimator was the initial motivation for this framework but then you also abstracted it further into a BaseObject but never updated the explicit references to "estimator" in these base classes. Specifically I'm talking about references in the BaseObject and _FlagManager implementations, both of which should be agnostic to whether the actual implementation is an "estimator" or something else.

Some of these references are easy to fix (just replace "estimator" with something else {"base object"?}). See tpvasconcelos#2 for example. However, both BaseObject.clone_tags() and other methods in _FlagManager use the term estimator in their signature. Changing this would cause a breaking change.

Thoughts?

fkiraly · 2024-02-21T16:35:26Z

In that regard, this PR would be ready to review and merge.

Makes sense - do you want ot turn it done then, and resolve the QL issues?

However, after doing a quick scan through the codebase, it seems like the use of the term 'estimator' is misused in a few places.

Yes - the history is that, initially only estimators (with "fit") were inheriting from a sklearn BaseEstimator-like interface. This expanded as different kinds of object became necessary, i.e., without fit, in sktime - leading to the BaseObject abstraction. However, by that time, many docstrings and variable names already had "estimator" in them.

In creating skbase, we did quite some renaming to avoid confusion, but mostly focused on user or developer facing locations. Already that created a lot of problems, since renaming has to be consistent everywhere, and also it needs to be consistent with a refactor in sktime. I still half wish we hadn't renamed anything, see, e.g., the refactor sktime/sktime#4283 which fails for unclear reasons but likely because of renaming - comparing to other packages where using skbase from scratch has created no problems.

fkiraly · 2024-02-23T12:49:32Z

So, @tpvasconcelos, if you're done - could you remove the WIP tag and turn the PR to ready? Just to make sure we're not merging something incomplete.

tpvasconcelos · 2024-02-23T12:55:21Z

Opened!

tpvasconcelos · 2024-02-23T13:13:01Z

@fkiraly I'll open this PR once we close this one: #293

(I don't know how to daisy-chain PRs from fork repos)

fkiraly · 2024-02-24T23:28:50Z

(no major change, just moved _clone to the end for better readability)

…more general references (#293) #### Reference Issues/PRs Depends on #281 #### What does this implement/fix? Explain your changes. See #281 (comment) and relevant replies.

tpvasconcelos added 6 commits January 31, 2024 15:14

Decouple _clone method into _sklearn_clone stateless function

8aaeda3

Update comment

1623546

create _check_clone

6112c4d

Merge branch 'clone-mnt' into refactor-clone

bba8389

return clone from _check_clone

3a21948

rename _sklearn_clone to _clone

0414851

github-advanced-security bot found potential problems Jan 31, 2024

View reviewed changes

skbase/base/_base.py Dismissed Show resolved Hide resolved

remove return from _check_clone

e6d85a0

fkiraly reviewed Jan 31, 2024

View reviewed changes

skbase/tests/conftest.py Show resolved Hide resolved

fkiraly added the enhancement Adding new functionality label Jan 31, 2024

Merge branch 'main' into refactor-clone

54c77c9

tpvasconcelos changed the title ~~[WIP][MNT] Refactor .clone()~~ [MNT] Refactor .clone() Feb 23, 2024

Merge branch 'main' into refactor-clone

8585ccf

tpvasconcelos marked this pull request as ready for review February 23, 2024 12:54

tpvasconcelos mentioned this pull request Feb 23, 2024

[DOC] Replace use of "estimator" term in base object interfaces with more general references #293

Merged

6 tasks

move clone fun to end

5b81057

fkiraly mentioned this pull request Feb 24, 2024

[ENH] skbase clone vs sklearn clone - comparison and actions #280

Closed

fkiraly approved these changes Feb 25, 2024

View reviewed changes

fkiraly merged commit 580244e into sktime:main Feb 25, 2024
21 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] Refactor `.clone()` #281

[MNT] Refactor `.clone()` #281

tpvasconcelos commented Jan 31, 2024 •

edited

Loading

codecov bot commented Jan 31, 2024 •

edited

Loading

fkiraly commented Jan 31, 2024

tpvasconcelos commented Feb 1, 2024

fkiraly commented Feb 1, 2024 •

edited

Loading

tpvasconcelos commented Feb 1, 2024

fkiraly commented Feb 1, 2024

tpvasconcelos commented Feb 21, 2024

fkiraly commented Feb 21, 2024

fkiraly commented Feb 23, 2024

tpvasconcelos commented Feb 23, 2024

tpvasconcelos commented Feb 23, 2024

fkiraly commented Feb 24, 2024

[MNT] Refactor .clone() #281

[MNT] Refactor .clone() #281

Conversation

tpvasconcelos commented Jan 31, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Any other comments?

PR checklist

For all contributions

For code contributions

codecov bot commented Jan 31, 2024 • edited Loading

Codecov Report

fkiraly commented Jan 31, 2024

tpvasconcelos commented Feb 1, 2024

fkiraly commented Feb 1, 2024 • edited Loading

tpvasconcelos commented Feb 1, 2024

fkiraly commented Feb 1, 2024

tpvasconcelos commented Feb 21, 2024

fkiraly commented Feb 21, 2024

fkiraly commented Feb 23, 2024

tpvasconcelos commented Feb 23, 2024

tpvasconcelos commented Feb 23, 2024

fkiraly commented Feb 24, 2024

[MNT] Refactor `.clone()` #281

[MNT] Refactor `.clone()` #281

tpvasconcelos commented Jan 31, 2024 •

edited

Loading

codecov bot commented Jan 31, 2024 •

edited

Loading

fkiraly commented Feb 1, 2024 •

edited

Loading