Skip to content

MAINT compatibility sklearn 1.6.0#1169

Merged
jeromedockes merged 5 commits intoskrub-data:mainfrom
glemaitre:sklearn_16_bis
Dec 10, 2024
Merged

MAINT compatibility sklearn 1.6.0#1169
jeromedockes merged 5 commits intoskrub-data:mainfrom
glemaitre:sklearn_16_bis

Conversation

@glemaitre
Copy link
Copy Markdown
Member

Alternative to #1135
closes #1135

Vendor the version of sklearn-compat (https://github.com/sklearn-compat/sklearn-compat).
This is still an RC but I wanted to give it a try.

We discussed with @jeromedockes to vendor the file instead of depending on the package.

@glemaitre
Copy link
Copy Markdown
Member Author

We will probably need to release 0.4.1 for this change when scikit-learn 1.6.0 will be released next week.


def __sklearn_tags__(self):
tags = super().__sklearn_tags__()
tags.transformer_tags = TransformerTags(preserves_dtype=[])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this class not inherit from TransformerMixin? Why not

Suggested change
tags.transformer_tags = TransformerTags(preserves_dtype=[])
tags.transformer_tags.preserves_dtype = []

to not override all the other tags from the parent classes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get into trouble with the set_output for this one: #1135 (comment)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is not mentioned in the comment. @jeromedockes can probably offer the detailed perspective.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that comment as "we don't need it, we don't have to use it". Now with tags, you kinda better use them, and if there's something you don't like from that mixin, you can always override.

Copy link
Copy Markdown
Member Author

@glemaitre glemaitre Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now with tags, you kinda better use them, and if there's something you don't like from that mixin, you can always override.

I recall that the discussion was around the set_output (we need @jeromedockes for the details but it might be related to avoiding any data copy or dtype conversion - keep the feature of the container library in short). We are fine having the tags, but inheriting from the TransformerMixin brings the all the infrastructure of set_output. Overwriting the inheritance is not easy while redefining the Tags that as in TransformerMixin is easy.

But I prefer to have @jeromedockes commenting to be sure that I don't say something unrelated.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If wrapping with set_output is adding copies, we should fix that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not add copies, at least when the output is already in the type of container the configuration is asking for. (AFAICT)

@jeromedockes
Copy link
Copy Markdown
Member

in the nightly build,

  • the failures in the deduplicate module are due to upgrading to scipy 1.15 (I think changes in the scipy.cluster module but need to check)
  • the failures in other modules are due to pandas changing datetime dtypes in 3.0.0

I think it might be easier to deal with this in separate prs since it is not related to scikit-learn

Comment thread skrub/_sklearn_compat.py
@@ -0,0 +1,572 @@
"""Ease developer experience to support multiple versions of scikit-learn.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we

  • have the URL of where this file came from here
  • exclude it from codecov, pre-commit formatting etc. maybe we should have something like a vendor/ submodule?

@jeromedockes
Copy link
Copy Markdown
Member

Hi @glemaitre I just realized that after we discussed it again and decided the PR was ready to merge I went again and ... merged the wrong PR! 🤦 -- I merged #1135 instead of this one. I know both work but this one with scikit-learn compat was your recommendation right?

I guess the best way forward is to revert the merge? WDYT

@jeromedockes jeromedockes merged commit 97d5b72 into skrub-data:main Dec 10, 2024
jeromedockes pushed a commit to jeromedockes/skrub that referenced this pull request Dec 11, 2024
TheooJ pushed a commit that referenced this pull request Dec 11, 2024
* bump version after 0.4.0 (#1162)

* DOC Correct typo in TableReport docstring (#1168)

* Add codespell support (#1126)

* add python 3.13 (#1170)

* add sections for the next release's changelogs (#1183)

* Add a verbosity parameter to TableReport to control the printing of messages in stdout when opening summary report (#1182)

* add verbose parameter to print_display to  toggle on or off printing of progress messages when generating table report (#1188)

* DOC add example for Cramer V for column_associations (#1186)

* ENH adding alias "regression" and "classification" (#1180)

* Bump codecov/codecov-action from 5.0.7 to 5.1.1 in the actions group (#1191)

* remove use of matplotlib.rc_context (#1172)

* MAINT adapt for scikit-learn 1.6 (#1135)

* Revert "MAINT adapt for scikit-learn 1.6 (#1135)" (#1194)

This reverts commit 18af508.

* MAINT compatibility sklearn 1.6.0 (#1169)

* MAINT fix nightly builds (#1193)

* Updating contributing doc to add more detail on how to start  (#1185)

* FIX better display of object columns with multi-line repr in tablereport (#1196)

* update changelog

---------

Co-authored-by: Matt J. <matthieu.jouis@gmail.com>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Co-authored-by: priscilla baah <66260270+priscilla-b@users.noreply.github.com>
Co-authored-by: Reshama Shaikh <reshama.stat@gmail.com>
Co-authored-by: mrastgoo <mojdeh.rastgoo@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Riccardo Cappuzzo <7548232+rcap107@users.noreply.github.com>
TheooJ pushed a commit that referenced this pull request Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants