Skip to content

ENH add possibility to have a callable for verbose_feature_names_out of ColumnTransformer#28934

Merged
adrinjalali merged 8 commits intoscikit-learn:mainfrom
MarcBresson:main
Jul 24, 2024
Merged

ENH add possibility to have a callable for verbose_feature_names_out of ColumnTransformer#28934
adrinjalali merged 8 commits intoscikit-learn:mainfrom
MarcBresson:main

Conversation

@MarcBresson
Copy link
Copy Markdown
Contributor

@MarcBresson MarcBresson commented May 2, 2024

What does this implement?

This brings the possibility to pass a callable to the verbose_feature_names_out parameter of ColumnTransformer. Instead of the new feature name being "transormer_name__feature_name", we could have "feature_name$this is amazing$TRANSFORMER_NAME".

Any other comments?

I have a few questions:

  • How do you type sklearn as there are no stub files ?
  • What is the version number I should put in .. versionchanged ?

In advance, thank you for your time.

@github-actions
Copy link
Copy Markdown

github-actions bot commented May 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a2bbf99. Link to the linter CI: here

@adrinjalali
Copy link
Copy Markdown
Member

cc @thomasjpfan

pretty sure we had the conversation about this at some point.

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

@MarcBresson
Copy link
Copy Markdown
Contributor Author

MarcBresson commented May 13, 2024

Hey,

just added the possibility to use string format. I also added unit tests.

One of the test I created creates name clashing, and I remember seeing something about that somewhere in sklearn's doc. However, I can't seem to put my hand on it.

From the behaviour of ColumnTransformer, I know it is not important since it uses np arrays rather than pandas dataframe, but do you think we should issue a warning somewhere in the doc?

-- EDIT -- I think it is supposed to raise an error, but the test passed.

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the test I created creates name clashing, and I remember seeing something about that somewhere in sklearn's doc.

Name clashing should raise an error.

I'm not decided on supporting both callable and string. If the string format can support 90% of the use cases, then I'm okay with just having the string formatting.

@adrinjalali
Copy link
Copy Markdown
Member

The added complexity here for callable is small. I agree with @thomasjpfan on string formats probably enough, but since the added complexity is small, I don't mind either way.

@MarcBresson
Copy link
Copy Markdown
Contributor Author

what are the next steps for this PR?

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the change log at doc/whats_new/v1.6.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@MarcBresson
Copy link
Copy Markdown
Contributor Author

Thank you! It is all done :)

thomasjpfan
thomasjpfan previously approved these changes Jul 2, 2024
Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan thomasjpfan dismissed their stale review July 2, 2024 19:43

Lint issue

@MarcBresson MarcBresson force-pushed the main branch 3 times, most recently from 0d94b69 to 2abd3bc Compare July 9, 2024 16:46
Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

printed as it is completed.

verbose_feature_names_out : bool, default=True
verbose_feature_names_out : bool, str or Callable[[str, str], str], default=True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rendered version of this now looks like:

image

I think it makes sense to make a bullet point list for the 3 cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it should be good. I'm gonna try to find how to compile the doc.

@MarcBresson MarcBresson force-pushed the main branch 5 times, most recently from f13055e to d4294da Compare July 16, 2024 13:21
@MarcBresson MarcBresson requested a review from adrinjalali July 16, 2024 15:31
MarcBresson and others added 2 commits July 19, 2024 15:27
…les in back quotes

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@MarcBresson
Copy link
Copy Markdown
Contributor Author

@adrinjalali don't hesitate to review. I will rebase my branch right before merging so you can still see that everything passes.

@adrinjalali adrinjalali merged commit 156ef1b into scikit-learn:main Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants