Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Rework Importance of Feature Scaling example #25012

Merged
merged 35 commits into from
Jan 19, 2023

Conversation

ArturoAmorQ
Copy link
Member

Reference Issues/PRs

Fixes #12282.

What does this implement/fix? Explain your changes.

This example can benefit from a "tutorialization". In particular, this PR adds a section regarding how nearest neighbors is sensitive to scaling.

Any other comments?

Side effect: Implements notebook style as intended in #22406.

@glemaitre
Copy link
Member

Weird that the CIs did not start. I merged main in the branch to trigger them.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 5 cent.

examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
ArturoAmorQ and others added 2 commits December 9, 2022 11:30
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
@glemaitre glemaitre self-assigned this Dec 9, 2022
@glemaitre glemaitre removed their assignment Dec 9, 2022
@ArturoAmorQ
Copy link
Member Author

I think all of your comments have been addressed, @glemaitre and @lorentzenchr.
Please let me know what do you think of my proposed solutions.

@glemaitre
Copy link
Member

I will have to check the rendering but I think that the proposal is already an improvement.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nitpicks.

examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
)
scaled_X_train = scaler.fit_transform(X_train)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: We could show the mean value of each feature, or min and max.

examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved

# %%
# The need for regularization is higher (lower values of `C`) for the data
# that was not scaled before applying PCA. From the plot we can confirm that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which plot? Is it over- or underfitting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By plotting the validation curves I realized that the training and test accuracy overlap too much to make a proper statement about over- or underfitting for the scenario with no standardization.

output1
output2

I think that it is better to avoid mentioning over-/underfitting to keep the example as simple as possible.

examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nitpicks. Otherwise LGTM.

examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre
Copy link
Member

I certainly broke the linter with my suggestion. Sorry @ArturoAmorQ

@glemaitre glemaitre merged commit 4b55dee into scikit-learn:main Jan 19, 2023
@ArturoAmorQ ArturoAmorQ deleted the scaling_importance branch January 19, 2023 16:12
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
adrinjalali pushed a commit that referenced this pull request Jan 24, 2023
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more didactic scaling example?
4 participants