Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Deprecate WMinkowskiDistance & make MinkowskiDistance accept weights #21873

Merged
merged 14 commits into from Jan 20, 2022
Merged

[MRG] Deprecate WMinkowskiDistance & make MinkowskiDistance accept weights #21873

merged 14 commits into from Jan 20, 2022

Conversation

yarkhinephyo
Copy link
Contributor

Reference Issues/PRs

Resolves #21765

What does this implement/fix? Explain your changes.

  • Add FutureWarning to _dist_metrics.WMinkowskiDistance
  • Modify _dist_metrics.MinkowskiDistance to accept optional weights w
  • Validate the values in w
  • Update tests and documentation accordingly

Any other comments?

Nil

@yarkhinephyo yarkhinephyo changed the title [WIP] Deprecate WMinkowskiDistance & make MinkowskiDistance accept weights [MRG] Deprecate WMinkowskiDistance & make MinkowskiDistance accept weights Dec 4, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the follow-up PR. This is really appreciated. Please find some suggestions/comments to address below:

sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_dist_metrics.py Show resolved Hide resolved
sklearn/metrics/tests/test_dist_metrics.py Outdated Show resolved Hide resolved
doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
@yarkhinephyo yarkhinephyo changed the title [MRG] Deprecate WMinkowskiDistance & make MinkowskiDistance accept weights [WIP] Deprecate WMinkowskiDistance & make MinkowskiDistance accept weights Dec 5, 2021
@yarkhinephyo
Copy link
Contributor Author

Thank you so much for the detailed suggestions @ogrisel! I have incorporated the changes.

@yarkhinephyo yarkhinephyo changed the title [WIP] Deprecate WMinkowskiDistance & make MinkowskiDistance accept weights [MRG] Deprecate WMinkowskiDistance & make MinkowskiDistance accept weights Dec 5, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the updated PR looks good to me!

Can you please run a quick interactive benchmark using ipython or jupyter and the %timeit magic command to check that this does not cause a significant performance regression compared to main when computing unweighted pairwise Minkowski distances between two random arrays in a 10d dimensional space?

The extra if has_w: check is probably degrading the performance by up to 30% or so in very low dimensional space (e.g. 2d or 3d) but so be it.

@yarkhinephyo
Copy link
Contributor Author

Main branch's vs current branch's unweighted Minkowski -

from sklearn.utils import check_random_state
from sklearn.metrics import DistanceMetric

rng = check_random_state(0)
d = 10
X1 = rng.random_sample((100, d)).astype("float64", copy=False)
X2 = rng.random_sample((100, d)).astype("float64", copy=False)

dm = DistanceMetric.get_metric("minkowski", p=3)
%%timeit
dm.pairwise(X1, X2)
# Main branch, d = 2
1.2 ms ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Current branch, d = 2
1.31 ms ± 29.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Main branch, d = 10
3.79 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Current branch, d = 10
3.74 ms ± 84.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@ogrisel
Copy link
Member

ogrisel commented Dec 9, 2021

Great, so no performance problems. Thanks for benchmarking.

@ogrisel
Copy link
Member

ogrisel commented Dec 9, 2021

The sphinx warnings from https://165839-843222-gh.circle-artifacts.com/0/doc/_changed.html seems unrelated to this PR. Let me push a merge commit to check if they have been concurrently resolved in main.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution, @yarkhinephyo.

Here are a few comments.

sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pyx Show resolved Hide resolved
sklearn/metrics/tests/test_dist_metrics.py Outdated Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved
doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pyx Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pyx Outdated Show resolved Hide resolved
doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after formatting with black.

@jjerphan
Copy link
Member

jjerphan commented Jan 20, 2022

The errors on the CI here look unrelated to your PR.

Can you re-trigger the CI by adding an empty commit, @yarkhinephyo? This can be done by:

git commit --allow-empty -m "Just to trigger the CI"

Edit: I just did it to move forward.

@jjerphan jjerphan merged commit 49043fc into scikit-learn:main Jan 20, 2022
@jjerphan
Copy link
Member

It indeed just was a random failure on the CI.

mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022
…ights (scikit-learn#21873)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate WMinkowskiDistance and make MinkowskiDistance accept weights directly
3 participants