Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1. score_pairs refactor #333

Merged
merged 22 commits into from
Oct 21, 2021

Conversation

mvargas33
Copy link
Contributor

See #329

In the aim of making metric learn compatible with similarity learners like OASIS and pseudo-distance Mahalanobis learners (all current learners) I take a step forward deprecating score_pairs with a FutureWarning, and replacing it with pair_score and pair_distance, as discussed in #329 .

After thinking a while between pairwise_similarity and pair_similarity names, I think that using ''pairwise'' gives the user the intuition of similarity between all pairs, so a matrix is expected as output, not a list. Just like sklearn's pairwise_distances (See here).

Right now it does not makes too much sense to have pair_distances and pair_similarity only for Mahalanobis learners because pair_similarity it's just the inverse of the distance. But, this PR is needed to proceed with #329 and then with #330.

As discussed before, this change makes the code very extendible for new types of learners, for instance a StringEdit learner could fit in the library seamlessly extending from BaseLearner regardless the algorithm learns a similarity or a pseudo-distance.

It also does not affect Classifiers that much, as they can work with pair_distance from now on.

In contrast to this proposal, using isinstance() for each kind of learner inside the current score_pairs does not look like a good solution, in my humble opinion.

Ps: If the discussion goes in favor, I'll check all current tests, one by one, to make sure they don't become broken.

@mvargas33
Copy link
Contributor Author

The general functionality is ready. What is left to be done is to produce a better documentation. I'm on it.

@mvargas33 mvargas33 changed the title [WIP] score_pairs refactor score_pairs refactor Oct 11, 2021
@bellet
Copy link
Member

bellet commented Oct 12, 2021

@mvargas33 is this ready to review?

@mvargas33
Copy link
Contributor Author

@mvargas33 is this ready to review?

Yes, it is ! The code and the docs are ready

@mvargas33 mvargas33 changed the title score_pairs refactor 1. score_pairs refactor Oct 13, 2021
Copy link
Contributor

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a full review yet, but overall I'm +1 for the idea.

metric_learn/base_metric.py Outdated Show resolved Hide resolved
metric_learn/base_metric.py Outdated Show resolved Hide resolved
Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a full review!

In general, please run spellcheck to avoid too many typos :-)

doc/introduction.rst Outdated Show resolved Hide resolved
doc/supervised.rst Outdated Show resolved Hide resolved
doc/supervised.rst Outdated Show resolved Hide resolved
doc/supervised.rst Outdated Show resolved Hide resolved
doc/supervised.rst Outdated Show resolved Hide resolved
test/test_pairs_classifiers.py Outdated Show resolved Hide resolved
test/test_pairs_classifiers.py Outdated Show resolved Hide resolved
@@ -148,7 +148,7 @@ def test_array_like_inputs(estimator, build_dataset, with_preprocessor):
pairs = np.array([[X[0], X[1]], [X[0], X[2]]])
pairs_variants, _ = generate_array_like(pairs)
for pairs_variant in pairs_variants:
estimator.score_pairs(pairs_variant)
estimator.pair_distance(pairs_variant)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all metric learners will have pair_distance implemented
so use pair_score instead
perhaps you can test pair_distance only when it is implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now its tested with pair_score, but it also checks that pair_distance can be used.

In case a learner does not have pair_distance, then the exception must match that it cannot be used with that specific learner.

But as this PR does not contemplate any learner non-mahalnobian, the exception msg is empty as a To-Do for #329

@@ -834,8 +834,8 @@ def test_error_message_tuple_size(estimator, _):

@pytest.mark.parametrize('estimator, _', metric_learners,
ids=ids_metric_learners)
def test_error_message_t_score_pairs(estimator, _):
"""tests that if you want to score_pairs on triplets for instance, it returns
def test_error_message_t_pair_distance(estimator, _):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before: pair_distance will not always be implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generalized with a try/catch. If there is an exception, it can only happen if pair_distance is not implemented. Did this in test_utils.py and test_sklearn_compat.py.

test/test_utils.py Outdated Show resolved Hide resolved
@mvargas33 mvargas33 requested a review from bellet October 19, 2021 14:44
Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mvargas33, still some small things to fix but overall looks good now!
@perimosocordiae do you want to take a pass before we merge?

doc/supervised.rst Outdated Show resolved Hide resolved
doc/weakly_supervised.rst Outdated Show resolved Hide resolved
Comment on lines 68 to 71
Returns the similarity score between pairs of points. Depending on the
algorithm, this method can return the learned similarity score between
pairs, or the opposite of the distance learned between pairs. The larger
the score, the more similar the pair. All learners have access to this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this is a bit heavy. I would recommend simply:
"Returns the similarity score between pairs of points (the larger the score, the more similar the pair). For metric learners that learn a distance, the score is simply the opposite of the distance between pairs."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed and kept the last sentence "All learners have access to this method".

1D arrays and cannot use a preprocessor. Besides, the returned function
is independent of the metric learner and hence is not modified if the
metric learner is.
two points. The difference between `pair_score` and `pair_distance` is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, this is in the "see also" of pair_score, so it should only be "The difference with pair_score"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually of the old score_pairs, so I changed it for "The difference with score_pairs".

pair_score and pair_distance 's "See Also" are ok

metric_learn/base_metric.py Outdated Show resolved Hide resolved
metric_learn/base_metric.py Outdated Show resolved Hide resolved
test/test_mahalanobis_mixin.py Outdated Show resolved Hide resolved
metric_learn/base_metric.py Outdated Show resolved Hide resolved
Comment on lines 65 to 81
def test_raise_not_fitted_error_if_not_fitted(estimator, build_dataset,
with_preprocessor):
"""Test that a NotFittedError is raised if someone tries to use
score_pairs, decision_function, get_metric, transform or
pair_score, score_pairs, decision_function, get_metric, transform or
get_mahalanobis_matrix on input data and the metric learner
has not been fitted."""
input_data, labels, preprocessor, _ = build_dataset(with_preprocessor)
estimator = clone(estimator)
estimator.set_params(preprocessor=preprocessor)
set_random_state(estimator)
with pytest.raises(NotFittedError):
with pytest.raises(NotFittedError): # Remove in 0.8.0
estimator.score_pairs(input_data)
with pytest.raises(NotFittedError):
estimator.pair_score(input_data)
with pytest.raises(NotFittedError):
estimator.decision_function(input_data)
with pytest.raises(NotFittedError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to keep in mind for #329: some of these "raise_not_fitted" tests should actually be moved to base metric / Mahalanobis mixin / bilinear mixin tests. Indeed some of these methods will be present for all metric learners, some of them only for Mahalanobis but not bilinear (eg, transform), etc. The tests for pairs_classifiers (and other classifiers) should only be for methods specific to the associated class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was raised some time ago in #160 and #136 . The time to refactor the tests structure is finally here. I'll look forward to create another PR regarding only this.

@mvargas33 mvargas33 requested a review from bellet October 20, 2021 09:26
Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @mvargas33!
Let's wait a bit to see if others have some time to review before we merge.

test/test_base_metric.py Outdated Show resolved Hide resolved
test/test_sklearn_compat.py Outdated Show resolved Hide resolved
@bellet bellet merged commit aaf8d44 into scikit-learn-contrib:master Oct 21, 2021
@bellet
Copy link
Member

bellet commented Oct 21, 2021

Merged, thanks @mvargas33!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants