[MRG+2] Update repo to work with both new and old scikit-learn #313

wdevazelhes · 2021-04-07T07:08:23Z

Fixes #311

I added the workaround suggested by @bellet here: Update metric-learn to work with the most recent scikit-learn when doing pytest #311 (comment), so that imports work both in sklearn <0.24 and >= 0.24 (EDIT: actually the old import strategy was already deprecated in 0.22, so I compared the version to 0.22)
as well as an additional travis job to test (I chose the one with python 3.6 since it's the oldest python so I thought if someting goes wrong it might be this one, and also with skggm, since I also I'm thinking something might have more chance to go wrong when using another package... I could have put all the checks (python3.6+3.7, with/without skggm), but then I thought the travis test suite might take quite some time)

…vis job to test the old scikit-learn

wdevazelhes · 2021-04-07T09:27:19Z

Actually, tests are failing because of some LMNN problem similar to the discussion in #309 : a test is run on LMNN with classes that have too few labels.
The fact that this pb happens in only some travis tests could be due to some different random states depending on the order/quantity of runned tests depending on the particular travis test (with/without skggm) (+ I guess the change of library versions wrt to the past could maybe impact the seed? otherwise I don't know why tests were passing before) so I set the random seed in the failing test so that they should all either fail or pass
After having a consistent behaviour we can decide what to do (either adapt the test to have big enough classes, or adapt LMNN to accept classes with few labels)
I'll react on #309 accordingly since the decision should also impact the new LMNN PR (for instance we can decide to merge it as is since actually the current LMNN is even more strict than the newer PR (it needs at least n_neighbors + 1 elements in each class, not just 2), so the new PR would actually allow to deal with more cases, and we could deal with singleton classes in another PR)

wdevazelhes · 2021-04-07T09:33:32Z

Ok so there's just the travis test with the old scikit learn that is failing, because of the string representation of estimators, I'll look into that
But at least fixing the random seed indeed fixed the other tests (temporarily since we should not depend on the random seed)

wdevazelhes · 2021-04-12T08:20:23Z

test/test_base_metric.py



 def remove_spaces(s):
  return re.sub(r'\s+', '', s)


+def sk_repr_kwargs(def_kwargs, nndef_kwargs):


Here I thought it could be good to test the str repr for both the old (<=0.22) sklearn version and the newer ones, so I made a string representation that depends on the sklearn version

wdevazelhes · 2021-04-12T08:21:50Z

test/test_sklearn_compat.py

@@ -121,7 +131,8 @@ def test_array_like_inputs(estimator, build_dataset, with_preprocessor):

  # we subsample the data for the test to be more efficient
  input_data, _, labels, _ = train_test_split(input_data, labels,
-                                              train_size=20)


Test was failing here because of classes with too few labels in LMNN (see comments above), so in the end I created a toy example with a bit more samples (which I guess makes sense because the role of this particular test is not to test edge cases, but rather the fact that array-like objects work with our estimators),

wdevazelhes · 2021-04-12T08:24:41Z

I guess the PR is ready to merge :)

wdevazelhes · 2021-04-12T08:29:06Z

test/test_sklearn_compat.py


  def test_scml(self):
-    check_estimator(SCML_Supervised)
+    check_estimator(SCML_Supervised())


Here scikit-learn had return an error saying that checks should be run on estimators instances, not classes

terrytangyuan

Just a small comment. Otherwise LGTM

terrytangyuan · 2021-04-12T12:41:00Z

.travis.yml

@@ -39,6 +39,18 @@ matrix:
      - pytest test --cov;
    after_success:
      - bash <(curl -s https://codecov.io/bash)
+  - name: "Pytest python 3.6 with skggm + scikit-learn 0.20.3"


Perhaps add a note here to clarify this additional test's purpose

I agree, done

wdevazelhes · 2021-04-12T12:49:40Z

Thanks for the review @terrytangyuan! I adressed your comment

perimosocordiae · 2021-04-12T13:20:27Z

test/test_base_metric.py

+def strify(obj):
+    """Function to add the quotation marks if the object
+    is a string"""
+    if type(obj) is str:


if isinstance(obj, str)

Depending on the inputs we expect to pass, it might be simpler to call json.dumps(obj) instead.

That's right, thanks ! Will do

actually there was a pb for booleans (json.dumps casts them to lowercase, i.e. json.dumps(True) returns true), but I found online that repr can do the same without this pb

perimosocordiae · 2021-04-12T13:23:34Z

test/test_base_metric.py

+    """
+    if skversion >= version.parse('0.22.0'):
+        def_kwargs = ""
+    nndef_kwargs = eval(f"dict({nndef_kwargs})")


Rather than using eval, could we pass actual dicts here instead? The values are all simple literals, so it should be easier that way.

Yes you're right, will do

perimosocordiae · 2021-04-12T13:25:46Z

test/test_mahalanobis_mixin.py

-from sklearn.utils.testing import set_random_state
+import sklearn
+from packaging import version
+if version.parse(sklearn.__version__) >= version.parse('0.22.0'):


This seems like something we could do once during module initialization, then have a global constant like SKLEARN_AT_LEAST_0_22 that we could use.

Or make a sklearn_shims.py file that does the conditional import, and have everything else import from that.

I agree, will do

wdevazelhes · 2021-04-13T09:53:01Z

Thanks for the review @perimosocordiae ! I'll do the changes and set the PR to [MRG] as soon as I finish

wdevazelhes · 2021-04-13T13:19:57Z

metric_learn/sklearn_shims.py

+    from sklearn.utils.estimator_checks import is_public_parameter
+    from sklearn.metrics.scorer import get_scorer
+
+__all__ = ['set_random_state', 'assert_warns_message', 'set_random_state',


writing this __all__ lines were necessary for flake8 not to return errors

wdevazelhes · 2021-04-13T13:21:28Z

@perimosocordiae I adressed your comments, I guess the PR is now ready to merge :)

bellet

+1 for merging, thanks @wdevazelhes!

bellet · 2021-04-15T09:48:46Z

Somehow we do not see all checks (it happens sometimes), maybe do an empty commit to make sure @wdevazelhes?

wdevazelhes · 2021-04-15T10:15:04Z

Thanks for the review @bellet !

I tried to do an empty commit but we still see 1 line for the check, indeed it's weird... Although clicking on details we see all checks, so maybe there was a change of display of travis checks on github ?

wdevazelhes · 2021-04-15T10:18:39Z

@perimosocordiae, do you approve my updates after your review ?

bellet · 2021-04-15T14:00:46Z

Thanks for the review @bellet !

I tried to do an empty commit but we still see 1 line for the check, indeed it's weird... Although clicking on details we see all checks, so maybe there was a change of display of travis checks on github ?

We don't see test coverage though for some reason

wdevazelhes · 2021-04-15T16:12:04Z

We don't see test coverage though for some reason

Really ? If I click on "show all checks" and then "details", here is what I see:

perimosocordiae · 2021-04-15T16:15:59Z

Merged, thanks!

bellet · 2021-04-15T16:29:08Z

We don't see test coverage though for some reason

Really ? If I click on "show all checks" and then "details", here is what I see:

Where do you see test coverage (codecov)?

wdevazelhes · 2021-04-15T17:01:25Z

Where do you see test coverage (codecov)?

Ah yes that's right sorry ! Actually they sent me an email telling me that their bash uploader was hacked: https://about.codecov.io/security-update/?utm_medium=email&utm_source=marketo&ajs_uid=2386266&ajs_event=Clicked%20Email&mkt_tok=MzMyLUxWWC03NDEAAAF8dJ5mOA5H1E5ORbGsBiNKeiALhIlihm9VPW70i0qLTSuToRz_FC1YHHXfXcvCiatj0yHQxAfkDKCri3j9ksaM0_8pH36YziLCqWSOcdIt
So I guess the pb is probably because of that, I'll try to look at it see if they say any guideline for fixing the pb

wdevazelhes · 2021-04-15T17:07:47Z

Merged, thanks!

Thanks @perimosocordiae !
I'll try to look into the codecov pb noticed by @bellet (see above), to double check that this PR indeed checks the coverage requirement

wdevazelhes · 2021-04-15T17:26:33Z

It seems the codecov is OK here is what I see on codecov in the "pull" section:

bellet · 2021-04-15T17:47:06Z

Where do you see test coverage (codecov)?

Ah yes that's right sorry ! Actually they sent me an email telling me that their bash uploader was hacked: https://about.codecov.io/security-update/?utm_medium=email&utm_source=marketo&ajs_uid=2386266&ajs_event=Clicked%20Email&mkt_tok=MzMyLUxWWC03NDEAAAF8dJ5mOA5H1E5ORbGsBiNKeiALhIlihm9VPW70i0qLTSuToRz_FC1YHHXfXcvCiatj0yHQxAfkDKCri3j9ksaM0_8pH36YziLCqWSOcdIt
So I guess the pb is probably because of that, I'll try to look at it see if they say any guideline for fixing the pb

Good catch! They have guidelines on what to do on the above page

wdevazelhes · 2021-04-16T12:38:39Z

I opened a draft PR here #314, running their recommended env command, and it doesn't seem to leak any sensitive information (what env prints can be seen for instance ), (I was wondering about the SSH_CONNECTION variable that is printed but I checked and it contains no password just IP adresses and port numbers)

As a precaution, on app.codecov.io, I clicked on "Regenerate" for the "Repository Upload Token", but I don't even think it had any effect, since I searched for "codecov" in our code, and I see nowhere any token

So I think we are safe,

However codecov still doesn't appear in the "all checks" of the test PR #314, so I think we can wait a little bit: they may be still fixing problems
In the meantime we can still check manually the code coverage for next PRs directly on the codecov website.

Update repo to work with both new and old scikit-learn, and add a tra…

3760c60

…vis job to test the old scikit-learn

wdevazelhes mentioned this pull request Apr 7, 2021

Update metric-learn to work with the most recent scikit-learn when doing pytest #311

Closed

Add random seed for test

777f8d6

wdevazelhes added 2 commits April 11, 2021 12:44

fix str representation for various sklearn versions

e5c240d

fix flake8 error

b754930

wdevazelhes changed the title ~~[WIP] Update repo to work with both new and old scikit-learn~~ [MRG] Update repo to work with both new and old scikit-learn Apr 12, 2021

add more samples to the test for robustness

195da0a

wdevazelhes changed the title ~~[MRG] Update repo to work with both new and old scikit-learn~~ [WIP] Update repo to work with both new and old scikit-learn Apr 12, 2021

wdevazelhes commented Apr 12, 2021

View reviewed changes

wdevazelhes changed the title ~~[WIP] Update repo to work with both new and old scikit-learn~~ [MRG] Update repo to work with both new and old scikit-learn Apr 12, 2021

wdevazelhes requested review from perimosocordiae, terrytangyuan and bellet April 12, 2021 08:27

wdevazelhes commented Apr 12, 2021

View reviewed changes

terrytangyuan approved these changes Apr 12, 2021

View reviewed changes

add comment for additional travis test

0c780b0

perimosocordiae requested changes Apr 12, 2021

View reviewed changes

wdevazelhes changed the title ~~[MRG] Update repo to work with both new and old scikit-learn~~ [WIP] Update repo to work with both new and old scikit-learn Apr 13, 2021

wdevazelhes added 3 commits April 13, 2021 13:16

change str to dict for simplicity

bfea742

simplify conditional imports with sklearn_shims file

61755dd

remove typo added blankline

4dac77b

wdevazelhes commented Apr 13, 2021

View reviewed changes

wdevazelhes changed the title ~~[WIP] Update repo to work with both new and old scikit-learn~~ [MRG+1] Update repo to work with both new and old scikit-learn Apr 13, 2021

bellet approved these changes Apr 15, 2021

View reviewed changes

empty commit

ee856ad

wdevazelhes changed the title ~~[MRG+1] Update repo to work with both new and old scikit-learn~~ [MRG+2] Update repo to work with both new and old scikit-learn Apr 15, 2021

perimosocordiae approved these changes Apr 15, 2021

View reviewed changes

perimosocordiae merged commit 7eef7c6 into scikit-learn-contrib:master Apr 15, 2021

wdevazelhes mentioned this pull request Apr 16, 2021

New LMNN version #309

Open

[MRG+2] Update repo to work with both new and old scikit-learn #313

[MRG+2] Update repo to work with both new and old scikit-learn #313

Conversation

wdevazelhes commented Apr 7, 2021 • edited

wdevazelhes commented Apr 7, 2021

wdevazelhes commented Apr 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdevazelhes commented Apr 12, 2021

Choose a reason for hiding this comment

terrytangyuan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdevazelhes commented Apr 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdevazelhes commented Apr 13, 2021 • edited

Choose a reason for hiding this comment

wdevazelhes commented Apr 13, 2021

bellet left a comment

Choose a reason for hiding this comment

bellet commented Apr 15, 2021

wdevazelhes commented Apr 15, 2021 • edited

wdevazelhes commented Apr 15, 2021

bellet commented Apr 15, 2021

wdevazelhes commented Apr 15, 2021

perimosocordiae commented Apr 15, 2021

bellet commented Apr 15, 2021

wdevazelhes commented Apr 15, 2021

wdevazelhes commented Apr 15, 2021

wdevazelhes commented Apr 15, 2021

bellet commented Apr 15, 2021

wdevazelhes commented Apr 16, 2021

wdevazelhes commented Apr 7, 2021 •

edited

wdevazelhes commented Apr 13, 2021 •

edited

wdevazelhes commented Apr 15, 2021 •

edited