Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Update repo to work with both new and old scikit-learn #313

Merged

Conversation

wdevazelhes
Copy link
Member

@wdevazelhes wdevazelhes commented Apr 7, 2021

Fixes #311

  • I added the workaround suggested by @bellet here: Update metric-learn to work with the most recent scikit-learn when doing pytest #311 (comment), so that imports work both in sklearn <0.24 and >= 0.24 (EDIT: actually the old import strategy was already deprecated in 0.22, so I compared the version to 0.22)
  • as well as an additional travis job to test (I chose the one with python 3.6 since it's the oldest python so I thought if someting goes wrong it might be this one, and also with skggm, since I also I'm thinking something might have more chance to go wrong when using another package... I could have put all the checks (python3.6+3.7, with/without skggm), but then I thought the travis test suite might take quite some time)

@wdevazelhes
Copy link
Member Author

Actually, tests are failing because of some LMNN problem similar to the discussion in #309 : a test is run on LMNN with classes that have too few labels.
The fact that this pb happens in only some travis tests could be due to some different random states depending on the order/quantity of runned tests depending on the particular travis test (with/without skggm) (+ I guess the change of library versions wrt to the past could maybe impact the seed? otherwise I don't know why tests were passing before) so I set the random seed in the failing test so that they should all either fail or pass
After having a consistent behaviour we can decide what to do (either adapt the test to have big enough classes, or adapt LMNN to accept classes with few labels)
I'll react on #309 accordingly since the decision should also impact the new LMNN PR (for instance we can decide to merge it as is since actually the current LMNN is even more strict than the newer PR (it needs at least n_neighbors + 1 elements in each class, not just 2), so the new PR would actually allow to deal with more cases, and we could deal with singleton classes in another PR)

@wdevazelhes
Copy link
Member Author

Ok so there's just the travis test with the old scikit learn that is failing, because of the string representation of estimators, I'll look into that
But at least fixing the random seed indeed fixed the other tests (temporarily since we should not depend on the random seed)

@wdevazelhes wdevazelhes changed the title [WIP] Update repo to work with both new and old scikit-learn [MRG] Update repo to work with both new and old scikit-learn Apr 12, 2021
@wdevazelhes wdevazelhes changed the title [MRG] Update repo to work with both new and old scikit-learn [WIP] Update repo to work with both new and old scikit-learn Apr 12, 2021


def remove_spaces(s):
return re.sub(r'\s+', '', s)


def sk_repr_kwargs(def_kwargs, nndef_kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I thought it could be good to test the str repr for both the old (<=0.22) sklearn version and the newer ones, so I made a string representation that depends on the sklearn version

@@ -121,7 +131,8 @@ def test_array_like_inputs(estimator, build_dataset, with_preprocessor):

# we subsample the data for the test to be more efficient
input_data, _, labels, _ = train_test_split(input_data, labels,
train_size=20)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test was failing here because of classes with too few labels in LMNN (see comments above), so in the end I created a toy example with a bit more samples (which I guess makes sense because the role of this particular test is not to test edge cases, but rather the fact that array-like objects work with our estimators),

@wdevazelhes
Copy link
Member Author

I guess the PR is ready to merge :)

@wdevazelhes wdevazelhes changed the title [WIP] Update repo to work with both new and old scikit-learn [MRG] Update repo to work with both new and old scikit-learn Apr 12, 2021

def test_scml(self):
check_estimator(SCML_Supervised)
check_estimator(SCML_Supervised())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here scikit-learn had return an error saying that checks should be run on estimators instances, not classes

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment. Otherwise LGTM

@@ -39,6 +39,18 @@ matrix:
- pytest test --cov;
after_success:
- bash <(curl -s https://codecov.io/bash)
- name: "Pytest python 3.6 with skggm + scikit-learn 0.20.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a note here to clarify this additional test's purpose

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, done

@wdevazelhes
Copy link
Member Author

Thanks for the review @terrytangyuan! I adressed your comment

def strify(obj):
"""Function to add the quotation marks if the object
is a string"""
if type(obj) is str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(obj, str)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the inputs we expect to pass, it might be simpler to call json.dumps(obj) instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, thanks ! Will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually there was a pb for booleans (json.dumps casts them to lowercase, i.e. json.dumps(True) returns true), but I found online that repr can do the same without this pb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""
if skversion >= version.parse('0.22.0'):
def_kwargs = ""
nndef_kwargs = eval(f"dict({nndef_kwargs})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using eval, could we pass actual dicts here instead? The values are all simple literals, so it should be easier that way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

from sklearn.utils.testing import set_random_state
import sklearn
from packaging import version
if version.parse(sklearn.__version__) >= version.parse('0.22.0'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something we could do once during module initialization, then have a global constant like SKLEARN_AT_LEAST_0_22 that we could use.

Or make a sklearn_shims.py file that does the conditional import, and have everything else import from that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wdevazelhes wdevazelhes changed the title [MRG] Update repo to work with both new and old scikit-learn [WIP] Update repo to work with both new and old scikit-learn Apr 13, 2021
@wdevazelhes
Copy link
Member Author

wdevazelhes commented Apr 13, 2021

Thanks for the review @perimosocordiae ! I'll do the changes and set the PR to [MRG] as soon as I finish

from sklearn.utils.estimator_checks import is_public_parameter
from sklearn.metrics.scorer import get_scorer

__all__ = ['set_random_state', 'assert_warns_message', 'set_random_state',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writing this __all__ lines were necessary for flake8 not to return errors

@wdevazelhes wdevazelhes changed the title [WIP] Update repo to work with both new and old scikit-learn [MRG+1] Update repo to work with both new and old scikit-learn Apr 13, 2021
@wdevazelhes
Copy link
Member Author

@perimosocordiae I adressed your comments, I guess the PR is now ready to merge :)

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for merging, thanks @wdevazelhes!

@bellet
Copy link
Member

bellet commented Apr 15, 2021

Somehow we do not see all checks (it happens sometimes), maybe do an empty commit to make sure @wdevazelhes?

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Apr 15, 2021

Thanks for the review @bellet !

I tried to do an empty commit but we still see 1 line for the check, indeed it's weird... Although clicking on details we see all checks, so maybe there was a change of display of travis checks on github ?

@wdevazelhes wdevazelhes changed the title [MRG+1] Update repo to work with both new and old scikit-learn [MRG+2] Update repo to work with both new and old scikit-learn Apr 15, 2021
@wdevazelhes
Copy link
Member Author

@perimosocordiae, do you approve my updates after your review ?

@bellet
Copy link
Member

bellet commented Apr 15, 2021

Thanks for the review @bellet !

I tried to do an empty commit but we still see 1 line for the check, indeed it's weird... Although clicking on details we see all checks, so maybe there was a change of display of travis checks on github ?

We don't see test coverage though for some reason

@wdevazelhes
Copy link
Member Author

We don't see test coverage though for some reason

Really ? If I click on "show all checks" and then "details", here is what I see:

image

@perimosocordiae perimosocordiae merged commit 7eef7c6 into scikit-learn-contrib:master Apr 15, 2021
@perimosocordiae
Copy link
Contributor

Merged, thanks!

@bellet
Copy link
Member

bellet commented Apr 15, 2021

We don't see test coverage though for some reason

Really ? If I click on "show all checks" and then "details", here is what I see:

image

Where do you see test coverage (codecov)?

@wdevazelhes
Copy link
Member Author

Where do you see test coverage (codecov)?

Ah yes that's right sorry ! Actually they sent me an email telling me that their bash uploader was hacked: https://about.codecov.io/security-update/?utm_medium=email&utm_source=marketo&ajs_uid=2386266&ajs_event=Clicked%20Email&mkt_tok=MzMyLUxWWC03NDEAAAF8dJ5mOA5H1E5ORbGsBiNKeiALhIlihm9VPW70i0qLTSuToRz_FC1YHHXfXcvCiatj0yHQxAfkDKCri3j9ksaM0_8pH36YziLCqWSOcdIt
So I guess the pb is probably because of that, I'll try to look at it see if they say any guideline for fixing the pb

@wdevazelhes
Copy link
Member Author

Merged, thanks!

Thanks @perimosocordiae !
I'll try to look into the codecov pb noticed by @bellet (see above), to double check that this PR indeed checks the coverage requirement

@wdevazelhes
Copy link
Member Author

It seems the codecov is OK here is what I see on codecov in the "pull" section:

image

@bellet
Copy link
Member

bellet commented Apr 15, 2021

Where do you see test coverage (codecov)?

Ah yes that's right sorry ! Actually they sent me an email telling me that their bash uploader was hacked: https://about.codecov.io/security-update/?utm_medium=email&utm_source=marketo&ajs_uid=2386266&ajs_event=Clicked%20Email&mkt_tok=MzMyLUxWWC03NDEAAAF8dJ5mOA5H1E5ORbGsBiNKeiALhIlihm9VPW70i0qLTSuToRz_FC1YHHXfXcvCiatj0yHQxAfkDKCri3j9ksaM0_8pH36YziLCqWSOcdIt
So I guess the pb is probably because of that, I'll try to look at it see if they say any guideline for fixing the pb

Good catch! They have guidelines on what to do on the above page

@wdevazelhes
Copy link
Member Author

I opened a draft PR here #314, running their recommended env command, and it doesn't seem to leak any sensitive information (what env prints can be seen for instance here ), (I was wondering about the SSH_CONNECTION variable that is printed but I checked here and it contains no password just IP adresses and port numbers)

As a precaution, on app.codecov.io, I clicked on "Regenerate" for the "Repository Upload Token", but I don't even think it had any effect, since I searched for "codecov" in our code, and I see nowhere any token

So I think we are safe,

However codecov still doesn't appear in the "all checks" of the test PR #314, so I think we can wait a little bit: they may be still fixing problems
In the meantime we can still check manually the code coverage for next PRs directly on the codecov website.

@wdevazelhes wdevazelhes mentioned this pull request Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update metric-learn to work with the most recent scikit-learn when doing pytest
4 participants