Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Address flaky test_ranker tests #3819

Merged
merged 1 commit into from
Jan 22, 2021
Merged

[dask] Address flaky test_ranker tests #3819

merged 1 commit into from
Jan 22, 2021

Conversation

ffineis
Copy link
Contributor

@ffineis ffineis commented Jan 22, 2021

Addresses issue #3817. Correlation between dask's ranker scores and locally-trained ranker scores is stochastic, despite our best efforts to make deterministic _make_ranking data and using seed=42 throughout DaskLGBM* tests.

I ran the test 100 times and this is the observed distribution:
spearmanr_dist

@StrikerRUS observed a run achieving only 0.86 correlation.
So lowering the threshold to a more conservative 0.75.

This is a stopgap - ideally there is a way to distribute data deterministically to dask workers in tests.

@jameslamb jameslamb changed the title Address flaky test_ranker tests [dask] Address flaky test_ranker tests Jan 22, 2021
@jameslamb
Copy link
Collaborator

don't worry about the appveyor failures (https://ci.appveyor.com/project/guolinke/lightgbm/builds/37402833), they're unrelated and I'll do a rebuild which should fix them

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree with this stopgap for now, to not block other non-Dask development on this project. Azure DevOps doesn't allow us to re-run individual failed jobs (you have to re-run ALL of them), so having a flaky test there is really expensive.

@StrikerRUS
Copy link
Collaborator

@jameslamb

Azure DevOps doesn't allow us to re-run individual failed jobs (you have to re-run ALL of them), so having a flaky test there is really expensive.

Actually, you can re-run failed jobs. It's really useful. 🙂 Thanks Azure, and shame on GitHub Actions where you really need to re-run all jobs.

But agree that any re-run costs a lot, because each job runs ~15 mins.

image

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffineis Thanks a lot for prompt workaround!

Hope we will be able to find

a way to distribute data deterministically to dask workers in tests.

@StrikerRUS StrikerRUS merged commit bf22a25 into microsoft:master Jan 22, 2021
@jameslamb
Copy link
Collaborator

Guess my account on Azure DevOps doesn't have permission to do that, because I don't see that button. I tried to access recent failed jobs (while signed in with my GitHub, using "sign in with GitHub"), and keep getting kicked with this error

{"$id":"1","innerException":null,"message":"TF400813: The user 'Windows Live ID\JayLamb20@gmail.com' is not authorized to access this resource.","typeName":"Microsoft.TeamFoundation.Framework.Server.UnauthorizedRequestException, Microsoft.TeamFoundation.Framework.Server","typeKey":"UnauthorizedRequestException","errorCode":0,"eventId":3000}

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants