Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] merge local_predict tests into other tests (fixes #3833) #3842

Merged
merged 6 commits into from
Jan 25, 2021
Merged

[dask] merge local_predict tests into other tests (fixes #3833) #3842

merged 6 commits into from
Jan 25, 2021

Conversation

ShrillShrestha
Copy link
Contributor

#3833

@ghost
Copy link

ghost commented Jan 24, 2021

CLA assistant check
All CLA requirements met.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in LightGBM and your contribution! Can you please explain why you needed to add an additional _create_data() call and an additional .fit() in the classifier test? That should not be necessary.

Comment on lines 136 to 139
X_1, y_1, w_1, dX_1, dy_1, dw_1 = _create_data(
objective='classification',
output='array'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this necessary? You should just use the dask_classifier defined below this. With this change, you'd only be doing the local predict on arrays each time, but we want to test on all of DataFrame, Array, and sparse matrix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I tried to reference the prediction inside test_classifier with p1_local=dask_classifier.to_local().predict(dX) for p1 of test_classifier_local_predict, but this line gave me two errors, both saying ValueError: Expected 2D array, got scalar array instead while running the tests.
I thought it was because of the difference we have in the output params in

X, y, w, dX, dy, dw = _create_data(
        objective='classification',
        output=output,
        centers=centers
    )

and

X, y, w, dX, dy, dw = _create_data(
        objective='classification',
        output='array'
    )

in test_classifier and test_classifier_local_predict accordingly.
I also tried reshaping the array with
p1_local=dask_classifier.to_local().predict(dX.reshape(1, 1), p1_local=dask_classifier.to_local().predict(dX.reshape(1, -1), p1_local=dask_classifier.to_local().predict(dX.reshape(-1, 1)
each at a time and running the test but it cause other test cases to fail as well.
Is there any way to resolve it? Can you help me with it? @jameslamb

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dask_classifier.to_local() produces a lightgbm.sklearn.LGBMClassifier. Pass it X (a non-Dask data structure) instead of dX(a Dask data structure) and it should work without needing to have this extra _create_data()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it worked. I also renamed the variables. you can tell me if I need any other changes?

Comment on lines 281 to 282
p2_local_predict = dask_regressor.to_local().predict(X)
s2_local_predict = dask_regressor.to_local().score(X, y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
p2_local_predict = dask_regressor.to_local().predict(X)
s2_local_predict = dask_regressor.to_local().score(X, y)
p2_local = dask_regressor.to_local().predict(X)
s2_local = dask_regressor.to_local().score(X, y)

Please simplify these names a bit (and elsewhere in the PR) by removing the "_predict" suffix

@jameslamb jameslamb changed the title Merge test_<est>_local_predict and test_<est> tests for Dask module #… [dask] merge local_predict tests into other tests (fixes #3833) Jan 25, 2021
@jameslamb jameslamb self-requested a review January 25, 2021 05:11
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few other minor requested changes. Once you do this, I think this will be good to merge.

Really appreciate you taking the time to contribute!

Comment on lines 268 to 269
p2_local = dask_regressor.to_local().predict(X)
s2_local = dask_regressor.to_local().score(X, y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
p2_local = dask_regressor.to_local().predict(X)
s2_local = dask_regressor.to_local().score(X, y)
p1_local = dask_regressor.to_local().predict(X)
s1_local = dask_regressor.to_local().score(X, y)

Please call these p1_local and p2_local. We have a convention in these tests right now that p1 references predictions from a model trained with DaskLGBMRegressor and the p2 references things from lightgbm.sklearn.LGBMRegressor. I know that's confusing and we should probably change it, but for now this test should be consistent with the existing code.

@@ -142,9 +142,11 @@ def test_classifier(output, centers, client, listen_port):
local_listen_port=listen_port,
**params
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

please revert this unrelated whitespace change

@@ -153,12 +155,13 @@ def test_classifier(output, centers, client, listen_port):
p2 = local_classifier.predict(X)
p2_proba = local_classifier.predict_proba(X)
s2 = local_classifier.score(X, y)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert this unrelated whitespace change and add this line back

@@ -428,6 +382,7 @@ def test_ranker(output, client, listen_port, group):
dask_ranker = dask_ranker.fit(dX, dy, sample_weight=dw, group=dg, client=client)
rnkvec_dask = dask_ranker.predict(dX)
rnkvec_dask = rnkvec_dask.compute()
rnkvec_local_1 = dask_ranker.to_local().predict(X)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rnkvec_local_1 = dask_ranker.to_local().predict(X)
rnkvec_dask_local = dask_ranker.to_local().predict(X)

Please change this name so it's clearer how it differs from rnkvec_local


# distributed and to-local scores should be the same.
assert_eq(rnkvec_dask, rnkvec_local)
assert_eq(rnkvec_dask, rnkvec_local_1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_eq(rnkvec_dask, rnkvec_local_1)
assert_eq(rnkvec_dask, rnkvec_dask_local)

@@ -300,10 +276,12 @@ def test_regressor(output, client, listen_port):
# Scores should be the same
if output != 'dataframe':
assert_eq(s1, s2, atol=.01)
assert_eq(s1, s2_local)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_eq(s1, s2_local)
assert_eq(s1, s1_local)


# Predictions should be roughly the same
assert_eq(y, p1, rtol=1., atol=100.)
assert_eq(y, p2, rtol=1., atol=50.)
assert_eq(p1, p2_local)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_eq(p1, p2_local)
assert_eq(p1, p1_local)

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much! I just added a commit resolving conflicts with some recently-merged stuff on master. I'll merge this once it builds.

We hope you'll contribute more to LightGBM in the future!

@ShrillShrestha
Copy link
Contributor Author

Thank you @jameslamb for having your patience with me. I got to learn alot form you just by working on this issue. I'll look forward to contribute more in future. :)

@jameslamb jameslamb self-requested a review January 25, 2021 17:29
@@ -276,12 +276,12 @@ def test_regressor(output, client, listen_port):
# Scores should be the same
if output != 'dataframe':
assert_eq(s1, s2, atol=.01)
assert_eq(s1, s2_local)
assert_eq(s1, s1_local)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that one of the tests is failing. I checked the log, and this is the line that is giving an error. Is it because I didn't pass an atol key argument?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think so. I can see in the logs that the differences are very small, maybe small enough to be due to precision differences

For example, from https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=8740&view=logs&j=c28dceab-947a-5848-c21f-eef3695e5f11&t=fa158246-17e2-53d4-5936-86070edbaacf

a_original = 0.9555812336527378, b_original = 0.9555698440069504

Please add atol = 0.003 here for now. That will be enough to catch major errors. We have an open issue (#3835) to come back in the future and investigate how to close these gaps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I already resolved conflicts with master on your branch. So you should pull this branch fresh before working on it.

git branch -D reduce-tests
git fetch origin reduce-tests
git checkout reduce-tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change. It should work now.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thank you very much for your contribution

@jameslamb jameslamb merged commit 02e4b79 into microsoft:master Jan 25, 2021
@ShrillShrestha
Copy link
Contributor Author

Thanks, @jameslamb for helping me out.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants