[dask] merge local_predict tests into other tests (fixes #3833) #3842

ShrillShrestha · 2021-01-24T17:10:08Z

ghost · 2021-01-24T17:10:22Z

All CLA requirements met.

…efactor #3833

jameslamb

Thanks for your interest in LightGBM and your contribution! Can you please explain why you needed to add an additional _create_data() call and an additional .fit() in the classifier test? That should not be necessary.

jameslamb · 2021-01-25T02:48:19Z

tests/python_package_test/test_dask.py

+    X_1, y_1, w_1, dX_1, dy_1, dw_1 = _create_data(
+        objective='classification',
+        output='array'
+    )


Why was this necessary? You should just use the dask_classifier defined below this. With this change, you'd only be doing the local predict on arrays each time, but we want to test on all of DataFrame, Array, and sparse matrix.

Initially, I tried to reference the prediction inside test_classifier with p1_local=dask_classifier.to_local().predict(dX) for p1 of test_classifier_local_predict, but this line gave me two errors, both saying ValueError: Expected 2D array, got scalar array instead while running the tests.
I thought it was because of the difference we have in the output params in

X, y, w, dX, dy, dw = _create_data( objective='classification', output=output, centers=centers )

and

X, y, w, dX, dy, dw = _create_data( objective='classification', output='array' )

in test_classifier and test_classifier_local_predict accordingly.
I also tried reshaping the array with
p1_local=dask_classifier.to_local().predict(dX.reshape(1, 1), p1_local=dask_classifier.to_local().predict(dX.reshape(1, -1), p1_local=dask_classifier.to_local().predict(dX.reshape(-1, 1)
each at a time and running the test but it cause other test cases to fail as well.
Is there any way to resolve it? Can you help me with it? @jameslamb

dask_classifier.to_local() produces a lightgbm.sklearn.LGBMClassifier. Pass it X (a non-Dask data structure) instead of dX(a Dask data structure) and it should work without needing to have this extra _create_data()

Thanks, it worked. I also renamed the variables. you can tell me if I need any other changes?

jameslamb · 2021-01-25T02:49:16Z

tests/python_package_test/test_dask.py

+    p2_local_predict = dask_regressor.to_local().predict(X)
+    s2_local_predict = dask_regressor.to_local().score(X, y)


Suggested change

p2_local_predict = dask_regressor.to_local().predict(X)

s2_local_predict = dask_regressor.to_local().score(X, y)

p2_local = dask_regressor.to_local().predict(X)

s2_local = dask_regressor.to_local().score(X, y)

Please simplify these names a bit (and elsewhere in the PR) by removing the "_predict" suffix

jameslamb

I left a few other minor requested changes. Once you do this, I think this will be good to merge.

Really appreciate you taking the time to contribute!

jameslamb · 2021-01-25T05:09:04Z

tests/python_package_test/test_dask.py

+    p2_local = dask_regressor.to_local().predict(X)
+    s2_local = dask_regressor.to_local().score(X, y)


Suggested change

p2_local = dask_regressor.to_local().predict(X)

s2_local = dask_regressor.to_local().score(X, y)

p1_local = dask_regressor.to_local().predict(X)

s1_local = dask_regressor.to_local().score(X, y)

Please call these p1_local and p2_local. We have a convention in these tests right now that p1 references predictions from a model trained with DaskLGBMRegressor and the p2 references things from lightgbm.sklearn.LGBMRegressor. I know that's confusing and we should probably change it, but for now this test should be consistent with the existing code.

jameslamb · 2021-01-25T05:09:20Z

tests/python_package_test/test_dask.py

@@ -142,9 +142,11 @@ def test_classifier(output, centers, client, listen_port):
        local_listen_port=listen_port,
        **params
    )
+


Suggested change

please revert this unrelated whitespace change

jameslamb · 2021-01-25T05:09:44Z

tests/python_package_test/test_dask.py

@@ -153,12 +155,13 @@ def test_classifier(output, centers, client, listen_port):
    p2 = local_classifier.predict(X)
    p2_proba = local_classifier.predict_proba(X)
    s2 = local_classifier.score(X, y)
-


please revert this unrelated whitespace change and add this line back

jameslamb · 2021-01-25T05:10:36Z

tests/python_package_test/test_dask.py

@@ -428,6 +382,7 @@ def test_ranker(output, client, listen_port, group):
    dask_ranker = dask_ranker.fit(dX, dy, sample_weight=dw, group=dg, client=client)
    rnkvec_dask = dask_ranker.predict(dX)
    rnkvec_dask = rnkvec_dask.compute()
+    rnkvec_local_1 = dask_ranker.to_local().predict(X)


Suggested change

rnkvec_local_1 = dask_ranker.to_local().predict(X)

rnkvec_dask_local = dask_ranker.to_local().predict(X)

Please change this name so it's clearer how it differs from rnkvec_local

jameslamb · 2021-01-25T05:10:50Z

tests/python_package_test/test_dask.py

-
-    # distributed and to-local scores should be the same.
-    assert_eq(rnkvec_dask, rnkvec_local)
+    assert_eq(rnkvec_dask, rnkvec_local_1)


Suggested change

assert_eq(rnkvec_dask, rnkvec_local_1)

assert_eq(rnkvec_dask, rnkvec_dask_local)

jameslamb · 2021-01-25T05:11:05Z

tests/python_package_test/test_dask.py

@@ -300,10 +276,12 @@ def test_regressor(output, client, listen_port):
    # Scores should be the same
    if output != 'dataframe':
        assert_eq(s1, s2, atol=.01)
+        assert_eq(s1, s2_local)


Suggested change

assert_eq(s1, s2_local)

assert_eq(s1, s1_local)

jameslamb · 2021-01-25T05:11:14Z

tests/python_package_test/test_dask.py


    # Predictions should be roughly the same
    assert_eq(y, p1, rtol=1., atol=100.)
    assert_eq(y, p2, rtol=1., atol=50.)
+    assert_eq(p1, p2_local)


Suggested change

assert_eq(p1, p2_local)

assert_eq(p1, p1_local)

jameslamb

Thanks so much! I just added a commit resolving conflicts with some recently-merged stuff on master. I'll merge this once it builds.

We hope you'll contribute more to LightGBM in the future!

ShrillShrestha · 2021-01-25T16:06:06Z

Thank you @jameslamb for having your patience with me. I got to learn alot form you just by working on this issue. I'll look forward to contribute more in future. :)

ShrillShrestha · 2021-01-25T17:46:40Z

tests/python_package_test/test_dask.py

@@ -276,12 +276,12 @@ def test_regressor(output, client, listen_port):
    # Scores should be the same
    if output != 'dataframe':
        assert_eq(s1, s2, atol=.01)
-        assert_eq(s1, s2_local)
+        assert_eq(s1, s1_local)


I saw that one of the tests is failing. I checked the log, and this is the line that is giving an error. Is it because I didn't pass an atol key argument?

yes I think so. I can see in the logs that the differences are very small, maybe small enough to be due to precision differences

For example, from https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=8740&view=logs&j=c28dceab-947a-5848-c21f-eef3695e5f11&t=fa158246-17e2-53d4-5936-86070edbaacf

a_original = 0.9555812336527378, b_original = 0.9555698440069504

Please add atol = 0.003 here for now. That will be enough to catch major errors. We have an open issue (#3835) to come back in the future and investigate how to close these gaps.

Note that I already resolved conflicts with master on your branch. So you should pull this branch fresh before working on it.

git branch -D reduce-tests git fetch origin reduce-tests git checkout reduce-tests

I have made the change. It should work now.

jameslamb

looks good, thank you very much for your contribution

ShrillShrestha · 2021-01-25T19:26:20Z

Thanks, @jameslamb for helping me out.

github-actions · 2023-08-24T02:16:56Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

Merge test_<est>_local_predict and test_<est> tests for Dask module ##…

3eb3605

…3833

ShrillShrestha requested a review from jameslamb as a code owner January 24, 2021 17:10

Merge test_<est>_local_predict to test_<est> tests in dask module - r…

a4b56c5

…efactor #3833

jameslamb added the maintenance label Jan 25, 2021

jameslamb requested changes Jan 25, 2021

View reviewed changes

jameslamb changed the title ~~Merge test_<est>_local_predict and test_<est> tests for Dask module #…~~ [dask] merge local_predict tests into other tests (fixes #3833) Jan 25, 2021

Update test_classifier and rename variables resolves #3833

9c7617a

jameslamb self-requested a review January 25, 2021 05:11

jameslamb requested changes Jan 25, 2021

View reviewed changes

Shrill Shrestha and others added 2 commits January 24, 2021 23:28

rename variables resolves #3833

97d2e8d

Merge branch 'master' into reduce-tests

b5e26d9

jameslamb approved these changes Jan 25, 2021

View reviewed changes

jameslamb self-requested a review January 25, 2021 17:29

ShrillShrestha commented Jan 25, 2021

View reviewed changes

manage precision error #3833

c2582a5

jameslamb approved these changes Jan 25, 2021

View reviewed changes

jameslamb merged commit 02e4b79 into microsoft:master Jan 25, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] merge local_predict tests into other tests (fixes #3833) #3842

[dask] merge local_predict tests into other tests (fixes #3833) #3842

ShrillShrestha commented Jan 24, 2021

ghost commented Jan 24, 2021 •

edited by ghost

jameslamb left a comment

jameslamb Jan 25, 2021

ShrillShrestha Jan 25, 2021 •

edited

jameslamb Jan 25, 2021

ShrillShrestha Jan 25, 2021

jameslamb Jan 25, 2021

jameslamb left a comment

jameslamb Jan 25, 2021

jameslamb Jan 25, 2021

jameslamb Jan 25, 2021

jameslamb Jan 25, 2021

jameslamb Jan 25, 2021

jameslamb Jan 25, 2021

jameslamb Jan 25, 2021

jameslamb left a comment

ShrillShrestha commented Jan 25, 2021

ShrillShrestha Jan 25, 2021

jameslamb Jan 25, 2021

jameslamb Jan 25, 2021

ShrillShrestha Jan 25, 2021

jameslamb left a comment

ShrillShrestha commented Jan 25, 2021

github-actions bot commented Aug 24, 2023

		p2_local_predict = dask_regressor.to_local().predict(X)
		s2_local_predict = dask_regressor.to_local().score(X, y)

		p2_local = dask_regressor.to_local().predict(X)
		s2_local = dask_regressor.to_local().score(X, y)

	rnkvec_local_1 = dask_ranker.to_local().predict(X)
	rnkvec_dask_local = dask_ranker.to_local().predict(X)

	assert_eq(rnkvec_dask, rnkvec_local_1)
	assert_eq(rnkvec_dask, rnkvec_dask_local)

[dask] merge local_predict tests into other tests (fixes #3833) #3842

[dask] merge local_predict tests into other tests (fixes #3833) #3842

Conversation

ShrillShrestha commented Jan 24, 2021

ghost commented Jan 24, 2021 • edited by ghost

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShrillShrestha Jan 25, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

ShrillShrestha commented Jan 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

ShrillShrestha commented Jan 25, 2021

github-actions bot commented Aug 24, 2023

ghost commented Jan 24, 2021 •

edited by ghost

ShrillShrestha Jan 25, 2021 •

edited