Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] allow parameter aliases for local_listen_port, num_threads, tree_learner (fixes #3671) #3789

Merged
merged 6 commits into from Jan 20, 2021

Conversation

jameslamb
Copy link
Collaborator

This PR closes #3671. It updates the code in lightgbm.dask so that it respects parameter aliases.

So, for example, as of this PR you can use tree_learner, tree, tree_type, or tree_learner_type to refer to the tree_learner parameter that controls which type of distributed training is performed.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 19, 2021

Hmmm, it doesn't look like GPU error is related to these Dask changes:

=================================== FAILURES ===================================
_________________ TestEngine.test_missing_value_handle_more_na _________________

self = <python_package_test.test_engine.TestEngine testMethod=test_missing_value_handle_more_na>

    def test_missing_value_handle_more_na(self):
        X_train = np.ones((100, 1))
        y_train = np.ones(100)
        trues = random.sample(range(100), 80)
        for idx in trues:
            X_train[idx, 0] = np.nan
            y_train[idx] = 0
        lgb_train = lgb.Dataset(X_train, y_train)
        lgb_eval = lgb.Dataset(X_train, y_train)
    
        params = {
            'metric': 'l2',
            'verbose': -1,
            'boost_from_average': False
        }
        evals_result = {}
        gbm = lgb.train(params, lgb_train,
                        num_boost_round=20,
                        valid_sets=lgb_eval,
                        verbose_eval=False,
                        evals_result=evals_result)
        ret = mean_squared_error(y_train, gbm.predict(X_train))
>       self.assertLess(ret, 0.005)
E       AssertionError: 0.008207460416181563 not less than 0.005

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for implementing my feature request! 🙂

Could please do the same with aliases here?

self.set_network(machines,
local_listen_port=params.get("local_listen_port", 12400),
listen_time_out=params.get("listen_time_out", 120),
num_machines=params.setdefault("num_machines", num_machines))

I think it OK to use the same PR for implementing network aliases in different placed of codebase. But feel free to disagree with me and say that it requires a separate PR or/and feature request.

python-package/lightgbm/dask.py Outdated Show resolved Hide resolved
python-package/lightgbm/dask.py Outdated Show resolved Hide resolved
python-package/lightgbm/dask.py Show resolved Hide resolved
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
@jameslamb
Copy link
Collaborator Author

Could please do the same with aliases here?

I'm happy to do this soon, but I'd like to do it as a separate PR. I have a preference for keeping the dask PRs small and quick, to prevent too much pain for @ffineis on #3708, and to make sure that the Dask interface keeps progressing towards something we're comfortable releasing with 3.2.0. You only get one chance to release code as "new" 😀

@StrikerRUS
Copy link
Collaborator

Looks like Dask tests are quite flaky:

/home/AzDevOps_azpcontainer/miniconda/envs/test-env/lib/python3.8/asyncio/tasks.py:490: TimeoutError

but I'd like to do it as a separate PR.

Sure, thanks! 👍

@jameslamb
Copy link
Collaborator Author

jameslamb commented Jan 20, 2021

Looks like Dask tests are quite flaky:

I'm realizing that too 😫 . @ffineis faced the same thing on #3708, and I think he's come up with a good solution for now on that PR. Also so see my link and comments in #3708 (comment) think we might get better reliability from switching to the @gen_cluster fixture from Dask distributed. I'm planning to test that once #3708 is merged.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@jameslamb jameslamb merged commit d107872 into microsoft:master Jan 20, 2021
@jameslamb jameslamb deleted the fix/param-aliases branch January 20, 2021 17:21
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dask] support parameter aliases in Dask API
2 participants