Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fixing circular import in daal4py/sklearnex device_offloading #1832

Merged
merged 35 commits into from
Jun 27, 2024

Conversation

samir-nasibli
Copy link
Contributor

@samir-nasibli samir-nasibli commented May 13, 2024

Description

Fixing circular import in daal4py/sklearnex device offloading.
The best way to resolve the issue here is to refactor the code to avoid it. This PR rethinking the design of sklearnex/daal4py/onedal4py import modules and their dependencies.

Changes proposed in the PR:

  • adding _config for onedal4py, just for exposing some sklearnex's config settings into onedal4py level
  • remove circular import in daal4py/onedal4py/sklearnex:
    • removing _device_offloading module from daal4py, since after KMeans OOP #1770 and adding ENH: Adding Ridge Regression support into sklearnex.preview #1843 there is no need for GPU offloading via daal4py syc_context.
    • most of device_ofloading functionality moved to onedal4py level.
    • created _config module in onedal4py, for exposing some sklearnex config setting into onedal4py level, reused it on sklearnex level
    • sklearnex depends on onedal4py _config and _device_oflload modules.

image

Note: It is expected that it will not offload GPU validation Kmeans on main branch, since they are using daal4py GPU offloading, that is removed to call.
#1770 is considered fixe for this.

TODO

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes, if necessary (updated in # - add PR number)
  • The unit tests pass successfully.
  • I have run it locally and tested the changes extensively.
  • I have resolved any merge conflicts that might occur with the base branch.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details)
  • I have added a respective label(s) to PR if I have a permission for that.

@samir-nasibli samir-nasibli added the bug Something isn't working label May 13, 2024
@samir-nasibli
Copy link
Contributor Author

/intelci: run

@samir-nasibli
Copy link
Contributor Author

/intelci: run

@samir-nasibli samir-nasibli marked this pull request as ready for review June 8, 2024 23:44
Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case we drop in the daal4py oneapi interfaces, is this movement of device_offload to daal4py necessary? Couldn't we just delete the sklearnex import of various aspects in daal4py/sklearn/_device_offload.py and it would remove the circular import problem? If true, how far away are we from being able to do that. As far as I see, none of the imported estimators from daal4py in sklearnex outside of Ridge can offload to GPU via daal4py. Its a question over aspects of:
Ridge
Lasso
ElasticNet
PairwiseDistances
train_test_split
roc_auc_score

or am I missing something?

In general just some minor naming convention stuff. Would like the thoughts of other reviewers on that.

daal4py/sklearn/_config.py Outdated Show resolved Hide resolved
daal4py/sklearn/_config.py Outdated Show resolved Hide resolved
daal4py/sklearn/_config.py Outdated Show resolved Hide resolved
daal4py/sklearn/_device_offload.py Outdated Show resolved Hide resolved
@samir-nasibli
Copy link
Contributor Author

In the case we drop in the daal4py oneapi interfaces, is this movement of device_offload to daal4py necessary? Couldn't we just delete the sklearnex import of various aspects in daal4py/sklearn/_device_offload.py and it would remove the circular import problem? If true, how far away are we from being able to do that. As far as I see, none of the imported estimators from daal4py in sklearnex outside of Ridge can offload to GPU via daal4py. Its a question over aspects of: Ridge Lasso ElasticNet PairwiseDistances train_test_split roc_auc_score

or am I missing something?

In general just some minor naming convention stuff. Would like the thoughts of other reviewers on that.

In the case we drop in the daal4py oneapi interfaces, is this movement of device_offload to daal4py necessary? Couldn't we just delete the sklearnex import of various aspects in daal4py/sklearn/_device_offload.py and it would remove the circular import problem? If true, how far away are we from being able to do that. As far as I see, none of the imported estimators from daal4py in sklearnex outside of Ridge can offload to GPU via daal4py. Its a question over aspects of: Ridge Lasso ElasticNet PairwiseDistances train_test_split roc_auc_score

or am I missing something?

In general just some minor naming convention stuff. Would like the thoughts of other reviewers on that.

@icfaust Thank you for the review!
I think there are several things mixed up here.
Perhaps this is a lack of of my PR description. I want to bring some clarity.

  • This PR tries to keep everything as it is. It does not imply changing anything in flow device offloading.
  • For oneapi iface we have onedal4py+sklearnex. If we drop daal4py, then all these necessary things will simply be transferred to the sklearnex _device_offload module, since daal4py will no longer be needed.

Some minor updates are required, but overall PR is ready

@samir-nasibli
Copy link
Contributor Author

As far as I see, none of the imported estimators from daal4py in sklearnex outside of Ridge can offload to GPU via daal4py. Its a question over aspects of: Ridge Lasso ElasticNet PairwiseDistances train_test_split roc_auc_score

or am I missing something?

If I understood the question correctly, and if you asking why we need daal4py sycl_context, so then actually Kmeans and other estimators from daal4py.sklearn that is used on sklearnex main branch does GPU offloading via sycl_context. so that is why we need to leave it as is.

If anything requires drop I would like to do this reduction in the next steps.

@samir-nasibli
Copy link
Contributor Author

samir-nasibli commented Jun 10, 2024

Dear reviewers,
In my PR, I tried to get rid of circular imports. so I slightly changed the import hierarchy in daal4py/onedal4py/sklearnex namespaces.
But @icfaust raised an interesting question.
If we look at this issue more broadly, should we generally retain the support_usm_ndarray functionality for daal4py in general?

Which estimator with daal4py backend do we use for GPU offloading? correct me if nothing other than Kmeans. Kmeans is in the active phase of moving out preview #1770, which should happen in the near future, perhaps already before the next release. In that case, does it make sense to support daal4py for usm_ndarray interop at all? if not, then it makes sense to make changes in my PR and transfer the necessary things for device offloading to onedal4py backend and reuse it in sklearnex.
Removing this from daal4py._device_offloading and adding it to onedal4py._device_offloading also solves the circular imports problem.

After the discussion we will decide to integrate is as is, or update it for onedal4py backend only.

@samir-nasibli
Copy link
Contributor Author

samir-nasibli commented Jun 11, 2024

Dear reviewers,

I am returning this PR to development.
Generally I will remove all usm_ndarray support from daal4py and use sklearnex device offloading for daal4py calls. All device_offload primitives will be located in the onedal4py module.
Since Kmeans only uses daal4py's GPU offloading, it is only expected that these tests will fail until Kmeans with onedal4py is integrated #1770

@samir-nasibli samir-nasibli marked this pull request as draft June 11, 2024 07:43
daal4py/sklearn/linear_model/_coordinate_descent.py Outdated Show resolved Hide resolved
daal4py/sklearn/linear_model/_coordinate_descent.py Outdated Show resolved Hide resolved
daal4py/sklearn/linear_model/_ridge.py Outdated Show resolved Hide resolved
sklearnex/linear_model/coordinate_descent.py Show resolved Hide resolved
sklearnex/linear_model/coordinate_descent.py Show resolved Hide resolved
sklearnex/dispatcher.py Show resolved Hide resolved
remove unnecessary docstrings assignments
@samir-nasibli
Copy link
Contributor Author

/azp run CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@samir-nasibli
Copy link
Contributor Author

/intelci: run

@samir-nasibli
Copy link
Contributor Author

samir-nasibli commented Jun 21, 2024

@ethanglaser @icfaust @Alexsandruss thank you for the review!
Dear reviewers I have addressed all your comments. Please mark comments as resolved if any.
Assuming green CI.

@ethanglaser
Copy link
Contributor

No major objections from my side at this point, looking good, seems like the queue/device offload will functionally remain the same. One remaining question I have is what separates functionality defined in onedal/_device_offload.py vs. sklearnex/_device_offload.py at this point? Seems like maybe these could all be moved to onedal4py. But if there is functionality in sklearnex for a reason then I guess not.

@samir-nasibli
Copy link
Contributor Author

No major objections from my side at this point, looking good, seems like the queue/device offload will functionally remain the same. One remaining question I have is what separates functionality defined in onedal/_device_offload.py vs. sklearnex/_device_offload.py at this point? Seems like maybe these could all be moved to onedal4py. But if there is functionality in sklearnex for a reason then I guess not.

Thank you!
sklearnex specific functional I left in sklearnex._device_offload module. I think it should be ok now.

@samir-nasibli
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@ethanglaser ethanglaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good work on this and good overall review process, let's wait for internal CI to make sure

@samir-nasibli
Copy link
Contributor Author

/intelci: run

1 similar comment
@samir-nasibli
Copy link
Contributor Author

/intelci: run

@samir-nasibli
Copy link
Contributor Author

/intelci: run

@samir-nasibli
Copy link
Contributor Author

CI looks good. Fails are not related with changes on current PR. Merging it
Many thanks to the reviewers for the excellent work done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants