Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAPIDS 24.06 Databricks Deployment Docs Update #373

Closed
Tracked by #374
jarmak-nv opened this issue Jun 3, 2024 · 7 comments
Closed
Tracked by #374

RAPIDS 24.06 Databricks Deployment Docs Update #373

jarmak-nv opened this issue Jun 3, 2024 · 7 comments

Comments

@jarmak-nv
Copy link
Contributor

cuML now uses sklearn 1.5 with the merge of rapidsai/cuml#5851 which causes databricks to fail since their containers use at newest version 1.3.

We will need to update the docs to add

pip install scikit-learn --upgrade

to init.sh

Otherwise users will see an error similar to below:

    from cuml.common import logger as cuml_logger
  File "/databricks/python/lib/python3.9/site-packages/cuml/__init__.py", line 42, in <module>
    from cuml.explainer.kernel_shap import KernelExplainer
  File "/databricks/python/lib/python3.9/site-packages/cuml/explainer/__init__.py", line 17, in <module>
    from cuml.explainer.kernel_shap import KernelExplainer
  File "kernel_shap.pyx", line 28, in init cuml.explainer.kernel_shap
  File "/databricks/python/lib/python3.9/site-packages/cuml/linear_model/__init__.py", line 18, in <module>
    from cuml.linear_model.elastic_net import ElasticNet
  File "elastic_net.pyx", line 21, in init cuml.linear_model.elastic_net
  File "/databricks/python/lib/python3.9/site-packages/cuml/solvers/__init__.py", line 19, in <module>
    from cuml.solvers.qn import QN
  File "qn.pyx", line 39, in init cuml.solvers.qn
  File "/databricks/python/lib/python3.9/site-packages/cuml/metrics/__init__.py", line 45, in <module>
    from cuml.metrics.hinge_loss import hinge_loss
  File "hinge_loss.pyx", line 20, in init cuml.metrics.hinge_loss
  File "/databricks/python/lib/python3.9/site-packages/cuml/preprocessing/__init__.py", line 23, in <module>
    from cuml._thirdparty.sklearn.preprocessing import (
  File "/databricks/python/lib/python3.9/site-packages/cuml/_thirdparty/sklearn/preprocessing/__init__.py", line 6, in <module>
    from ._data import Binarizer
  File "/databricks/python/lib/python3.9/site-packages/cuml/_thirdparty/sklearn/preprocessing/_data.py", line 48, in <module>
    from sklearn.utils._indexing import resample
ModuleNotFoundError: No module named 'sklearn.utils._indexing'
@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jun 3, 2024

Does cuml set 1.5 as a minimum version?

In init.sh in our docs we have

pip install --extra-index-url=https://pypi.nvidia.com \
    "cudf-cu11" \
    "cuml-cu11" \
    "dask-cudf-cu11" \
    "dask-cuda=={{rapids_version}}"

I would assume installing cuml would bump scikit-learn. Is that not the case?

@jarmak-nv
Copy link
Contributor Author

Oh interesting - you're right!

scikit-learn isn't a hard-dependency of cuML, but it breaks on import now. Looks like this is actually a cuML issue.

@jarmak-nv
Copy link
Contributor Author

jarmak-nv commented Jun 4, 2024

cuML now has a PR to remove the hard dependency for 24.06.

DataBricks has 1.0.2 installed on live, and 1.3 on the beta container. cuML won't trigger an update on its own, so to ensure DB users get a good experience I think we should do an upgrade as part of init.sh.

That being said, maybe my initial plan of an --upgrade is worse than a pin to the same as in cuML ie: pip install scikit-learn==1.5

@jacobtomlinson
Copy link
Member

Ok thanks for confirming. So just to check, you are proposing we add something like the following to our docs

pip install --extra-index-url=https://pypi.nvidia.com \
    "cudf-cu11" \
    "cuml-cu11" \
    "dask-cudf-cu11" \
    "dask-cuda=={{rapids_version}}" \
    "scikit-learn==1.5"

@jarmak-nv
Copy link
Contributor Author

Yup! I figured this is the best place to do it since we already provide the init.sh and while technically users might have no problems on Databricks with the old version of scikit-learn, it's safest to upgrade it to prevent potential issues with cuML.

@taureandyernv
Copy link
Contributor

@jarmak-nv @jacobtomlinson @aravenel this issue also affects colab. Thanks for sharing Ben!

@jacobtomlinson
Copy link
Member

The fix in cuml means this change should no longer be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants