-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Client][Dask] Dask Errors on Ray Client #16743
Comments
Is this a duplicate of |
@ijrsvt I think that it might be! |
@matthewdeng One workaround is to use the single-threaded scheduler: ray/python/ray/util/dask/scheduler.py Lines 385 to 460 in 54ce809
|
@ijrsvt I tried running the same script with the latest version of
I also tried setting the scheduler to
|
@matthewdeng from ray.util.dask import ray_dask_get_sync |
@clarkzinzow haha woops totally forgot that I had originally imported
|
It looks like the Dask-on-Ray data source for XGBoost-on-Ray hardcodes the multithreaded Dask-on-Ray scheduler here when persisting the collection, so it could still be the same bug. And the |
@matthewdeng I'm able to reproduce this! |
Actually this might be distinct from #16406 because neither I'm not able to follow where this attempt to subscript is coming from :( |
@clarkzinzow do you know have any insight into how to debug further? |
My guess is that @matthewdeng As a workaround, can you try removing the persisting of the training data from the script?
|
@matthewdeng I just tried removing that, it appears to work fine. |
@matthewdeng another workaround is to do the following and keep the 'persist' call:
|
Nice, removing
With
|
@matthewdeng Sorry for the delay, but if you reproduce this, can you print the output of |
@irjsvt thanks for making the deserialization change, I now see that it's a credential error:
This is from running the latest commit in https://github.com/ray-project/ray/pull/16518/files. A few quick observations:
|
I am unable to reproduce the original problem
nor with
The other problem (with creds) is still there. |
@matthewdeng Just to confirm: The main initial issue is resolved with Would it be okay to close & open a follow-up about the credentials issue? |
@ijrsvt I'm okay with opening a separate issue to track the credentials issue, but would it be reasonable to keep this one open since I don't think we've actually fixed the root cause of the original scheduler issue? |
I can not reproduce the original issue. The problem with credentials is separate: AWS environment variables are not propagated properly. |
@sasha-s I think it only arose when the Client Server was on a remote machine @matthewdeng The solution is to do I guess is there some follow-up for this:
|
@richardliaw was not able to reproduce. |
@richardliaw @matthewdeng Can we close this issue as the initial problem is solved by |
I'll take a look! |
@Yard1 Any updates on trying to implement the suggestion? |
We are still trying to figure out the best way to take care of this. Ideally this would be taken care of on the Client's side |
@Yard1, Would ensuring all calls to the
be okay? |
Yeah that should work actually. I'd be happy to test this! |
@ijrsvt , thanks for closing the loop on this one! |
What is the problem?
When running a simple Dask/XGBoost script using Ray Client, length computation of the Dask Dataframe ultimately errors out with the following:
Ray version and other system information (Python version, TensorFlow version, OS):
Reproduction (REQUIRED)
ray start --head
on local machine.dask-xgboost.py
:The text was updated successfully, but these errors were encountered: