Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray MLDataset Integration #2230

Closed
chaokunyang opened this issue Jul 19, 2021 · 3 comments · Fixed by #2294
Closed

Ray MLDataset Integration #2230

chaokunyang opened this issue Jul 19, 2021 · 3 comments · Fixed by #2294

Comments

@chaokunyang
Copy link
Contributor

Ray MLDataset is a distributed dataset implemented based on ray ParallelIterator. The data in MLDataset can be used by mllibs on ray such as xgboost_on_ray or raysgd for distributed training.
It would be great that mars can support convert mars dataframe to ray MLDataset, so that mars can use ray mllibs for distributed training easily.
And since both records in Ray MLDataset and chunks in mars are pandas.Dataframe too, there won't be any conversion cost between mars dataframe and ray MLDataset

@chaokunyang
Copy link
Contributor Author

@vcfgv

@ericl
Copy link

ericl commented Jul 20, 2021

Fyi MLdataset is planned for deprecation in Ray, we're in the process of replacing them with just Dataset (once it leaves beta).

@chaokunyang
Copy link
Contributor Author

Fyi MLdataset is planned for deprecation in Ray, we're in the process of replacing them with just Dataset (once it leaves beta).

Thanks for the information. After some offline discussion, we decided to support MLDataset too. Because most of the work for supportting ray dataset and MLDataset are the same, and xgboost_ray/lightgbm_ray doesn't support ray dataset yet. And for older version of ray which doesn't have ray support, MLDataset is still useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants