Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Remove out-of-date Data examples #40127

Merged
merged 6 commits into from
Oct 17, 2023

Conversation

bveeramani
Copy link
Member

@bveeramani bveeramani commented Oct 4, 2023

Why are these changes needed?

This PR removes out-of-date or unhelpful Data examples.

The following examples are removed:

  • nyc_taxi_basic_processing, because it it's too long and emphasizes tabular data (which is inconsistent with our positioning)
  • ocr_examples, because it's out-of-date. It'll get rewritten.
  • random-access, because random access datasets aren't maintained.
  • custom-datasource, because it uses deprecated APIs. It'll get rewritten.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Comment on lines 29 to 30
Computer Vision
---------------
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that a heading didn't make sense if there's only one heading.

@bveeramani
Copy link
Member Author

cc @angelinalg @emmyscode

@amogkam
Copy link
Contributor

amogkam commented Oct 5, 2023

Thanks.

  1. Can we create issues to track the rewrites?
  2. For the batch training example, if we are no longer positioning Ray Data as a solution for Many Model Training, then we should update the corresponding documentation: https://docs.ray.io/en/latest/ray-overview/use-cases.html#many-model-training
  3. I think we should keep 1 tabular data + xgboost example

@ericl
Copy link
Contributor

ericl commented Oct 5, 2023

Er, I don't think we've decided to do (2).

@amogkam
Copy link
Contributor

amogkam commented Oct 5, 2023

ok then let's either keep the batch training example in a new section called Many Model Training or file an issue for rewrite if it needs to be rewritten

@bveeramani
Copy link
Member Author

bveeramani commented Oct 5, 2023

Er, I don't think we've decided to do (2).

@ericl to clarify, we're still positioning Ray Data as the recommended solution for many model training?

@ericl
Copy link
Contributor

ericl commented Oct 5, 2023

@ericl to clarify, we're still positioning Ray Data as the recommended solution for many model training?

Yes, nothing has changed here as far as I am aware, and we also don't have any other better solution here. This falls under the category of "generic bulk parallel processing".

@bveeramani
Copy link
Member Author

Got it. Has many model training been a common use case? Even if we preface this example with a warning, I'm concerned that it'll cause confusion for the majority of users who perform regular training

@ericl
Copy link
Contributor

ericl commented Oct 5, 2023

Got it. Has many model training been a common use case? Even if we preface this example with a warning, I'm concerned that it'll cause confusion for the majority of users who perform regular training

Yes, it's a frequent ask. Why are we even adding a warning? As far as I know nothing has changed here in regards to use cases and recommendations, so we shouldn't be making any material positioning changes.

@bveeramani
Copy link
Member Author

We'd warn that, if you're doing regular training, you should look at the Ray Train examples. What do you think?

@ericl
Copy link
Contributor

ericl commented Oct 5, 2023

I'd just make it clear that this is for when you are training thousands or more models, and when the training depends on different slices of data. You can add a link to Train for more monolithic training cases. No need to frame it as a warning, just need to set the context for the use case.

@amogkam
Copy link
Contributor

amogkam commented Oct 5, 2023

Let's move it into a separate section called Many Model Training to make it clear. This matches the header we use in the Ray Use Cases section

bveeramani and others added 5 commits October 16, 2023 15:51
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
@bveeramani bveeramani requested a review from a team as a code owner October 17, 2023 00:35
@pcmoritz
Copy link
Contributor

What is the timeline for rewriting the examples that are planned to be rewritten? We should make sure we don't leave a gap here (e.g. not having them for Ray 2.8) and also make sure the links will keep working.

@bveeramani
Copy link
Member Author

bveeramani commented Oct 17, 2023

What is the timeline for rewriting the examples that are planned to be rewritten? We should make sure we don't leave a gap here (e.g. not having them for Ray 2.8) and also make sure the links will keep working.

With the exception of the custom datasource example, we're rewriting them after the 2.8 branch cut but before the 2.8 release. We'll rewrite the custom datasource example once we finish #40296

@bveeramani bveeramani merged commit 56b72a5 into ray-project:master Oct 17, 2023
32 of 41 checks passed
@bveeramani bveeramani deleted the remove-examples branch October 17, 2023 22:06
jonathan-anyscale pushed a commit to jonathan-anyscale/ray that referenced this pull request Oct 26, 2023
This PR removes out-of-date or unhelpful Data examples.

The following examples are removed:

- nyc_taxi_basic_processing, because it it's too long and emphasizes tabular data (which is inconsistent with our positioning)
- ocr_examples, because it's out-of-date. It'll get rewritten.
- random-access, because random access datasets aren't maintained.
- custom-datasource, because it uses deprecated APIs. It'll get rewritten.

---------

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Co-authored-by: Cheng Su <scnju13@gmail.com>
jonathan-anyscale pushed a commit to jonathan-anyscale/ray that referenced this pull request Oct 26, 2023
This PR removes out-of-date or unhelpful Data examples.

The following examples are removed:

- nyc_taxi_basic_processing, because it it's too long and emphasizes tabular data (which is inconsistent with our positioning)
- ocr_examples, because it's out-of-date. It'll get rewritten.
- random-access, because random access datasets aren't maintained.
- custom-datasource, because it uses deprecated APIs. It'll get rewritten.

---------

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Co-authored-by: Cheng Su <scnju13@gmail.com>
bveeramani added a commit that referenced this pull request Dec 12, 2023
#40127 removed the "Implementing a Custom Datasource" example because it used deprecated APIs. This PR introduces a new example that uses up-to-date APIs.

---------

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
bveeramani added a commit to bveeramani/ray that referenced this pull request Dec 12, 2023
…oject#41785)

ray-project#40127 removed the "Implementing a Custom Datasource" example because it used deprecated APIs. This PR introduces a new example that uses up-to-date APIs.

---------

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
architkulkarni pushed a commit that referenced this pull request Dec 12, 2023
#41821)

#40127 removed the "Implementing a Custom Datasource" example because it used deprecated APIs. This PR introduces a new example that uses up-to-date APIs.

---------

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants