-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Remove out-of-date Data examples #40127
Conversation
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
doc/source/data/examples/index.rst
Outdated
Computer Vision | ||
--------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought that a heading didn't make sense if there's only one heading.
Thanks.
|
Er, I don't think we've decided to do (2). |
ok then let's either keep the batch training example in a new section called Many Model Training or file an issue for rewrite if it needs to be rewritten |
@ericl to clarify, we're still positioning Ray Data as the recommended solution for many model training? |
Yes, nothing has changed here as far as I am aware, and we also don't have any other better solution here. This falls under the category of "generic bulk parallel processing". |
Got it. Has many model training been a common use case? Even if we preface this example with a warning, I'm concerned that it'll cause confusion for the majority of users who perform regular training |
Yes, it's a frequent ask. Why are we even adding a warning? As far as I know nothing has changed here in regards to use cases and recommendations, so we shouldn't be making any material positioning changes. |
We'd warn that, if you're doing regular training, you should look at the Ray Train examples. What do you think? |
I'd just make it clear that this is for when you are training thousands or more models, and when the training depends on different slices of data. You can add a link to Train for more monolithic training cases. No need to frame it as a warning, just need to set the context for the use case. |
Let's move it into a separate section called Many Model Training to make it clear. This matches the header we use in the Ray Use Cases section |
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
What is the timeline for rewriting the examples that are planned to be rewritten? We should make sure we don't leave a gap here (e.g. not having them for Ray 2.8) and also make sure the links will keep working. |
With the exception of the custom datasource example, we're rewriting them after the 2.8 branch cut but before the 2.8 release. We'll rewrite the custom datasource example once we finish #40296 |
This PR removes out-of-date or unhelpful Data examples. The following examples are removed: - nyc_taxi_basic_processing, because it it's too long and emphasizes tabular data (which is inconsistent with our positioning) - ocr_examples, because it's out-of-date. It'll get rewritten. - random-access, because random access datasets aren't maintained. - custom-datasource, because it uses deprecated APIs. It'll get rewritten. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Co-authored-by: Cheng Su <scnju13@gmail.com>
This PR removes out-of-date or unhelpful Data examples. The following examples are removed: - nyc_taxi_basic_processing, because it it's too long and emphasizes tabular data (which is inconsistent with our positioning) - ocr_examples, because it's out-of-date. It'll get rewritten. - random-access, because random access datasets aren't maintained. - custom-datasource, because it uses deprecated APIs. It'll get rewritten. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Co-authored-by: Cheng Su <scnju13@gmail.com>
#40127 removed the "Implementing a Custom Datasource" example because it used deprecated APIs. This PR introduces a new example that uses up-to-date APIs. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
…oject#41785) ray-project#40127 removed the "Implementing a Custom Datasource" example because it used deprecated APIs. This PR introduces a new example that uses up-to-date APIs. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Why are these changes needed?
This PR removes out-of-date or unhelpful Data examples.
The following examples are removed:
nyc_taxi_basic_processing
, because it it's too long and emphasizes tabular data (which is inconsistent with our positioning)ocr_examples
, because it's out-of-date. It'll get rewritten.random-access
, because random access datasets aren't maintained.custom-datasource
, because it uses deprecated APIs. It'll get rewritten.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.