-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Deprecate Reader
#40196
Merged
Merged
[Data] Deprecate Reader
#40196
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
bveeramani
requested review from
ericl,
scv119,
c21,
amogkam,
scottjlee,
raulchen and
stephanie-wang
as code owners
October 6, 2023 23:09
16 tasks
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
9 tasks
Can you provide some context in the PR description? |
9 tasks
stephanie-wang
approved these changes
Nov 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, modulo the TODOs!
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
This was referenced Nov 2, 2023
bveeramani
added a commit
that referenced
this pull request
Nov 3, 2023
This PR is part of a larger effort to clean up the Datasource interfaces. #40196 deprecated Reader. Accordingly, this PR removes _TorchDatasourceReader and _HuggingFaceDatasourceReader and moves their logic into the respective datasources. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
bveeramani
added a commit
that referenced
this pull request
Nov 3, 2023
This PR is part of a larger effort to clean up the Datasource interfaces. #40196 deprecated Reader. Accordingly, this PR removes _RangeDatasourceReader and moves it's logic into RangeDatasource. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
bveeramani
added a commit
that referenced
this pull request
Nov 3, 2023
This PR is part of a larger effort to clean up the Datasource interfaces. #40196 deprecated Reader. Accordingly, this PR removes Reader implementations for Mongo, BigQuery, Databricks, and SQL, and moves their logic into the respective datasources. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
bveeramani
added a commit
that referenced
this pull request
Nov 6, 2023
This PR is part of a larger effort to clean up the Datasource interfaces. #40196 deprecated Reader. Accordingly, this PR removes _FileBasedDatasourceReader and updates the FileBasedDatasource subclasses. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this pull request
Nov 29, 2023
tl;dr: Reader adds unnecessary complexity, so we're removing it. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this pull request
Nov 29, 2023
This PR is part of a larger effort to clean up the Datasource interfaces. ray-project#40196 deprecated Reader. Accordingly, this PR removes _TorchDatasourceReader and _HuggingFaceDatasourceReader and moves their logic into the respective datasources. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this pull request
Nov 29, 2023
This PR is part of a larger effort to clean up the Datasource interfaces. ray-project#40196 deprecated Reader. Accordingly, this PR removes _RangeDatasourceReader and moves it's logic into RangeDatasource. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this pull request
Nov 29, 2023
…40826) This PR is part of a larger effort to clean up the Datasource interfaces. ray-project#40196 deprecated Reader. Accordingly, this PR removes Reader implementations for Mongo, BigQuery, Databricks, and SQL, and moves their logic into the respective datasources. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this pull request
Nov 29, 2023
This PR is part of a larger effort to clean up the Datasource interfaces. ray-project#40196 deprecated Reader. Accordingly, this PR removes _FileBasedDatasourceReader and updates the FileBasedDatasource subclasses. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
tl;dr
Reader
adds unnecessary complexity, so we're removing it.We're deprecating
Reader
and moving its methods intoDatasource
. There are several reasons why:Reader
introduces a pass-through layer.Reader
subclasses contain circular dependencies.Datasource
parameters are specified with implicit (and error-prone) keywords arguments. For example:ray/python/ray/data/datasource/csv_datasource.py
Lines 35 to 38 in 8504563
Going forward, you would implement
Datasource.get_read_tasks
andDatasource.estimate_inmemory_size
directly rather thanReader.get_read_tasks
andReader.estimate_inmemory_size
.This change is part of a larger effort to clean up the
Datasource
-related interfaces. For more information, see the design document.Related issue number
Towards #40296
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.