-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Clean up Datasource
abstractions
#40296
Comments
This was referenced Oct 12, 2023
This was referenced Oct 26, 2023
Merged
This was referenced Nov 2, 2023
bveeramani
added a commit
that referenced
this issue
Nov 3, 2023
This PR adds `_FileDatasink`, and it's user-facing subclasses `RowBasedFileDatasink` and `BockBasedFileDatasink`. #40693 migrates `FileDatasource` implementations to the new APIs. These changes are part of a larger effort to clean up `Datasource` interfaces (#40296). --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
bveeramani
added a commit
that referenced
this issue
Nov 3, 2023
This PR is part of a larger effort to clean up Datasource interfaces (#40296). #40199 introduces a new `Datasink` abstraction, and this PR migrates the write-supporting database-related `Datasource`s (BigQuery and SQL) to the new API. The primary motivation for these changes is to reduced complexity of our internal code base. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
bveeramani
added a commit
that referenced
this issue
Nov 3, 2023
This PR is part of a larger effort to clean up Datasource interfaces (#40296). This #40691 added the new FileDatasink base class, and this PR migrates FileDatasource implementations to the new API. The primary motivation for these changes is to reduced complexity of our internal code base. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
bveeramani
added a commit
that referenced
this issue
Nov 3, 2023
This PR is part of a larger effort to clean up Datasource interfaces (#40296). #40691 added the new FileDatasink base class, and this PR migrates ParquetDatasource to the new API. The primary motivation for these changes is to reduced complexity of our internal code base. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
This was referenced Nov 7, 2023
stephanie-wang
pushed a commit
that referenced
this issue
Nov 13, 2023
#40296 copied write-related code from Datasource implementations to Datasink implementations. As a result, there's now unused write-related code in existing Datasource implementations. This PR removes them. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this issue
Nov 29, 2023
This PR adds `_FileDatasink`, and it's user-facing subclasses `RowBasedFileDatasink` and `BockBasedFileDatasink`. ray-project#40693 migrates `FileDatasource` implementations to the new APIs. These changes are part of a larger effort to clean up `Datasource` interfaces (ray-project#40296). --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this issue
Nov 29, 2023
This PR is part of a larger effort to clean up Datasource interfaces (ray-project#40296). ray-project#40199 introduces a new `Datasink` abstraction, and this PR migrates the write-supporting database-related `Datasource`s (BigQuery and SQL) to the new API. The primary motivation for these changes is to reduced complexity of our internal code base. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this issue
Nov 29, 2023
This PR is part of a larger effort to clean up Datasource interfaces (ray-project#40296). This ray-project#40691 added the new FileDatasink base class, and this PR migrates FileDatasource implementations to the new API. The primary motivation for these changes is to reduced complexity of our internal code base. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this issue
Nov 29, 2023
This PR is part of a larger effort to clean up Datasource interfaces (ray-project#40296). ray-project#40691 added the new FileDatasink base class, and this PR migrates ParquetDatasource to the new API. The primary motivation for these changes is to reduced complexity of our internal code base. For more information, see https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
ujjawal-khare
pushed a commit
to ujjawal-khare-27/ray
that referenced
this issue
Nov 29, 2023
ray-project#40296 copied write-related code from Datasource implementations to Datasink implementations. As a result, there's now unused write-related code in existing Datasource implementations. This PR removes them. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
8 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://docs.google.com/document/d/1Bqhbzvxv7liwpOhyBzRVy5tOzXdy-NiMSFa-6hupr18/edit#heading=h.rytitv546vx5
API changes
Write-related
Datasink
#40199BlockWritePathProvider
in favor ofFilenameProvider
#40297FileDatasink
#40691Read-related
Reader
#40196Migrations
Write-related
FileDatasink
subclasses #40693Datasink
subclasses #40817ParquetDatasink
#40818Read-related
FileBasedDatasource
to new API #40900Datasource
subclasses #40858Datasource
subclasses #40826RangeDatasource
to updated API #40881'ParquetDatasource
to new API #40902Minor refactors
_fetch_metadata_parallel
tofile_meta_provider
#40295BlockWritePathProvider
to separate file #40302_resolve_paths_and_filesystem
toutil
file #40304do_write
#40422The text was updated successfully, but these errors were encountered: