[Data][2/N] Move schema and meta_count to dataset#61349
[Data][2/N] Move schema and meta_count to dataset#61349bveeramani merged 14 commits intoray-project:masterfrom
schema and meta_count to dataset#61349Conversation
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
schema and meta_count to dataset
There was a problem hiding this comment.
Code Review
This pull request is a good step towards refactoring ExecutionPlan by moving the schema and meta_count methods to the Dataset class. The changes are well-contained and correctly update all relevant call sites. The new implementations in Dataset preserve the original logic while making the code cleaner, for example by using self.limit(1) in the schema method. The modifications across plan.py, dataset.py, dataset_repr.py, and base_trainer.py are consistent with this goal. Overall, this is a solid refactoring that improves code organization.
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
| schema = ds._base_schema(fetch_if_missing=False) | ||
| if count is None: | ||
| count = plan.meta_count() | ||
| count = ds._meta_count() |
There was a problem hiding this comment.
This temporary bidirectional coupling between Dataset and ExecutionPlan doesn't seem ideal. Are we planning to move get_plan_as_string, initial_num_blocks, and input_files to Dataset soon?
There was a problem hiding this comment.
yes this will go away soon
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
|
buildkite failure is some pip dependency resolution is too deep error. This PR doesn't change any setup.py or requirements.txt files, so I'll wait a bit, merge to latest master, and then retry CI. |
|
waiting on #62208 to be merged |
) ## Description Progress for removing `ExecutionPlan`. Moved `schema` and `meta_count` to the `Dataset` class. The operations for getting schema is separatated into two parts: `_base_schema` and `schema`. The motivation for this that for certain operations, we need the underlying schema class, while the public api returns the Ray wrapped schema. Currently there is a two-way binding between Dataset and ExecutionPlan due to repr operations. These will move to the Dataset in subsequent PRs. ## Related issues ray-project#60358 ## Additional information N/A --------- Signed-off-by: Daniel Shin <kyuseung1016@gmail.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Frank Mancina <fmancina@haproxy.com>

Description
Progress for removing
ExecutionPlan.Moved
schemaandmeta_countto theDatasetclass. The operations for getting schema is separatated into two parts:_base_schemaandschema. The motivation for this that for certain operations, we need the underlying schema class, while the public api returns the Ray wrapped schema.Currently there is a two-way binding between Dataset and ExecutionPlan due to repr operations. These will move to the Dataset in subsequent PRs.
Related issues
#60358
Additional information
N/A