[Data][2/N] Move `schema` and `meta_count` to dataset by kyuds · Pull Request #61349 · ray-project/ray

kyuds · 2026-02-26T08:16:52Z

Description

Progress for removing ExecutionPlan.

Moved schema and meta_count to the Dataset class. The operations for getting schema is separatated into two parts: _base_schema and schema. The motivation for this that for certain operations, we need the underlying schema class, while the public api returns the Ray wrapped schema.

Currently there is a two-way binding between Dataset and ExecutionPlan due to repr operations. These will move to the Dataset in subsequent PRs.

Related issues

#60358

Additional information

N/A

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

gemini-code-assist

Code Review

This pull request is a good step towards refactoring ExecutionPlan by moving the schema and meta_count methods to the Dataset class. The changes are well-contained and correctly update all relevant call sites. The new implementations in Dataset preserve the original logic while making the code cleaner, for example by using self.limit(1) in the schema method. The modifications across plan.py, dataset.py, dataset_repr.py, and base_trainer.py are consistent with this goal. Overall, this is a solid refactoring that improves code organization.

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

github-actions · 2026-03-13T00:50:25Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

bveeramani · 2026-03-25T18:58:21Z

+                    schema = ds._base_schema(fetch_if_missing=False)
                if count is None:
-                    count = plan.meta_count()
+                    count = ds._meta_count()


This temporary bidirectional coupling between Dataset and ExecutionPlan doesn't seem ideal. Are we planning to move get_plan_as_string, initial_num_blocks, and input_files to Dataset soon?

yes this will go away soon

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

kyuds · 2026-03-30T04:44:06Z

buildkite failure is some pip dependency resolution is too deep error. This PR doesn't change any setup.py or requirements.txt files, so I'll wait a bit, merge to latest master, and then retry CI.

kyuds · 2026-03-31T04:56:43Z

waiting on #62208 to be merged

) ## Description Progress for removing `ExecutionPlan`. Moved `schema` and `meta_count` to the `Dataset` class. The operations for getting schema is separatated into two parts: `_base_schema` and `schema`. The motivation for this that for certain operations, we need the underlying schema class, while the public api returns the Ray wrapped schema. Currently there is a two-way binding between Dataset and ExecutionPlan due to repr operations. These will move to the Dataset in subsequent PRs. ## Related issues ray-project#60358 ## Additional information N/A --------- Signed-off-by: Daniel Shin <kyuseung1016@gmail.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Frank Mancina <fmancina@haproxy.com>

move schema and metacount to dataset

b029901

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

kyuds requested review from a team as code owners February 26, 2026 08:16

kyuds added the go add ONLY when ready to merge, run all tests label Feb 26, 2026

kyuds changed the title ~~[Data][2/N] Move schema and meta_count to dataset~~ [Data][2/N] Move schema and meta_count to dataset Feb 26, 2026

kyuds requested a review from bveeramani February 26, 2026 08:18

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

cursor bot reviewed Feb 26, 2026

View reviewed changes

Comment thread python/ray/data/_internal/plan.py Outdated

Comment thread python/ray/data/dataset.py Outdated

reflect review comments

e7adc79

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

cursor bot reviewed Feb 26, 2026

View reviewed changes

Comment thread python/ray/data/_internal/plan.py Outdated

don't execute

3e9efb3

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

cursor bot reviewed Feb 26, 2026

View reviewed changes

Comment thread python/ray/data/_internal/plan.py Outdated

Comment thread python/ray/data/dataset.py Outdated

separate function out

ac49144

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

ray-gardener bot added the community-contribution Contributed by the community label Feb 26, 2026

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 13, 2026

kyuds removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 17, 2026

Merge branch 'master' into schema-meta-move

1970214

bveeramani approved these changes Mar 25, 2026

View reviewed changes

Comment thread python/ray/data/dataset.py Outdated

Comment thread python/ray/data/dataset.py

merge master

5dab7ae

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

cursor bot reviewed Mar 28, 2026

View reviewed changes

Comment thread python/ray/data/dataset.py Outdated

kyuds added 2 commits March 28, 2026 15:13

streaming split handle

8b6c9cb

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

retrigger docs build

bdcb82a

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

bveeramani reviewed Mar 29, 2026

View reviewed changes

Comment thread python/ray/data/_internal/plan.py Outdated

kyuds added 2 commits March 28, 2026 23:36

Merge branch 'master' into schema-meta-move

e9ae6b6

reflect review comment

4799eae

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

kyuds requested a review from bveeramani March 29, 2026 06:42

cursor bot reviewed Mar 29, 2026

View reviewed changes

Comment thread python/ray/data/_internal/plan.py Outdated

fix bug

3273625

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

bveeramani approved these changes Mar 29, 2026

View reviewed changes

retrigger ci

176de53

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>

Merge branch 'master' into schema-meta-move

8b4ce6a

kyuds added the data Ray Data-related issues label Mar 30, 2026

Merge branch 'master' into schema-meta-move

0015c76

liulehui approved these changes Mar 31, 2026

View reviewed changes

bveeramani merged commit 6dadfdf into ray-project:master Mar 31, 2026
6 checks passed

kyuds deleted the schema-meta-move branch March 31, 2026 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data][2/N] Move `schema` and `meta_count` to dataset#61349

[Data][2/N] Move `schema` and `meta_count` to dataset#61349
bveeramani merged 14 commits intoray-project:masterfrom
kyuds:schema-meta-move

kyuds commented Feb 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

bveeramani Mar 25, 2026

Uh oh!

kyuds Mar 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

kyuds commented Mar 30, 2026

Uh oh!

kyuds commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kyuds commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

bveeramani Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

kyuds Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kyuds commented Mar 30, 2026

Uh oh!

kyuds commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kyuds commented Feb 26, 2026 •

edited

Loading