Skip to content

Conversation

@iamjustinhsu
Copy link
Contributor

@iamjustinhsu iamjustinhsu commented Apr 24, 2025

Why are these changes needed?

Allows users to manipulate the column output names

Related issue number

Closes #52282

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@iamjustinhsu iamjustinhsu requested a review from a team as a code owner April 24, 2025 20:32
@iamjustinhsu iamjustinhsu changed the title Allow user column names read api [Data] Allow user column names read api Apr 24, 2025
@iamjustinhsu iamjustinhsu force-pushed the jhsu/read-api-column-names branch from a2f70e3 to 8926984 Compare April 24, 2025 21:58
@iamjustinhsu iamjustinhsu requested a review from bveeramani April 24, 2025 21:59
Signed-off-by: jhsu <jhsu@anyscale.com>
@iamjustinhsu iamjustinhsu force-pushed the jhsu/read-api-column-names branch from 640e621 to e2014b8 Compare April 24, 2025 23:21
@iamjustinhsu iamjustinhsu changed the title [Data] Allow user column names read api [WIP] [Data] Allow user column names read api Apr 25, 2025
Signed-off-by: jhsu <jhsu@anyscale.com>
@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label Apr 28, 2025
@mascharkh mascharkh added data Ray Data-related issues usability labels Apr 28, 2025
Signed-off-by: jhsu <jhsu@anyscale.com>
@iamjustinhsu iamjustinhsu force-pushed the jhsu/read-api-column-names branch from 32d92a3 to 235c2f7 Compare April 29, 2025 22:41
Signed-off-by: jhsu <jhsu@anyscale.com>
…i-column-names

Signed-off-by: jhsu <jhsu@anyscale.com>
@iamjustinhsu iamjustinhsu changed the title [WIP] [Data] Allow user column names read api [Data] Allow user column names read api Apr 30, 2025
Comment on lines 166 to 167
Column name defaults to "item".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere -- this isn't describing the example(s), so IMO makes more sense to move it above to the main description.

For example:

    """Creates a :class:`~ray.data.Dataset` from image files.

    Column name defaults to “image”.

    ...

https://anyscale-ray--52587.com.readthedocs.build/en/52587/data/api/doc/ray.data.read_images.html#ray.data.read_images

[0, 0]])}, {'data': array([[2, 2],
[2, 2]])}]
Colum name defaults to data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Colum name defaults to data.
Colum name defaults to "data".

iamjustinhsu and others added 8 commits May 1, 2025 13:24
Signed-off-by: jhsu <jhsu@anyscale.com>
Signed-off-by: jhsu <jhsu@anyscale.com>
…i-column-names

Signed-off-by: jhsu <jhsu@anyscale.com>
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
explicity document the default column names
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: jhsu <jhsu@anyscale.com>
Signed-off-by: jhsu <jhsu@anyscale.com>
Signed-off-by: jhsu <jhsu@anyscale.com>
Signed-off-by: jhsu <jhsu@anyscale.com>
Signed-off-by: jhsu <jhsu@anyscale.com>
Signed-off-by: jhsu <jhsu@anyscale.com>
@aslonnie aslonnie removed request for a team May 3, 2025 05:47
…i-column-names

Signed-off-by: jhsu <jhsu@anyscale.com>
@iamjustinhsu iamjustinhsu changed the title [Data] Allow user column names read api [Data] user column names read api doc changes May 6, 2025
@bveeramani bveeramani enabled auto-merge (squash) May 6, 2025 16:38
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label May 6, 2025
@bveeramani bveeramani disabled auto-merge May 6, 2025 17:35
bveeramani added 2 commits May 6, 2025 10:52
…roject/ray into jhsu/read-api-column-names

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani enabled auto-merge (squash) May 6, 2025 18:03
@bveeramani bveeramani merged commit 2bcea58 into master May 6, 2025
6 checks passed
@bveeramani bveeramani deleted the jhsu/read-api-column-names branch May 6, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests usability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] allow user to specify column names in read APIs