Skip to content

[Data] Compute Expressions-struct field_by_index#62560

Open
wanadzhar913 wants to merge 5 commits into
ray-project:masterfrom
wanadzhar913:data/compute-expressions-struct
Open

[Data] Compute Expressions-struct field_by_index#62560
wanadzhar913 wants to merge 5 commits into
ray-project:masterfrom
wanadzhar913:data/compute-expressions-struct

Conversation

@wanadzhar913
Copy link
Copy Markdown

@wanadzhar913 wanadzhar913 commented Apr 13, 2026

Description

Add index-based field access for struct namespace operations in Ray Data.

Changes

  • Add struct.field_by_index(field_index: int) in python/ray/data/namespace_expressions/struct_namespace.py
  • Extend struct.__getitem__ to accept both:
    • str (existing behavior): col("s").struct["field"]
    • int (new): col("s").struct[0]
  • Keep type inference for Arrow-backed struct dtypes
  • Add key-type validation for .struct[...]
  • Update Expr.struct examples in python/ray/data/expressions.py to show index-based usage

Related issues

Related to #58674

Additional information

  • python -m pytest -v --doctest-modules python/ray/data/namespace_expressions/struct_namespace.py
  • python -m pytest -v python/ray/data/tests/expressions/test_namespace_struct.py
  • python -m pytest -v --doctest-modules python/ray/data/expressions.py -k 'struct and Expr'

@wanadzhar913 wanadzhar913 requested a review from a team as a code owner April 13, 2026 15:48
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces index-based access for struct fields in Ray Data expressions, allowing users to retrieve fields by their position using both bracket notation and a new field_by_index method. The feedback suggests refining type checks to explicitly exclude boolean values from being treated as integer indices and adding validation to ensure indices are non-negative, preventing potential mismatches with the underlying PyArrow compute functions.

Comment thread python/ray/data/namespace_expressions/struct_namespace.py
Comment thread python/ray/data/namespace_expressions/struct_namespace.py
@wanadzhar913
Copy link
Copy Markdown
Author

Hi @goutamvenkat-anyscale, could you help to review this pr when you have a moment? Thanks!

@ray-gardener ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels Apr 13, 2026
@wanadzhar913 wanadzhar913 force-pushed the data/compute-expressions-struct branch from a31c717 to b7c69fa Compare April 15, 2026 02:53
Signed-off-by: wanadzhar913 <adzhar.faiq@gmail.com>
…oject/ray/pull/62560/changes#r3074173858)

- Explicitly exclude boolean types when checking for integer indices to avoid confusing behavior (https://github.com/ray-project/ray/pull/62560/changes#r3074173838)

Signed-off-by: wanadzhar913 <adzhar.faiq@gmail.com>
@wanadzhar913 wanadzhar913 force-pushed the data/compute-expressions-struct branch from b7c69fa to c0edef4 Compare April 21, 2026 05:29
Comment thread python/ray/data/expressions.py Outdated
Comment thread python/ray/data/tests/expressions/test_namespace_struct.py Outdated
Copy link
Copy Markdown
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left a few comments.

@goutamvenkat-anyscale goutamvenkat-anyscale added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 12, 2026
…67648 by parameterizing test cases

Signed-off-by: wanadzhar913 <adzhar.faiq@gmail.com>
…51559 by readding doctests

Signed-off-by: wanadzhar913 <adzhar.faiq@gmail.com>
Signed-off-by: wanadzhar913 <adzhar.faiq@gmail.com>
@wanadzhar913
Copy link
Copy Markdown
Author

Tysm for reviewing my PR! Have resolved your comments.

Copy link
Copy Markdown
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the change!

@goutamvenkat-anyscale goutamvenkat-anyscale added the go add ONLY when ready to merge, run all tests label May 18, 2026
@goutamvenkat-anyscale goutamvenkat-anyscale enabled auto-merge (squash) May 18, 2026 05:41
@wanadzhar913
Copy link
Copy Markdown
Author

Hi @goutamvenkat-anyscale, not too sure if the CI failure is related to this PR or not. Lmk if I've to change anything. Tysm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants