Skip to content

[Data] Simplify execution callback lifecycle fix Diff #1#60480

Merged
bveeramani merged 11 commits intoray-project:masterfrom
limarkdcunha:task/simplify-execution-callback-lifecycle
Feb 24, 2026
Merged

[Data] Simplify execution callback lifecycle fix Diff #1#60480
bveeramani merged 11 commits intoray-project:masterfrom
limarkdcunha:task/simplify-execution-callback-lifecycle

Conversation

@limarkdcunha
Copy link
Contributor

@limarkdcunha limarkdcunha commented Jan 24, 2026

Description

Larger Issue

Refactored the execution callback system from dynamic, per-job injection to static, eager initialization. DataContext now serves as a stateless registry of callback factories, enabling the StreamingExecutor to load all callbacks at startup. Each callback then self-configures or no-ops based on the runtime environment.

Context

This is Part 1 of a four-part change set intended to resolve the issue described above.

Closes #60279.

@limarkdcunha
Copy link
Contributor Author

@bveeramani review please

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the execution callback system in Ray Data, moving from a dynamic, per-job injection model to a static, eager initialization approach. The DataContext now serves as a registry for callback factories, and the StreamingExecutor loads all callbacks at startup. This simplifies the callback lifecycle and makes the system more declarative. The LoadCheckpointCallback is also updated to fit this new model, becoming a no-op when checkpointing is not configured. The changes are logical and well-implemented. I have one suggestion to improve the robustness of exception handling when loading callbacks from environment variables.

@bveeramani
Copy link
Member

@owenowenisme would you mind doing a first pass on this? I think we should pay special attention to the checkpointing changes and ensuring there are no new correctness issues

Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high-level feedback: two disadvantages of requiring all subclasses to share the same constructor are: (1) it makes it less explicit what the dependencies of that subclass are, and (2) it makes it harder to test the subclass because you can’t directly inject fake dependencies.

An alternative pattern would be to introduce a require factory class method like from_executor(cls, executor), and use that when constructing the callbacks. Then, subclasses can still define whatever dependencies they need in the constructor.

Also, make sure the other callback tests still pass? (If you haven’t already)

@limarkdcunha limarkdcunha force-pushed the task/simplify-execution-callback-lifecycle branch from 6f87e23 to 6ac4e42 Compare January 29, 2026 23:47
@limarkdcunha limarkdcunha marked this pull request as ready for review January 30, 2026 00:25
@limarkdcunha limarkdcunha requested a review from a team as a code owner January 30, 2026 00:25
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Jan 30, 2026
@limarkdcunha limarkdcunha force-pushed the task/simplify-execution-callback-lifecycle branch from 9b0e0db to 7b8c9dc Compare February 3, 2026 16:55
@limarkdcunha
Copy link
Contributor Author

@bveeramani @owenowenisme another review please

Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
@limarkdcunha limarkdcunha force-pushed the task/simplify-execution-callback-lifecycle branch from 824de72 to bb435d8 Compare February 20, 2026 16:44
@bveeramani bveeramani enabled auto-merge (squash) February 23, 2026 20:53
@github-actions github-actions bot disabled auto-merge February 23, 2026 21:24
@bveeramani bveeramani enabled auto-merge (squash) February 24, 2026 00:47
@github-actions github-actions bot disabled auto-merge February 24, 2026 00:50
@bveeramani bveeramani enabled auto-merge (squash) February 24, 2026 00:54
@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Feb 24, 2026
@github-actions github-actions bot disabled auto-merge February 24, 2026 19:54
bveeramani and others added 2 commits February 24, 2026 12:06
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@bveeramani bveeramani merged commit 6750300 into ray-project:master Feb 24, 2026
6 checks passed
@limarkdcunha limarkdcunha deleted the task/simplify-execution-callback-lifecycle branch February 24, 2026 21:38
@limarkdcunha limarkdcunha self-assigned this Feb 24, 2026
bveeramani pushed a commit that referenced this pull request Feb 25, 2026
…d of instances (#61293)

Larger Issue
Refactored the execution callback system from dynamic, per-job injection
to static, eager initialization. DataContext now serves as a stateless
registry of callback factories, enabling the StreamingExecutor to load
all callbacks at startup. Each callback then self-configures or no-ops
based on the runtime environment.

### Context
This is Part 2 of a four-part change set intended to resolve the issue
described above.

Part 1 diff: #60480

Closes #60279.

---------

Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Simplify execution callback lifecycle

2 participants