[part 1/2] [train] Add metadata argument to Trainer #38481

ericl · 2023-08-15T23:11:02Z

Why are these changes needed?

This implements the feature. In part 2, I'll add some docs. I'm splitting part 2 since merging just part 1 will unblock other issues.

Related issue number

Part of #38288

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl · 2023-08-15T23:11:20Z

python/ray/air/util/check_ingest.py

@@ -69,6 +71,8 @@ def make_train_loop(
        def train_loop_per_worker():
            import pandas as pd

+            print("Session metadata", train.get_context().get_metadata())


This is just updating the example class to use the new metadata stuff.

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl · 2023-08-15T23:12:24Z

python/ray/train/base_trainer.py


        def train_func(config):
+            assert metadata, metadata
+            # Propagate user metadata from the Trainer constructor.


I'm not very happy about this hack, but it's much cleaner than trying to propagate this dict through all the tune function wrapper layers.

pcmoritz · 2023-08-16T21:11:22Z

python/ray/train/base_trainer.py

+            try:
+                self.metadata = json.loads(json.dumps(self.metadata))
+            except Exception as e:
+                raise ValueError(


pcmoritz

Probably @justinvyu can comment more on the implementation (thought it looks very simple) -- the API and tests look great to me :)

justinvyu

Thanks, generally looks good to me. Just one suggestion about the multi-rank metadata setting and some nits.

python/ray/train/_internal/session.py

python/ray/train/base_trainer.py

justinvyu · 2023-08-16T22:49:45Z

python/ray/train/_internal/session.py

+        # Set additional user metadata from the Trainer.
+        if persisted_checkpoint and self.metadata:
+            user_metadata = persisted_checkpoint.get_metadata()
+            for k, v in self.metadata.items():
+                # Update keys not already set by the user. This gives user-set keys
+                # precedence over keys set at the Trainer level.
+                if k not in user_metadata:
+                    user_metadata[k] = v
+            persisted_checkpoint.set_metadata(user_metadata)
+


We will be setting the metadata many times here, once for each worker. Can we guard this with a rank check (e.g. only set metadata on rank 0 worker) and only set metadata once?

The other caveat here is that other trainers (xgb, lgbm, sklearn) don't have Train workers calling train.report, so we can't access train.get_context().get_world_rank() there.

Hmm, I feel like this would be more brittle given we are now supporting not reporting from rank 0. I don't think the performance impact here is measurable.

Ok, not too big of a deal

python/ray/train/_internal/backend_executor.py

Signed-off-by: Eric Liang <ekhliang@gmail.com>

justinvyu

Thanks, lgtm!

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

Signed-off-by: Victor <vctr.y.m@example.com>

ericl added 3 commits August 15, 2023 14:34

initial commit

849f3a8

fix checkpoint

b628bd4

add unit test

0532579

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl assigned pcmoritz and justinvyu Aug 15, 2023

ericl commented Aug 15, 2023

View reviewed changes

update

8e7216c

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl commented Aug 15, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/master' into add-metadata

81ad5b3

pcmoritz reviewed Aug 16, 2023

View reviewed changes

pcmoritz approved these changes Aug 16, 2023

View reviewed changes

justinvyu reviewed Aug 16, 2023

View reviewed changes

ericl added 4 commits August 17, 2023 11:35

review comments

3f755d5

Signed-off-by: Eric Liang <ekhliang@gmail.com>

Merge remote-tracking branch 'upstream/master' into add-metadata

0854ca7

revert check ingest file, fix test

529d802

Signed-off-by: Eric Liang <ekhliang@gmail.com>

lint

6800d44

Signed-off-by: Eric Liang <ekhliang@gmail.com>

justinvyu approved these changes Aug 17, 2023

View reviewed changes

ericl added 2 commits August 17, 2023 14:32

fix meta

de50c0d

Merge remote-tracking branch 'upstream/master' into add-metadata

b550321

ericl merged commit a4b1340 into ray-project:master Aug 18, 2023
41 of 46 checks passed

arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023

[part 1/2] [train] Add metadata argument to Trainer (ray-project#38481)

f8cc8e4

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023

[part 1/2] [train] Add metadata argument to Trainer (ray-project#38481)

34e3182

Signed-off-by: Victor <vctr.y.m@example.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[part 1/2] [train] Add metadata argument to Trainer #38481

[part 1/2] [train] Add metadata argument to Trainer #38481

ericl commented Aug 15, 2023 •

edited

Loading

ericl Aug 15, 2023

ericl Aug 15, 2023

pcmoritz Aug 16, 2023

pcmoritz left a comment

justinvyu left a comment

justinvyu Aug 16, 2023 •

edited

Loading

ericl Aug 17, 2023

justinvyu Aug 17, 2023 •

edited

Loading

justinvyu left a comment

[part 1/2] [train] Add metadata argument to Trainer #38481

[part 1/2] [train] Add metadata argument to Trainer #38481

Conversation

ericl commented Aug 15, 2023 • edited Loading

Why are these changes needed?

Related issue number

ericl Aug 15, 2023

Choose a reason for hiding this comment

ericl Aug 15, 2023

Choose a reason for hiding this comment

pcmoritz Aug 16, 2023

Choose a reason for hiding this comment

pcmoritz left a comment

Choose a reason for hiding this comment

justinvyu left a comment

Choose a reason for hiding this comment

justinvyu Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

ericl Aug 17, 2023

Choose a reason for hiding this comment

justinvyu Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

justinvyu left a comment

Choose a reason for hiding this comment

ericl commented Aug 15, 2023 •

edited

Loading

justinvyu Aug 16, 2023 •

edited

Loading

justinvyu Aug 17, 2023 •

edited

Loading