Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[train] New persistence mode: Add storage type telemetry #39286

Merged
merged 6 commits into from
Sep 6, 2023

Conversation

justinvyu
Copy link
Contributor

Why are these changes needed?

This adds back storage telemetry for the new codepath. Only a whitelist of default pyarrow implementations will be tracked. All custom filesystems will be marked as "custom".

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…age_telemetry

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
python/ray/air/_internal/usage.py Show resolved Hide resolved
- 'custom' = All other storage schemes, which includes ALL cases where a
custom `storage_filesystem` is provided.
"""
whitelist = {"local", "mock", "s3", "gcs", "abfs", "hdfs"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there also gs as an alternative for gcs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, pyarrow fs type name is always gcs

Comment on lines +955 to +957
if _use_storage_context():
air_usage.tag_storage_type(experiments[0].storage)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is tracked down here rather than where the TODO was?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the Experiment object to be initialized to access the StorageContext

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@matthewdeng matthewdeng merged commit 056ad5d into ray-project:master Sep 6, 2023
85 of 93 checks passed
jonathan-anyscale pushed a commit to jonathan-anyscale/ray that referenced this pull request Sep 6, 2023
matthewdeng pushed a commit to matthewdeng/ray that referenced this pull request Sep 7, 2023
GeneDer pushed a commit that referenced this pull request Sep 7, 2023
…39368)

* fix tune_hvd_keras (#39223)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* [train] New persistence mode: Add backwards compatibility support for `local_dir` (#39282)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* [train] Add TrainContext.get_storage (#39281)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* [train] New persistence mode: Deprecate experimental distributed checkpointing configs (#39279)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* [Release] Fix `air_example_dolly_v2_lightning_fsdp_finetuning` with large cpu mem head node (#39263)

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

* Remove all "ray.init" cell output in the example notebooks. (#39283)

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

* [train] New persistence mode: Add storage type telemetry (#39286)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* [air-doc] Rework experiment tracking docs for Torch trainer. (#38684)

* Rework experiment tracking for DDP trainer.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* [docs] add Developer Guides landing page (#39296)

Signed-off-by: Matthew Deng <matt@anyscale.com>

* [2.7][Doc] Clean up more Ray Train examples (#39284)

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

* disable Train examples with authentication for now. (#39358)

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

* [air] move doc_code and examples into respective libraries (#39298)

Signed-off-by: Matthew Deng <matt@anyscale.com>

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Signed-off-by: Matthew Deng <matt@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Yunxuan Xiao <yunxuanx@anyscale.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
harborn pushed a commit to harborn/ray that referenced this pull request Sep 8, 2023
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
…#39286)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
…#39286)

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants