Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tensorboard] Write summaries to S3 or GCS bucket #24468

Open
eisenjulian opened this issue Aug 16, 2019 · 16 comments
Open

[Tensorboard] Write summaries to S3 or GCS bucket #24468

eisenjulian opened this issue Aug 16, 2019 · 16 comments
Assignees
Labels
feature A request for a proper, new feature. module: tensorboard triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@eisenjulian
Copy link

馃殌 Feature

When creating a SummaryWriter, specifying a path in an S3 or GCP bucket should directly write to the bucket instead of the local filesystem

Motivation

Both in tensorflow and tesorboardX you can specify s3:// or gs:// paths in your logdir, which greatly simplifies distributed training and monitoring, you also can launch tensorbaord directly from your local machine, or a notebook pointing directly to the bucket, which means no need to launch a machine to share results, just the URL to the results inside the bucket

Additional context

tensorboardX implementation https://github.com/lanpa/tensorboardX/blob/master/tensorboardX/record_writer.py#L57

@jeffreyksmithjr jeffreyksmithjr added feature A request for a proper, new feature. module: tensorboard triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 19, 2019
@lanpa
Copy link
Collaborator

lanpa commented Nov 4, 2019

lanpa/tensorboardX#528

@lanpa
Copy link
Collaborator

lanpa commented Nov 4, 2019

@orionr @sanekmelnikov Should we merge these two in 1.4?

@orionr
Copy link
Contributor

orionr commented Nov 4, 2019

The torch.utils.tensorboard implementation uses a writer in core TensorBoard that supports GCS and S3 if TF is installed and S3 if not installed. Adding GCS in that second case would be great. You would need add a new GCSFileSystem around https://github.com/tensorflow/tensorboard/blob/master/tensorboard/compat/tensorflow_stub/io/gfile.py#L206

@amatsukawa
Copy link

@orionr if I do have tensorflow installed, what steps should I follow to make this work? Just simply having it installed doesn't seem to to do the trick, the gs://proj/... url is just interpreted as the path gs: -> proj -> ... on local disk.

@orionr
Copy link
Contributor

orionr commented Mar 16, 2020

@amatsukawa, is everything installed in the same conda environment or virtualenv? You should be able to confirm that TensorBoard is returning tensorflow at https://github.com/tensorflow/tensorboard/blob/master/tensorboard/compat/__init__.py#L52

@amatsukawa
Copy link

After some tinkering, this seems to only work if you have tensorflow 2.1 installed, but not with tensorflow 1.14, which is what I had.

@amatsukawa
Copy link

Just for folks encountering this later. After further experimentation, it seems this will work with the last v1 release of TF as well (v1.15), if you prefer to have TF1.

@JulianFerry
Copy link

Would it be possible to have a version of this which doesn't require TensorFlow to be installed? Maybe an implementation with google-cloud-storage, since this is considerably lighter than TF. The existence of a backend could be checked when torch.utils.tensorboard is imported, for instance. What do you think?

@LarsDu
Copy link

LarsDu commented Mar 22, 2023

Any progress on this? Having to install tensorflow solely to get logging to GCS is a bit ridiculous, yet somehow S3 is supported out of the box

@orionr
Copy link
Contributor

orionr commented Mar 22, 2023

I think if anybody is willing to do the work as detailed here, we'd be happy to take some PRs:

The torch.utils.tensorboard implementation uses a writer in core TensorBoard that supports GCS and S3 if TF is installed and S3 if not installed. Adding GCS in that second case would be great. You would need add a new GCSFileSystem around https://github.com/tensorflow/tensorboard/blob/master/tensorboard/compat/tensorflow_stub/io/gfile.py#L206

cc @Reubend

@tarrade
Copy link

tarrade commented Mar 25, 2023

@LarsDu, I was having the same issue with many tools that don't support S3 or GCS bucket. The solution I found on GCP is to use gcsfuse gfuse so GCS bucket is seen as a local directory.

Many GCP tool already have gcsfuse pre-installed (Vertex AI pipeline...). I don't see performances issue when logging summaries for Tensorboard. I can imagine that the same exist for S3 on AWS.

@LarsDu
Copy link

LarsDu commented Mar 25, 2023

I wound up using tensorboardX for logging since it's much lighter weight than having a Tensorboard install with it's many problematic dependencies

@LarsDu
Copy link

LarsDu commented Apr 6, 2023

Correction: tensorboardX also has problematic dependencies. I gave up and am trying to include tensorflow in my project, but this makes stakeholders incredibly nervous

@orionr
Copy link
Contributor

orionr commented Apr 6, 2023

gfuse as @tarrade called out looks like a great option here @LarsDu

@LarsDu
Copy link

LarsDu commented Apr 7, 2023

gcsfuse is even more problematic there are permissions settings that need to be considered for the kubernetes cluster from which we are running experimentation. Ultimately, I ended up simply installing tensorboard + tensorflow in our project.

I've filed this issue with tensorboard: tensorflow/tensorboard#6298

@Reubend
Copy link

Reubend commented Apr 7, 2023

gcsfuse is probably not something we should include here since it's so specific to GCP, but PyTorch Lightning recently implemented fsspec inside of TB:
https://lightning.ai/docs/pytorch/stable/common/remote_fs.html
Fsspec gives you support for S3, Azure, GCP, etc "for free" by providing a generic interface to all of them. Maybe we could move the Lightning implementation here instead of it being a Lightning-specific feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: tensorboard triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants