-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tensorboard] Write summaries to S3 or GCS bucket #24468
Comments
@orionr @sanekmelnikov Should we merge these two in 1.4? |
The |
@orionr if I do have tensorflow installed, what steps should I follow to make this work? Just simply having it installed doesn't seem to to do the trick, the |
@amatsukawa, is everything installed in the same conda environment or virtualenv? You should be able to confirm that TensorBoard is returning |
After some tinkering, this seems to only work if you have tensorflow 2.1 installed, but not with tensorflow 1.14, which is what I had. |
Just for folks encountering this later. After further experimentation, it seems this will work with the last v1 release of TF as well (v1.15), if you prefer to have TF1. |
Would it be possible to have a version of this which doesn't require TensorFlow to be installed? Maybe an implementation with |
Any progress on this? Having to install tensorflow solely to get logging to GCS is a bit ridiculous, yet somehow S3 is supported out of the box |
I think if anybody is willing to do the work as detailed here, we'd be happy to take some PRs: The torch.utils.tensorboard implementation uses a writer in core TensorBoard that supports GCS and S3 if TF is installed and S3 if not installed. Adding GCS in that second case would be great. You would need add a new GCSFileSystem around https://github.com/tensorflow/tensorboard/blob/master/tensorboard/compat/tensorflow_stub/io/gfile.py#L206 cc @Reubend |
@LarsDu, I was having the same issue with many tools that don't support S3 or GCS bucket. The solution I found on GCP is to use gcsfuse gfuse so GCS bucket is seen as a local directory. Many GCP tool already have gcsfuse pre-installed (Vertex AI pipeline...). I don't see performances issue when logging summaries for Tensorboard. I can imagine that the same exist for S3 on AWS. |
I wound up using tensorboardX for logging since it's much lighter weight than having a Tensorboard install with it's many problematic dependencies |
Correction: tensorboardX also has problematic dependencies. I gave up and am trying to include tensorflow in my project, but this makes stakeholders incredibly nervous |
gcsfuse is even more problematic there are permissions settings that need to be considered for the kubernetes cluster from which we are running experimentation. Ultimately, I ended up simply installing I've filed this issue with tensorboard: tensorflow/tensorboard#6298 |
gcsfuse is probably not something we should include here since it's so specific to GCP, but PyTorch Lightning recently implemented fsspec inside of TB: |
馃殌 Feature
When creating a SummaryWriter, specifying a path in an S3 or GCP bucket should directly write to the bucket instead of the local filesystem
Motivation
Both in tensorflow and tesorboardX you can specify s3:// or gs:// paths in your logdir, which greatly simplifies distributed training and monitoring, you also can launch tensorbaord directly from your local machine, or a notebook pointing directly to the bucket, which means no need to launch a machine to share results, just the URL to the results inside the bucket
Additional context
tensorboardX implementation https://github.com/lanpa/tensorboardX/blob/master/tensorboardX/record_writer.py#L57
The text was updated successfully, but these errors were encountered: