Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.utils.tensorboard.SummaryWriter should support s3 and other remote paths #36056

Closed
f4hy opened this issue Apr 6, 2020 · 6 comments
Closed

Comments

@f4hy
Copy link

f4hy commented Apr 6, 2020

馃殌 Feature

SummaryWriter current takes a logdir however only local filesystem paths currently work. Ideally one could have these logs written directly to a remote storage such as s3 or hdfs. Currently passing an s3 path such as SummaryWriter("s3://mybucket/") will create a folder in the working directory titled s3: and nest folder mybucket in that.

Motivation

Training jobs are often run in containers that do not persist in the local file system of the job after it is done. This requires all jobs to have logic to upload the resulting tensorboard logs to somewhere remote. In addition, the local file system where the training is occurring may not be accessible so a tensorboard instance can not access those logs. If many experiments are running and one wishes to use tensorboard to compare the results the logs must be copied to external storage periodically. Writing directly to some remote storage would simplify this and ensure logs are always captured.

Pitch

Ideally specifying a path like s3://mybucket/ or hdfs://user/folder or other remote storage uris would simply work in the logdir parameter. However, there are often credentials and other issues with connecting to remote storage which can complicate things. So instead allow a new parameter to take a remote writer class which conforms to some API that the user could control how files are written.

Alternatives

Alternatively, there could be some process which the logdir could be automatically synced to a remote storage location, so the logging would happen the same way it does now, but a helper thread would sync those logs to the remote storage every N seconds.

@f4hy
Copy link
Author

f4hy commented Apr 6, 2020

Note that tensorboardX does support this for both s3 and gcs so we could mirror the implementation there.

@f4hy
Copy link
Author

f4hy commented Apr 6, 2020

Nevermind. Using a fully updated tensorboard allowed for writing directly to s3 as the writer in tensorboard supports this directly.

@f4hy f4hy closed this as completed Apr 6, 2020
@Dhaval08
Copy link

Hello @f4hy
Could you please share the code snippet you used to directly write tensorboard logs to s3?
Thank you.

@vergilus
Copy link

hoping for HDFS support. because tensorboard supports direct HDFS access.

@PeterL1n
Copy link
Contributor

hoping for HDFS support. because tensorboard supports direct HDFS access.

HDFS is supported! You need to pip install both tensorflow and tensorboard. TensorFlow is needed to enable remote logging.

@yl-to
Copy link
Contributor

yl-to commented Jan 30, 2023

Tried summarywriter using s3 path, its still not working.

tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 's3' not implemented

Can anyone help here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants