-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.utils.tensorboard.SummaryWriter should support s3
and other remote paths
#36056
Comments
Nevermind. Using a fully updated tensorboard allowed for writing directly to |
Hello @f4hy |
hoping for HDFS support. because tensorboard supports direct HDFS access. |
HDFS is supported! You need to pip install both tensorflow and tensorboard. TensorFlow is needed to enable remote logging. |
Tried summarywriter using s3 path, its still not working.
Can anyone help here? |
馃殌 Feature
SummaryWriter current takes a logdir however only local filesystem paths currently work. Ideally one could have these logs written directly to a remote storage such as s3 or hdfs. Currently passing an s3 path such as
SummaryWriter("s3://mybucket/")
will create a folder in the working directory titleds3:
and nest foldermybucket
in that.Motivation
Training jobs are often run in containers that do not persist in the local file system of the job after it is done. This requires all jobs to have logic to upload the resulting tensorboard logs to somewhere remote. In addition, the local file system where the training is occurring may not be accessible so a tensorboard instance can not access those logs. If many experiments are running and one wishes to use tensorboard to compare the results the logs must be copied to external storage periodically. Writing directly to some remote storage would simplify this and ensure logs are always captured.
Pitch
Ideally specifying a path like
s3://mybucket/
orhdfs://user/folder
or other remote storage uris would simply work in the logdir parameter. However, there are often credentials and other issues with connecting to remote storage which can complicate things. So instead allow a new parameter to take a remote writer class which conforms to some API that the user could control how files are written.Alternatives
Alternatively, there could be some process which the logdir could be automatically synced to a remote storage location, so the logging would happen the same way it does now, but a helper thread would sync those logs to the remote storage every
N
seconds.The text was updated successfully, but these errors were encountered: