Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better way to store Tensorboard logs in Keepsake #580

Open
andreasjansson opened this issue Mar 19, 2021 · 3 comments
Open

Better way to store Tensorboard logs in Keepsake #580

andreasjansson opened this issue Mar 19, 2021 · 3 comments
Labels
type/roadmap High-level goals. https://github.com/replicate/replicate/projects/1

Comments

@andreasjansson
Copy link
Member

At the moment you have to put Tensorboard logs in the path in experiment.checkpoint. This has two main drawbacks:

  • Checkpoint sizes grow with each checkpoint as TB logs expand
  • To view the Tensorboard logs for the whole experiment you have to check out the last checkpoint

We should probably have some way of tracking experiment-level files that change during the course of the experiment, that aren't necessarily tied to specific checkpoints.

This could apply to other use cases than Tensorboard as well.

@andreasjansson andreasjansson added the type/roadmap High-level goals. https://github.com/replicate/replicate/projects/1 label Mar 19, 2021
@sjakkampudi
Copy link

sjakkampudi commented Mar 19, 2021

One potential approach to handle this issue would be to add a post-experiment path parameter in experiment.init in addition to the current (pre-experiment) path parameter. Currently the path parameter uploads prior to the experiment being run. The idea would be to have the new parameter upload once the experiment is finished running. Additionally, I think it would be good, but not absolutely necessary, to trigger that upload if the experiment stops "early" for whatever reason. Maybe a keyboard interrupt or some other forced early-stopping.

@pseeth
Copy link

pseeth commented Mar 24, 2021

Been keeping an eye on keepsake for a few days now, and this would definitely be a great feature that would get me to go all in! Especially if there's a way to monitor experiments from multiple computers in a single Tensorboard, by syncing them. It'd be nice if Tensorboard logs could just be synced along with the checkpoints, rather than having one stored at each one. Then, perhaps you could access a synced folder (which all experiments can access) through keepsake like keepsake.logs or something along those lines, sync that folder to some machine periodically, and run Tensorboard on that synced folder.

@andreasjansson
Copy link
Member Author

One potential design for this could be something along the lines of:

experiment = keepsake.init(path=".", params=..., logs_path="./tensorboard-logs/")

where logs_path is stored in the experiment and synced to the same remote directory on each experiment.checkpoint(). As opposed to the checkpoint path which is uploaded to a new checkpoint directory each time.

This could create a logs folder somewhere in the remote storage directory. On each experiment.checkpoint() the local logs_path is uploaded to remote logs folder that is shared across the experiment for all checkpoints. Either that could naively just upload the entire directory on each checkpoint, or it could be smarter and sync based on file hashes (though that's probably an optimization that could be added later).

What do you think @bfirsh?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/roadmap High-level goals. https://github.com/replicate/replicate/projects/1
Projects
None yet
Development

No branches or pull requests

3 participants