Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] tune autogenned dir names use commas, this messes with tensorboard #1580

Closed
vlad17 opened this issue Feb 22, 2018 · 9 comments
Closed

Comments

@vlad17
Copy link

vlad17 commented Feb 22, 2018

I point tensorboard at the ray_results dir which has my grid search on head to view intermediate progress. This ends up with double entries which are empty, likely due to tensorboard being confused by commas in the dirnames that tune creates:

screenshot from 2018-02-21 21-03-47

In the plot above, there is no red curve, for example, only a teal one.

@richardliaw
Copy link
Contributor

Maybe "+" would be ok replacement?

@ericl
Copy link
Contributor

ericl commented Feb 22, 2018

Are you sure it isn't due to the nested directories? Commas seem to work fine for me.

@vlad17
Copy link
Author

vlad17 commented Feb 22, 2018

What do you mean by the nested directories? Here's what that directory looks like:

(tensorflow_p36) ubuntu@ip-172-31-26-124:~/ray_results/swim-notdk$ tree ray_train_1_exp_name\=seed-5\,seed\=5_2018-02-22_04-57-57s3n2cda4/
ray_train_1_exp_name=seed-5,seed=5_2018-02-22_04-57-57s3n2cda4/
├── data
│   └── seed-5_swimmer
│       ├── 5
│       │   ├── 100
│       │   │   ├── openaigym.episode_batch.0.107795.stats.json
│       │   │   ├── openaigym.manifest.0.107795.manifest.json
│       │   │   ├── openaigym.video.0.107795.video000000.meta.json
│       │   │   └── openaigym.video.0.107795.video000000.mp4
│       │   ├── 100000
│       │   │   ├── openaigym.episode_batch.1.107795.stats.json
│       │   │   ├── openaigym.manifest.1.107795.manifest.json
│       │   │   ├── openaigym.video.1.107795.video000000.meta.json
│       │   │   └── openaigym.video.1.107795.video000000.mp4
│       │   ├── 200000
│       │   │   ├── openaigym.episode_batch.2.107795.stats.json
│       │   │   ├── openaigym.manifest.2.107795.manifest.json
│       │   │   ├── openaigym.video.2.107795.video000000.meta.json
│       │   │   └── openaigym.video.2.107795.video000000.mp4
│       │   ├── 300000
│       │   │   ├── openaigym.episode_batch.3.107795.stats.json
│       │   │   ├── openaigym.manifest.3.107795.manifest.json
│       │   │   ├── openaigym.video.3.107795.video000000.meta.json
│       │   │   └── openaigym.video.3.107795.video000000.mp4
│       │   ├── 400000
│       │   │   ├── openaigym.episode_batch.4.107795.stats.json
│       │   │   ├── openaigym.manifest.4.107795.manifest.json
│       │   │   ├── openaigym.video.4.107795.video000000.meta.json
│       │   │   └── openaigym.video.4.107795.video000000.mp4
│       │   ├── checkpoints
│       │   │   ├── checkpoint
│       │   │   ├── ddpg.ckpt-00000100.data-00000-of-00001
│       │   │   ├── ddpg.ckpt-00000100.index
│       │   │   ├── ddpg.ckpt-00000100.meta
│       │   │   ├── ddpg.ckpt-00250000.data-00000-of-00001
│       │   │   ├── ddpg.ckpt-00250000.index
│       │   │   ├── ddpg.ckpt-00250000.meta
│       │   │   ├── dynamics.ckpt-00000100.data-00000-of-00001
│       │   │   ├── dynamics.ckpt-00000100.index
│       │   │   ├── dynamics.ckpt-00000100.meta
│       │   │   ├── dynamics.ckpt-00250000.data-00000-of-00001
│       │   │   ├── dynamics.ckpt-00250000.index
│       │   │   ├── dynamics.ckpt-00250000.meta
│       │   │   ├── persistable_dataset.ckpt-00000100.data-00000-of-00001
│       │   │   ├── persistable_dataset.ckpt-00000100.index
│       │   │   ├── persistable_dataset.ckpt-00000100.meta
│       │   │   ├── persistable_dataset.ckpt-00250000.data-00000-of-00001
│       │   │   ├── persistable_dataset.ckpt-00250000.index
│       │   │   └── persistable_dataset.ckpt-00250000.meta
│       │   ├── events.out.tfevents.1519275510.ip-172-31-26-124
│       │   └── params.json
│       └── starttime.txt
├── events.out.tfevents.1519275510.ip-172-31-26-124
├── params.json
├── progress.csv
└── result.json

9 directories, 46 files

also note that the commas need to be escaped in bash, so maybe that's reason enough to avoid them, unless that breaks a lot of things.

From the structure it looks like the extra tensorboard item is caused by some extra events file -- is that by tune?

@richardliaw
Copy link
Contributor

richardliaw commented Feb 22, 2018

yeah, the events file and result.json, progress.csv is from tune. How are you logging your own things (as in, what file path are you using)?

@vlad17
Copy link
Author

vlad17 commented Feb 22, 2018

i'm writing my own TF file into data/ . If tune starts plotting things I put in info (I think there's an issue for this), then I suppose there is no issue here and we can close this.

@vlad17 vlad17 closed this as completed Feb 22, 2018
@richardliaw
Copy link
Contributor

richardliaw commented Feb 22, 2018

Yeah the corresponding PR is #1567 ; awesome.

@vlad17
Copy link
Author

vlad17 commented Feb 22, 2018

Ah, actually I just realized an issue with this @richardliaw . If I have both my own tensorboard (which presumably logs a lot of metrics, more than what I give you in info) and i point tensorboard logdir at ray_results, it will show both curves, which is a little annoying. The bigger issue with commas in filenames is that TB's logdir argument separates directories with commas itself.

So I can't do , e.g., tensorboard --logdir ray_results/experiment/ray_train_1_exp_name=seed-5,seed=5_2018-02-22_04-57-57s3n2cda4/. I think this is a reasonable thing to want to do, so maybe some other separators really are needed. @ericl , thoughts?

@ericl
Copy link
Contributor

ericl commented Feb 24, 2018

Hm I think you can quote the paths, e.g. tensorboard --logdir='/tmp','/proc'

@vlad17
Copy link
Author

vlad17 commented Feb 24, 2018

sg, it's moot anyway if at some point you autolaunch tb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants