Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session run is gray in tensorboard->graph, and unknown device #26

Closed
elulue opened this issue Aug 17, 2018 · 2 comments
Closed

session run is gray in tensorboard->graph, and unknown device #26

elulue opened this issue Aug 17, 2018 · 2 comments

Comments

@elulue
Copy link

elulue commented Aug 17, 2018

Hi,
Branch py3 working fine on my PC, I use ubuntu18.04, py3.5 and tensorflow 1.10 with singal video card Nvidia 1070.
I see my GPU usage is around 30% while most video card memory been occupied during training. I'd like to see if there's room to improve the performance so goto tensorboard.

But the device is unknown when I check it in tensorboard->graph, also could not see compute time.
Could you pls kindly let me know if any tip to fix it ?
Thanks a lot.

image

@elulue
Copy link
Author

elulue commented Aug 17, 2018

btw, if use tfdbg, I can see -
"WARNING:tensorflow:Failed to load partition graphs for device /job:localhost/replica:0/task:0/device:CPU:0 from disk. As a fallback, the client graphs will be used. This may cause mismatches in device names."

@elulue
Copy link
Author

elulue commented Aug 18, 2018

I've found the root cause, tensorboard don't record compute time etc by default.
I updated below and working for me -


                if np.mod(global_step, show_every_n_step) == 1:
                    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
                    run_metadata = tf.RunMetadata()

                    train_loss, train_mse, _, train_merged_sum = self.sess.run(
                        [self.loss, self.mse, self.optim, self.summary], train_data_feed,
                        options=run_options, run_metadata=run_metadata)

                    self.writer.add_run_metadata(run_metadata, 'step{}'.format(global_step))
                    self.writer.add_summary(train_merged_sum, global_step=global_step)

                else:
                    train_loss, train_mse, _, train_merged_sum = self.sess.run(
                        [self.loss, self.mse, self.optim, self.summary], train_data_feed)
                    self.writer.add_summary(train_merged_sum, global_step=global_step)

@elulue elulue closed this as completed Aug 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant