Merge plugin/0.2.1 code into master #345

guotuofeng · 2021-07-09T07:37:32Z

merge 0.2.1 bug fixes into master

* remove the Record Window End event if the duration is larger than 1 day to workaround the high resolution clock bug in pytorch profiler * remove debug print

update doc and tooltip

* add notes of python version in case of circular import bug * update wording

… is available in Tensorboard 2.0 (#323)

* workaround for negative gpu metrics from input json file * refine

This commit makes `walk`-ing directories follow symlinks when searching for run data (on local filesystems, where it's supported!). This makes the plugin's search behavior consistent with that of tensorboard itself; using symlink trees to organize runs is one of the recommendations made in the tensorboard docs to have fine-grained control over the naming of runs and the location of the data [1]: > TensorBoard walks log directories recursively; for finer-grained > control, prefer using a symlink tree. A unit test is added to validate the new behavior. [1] https://github.com/tensorflow/tensorboard/blob/master/README.md#logdir--logdir_spec-legacy-mode

* return loading run status to frontend to fix a couple of bugs * Add message when logdir has no runs (#343) * Add message when logdir has no runs * Correct Typography import * fix test failure * remove dead code Co-authored-by: TomWildenhain-Microsoft <67606533+TomWildenhain-Microsoft@users.noreply.github.com>

gdankel · 2021-07-09T16:46:11Z

tb_plugin/docs/gpu_utilization.md

                      For example, a kernel with only one thread per block can’t fully utilize each SM. 

-* Est. Achieved Occupancy: The bigger, the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). 
+* Est. Achieved Occupancy: For most cases such as memory bandwidth bounded kernels, the higher the better. [Reference](http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0514-GTC2012-GPU-Performance-Analysis.pdf). The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). 


The improvements in performance does not normally scale linearly after some point, but not sure how to best express this in a clear way.

we will consider more clear description in next release.

I had to fix a couple of linter warnings so I changed this as well to "a higher value often translates to better performance, especially when the starting value is very low." Sounds OK?

Yes, it is ok

gdankel · 2021-07-09T16:46:47Z

tb_plugin/examples/resnet50_profiler_api.py

    on_trace_ready=torch.profiler.tensorboard_trace_handler('./result', worker_name='worker0'),
    record_shapes=True,
-    profile_memory=True,
+    profile_memory=True,  # This will take 1 to 2 minutes. Setting it to False could greatly speedup.


I suggest we also add a similar comment to with_stack

will do in next release.

facebook-github-bot · 2021-07-09T16:50:07Z

@gdankel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-12T06:49:14Z

@gdankel merged this pull request in c0bc598.

guotuofeng and others added 13 commits June 17, 2021 14:19

remove the Record Window End event (#298)

a087b66

* remove the Record Window End event if the duration is larger than 1 day to workaround the high resolution clock bug in pytorch profiler * remove debug print

Merge from branch tb_plugin (#303)

d6a5775

fix bug of kernel out of step (#305)

7455c31

Update readme (#306)

29c3c46

update doc and tooltip

rename error word spelling (#307)

3db8e75

fix typo in readme (#312)

ed81003

add notes of python version in case of circular import bug (#315)

50df0db

* add notes of python version in case of circular import bug * update wording

update example to speedup (#316)

d51f607

Fix bugs happens with tensorboard 1.15 related to import errors which…

10ba327

… is available in Tensorboard 2.0 (#323)

workaround for negative gpu metrics from input json file (#330)

e828a2c

* workaround for negative gpu metrics from input json file * refine

Merge branch 'master' into plugin/0.2

0332c84

facebook-github-bot added the cla signed label Jul 9, 2021

gdankel approved these changes Jul 9, 2021

View reviewed changes

facebook-github-bot closed this in c0bc598 Jul 12, 2021

facebook-github-bot added the Merged label Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge plugin/0.2.1 code into master #345

Merge plugin/0.2.1 code into master #345

Uh oh!

guotuofeng commented Jul 9, 2021

Uh oh!

gdankel Jul 9, 2021

Uh oh!

guotuofeng Jul 11, 2021

Uh oh!

gdankel Jul 11, 2021

Uh oh!

guotuofeng Jul 12, 2021

Uh oh!

gdankel Jul 9, 2021

Uh oh!

guotuofeng Jul 11, 2021

Uh oh!

facebook-github-bot commented Jul 9, 2021

Uh oh!

facebook-github-bot commented Jul 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Merge plugin/0.2.1 code into master #345

Merge plugin/0.2.1 code into master #345

Uh oh!

Conversation

guotuofeng commented Jul 9, 2021

Uh oh!

gdankel Jul 9, 2021

Choose a reason for hiding this comment

Uh oh!

guotuofeng Jul 11, 2021

Choose a reason for hiding this comment

Uh oh!

gdankel Jul 11, 2021

Choose a reason for hiding this comment

Uh oh!

guotuofeng Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

gdankel Jul 9, 2021

Choose a reason for hiding this comment

Uh oh!

guotuofeng Jul 11, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 9, 2021

Uh oh!

facebook-github-bot commented Jul 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants