Skip to content

Conversation

@guotuofeng
Copy link
Contributor

merge 0.2.1 bug fixes into master

guotuofeng and others added 13 commits June 17, 2021 14:19
* remove the Record Window End event if the duration is larger than 1 day to workaround the high resolution clock bug in pytorch profiler

* remove debug print
update doc and tooltip
* add notes of python version in case of circular import bug

* update wording
* workaround for negative gpu metrics from input json file

* refine
This commit makes `walk`-ing directories follow symlinks when searching
for run data (on local filesystems, where it's supported!).

This makes the plugin's search behavior consistent with that of
tensorboard itself; using symlink trees to organize runs is one of the
recommendations made in the tensorboard docs to have fine-grained
control over the naming of runs and the location of the data [1]:

> TensorBoard walks log directories recursively; for finer-grained
> control, prefer using a symlink tree.

A unit test is added to validate the new behavior.

[1] https://github.com/tensorflow/tensorboard/blob/master/README.md#logdir--logdir_spec-legacy-mode
* return loading run status to frontend to fix a couple of bugs

* Add message when logdir has no runs (#343)

* Add message when logdir has no runs

* Correct Typography import

* fix test failure

* remove dead code

Co-authored-by: TomWildenhain-Microsoft <67606533+TomWildenhain-Microsoft@users.noreply.github.com>
For example, a kernel with only one thread per block can’t fully utilize each SM.

* Est. Achieved Occupancy: The bigger, the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm).
* Est. Achieved Occupancy: For most cases such as memory bandwidth bounded kernels, the higher the better. [Reference](http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0514-GTC2012-GPU-Performance-Analysis.pdf). The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvements in performance does not normally scale linearly after some point, but not sure how to best express this in a clear way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will consider more clear description in next release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to fix a couple of linter warnings so I changed this as well to "a higher value often translates to better performance, especially when the starting value is very low." Sounds OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is ok

on_trace_ready=torch.profiler.tensorboard_trace_handler('./result', worker_name='worker0'),
record_shapes=True,
profile_memory=True,
profile_memory=True, # This will take 1 to 2 minutes. Setting it to False could greatly speedup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we also add a similar comment to with_stack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do in next release.

@facebook-github-bot
Copy link
Contributor

@gdankel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@gdankel merged this pull request in c0bc598.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants