Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBE logging - efficiently enable 'always on' cache printing #2378

Closed
wants to merge 2 commits into from

Conversation

damianr99
Copy link

Summary:
When 'training.tbe.gather_uvm_cache_stats=True', always log UVM cache
activity. We decrease logging frequency using an exponential schedule
as training proceeds. This keeps log overhead minimal while still
providing insight into whether cache hit rates become non-stationary
over large training periods.

Reviewed By: sryap

Differential Revision: D54181312

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54181312

Copy link

netlify bot commented Mar 2, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 171ebcd
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/65e7610ecd58b8000820b255
😎 Deploy Preview https://deploy-preview-2378--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

damianr99 pushed a commit to damianr99/FBGEMM that referenced this pull request Mar 2, 2024
…2378)

Summary:

When 'training.tbe.gather_uvm_cache_stats=True', always log UVM cache
activity. We decrease logging frequency using an exponential schedule
as training proceeds. This keeps log overhead minimal while still
providing insight into whether cache hit rates become non-stationary
over large training periods.

Reviewed By: sryap

Differential Revision: D54181312
damianr99 pushed a commit to damianr99/FBGEMM that referenced this pull request Mar 5, 2024
…2378)

Summary:

When 'training.tbe.gather_uvm_cache_stats=True', always log UVM cache
activity. We decrease logging frequency using an exponential schedule
as training proceeds. This keeps log overhead minimal while still
providing insight into whether cache hit rates become non-stationary
over large training periods.

Reviewed By: sryap

Differential Revision: D54181312
damianr99 pushed a commit to damianr99/FBGEMM that referenced this pull request Mar 5, 2024
…2378)

Summary:

When 'training.tbe.gather_uvm_cache_stats=True', always log UVM cache
activity. We decrease logging frequency using an exponential schedule
as training proceeds. This keeps log overhead minimal while still
providing insight into whether cache hit rates become non-stationary
over large training periods.

Reviewed By: sryap

Differential Revision: D54181312
Damian Reeves added 2 commits March 5, 2024 10:13
Summary:

Make it possible to interpret TBE logs when there are multiple TBEs on
a rank. This diff (1) prints the table names contained in a TBE, (2)
prefixes all TBE logging with a unique id per TBE instance.

We use a uuid, so when there are many TBEs across many ranks, they 
will all be unique.

Reviewed By: carlbunny, henrylhtsang

Differential Revision: D54181313
…2378)

Summary:

When 'training.tbe.gather_uvm_cache_stats=True', always log UVM cache
activity. We decrease logging frequency using an exponential schedule
as training proceeds. This keeps log overhead minimal while still
providing insight into whether cache hit rates become non-stationary
over large training periods.

Reviewed By: sryap

Differential Revision: D54181312
damianr99 pushed a commit to damianr99/FBGEMM that referenced this pull request Mar 5, 2024
…2378)

Summary:

When 'training.tbe.gather_uvm_cache_stats=True', always log UVM cache
activity. We decrease logging frequency using an exponential schedule
as training proceeds. This keeps log overhead minimal while still
providing insight into whether cache hit rates become non-stationary
over large training periods.

Reviewed By: sryap

Differential Revision: D54181312
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54181312

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54181312

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f039544.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants