Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid marking every profile loop stop as Collection stage, use data available to mark errored stages. #977

Closed
wants to merge 1 commit into from

Conversation

sanrise
Copy link
Contributor

@sanrise sanrise commented Aug 15, 2024

Summary:
We already have the data collected to know if the collection was stopped due to collectionDone or stopCollection, the later is only set when CUPTI abruptly stops in events like not finding buffers.

We infact also set this in the itnernal Error Counters, so leverage that functionality within UST logging as well to denote a terminal stage within Kineto.

Differential Revision: D61226939

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61226939

…vailable to mark errored stages. (pytorch#977)

Summary:
Pull Request resolved: pytorch#977

We already have the data collected to know if the collection was stopped due to `collectionDone` or `stopCollection`, the later is only set when CUPTI abruptly stops in events like not finding buffers.

We infact also set this in the itnernal Error Counters, so leverage that functionality within UST logging as well to denote a terminal stage within Kineto.

Reviewed By: aaronenyeshi

Differential Revision: D61226939
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61226939

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 7d5e58f.

staugust pushed a commit to staugust/kineto that referenced this pull request Aug 27, 2024
…vailable to mark errored stages. (pytorch#977)

Summary:
Pull Request resolved: pytorch#977

We already have the data collected to know if the collection was stopped due to `collectionDone` or `stopCollection`, the later is only set when CUPTI abruptly stops in events like not finding buffers.

We infact also set this in the itnernal Error Counters, so leverage that functionality within UST logging as well to denote a terminal stage within Kineto.

Reviewed By: aaronenyeshi

Differential Revision: D61226939

fbshipit-source-id: a4d5fa525d4457d44f0b959e4761b82de160152c
staugust pushed a commit to staugust/kineto that referenced this pull request Aug 27, 2024
…vailable to mark errored stages. (pytorch#977)

Summary:
Pull Request resolved: pytorch#977

We already have the data collected to know if the collection was stopped due to `collectionDone` or `stopCollection`, the later is only set when CUPTI abruptly stops in events like not finding buffers.

We infact also set this in the itnernal Error Counters, so leverage that functionality within UST logging as well to denote a terminal stage within Kineto.

Reviewed By: aaronenyeshi

Differential Revision: D61226939

fbshipit-source-id: a4d5fa525d4457d44f0b959e4761b82de160152c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants