-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Pytorch Edge] Support profiling kineto events from external source #64397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 3a28ee8 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…al source" Summary: This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D30710710](https://our.internmc.facebook.com/intern/diff/D30710710) Here is a chrome trace output for example delegated backend  [ghstack-poisoned]
Summary: This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 11526ec Pull Request resolved: #64397
@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thanks
Though somebody from Kineto should review the Kineto changes.
ctx_ptr->startThreadId = at::RecordFunction::currentThreadId(); | ||
ctx_ptr->debug_handle = debug_handle; | ||
|
||
/* no support for input shapes now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly to avoid overhead of shape profiling.
} | ||
std::string evt_name(fn.name().str()); | ||
auto end_time = getTimeUs(); | ||
auto end_time = ctx->endUS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assume that this is always set?
} | ||
|
||
state_ptr->reportClientActivity(fn, kineto_ctx_ptr); | ||
kineto_ctx_ptr->endUS = getTimeUs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so that reportClientActivity
can assume the start and end time are already encoded in ctx. When backend's report the runtime per op they will use indirectly use reportClientActivity
. At the time of recording runtime of op, backends already have the duration/start and end time.
#endif // USE_KINETO | ||
}; | ||
|
||
TORCH_API void reportBackendEventToActiveKinetoProfiler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment / documentation.
if (!state_ptr) { | ||
return; | ||
} | ||
auto ctx_ptr = std::make_unique<KinetoObserverContext>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is fine for the moment, I think we can optimize this a lot - it seems unnecessary to create the KinetoObserverContext in this case. In fact we can probably remove it from the profiler altogether...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know if you meant for the backend event reporting only or generally in the profiler itself but I completely agree. I thought the same that we dont need the ObserverContext. My understanding was that the observer context was used by callback where enter callback construct one, push some info and return the context. Then record_function calls exit callback with the context which populates more information. For the purposes of the API here, it is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some nits but in general this looks fine for now.
Definitely need to have a follow-up refactoring but that applies to the entire profiler.
…al source" Summary: This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D30710710](https://our.internmc.facebook.com/intern/diff/D30710710) Here is a chrome trace output for example delegated backend  [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
Summary: This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c6e9359 Pull Request resolved: #64397
@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…al source" Summary: This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D30710710](https://our.internmc.facebook.com/intern/diff/D30710710) Here is a chrome trace output for example delegated backend  [ghstack-poisoned]
Summary: This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3942222 Pull Request resolved: #64397
Reverting this PR as it failed the bazel test (PR signal was red as well) on HUD |
This pull request has been reverted by c62ed96. To re-land this change, follow these steps. |
Is the failing build trigerred on the PRs? I remember seeing all green for this PR's CI. |
Stack from ghstack:
Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.
Test Plan:
test_lite_interpreter
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D30710710
Here is a chrome trace output for example delegated backend
