Skip to content

Conversation

kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Sep 1, 2021

Stack from ghstack:

Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D30710710

Here is a chrome trace output for example delegated backend
image

Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Sep 1, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 1, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 3a28ee8 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-py3.6-gcc7-bazel-test / build-and-test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-10-07T22:54:20.9448601Z collect2: error: ld returned 1 exit status
2021-10-07T22:54:20.9435737Z �[1A�[K
2021-10-07T22:54:20.9436053Z �[1A�[K
2021-10-07T22:54:20.9436349Z �[1A�[K
2021-10-07T22:54:20.9436660Z �[1A�[K
2021-10-07T22:54:20.9436962Z �[1A�[K
2021-10-07T22:54:20.9437272Z �[1A�[K
2021-10-07T22:54:20.9437576Z �[1A�[K
2021-10-07T22:54:20.9439859Z �[1A�[Kbazel-out/k8-fastbuild/bin/_objs/jit_tests/test_backend_compiler_lib.pic.o:test_backend_compiler_lib.cpp:function torch::jit::BackendWithCompiler::execute(c10::IValue, c10::List<c10::IValue>): error: undefined reference to 'torch::jit::mobile::getCurrentEdgeProfiler()'
2021-10-07T22:54:20.9442293Z bazel-out/k8-fastbuild/bin/_objs/jit_tests/test_backend_compiler_lib.pic.o:test_backend_compiler_lib.cpp:function torch::jit::BackendWithCompiler::execute(c10::IValue, c10::List<c10::IValue>): error: undefined reference to 'torch::jit::mobile::getCurrentEdgeProfiler()'
2021-10-07T22:54:20.9446243Z bazel-out/k8-fastbuild/bin/_objs/jit_tests/test_backend_compiler_lib.pic.o:test_backend_compiler_lib.cpp:function torch::jit::BackendWithCompiler::execute(c10::IValue, c10::List<c10::IValue>): error: undefined reference to 'torch::jit::mobile::KinetoEdgeCPUProfiler::recordBackendEvent(long, long, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
2021-10-07T22:54:20.9448601Z collect2: error: ld returned 1 exit status
2021-10-07T22:54:20.9449473Z �[32m[2,855 / 2,863]�[0m 4 / 25 tests;�[0m 7 actions running�[0m; last test: �[32m//:memory_test�[0m
2021-10-07T22:54:20.9450300Z     Linking dataloader_test; 4s processwrapper-sandbox
2021-10-07T22:54:20.9451320Z     Testing //:enum_test; 2s processwrapper-sandbox
2021-10-07T22:54:20.9452425Z     Testing //:ordered_dict_test; 2s processwrapper-sandbox
2021-10-07T22:54:20.9453325Z     Testing //:misc_test; 2s processwrapper-sandbox
2021-10-07T22:54:20.9454042Z     Testing //:integration_test; 2s processwrapper-sandbox
2021-10-07T22:54:20.9454842Z     Compiling test/cpp/api/tensor.cpp; 2s processwrapper-sandbox
2021-10-07T22:54:20.9456113Z     Testing //:modules_test; 2s processwrapper-sandbox
2021-10-07T22:54:21.1899018Z 
2021-10-07T22:54:21.1900365Z �[1A�[K

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@albanD albanD removed their request for review September 1, 2021 21:28
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…al source"


Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D30710710](https://our.internmc.facebook.com/intern/diff/D30710710)

Here is a chrome trace output for example delegated backend
![image](https://user-images.githubusercontent.com/17488981/131749023-71062b4c-9d1f-428a-9bb1-6bd45b02d194.png)


[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 1, 2021
Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 11526ec
Pull Request resolved: #64397
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@raziel raziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks

Though somebody from Kineto should review the Kineto changes.

ctx_ptr->startThreadId = at::RecordFunction::currentThreadId();
ctx_ptr->debug_handle = debug_handle;

/* no support for input shapes now?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why's this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly to avoid overhead of shape profiling.

}
std::string evt_name(fn.name().str());
auto end_time = getTimeUs();
auto end_time = ctx->endUS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that this is always set?

}

state_ptr->reportClientActivity(fn, kineto_ctx_ptr);
kineto_ctx_ptr->endUS = getTimeUs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that reportClientActivity can assume the start and end time are already encoded in ctx. When backend's report the runtime per op they will use indirectly use reportClientActivity. At the time of recording runtime of op, backends already have the duration/start and end time.

#endif // USE_KINETO
};

TORCH_API void reportBackendEventToActiveKinetoProfiler(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment / documentation.

if (!state_ptr) {
return;
}
auto ctx_ptr = std::make_unique<KinetoObserverContext>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is fine for the moment, I think we can optimize this a lot - it seems unnecessary to create the KinetoObserverContext in this case. In fact we can probably remove it from the profiler altogether...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know if you meant for the backend event reporting only or generally in the profiler itself but I completely agree. I thought the same that we dont need the ObserverContext. My understanding was that the observer context was used by callback where enter callback construct one, push some info and return the context. Then record_function calls exit callback with the context which populates more information. For the purposes of the API here, it is not needed.

Copy link
Contributor

@gdankel gdankel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some nits but in general this looks fine for now.
Definitely need to have a follow-up refactoring but that applies to the entire profiler.

…al source"


Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D30710710](https://our.internmc.facebook.com/intern/diff/D30710710)

Here is a chrome trace output for example delegated backend
![image](https://user-images.githubusercontent.com/17488981/131749023-71062b4c-9d1f-428a-9bb1-6bd45b02d194.png)


[ghstack-poisoned]
@pytorch-probot
Copy link

pytorch-probot bot commented Oct 5, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/3a28ee8661a33b29f0f2a1fad65d90b0c9529f00/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/slow, ciflow/slow-gradcheck ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

kimishpatel added a commit that referenced this pull request Oct 5, 2021
Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: c6e9359
Pull Request resolved: #64397
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…al source"


Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D30710710](https://our.internmc.facebook.com/intern/diff/D30710710)

Here is a chrome trace output for example delegated backend
![image](https://user-images.githubusercontent.com/17488981/131749023-71062b4c-9d1f-428a-9bb1-6bd45b02d194.png)


[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Oct 7, 2021
Summary:
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 3942222
Pull Request resolved: #64397
@janeyx99
Copy link
Contributor

janeyx99 commented Oct 9, 2021

Reverting this PR as it failed the bazel test (PR signal was red as well) on HUD

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by c62ed96. To re-land this change, follow these steps.

@kimishpatel
Copy link
Contributor Author

Reverting this PR as it failed the bazel test (PR signal was red as well) on HUD

Is the failing build trigerred on the PRs? I remember seeing all green for this PR's CI.

@facebook-github-bot facebook-github-bot deleted the gh/kimishpatel/78/head branch October 12, 2021 14:18
@janeyx99
Copy link
Contributor

Yes, it is. On your diff and in the HUD PR page, I see:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants