Skip to content

Conversation

guyang3532
Copy link
Contributor

No description provided.

bertmaher and others added 30 commits March 29, 2021 07:43
Summary:
Pull Request resolved: #143

I'm not actually sure there's any potential negative consequence here,
but it's potentially harmful to use an anonymous namespace in a header, since
it creates a unique symbol for each translation unit which can lead to ODR
violations.

I happened to notice a warning for this while building OSS pytorch :-)

Reviewed By: gdankel

Differential Revision: D27383716

fbshipit-source-id: 0fffc93478bc044a543b44011943652dc13c75cd
Summary:
Pull Request resolved: #144

Accidentally used C++17 feature, and PyTorch still only supports C++14.

Reviewed By: ilia-cher

Differential Revision: D27422287

fbshipit-source-id: d5b72571f0272cb0cb25a395caf916ac3c16a4d3
Summary:
Pull Request resolved: #164

Plan is to deprecate ClientTraceActivity and use a single concrete data structure to log all client side activities :`GenericTraceActivity`. CTA has a lot of explicit fields for metadata that are rarely used which results in sparse definition of activities. Instead of adding a new field, encode the key/value as a string. Note, more type-safe and rich alternative is `std::map` but we are not using it for performance reason.

Reviewed By: gdankel

Differential Revision: D27298829

fbshipit-source-id: b7a82081f983c5502f6d53104fc2d01ddee77b08
Summary:
Pull Request resolved: pytorch/pytorch#55988

Pull Request resolved: #165

as part of the ClientTraceActivity -> GenericTraceActivity migration, move all the metadata fields to JSON encoded string

Reviewed By: gdankel

Differential Revision: D27340314

fbshipit-source-id: f55b77a779e4bda1fb8667cb4e0f4252b93af5ea
Summary:
Pull Request resolved: pytorch/pytorch#56209

Pull Request resolved: #172

in this diff of the stack, we remove the threadId field from the ClientTraceActivity as our step towards the deprecation

Reviewed By: ilia-cher

Differential Revision: D27662747

fbshipit-source-id: 040ba040390680a0fc63ddc8149c6fad940439fc
Summary:
Pull Request resolved: #173

Fix a case of invalid json when metadata is empty

Reviewed By: chaekit

Differential Revision: D27837639

fbshipit-source-id: 24c80671d8a15ac2d39096d401a5ec05b3339317
Summary:
Pull Request resolved: #176

In OSS not all builds support nodiscard
https://app.circleci.com/pipelines/github/pytorch/pytorch/305130/workflows/1885277b-9d54-4637-ab9f-7234be2f2ee2/jobs/12588162

Reviewed By: dzhulgakov

Differential Revision: D27872203

fbshipit-source-id: 711712f5b819fd4faf21da1c363b70960acf9518
Summary:
Pull Request resolved: #192

Add user-defined metadata api

Reviewed By: gdankel

Differential Revision: D27906278

fbshipit-source-id: 601de1e6f7ce61ed2cc62dd7d9a4bd3d4c955286
Summary:
Types defined as `struct` should be forward declared as struct

https://github.com/pytorch/kineto/blob/5bc9386b6d60c3b34b77961ea2900947103304b9/libkineto/include/TraceActivity.h#L21

Find while working on making PyTorch clang-tidy compliant

Pull Request resolved: #199

Reviewed By: ilia-cher

Differential Revision: D28092540

Pulled By: malfet

fbshipit-source-id: 428a5e762002f763f636957f8c207122f411f3c0
…(#56743)

Summary:
Pull Request resolved: pytorch/pytorch#56743

Pull Request resolved: #184

as part of the migration to ClientTraceActivity -> GenericTraceActivity, now that all CTA mirrors GTA's data structure, we can safely swap out the symbol name.

Reviewed By: gdankel

Differential Revision: D27353973

fbshipit-source-id: 7012c6524c3c75079029ac290c1dd722ac187ec5
Summary:
Pull Request resolved: pytorch/pytorch#57333

Pull Request resolved: #193

Expanding Kineto support to more platforms

Reviewed By: gdankel

Differential Revision: D27873669

fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7
Summary:
Refactor for better readability.
Also add locks to ensure CUPTI threads and profiler threads don't interfere.

Reviewed By: chaekit

Differential Revision: D28051462

fbshipit-source-id: c8a4f75b47c20f8467145728d4b38da8bdfcdccd
Summary:
Fixes two issues:
1) Back-to-back collections hit concurrency issue in the observer.
2) Early exit due to running out of buffers during warmup is broken due to missing break statement.

Reviewed By: leitian, ilia-cher

Differential Revision: D28051464

fbshipit-source-id: 7ddc46a344af2a922ee8f7e904b1b4e48bfac773
Summary: I hit some jemalloc aborts (size mismatch) occasionally when freeing metadata strings. Not entirely sure I understand why but I have simplified the code and the issue has disappeared.

Reviewed By: chaekit

Differential Revision: D28051465

fbshipit-source-id: 699515409d6600f7aea14a96c1dac6fb9cad311e
Summary: CPU_OP was missing from the available activity types. It should be possible to profile only CPU ops, or to exclude them.

Reviewed By: chaekit

Differential Revision: D28051461

fbshipit-source-id: 2cfc3e4262b39ad08e38d7bf79541ce0cca97e60
Summary: The current warmup time is extremely conservative and not needed for the vast majority of workloads. Change the default to something more reasonable: 5s.

Reviewed By: chaekit

Differential Revision: D28051463

fbshipit-source-id: 3919342978e7137c0bd9d57536ee1e59eeb2f704
Summary:
Pull Request resolved: #201

Previously tracing buffers were being cleared frequently during warmup in order to stay under the buffer memory limit. But most use cases need at least 1-2s active trace time to capture multiple iterations and so clearing the buffers every 1s seems sufficient.

Reviewed By: chaekit

Differential Revision: D28094676

fbshipit-source-id: a6a99ff02c710587a467d1d607964939bf23f39e
Summary:
1. Update "warps per sm" to "blocks per sm".
2. Add "occupancy" per kernel.

Pull Request resolved: #185

Reviewed By: chaekit

Differential Revision: D28120846

Pulled By: gdankel

fbshipit-source-id: c7ce33b1421b60ae4323c66d38bba5eb175b105b
Summary:
Pull Request resolved: #205

Fix invalid json in case of empty metadata

Reviewed By: rohan-varma

Differential Revision: D28239004

fbshipit-source-id: 3e2930b1b80a5d5f6c6f92de7eee899729a048dc
Summary: while merging ClientTraceActivity and GenericTraceActivity, we accidentally adopted CTA's behavior of returning process id over its `device`. This causes GTA to show up in CPU timeline rather than associated GPU's

Reviewed By: gdankel

Differential Revision: D28196723

fbshipit-source-id: eb8330c14e7c43a470bb4df4811b80754d96535b
Summary:
Pull Request resolved: #207

Different platforms may use different aliases for clocks, fixing the
discrepancy by specifying system_clock directly

Reviewed By: gdankel

Differential Revision: D28272403

fbshipit-source-id: 79df6c3e67cf883ca73146a15a9de2ce226891ae
Summary:
Pull Request resolved: pytorch/pytorch#57835

Pull Request resolved: #208

Adding ability to save memory allocs/deallocs into the trace

Reviewed By: gdankel

Differential Revision: D28260915

fbshipit-source-id: d7905d38d7fac9750754ac1b293d3a1951590b5f
Summary:
Pull Request resolved: #210

Theoretical occupancy is used in Nvidia docs to mean the occupancy a kernel could achieve given sufficient input parallelism (grid size).

It's more interesting in a trace to know what occupancy is actually achieved. Then if it's low, theoretical occupancy can be studied for each kernel to estimate improvement from increasing grid size. But that's something we can do as part of a recommendation or tutorial.

Reviewed By: ilia-cher

Differential Revision: D28327605

fbshipit-source-id: 80b54a955fa2885dbbd45006c45f2fb0d9fca2d2
Summary:
Pull Request resolved: #211

Device properties are useful in any case, but especially for performing analysis on traces such as occupancy.

This patch is a re-implementation of #209

Reviewed By: ilia-cher

Differential Revision: D28337067

fbshipit-source-id: 87267d53bdc1c257db452319339518539b81efed
Summary:
Pull Request resolved: #216

Check the variable value before adding dep on cupti

Reviewed By: gdankel, malfet

Differential Revision: D28363222

fbshipit-source-id: 09f7da1e756ee35559e79ec7a2e5018223f5a12f
Summary:
SetThreadDescription is not available on some older runtimes/older
versions of Windows, which can lead to to following linker errors:
```
kineto.lib(ThreadUtil.cpp.obj) : error LNK2019: unresolved external symbol __imp_SetThreadDescription referenced in function "bool __cdecl libkineto::setThreadName(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?setThreadName@libkineto@YA_NAEBV?$basic_string@DU?$char_traits@D@std@V?$allocator@D@2@std@@z)
```

Use runtime linking method recommended in
See https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreaddescription

Pull Request resolved: #224

Reviewed By: ilia-cher

Differential Revision: D28404596

Pulled By: malfet

fbshipit-source-id: 01937821a131aeff562aed5ebf274eac74bd816e
Summary:
`daemonConfigLoaderFactory` is a function and therefore always non-null

Fixes following clang warning:
```
../third_party/kineto/libkineto/src/ConfigLoader.cpp:161:7: warning: address of function 'daemonConfigLoaderFactory' will always evaluate to 'true' [-Wpointer-bool-conversion]
  if (daemonConfigLoaderFactory && daemonConfigLoaderFactory()) {
      ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~
```

Pull Request resolved: #225

Reviewed By: ilia-cher

Differential Revision: D28405389

Pulled By: malfet

fbshipit-source-id: 3930ab1acd6f2909602eada246d9ad01d9a923e3
Summary: Now that we're extracting SM count from CUDA API in the chrome trace logger, there is no longer any need for passing in the CUPTI API wrapper.

Reviewed By: ilia-cher

Differential Revision: D28579642

fbshipit-source-id: 75004e092d3bc9c26435a02ce1a2034c0ea70004
Summary:
Pull Request resolved: #251

Add a way to add support for multiple protocols and associated logger objects.

Reviewed By: ilia-cher

Differential Revision: D28579660

fbshipit-source-id: 497c45eca0529ba0620d635c6605fc0f20172d9c
Summary:
Pull Request resolved: #271

Change direction of dependency from ConfigLoader -> profiler to profiler -> ConfigLoader. This way, the profiler is able to lazily initialize config loader and also move towards the pluggable profiler design.

Reviewed By: xw285cornell

Differential Revision: D28802798

fbshipit-source-id: 56e88b223fe8fc5276500e1b9a19c602450ae6dc
gdankel and others added 8 commits June 4, 2021 17:57
Summary: Verbose logging was accidentally left enabled, revert.

Reviewed By: ilia-cher

Differential Revision: D28916054

fbshipit-source-id: 9dd3bd831190f246990f1fe17c04ea505ce219e5
Summary: S233925

Reviewed By: satgera, xw285cornell

Differential Revision: D28907829

fbshipit-source-id: 2224720c178fb885d8dccb8a38f50c856e48bdd7
Summary:
Pull Request resolved: pytorch/pytorch#59360

Pull Request resolved: #206

Replace ClientTraceActivity with GenericActivity.
In addition:
* Add a couple of new activity types for user annotations
* Simplify code for GPU-side user annotations
* Add accessor to containing trace span object in activities. Later we can replace this with a trace context / trace session object.
* Simplified MemoryTraceLogger
* Added early exit for cupti push/pop correlation ID

Reviewed By: ilia-cher

Differential Revision: D28231675

fbshipit-source-id: 7129f2493016efb4d3697094f24475e2c39e6e65
Summary:
Pull Request resolved: #202

# Activity Profiler Interface
Adds the child Activiity profiler interface and implementation. This interface can be used by libraries and frameworks to supply trace events to Kineto.
The first version only consolidates trace events and does not handle correlation yet.

## Details
* Add Activity Profiler interface header that includes both profiler and the profiler session.
   A session manages all the trace event data captured by the respective profiler
* Also had to move GenericTrace activity to includes dir as it used in the profiler session interface.
* Creates sessions and starts and stops them in the primary ActivityProfilers flow.
* First use-case is to integrate with Glow (Graph Lowering Compiler) , adding a glow_runtime event type for it.
https://github.com/pytorch/glow/blob/master/docs/Tracing.md

Reviewed By: gdankel

Differential Revision: D27601906

fbshipit-source-id: 38e608a3a1a6e1b9e69f80d07312e388ed7ad339
Summary:
Each `fopen()` must be followed by `fclose()`

Pull Request resolved: #281

Reviewed By: seemethere

Differential Revision: D29008280

Pulled By: malfet

fbshipit-source-id: 2171740a4f95bb980fe41cae1080f38458069836
Summary: Pull Request resolved: #284

Reviewed By: leitian, chaekit

Differential Revision: D29052618

Pulled By: gdankel

fbshipit-source-id: 9f38cdfc7c7e73f5f62844ef857ebe6fed46f30a
Summary:
Pull Request resolved: #291

In container setups, libkineto may not be able to read config file from the host file system, so add a path to retrieve it from the daemon.

Reviewed By: briancoutinho

Differential Revision: D28877610

fbshipit-source-id: a5f513524ac7afea14a7089677c5eff96b3530c6
@guyang3532 guyang3532 closed this Jun 15, 2021
@guyang3532 guyang3532 reopened this Jun 15, 2021
@guyang3532 guyang3532 merged commit 7b0a63b into plugin/0.2 Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants