-
Notifications
You must be signed in to change notification settings - Fork 206
Merge master #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Merge master #293
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: Pull Request resolved: #143 I'm not actually sure there's any potential negative consequence here, but it's potentially harmful to use an anonymous namespace in a header, since it creates a unique symbol for each translation unit which can lead to ODR violations. I happened to notice a warning for this while building OSS pytorch :-) Reviewed By: gdankel Differential Revision: D27383716 fbshipit-source-id: 0fffc93478bc044a543b44011943652dc13c75cd
Summary: Pull Request resolved: #144 Accidentally used C++17 feature, and PyTorch still only supports C++14. Reviewed By: ilia-cher Differential Revision: D27422287 fbshipit-source-id: d5b72571f0272cb0cb25a395caf916ac3c16a4d3
Summary: Pull Request resolved: #164 Plan is to deprecate ClientTraceActivity and use a single concrete data structure to log all client side activities :`GenericTraceActivity`. CTA has a lot of explicit fields for metadata that are rarely used which results in sparse definition of activities. Instead of adding a new field, encode the key/value as a string. Note, more type-safe and rich alternative is `std::map` but we are not using it for performance reason. Reviewed By: gdankel Differential Revision: D27298829 fbshipit-source-id: b7a82081f983c5502f6d53104fc2d01ddee77b08
Summary: Pull Request resolved: pytorch/pytorch#55988 Pull Request resolved: #165 as part of the ClientTraceActivity -> GenericTraceActivity migration, move all the metadata fields to JSON encoded string Reviewed By: gdankel Differential Revision: D27340314 fbshipit-source-id: f55b77a779e4bda1fb8667cb4e0f4252b93af5ea
Summary: Pull Request resolved: pytorch/pytorch#56209 Pull Request resolved: #172 in this diff of the stack, we remove the threadId field from the ClientTraceActivity as our step towards the deprecation Reviewed By: ilia-cher Differential Revision: D27662747 fbshipit-source-id: 040ba040390680a0fc63ddc8149c6fad940439fc
Summary: Pull Request resolved: #173 Fix a case of invalid json when metadata is empty Reviewed By: chaekit Differential Revision: D27837639 fbshipit-source-id: 24c80671d8a15ac2d39096d401a5ec05b3339317
Summary: Pull Request resolved: #176 In OSS not all builds support nodiscard https://app.circleci.com/pipelines/github/pytorch/pytorch/305130/workflows/1885277b-9d54-4637-ab9f-7234be2f2ee2/jobs/12588162 Reviewed By: dzhulgakov Differential Revision: D27872203 fbshipit-source-id: 711712f5b819fd4faf21da1c363b70960acf9518
Summary: Pull Request resolved: #192 Add user-defined metadata api Reviewed By: gdankel Differential Revision: D27906278 fbshipit-source-id: 601de1e6f7ce61ed2cc62dd7d9a4bd3d4c955286
Summary: Types defined as `struct` should be forward declared as struct https://github.com/pytorch/kineto/blob/5bc9386b6d60c3b34b77961ea2900947103304b9/libkineto/include/TraceActivity.h#L21 Find while working on making PyTorch clang-tidy compliant Pull Request resolved: #199 Reviewed By: ilia-cher Differential Revision: D28092540 Pulled By: malfet fbshipit-source-id: 428a5e762002f763f636957f8c207122f411f3c0
…(#56743) Summary: Pull Request resolved: pytorch/pytorch#56743 Pull Request resolved: #184 as part of the migration to ClientTraceActivity -> GenericTraceActivity, now that all CTA mirrors GTA's data structure, we can safely swap out the symbol name. Reviewed By: gdankel Differential Revision: D27353973 fbshipit-source-id: 7012c6524c3c75079029ac290c1dd722ac187ec5
Summary: Pull Request resolved: pytorch/pytorch#57333 Pull Request resolved: #193 Expanding Kineto support to more platforms Reviewed By: gdankel Differential Revision: D27873669 fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7
Summary: Refactor for better readability. Also add locks to ensure CUPTI threads and profiler threads don't interfere. Reviewed By: chaekit Differential Revision: D28051462 fbshipit-source-id: c8a4f75b47c20f8467145728d4b38da8bdfcdccd
Summary: Fixes two issues: 1) Back-to-back collections hit concurrency issue in the observer. 2) Early exit due to running out of buffers during warmup is broken due to missing break statement. Reviewed By: leitian, ilia-cher Differential Revision: D28051464 fbshipit-source-id: 7ddc46a344af2a922ee8f7e904b1b4e48bfac773
Summary: I hit some jemalloc aborts (size mismatch) occasionally when freeing metadata strings. Not entirely sure I understand why but I have simplified the code and the issue has disappeared. Reviewed By: chaekit Differential Revision: D28051465 fbshipit-source-id: 699515409d6600f7aea14a96c1dac6fb9cad311e
Summary: CPU_OP was missing from the available activity types. It should be possible to profile only CPU ops, or to exclude them. Reviewed By: chaekit Differential Revision: D28051461 fbshipit-source-id: 2cfc3e4262b39ad08e38d7bf79541ce0cca97e60
Summary: The current warmup time is extremely conservative and not needed for the vast majority of workloads. Change the default to something more reasonable: 5s. Reviewed By: chaekit Differential Revision: D28051463 fbshipit-source-id: 3919342978e7137c0bd9d57536ee1e59eeb2f704
Summary: Pull Request resolved: #201 Previously tracing buffers were being cleared frequently during warmup in order to stay under the buffer memory limit. But most use cases need at least 1-2s active trace time to capture multiple iterations and so clearing the buffers every 1s seems sufficient. Reviewed By: chaekit Differential Revision: D28094676 fbshipit-source-id: a6a99ff02c710587a467d1d607964939bf23f39e
Summary: 1. Update "warps per sm" to "blocks per sm". 2. Add "occupancy" per kernel. Pull Request resolved: #185 Reviewed By: chaekit Differential Revision: D28120846 Pulled By: gdankel fbshipit-source-id: c7ce33b1421b60ae4323c66d38bba5eb175b105b
Summary: Pull Request resolved: #205 Fix invalid json in case of empty metadata Reviewed By: rohan-varma Differential Revision: D28239004 fbshipit-source-id: 3e2930b1b80a5d5f6c6f92de7eee899729a048dc
Summary: while merging ClientTraceActivity and GenericTraceActivity, we accidentally adopted CTA's behavior of returning process id over its `device`. This causes GTA to show up in CPU timeline rather than associated GPU's Reviewed By: gdankel Differential Revision: D28196723 fbshipit-source-id: eb8330c14e7c43a470bb4df4811b80754d96535b
Summary: Pull Request resolved: #207 Different platforms may use different aliases for clocks, fixing the discrepancy by specifying system_clock directly Reviewed By: gdankel Differential Revision: D28272403 fbshipit-source-id: 79df6c3e67cf883ca73146a15a9de2ce226891ae
Summary: Pull Request resolved: pytorch/pytorch#57835 Pull Request resolved: #208 Adding ability to save memory allocs/deallocs into the trace Reviewed By: gdankel Differential Revision: D28260915 fbshipit-source-id: d7905d38d7fac9750754ac1b293d3a1951590b5f
Summary: Pull Request resolved: #210 Theoretical occupancy is used in Nvidia docs to mean the occupancy a kernel could achieve given sufficient input parallelism (grid size). It's more interesting in a trace to know what occupancy is actually achieved. Then if it's low, theoretical occupancy can be studied for each kernel to estimate improvement from increasing grid size. But that's something we can do as part of a recommendation or tutorial. Reviewed By: ilia-cher Differential Revision: D28327605 fbshipit-source-id: 80b54a955fa2885dbbd45006c45f2fb0d9fca2d2
Summary: Pull Request resolved: #211 Device properties are useful in any case, but especially for performing analysis on traces such as occupancy. This patch is a re-implementation of #209 Reviewed By: ilia-cher Differential Revision: D28337067 fbshipit-source-id: 87267d53bdc1c257db452319339518539b81efed
Summary: Pull Request resolved: #216 Check the variable value before adding dep on cupti Reviewed By: gdankel, malfet Differential Revision: D28363222 fbshipit-source-id: 09f7da1e756ee35559e79ec7a2e5018223f5a12f
Summary: SetThreadDescription is not available on some older runtimes/older versions of Windows, which can lead to to following linker errors: ``` kineto.lib(ThreadUtil.cpp.obj) : error LNK2019: unresolved external symbol __imp_SetThreadDescription referenced in function "bool __cdecl libkineto::setThreadName(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?setThreadName@libkineto@YA_NAEBV?$basic_string@DU?$char_traits@D@std@V?$allocator@D@2@std@@z) ``` Use runtime linking method recommended in See https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreaddescription Pull Request resolved: #224 Reviewed By: ilia-cher Differential Revision: D28404596 Pulled By: malfet fbshipit-source-id: 01937821a131aeff562aed5ebf274eac74bd816e
Summary: `daemonConfigLoaderFactory` is a function and therefore always non-null Fixes following clang warning: ``` ../third_party/kineto/libkineto/src/ConfigLoader.cpp:161:7: warning: address of function 'daemonConfigLoaderFactory' will always evaluate to 'true' [-Wpointer-bool-conversion] if (daemonConfigLoaderFactory && daemonConfigLoaderFactory()) { ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~ ``` Pull Request resolved: #225 Reviewed By: ilia-cher Differential Revision: D28405389 Pulled By: malfet fbshipit-source-id: 3930ab1acd6f2909602eada246d9ad01d9a923e3
Summary: Now that we're extracting SM count from CUDA API in the chrome trace logger, there is no longer any need for passing in the CUPTI API wrapper. Reviewed By: ilia-cher Differential Revision: D28579642 fbshipit-source-id: 75004e092d3bc9c26435a02ce1a2034c0ea70004
Summary: Pull Request resolved: #251 Add a way to add support for multiple protocols and associated logger objects. Reviewed By: ilia-cher Differential Revision: D28579660 fbshipit-source-id: 497c45eca0529ba0620d635c6605fc0f20172d9c
Summary: Pull Request resolved: #271 Change direction of dependency from ConfigLoader -> profiler to profiler -> ConfigLoader. This way, the profiler is able to lazily initialize config loader and also move towards the pluggable profiler design. Reviewed By: xw285cornell Differential Revision: D28802798 fbshipit-source-id: 56e88b223fe8fc5276500e1b9a19c602450ae6dc
Summary: Verbose logging was accidentally left enabled, revert. Reviewed By: ilia-cher Differential Revision: D28916054 fbshipit-source-id: 9dd3bd831190f246990f1fe17c04ea505ce219e5
Summary: S233925 Reviewed By: satgera, xw285cornell Differential Revision: D28907829 fbshipit-source-id: 2224720c178fb885d8dccb8a38f50c856e48bdd7
Summary: Pull Request resolved: pytorch/pytorch#59360 Pull Request resolved: #206 Replace ClientTraceActivity with GenericActivity. In addition: * Add a couple of new activity types for user annotations * Simplify code for GPU-side user annotations * Add accessor to containing trace span object in activities. Later we can replace this with a trace context / trace session object. * Simplified MemoryTraceLogger * Added early exit for cupti push/pop correlation ID Reviewed By: ilia-cher Differential Revision: D28231675 fbshipit-source-id: 7129f2493016efb4d3697094f24475e2c39e6e65
Summary: Pull Request resolved: #202 # Activity Profiler Interface Adds the child Activiity profiler interface and implementation. This interface can be used by libraries and frameworks to supply trace events to Kineto. The first version only consolidates trace events and does not handle correlation yet. ## Details * Add Activity Profiler interface header that includes both profiler and the profiler session. A session manages all the trace event data captured by the respective profiler * Also had to move GenericTrace activity to includes dir as it used in the profiler session interface. * Creates sessions and starts and stops them in the primary ActivityProfilers flow. * First use-case is to integrate with Glow (Graph Lowering Compiler) , adding a glow_runtime event type for it. https://github.com/pytorch/glow/blob/master/docs/Tracing.md Reviewed By: gdankel Differential Revision: D27601906 fbshipit-source-id: 38e608a3a1a6e1b9e69f80d07312e388ed7ad339
Summary: Each `fopen()` must be followed by `fclose()` Pull Request resolved: #281 Reviewed By: seemethere Differential Revision: D29008280 Pulled By: malfet fbshipit-source-id: 2171740a4f95bb980fe41cae1080f38458069836
Summary: Pull Request resolved: #284 Reviewed By: leitian, chaekit Differential Revision: D29052618 Pulled By: gdankel fbshipit-source-id: 9f38cdfc7c7e73f5f62844ef857ebe6fed46f30a
Summary: Pull Request resolved: #291 In container setups, libkineto may not be able to read config file from the host file system, so add a path to retrieve it from the daemon. Reviewed By: briancoutinho Differential Revision: D28877610 fbshipit-source-id: a5f513524ac7afea14a7089677c5eff96b3530c6
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.