[profiler] Support top-level memory events #51421

ilia-cher · 2021-01-31T02:29:32Z

Stack from ghstack:

[profiler] Support top-level memory events #51421 [profiler] Support top-level memory events

Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.

Test Plan:
python test/test_profiler.py -k test_memory_profiler

------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
        aten::rand        40.00%      10.000us       100.00%      25.000us      25.000us         400 b           0 b           0 b           0 b             1
       aten::empty        24.00%       6.000us        24.00%       6.000us       6.000us         400 b         400 b           0 b           0 b             1
    aten::uniform_        36.00%       9.000us        36.00%       9.000us       9.000us           0 b           0 b           0 b           0 b             1
          [memory]         0.00%       0.000us         0.00%       0.000us       0.000us        -400 b        -400 b        -512 b        -512 b             2
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------

Differential Revision: D26166518

Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler [ghstack-poisoned]

Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ghstack-source-id: ef780c6b19287f0574dd9fee35ae54f9f0bf83b3 Pull Request resolved: #51421

facebook-github-bot · 2021-01-31T02:29:51Z

💊 CI failures summary and remediations

As of commit 326f80c (more details on the Dr. CI page):

2/3 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)
1/3 broken upstream at merge base 0c60922 on Feb 03 from 5:06pm to 11:46pm

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.

pytorch_linux_backward_compatibility_check_test on Feb 03 from 5:06pm to 11:46pm (443a431 - 1518aee)
- 🔁 rerun

Extra GitHub checks: 1 failed

Failed: GitHub Actions - clang-tidy

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ngimel · 2021-02-01T21:36:54Z

c10/core/Allocator.cpp

+std::atomic<bool> global_memory_reporting_ {false};
+}
+void enableGlobalMemoryReporting(bool enable) {
+  global_memory_reporting_ = true;


should be =enable here?

ngimel · 2021-02-01T21:44:54Z

torch/autograd/profiler.py

@@ -1145,11 +1149,14 @@ def parse_kineto_results(result):
        cuda_memory_usage = 0
        if kineto_event.device_type() == DeviceType.CPU:
            # find the corresponding memory allocation events
-            for mem_record in mem_records:
+            for mem_record_idx in range(len(mem_records)):


nit: for mem_record_idx, mem_record in enumerate(mem_records):
Alternatively, you can make mem_records consist of [mem_record, flag] pairs or NamedTuples, and then you can simplify code in this and subsequent loops:
mem_record[1]=True, or mem_record.flag=True

ngimel · 2021-02-01T21:51:00Z

torch/autograd/profiler.py

@@ -1188,6 +1195,30 @@ def parse_kineto_results(result):
                    k_evt.start_us(),
                    k_evt.start_us() + k_evt.duration_us())

+    # output top-level memory events
+    for mem_record_idx in range(len(mem_records)):


with the previous proposal this would become:

for mem_record in mem_records: if not mem_record[1]:

ngimel · 2021-02-01T23:12:48Z

c10/core/CPUAllocator.cpp

@@ -292,6 +292,10 @@ void ProfiledCPUMemoryReporter::Delete(void* ptr) {
      allocated = allocated_;
      nbytes = it->second;
      size_table_.erase(it);
+    } else {


are there any changes required in CUDAAllocator?

CUDACachingAllocator already saves block sizes

Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ``` ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::rand 40.00% 10.000us 100.00% 25.000us 25.000us 400 b 0 b 0 b 0 b 1 aten::empty 24.00% 6.000us 24.00% 6.000us 6.000us 400 b 400 b 0 b 0 b 1 aten::uniform_ 36.00% 9.000us 36.00% 9.000us 9.000us 0 b 0 b 0 b 0 b 1 [memory] 0.00% 0.000us 0.00% 0.000us 0.000us -400 b -400 b -512 b -512 b 2 ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518) [ghstack-poisoned]

Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ghstack-source-id: 50b3f09f7a5cf4978f575f4f7a6d01e9f821666d Pull Request resolved: #51421

ilia-cher · 2021-02-03T04:31:13Z

(we decided not to include the new api at the moment, will remove enable_global_memory_reporting)

Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ``` ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::rand 40.00% 10.000us 100.00% 25.000us 25.000us 400 b 0 b 0 b 0 b 1 aten::empty 24.00% 6.000us 24.00% 6.000us 6.000us 400 b 400 b 0 b 0 b 1 aten::uniform_ 36.00% 9.000us 36.00% 9.000us 9.000us 0 b 0 b 0 b 0 b 1 [memory] 0.00% 0.000us 0.00% 0.000us 0.000us -400 b -400 b -512 b -512 b 2 ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518) [ghstack-poisoned]

Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ghstack-source-id: eee1b6e1a71f210b575bd2e0c9ed65fe7d6c5e4a Pull Request resolved: #51421

ngimel

Please remove debug prints.

ngimel · 2021-02-03T17:48:23Z

test/test_profiler.py

+    def test_memory_profiler(self):
+        def run_profiler(tensor_creation_fn, metric):
+            # collecting allocs / deallocs
+            with _profile(profile_memory=True, record_shapes=True, use_kineto=kineto_available()) as prof:


we should make a follow-up item to either migrate or duplicate the tests with profiler instead of autograd.profiler.

all our autograd.profiler tests should use use_kineto=kineto_available() now, and we have both kineto-enabled and not enabled CI builds; eventually when autograd.profiler is deprecated and we always use kineto, we will just make the tests use the new, kineto-only api

ngimel · 2021-02-03T17:49:56Z

test/test_profiler.py

+                del y
+            gc.collect()
+        stats = prof.key_averages(group_by_input_shape=True)
+        print(stats.table(sort_by="cpu_memory_usage", row_limit=-1))


this looks like a debug print

ngimel · 2021-02-03T17:50:19Z

test/test_profiler.py

+        def create_mkldnn_tensor():
+            return torch.rand(10, 10, dtype=torch.float32).to_mkldnn()
+
+        print("Running CPU test")


remove debug print

it was originally added on purpose (together with profiler output), I think we have many tests that do this?

Idk, usually tests don't do this (they are tests), but there are some stray prints. Do you need this output?

only for debug, will remove

Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. Test Plan: python test/test_profiler.py -k test_memory_profiler ``` ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::rand 40.00% 10.000us 100.00% 25.000us 25.000us 400 b 0 b 0 b 0 b 1 aten::empty 24.00% 6.000us 24.00% 6.000us 6.000us 400 b 400 b 0 b 0 b 1 aten::uniform_ 36.00% 9.000us 36.00% 9.000us 9.000us 0 b 0 b 0 b 0 b 1 [memory] 0.00% 0.000us 0.00% 0.000us 0.000us -400 b -400 b -512 b -512 b 2 ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518) [ghstack-poisoned]

Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ghstack-source-id: 2857c59eb4ade824434727116c223a8a23a03f81 Pull Request resolved: #51421

facebook-github-bot · 2021-02-04T13:15:17Z

@ilia-cher merged this pull request in f1f9b04.

ilia-cher requested a review from albanD as a code owner January 31, 2021 02:29

facebook-github-bot added the cla signed label Jan 31, 2021

ilia-cher requested review from ngimel and Chillee January 31, 2021 02:30

ngimel reviewed Feb 1, 2021

View reviewed changes

ilia-cher requested a review from ngimel February 2, 2021 14:38

ngimel approved these changes Feb 3, 2021

View reviewed changes

ilia-cher mentioned this pull request Feb 4, 2021

Use libkineto in profiler #46470

Closed

facebook-github-bot closed this in f1f9b04 Feb 4, 2021

facebook-github-bot added the Merged label Feb 4, 2021

facebook-github-bot deleted the gh/ilia-cher/99/head branch February 7, 2021 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[profiler] Support top-level memory events #51421

[profiler] Support top-level memory events #51421

ilia-cher commented Jan 31, 2021 •

edited

facebook-github-bot commented Jan 31, 2021 •

edited

ngimel Feb 1, 2021

ngimel Feb 1, 2021

ngimel Feb 1, 2021

ngimel Feb 1, 2021

ilia-cher Feb 2, 2021

ilia-cher commented Feb 3, 2021

ngimel left a comment

ngimel Feb 3, 2021

ilia-cher Feb 4, 2021

ngimel Feb 3, 2021

ngimel Feb 3, 2021

ilia-cher Feb 3, 2021

ngimel Feb 4, 2021

ilia-cher Feb 4, 2021

facebook-github-bot commented Feb 4, 2021

[profiler] Support top-level memory events #51421

[profiler] Support top-level memory events #51421

Conversation

ilia-cher commented Jan 31, 2021 • edited

facebook-github-bot commented Jan 31, 2021 • edited

💊 CI failures summary and remediations

🚧 1 fixed upstream failure:

Extra GitHub checks: 1 failed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilia-cher commented Feb 3, 2021

ngimel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 4, 2021

ilia-cher commented Jan 31, 2021 •

edited

facebook-github-bot commented Jan 31, 2021 •

edited