Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[profiler] Support top-level memory events #51421

Closed
wants to merge 5 commits into from

Conversation

ilia-cher
Copy link
Contributor

@ilia-cher ilia-cher commented Jan 31, 2021

Stack from ghstack:

Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.

Test Plan:
python test/test_profiler.py -k test_memory_profiler

------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
        aten::rand        40.00%      10.000us       100.00%      25.000us      25.000us         400 b           0 b           0 b           0 b             1
       aten::empty        24.00%       6.000us        24.00%       6.000us       6.000us         400 b         400 b           0 b           0 b             1
    aten::uniform_        36.00%       9.000us        36.00%       9.000us       9.000us           0 b           0 b           0 b           0 b             1
          [memory]         0.00%       0.000us         0.00%       0.000us       0.000us        -400 b        -400 b        -512 b        -512 b             2
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------

Differential Revision: D26166518

Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
This PR also adds an API to track memory events outside of or partially
overlapping with the profiler scope.

Test Plan:
python test/test_profiler.py -k test_memory_profiler

[ghstack-poisoned]
ilia-cher added a commit that referenced this pull request Jan 31, 2021
Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
This PR also adds an API to track memory events outside of or partially
overlapping with the profiler scope.

Test Plan:
python test/test_profiler.py -k test_memory_profiler

ghstack-source-id: ef780c6b19287f0574dd9fee35ae54f9f0bf83b3
Pull Request resolved: #51421
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 31, 2021

💊 CI failures summary and remediations

As of commit 326f80c (more details on the Dr. CI page):


  • 2/3 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)
  • 1/3 broken upstream at merge base 0c60922 on Feb 03 from 5:06pm to 11:46pm

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


Extra GitHub checks: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

std::atomic<bool> global_memory_reporting_ {false};
}
void enableGlobalMemoryReporting(bool enable) {
global_memory_reporting_ = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be =enable here?

@@ -1145,11 +1149,14 @@ def parse_kineto_results(result):
cuda_memory_usage = 0
if kineto_event.device_type() == DeviceType.CPU:
# find the corresponding memory allocation events
for mem_record in mem_records:
for mem_record_idx in range(len(mem_records)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for mem_record_idx, mem_record in enumerate(mem_records):
Alternatively, you can make mem_records consist of [mem_record, flag] pairs or NamedTuples, and then you can simplify code in this and subsequent loops:
mem_record[1]=True, or mem_record.flag=True

@@ -1188,6 +1195,30 @@ def parse_kineto_results(result):
k_evt.start_us(),
k_evt.start_us() + k_evt.duration_us())

# output top-level memory events
for mem_record_idx in range(len(mem_records)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the previous proposal this would become:

for mem_record in mem_records:
    if not mem_record[1]:

@@ -292,6 +292,10 @@ void ProfiledCPUMemoryReporter::Delete(void* ptr) {
allocated = allocated_;
nbytes = it->second;
size_table_.erase(it);
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any changes required in CUDAAllocator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDACachingAllocator already saves block sizes

Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
This PR also adds an API to track memory events outside of or partially
overlapping with the profiler scope.

Test Plan:
python test/test_profiler.py -k test_memory_profiler
```
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
        aten::rand        40.00%      10.000us       100.00%      25.000us      25.000us         400 b           0 b           0 b           0 b             1
       aten::empty        24.00%       6.000us        24.00%       6.000us       6.000us         400 b         400 b           0 b           0 b             1
    aten::uniform_        36.00%       9.000us        36.00%       9.000us       9.000us           0 b           0 b           0 b           0 b             1
          [memory]         0.00%       0.000us         0.00%       0.000us       0.000us        -400 b        -400 b        -512 b        -512 b             2
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
```

Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518)

[ghstack-poisoned]
ilia-cher added a commit that referenced this pull request Feb 2, 2021
Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
This PR also adds an API to track memory events outside of or partially
overlapping with the profiler scope.

Test Plan:
python test/test_profiler.py -k test_memory_profiler

ghstack-source-id: 50b3f09f7a5cf4978f575f4f7a6d01e9f821666d
Pull Request resolved: #51421
@ilia-cher ilia-cher requested a review from ngimel February 2, 2021 14:38
@ilia-cher
Copy link
Contributor Author

(we decided not to include the new api at the moment, will remove enable_global_memory_reporting)

Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
This PR also adds an API to track memory events outside of or partially
overlapping with the profiler scope.

Test Plan:
python test/test_profiler.py -k test_memory_profiler
```
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
        aten::rand        40.00%      10.000us       100.00%      25.000us      25.000us         400 b           0 b           0 b           0 b             1
       aten::empty        24.00%       6.000us        24.00%       6.000us       6.000us         400 b         400 b           0 b           0 b             1
    aten::uniform_        36.00%       9.000us        36.00%       9.000us       9.000us           0 b           0 b           0 b           0 b             1
          [memory]         0.00%       0.000us         0.00%       0.000us       0.000us        -400 b        -400 b        -512 b        -512 b             2
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
```

Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518)

[ghstack-poisoned]
Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
This PR also adds an API to track memory events outside of or partially
overlapping with the profiler scope.

Test Plan:
python test/test_profiler.py -k test_memory_profiler
```
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
        aten::rand        40.00%      10.000us       100.00%      25.000us      25.000us         400 b           0 b           0 b           0 b             1
       aten::empty        24.00%       6.000us        24.00%       6.000us       6.000us         400 b         400 b           0 b           0 b             1
    aten::uniform_        36.00%       9.000us        36.00%       9.000us       9.000us           0 b           0 b           0 b           0 b             1
          [memory]         0.00%       0.000us         0.00%       0.000us       0.000us        -400 b        -400 b        -512 b        -512 b             2
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
```

Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518)

[ghstack-poisoned]
ilia-cher added a commit that referenced this pull request Feb 3, 2021
Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
This PR also adds an API to track memory events outside of or partially
overlapping with the profiler scope.

Test Plan:
python test/test_profiler.py -k test_memory_profiler

ghstack-source-id: eee1b6e1a71f210b575bd2e0c9ed65fe7d6c5e4a
Pull Request resolved: #51421
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove debug prints.

def test_memory_profiler(self):
def run_profiler(tensor_creation_fn, metric):
# collecting allocs / deallocs
with _profile(profile_memory=True, record_shapes=True, use_kineto=kineto_available()) as prof:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make a follow-up item to either migrate or duplicate the tests with profiler instead of autograd.profiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all our autograd.profiler tests should use use_kineto=kineto_available() now, and we have both kineto-enabled and not enabled CI builds; eventually when autograd.profiler is deprecated and we always use kineto, we will just make the tests use the new, kineto-only api

del y
gc.collect()
stats = prof.key_averages(group_by_input_shape=True)
print(stats.table(sort_by="cpu_memory_usage", row_limit=-1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a debug print

def create_mkldnn_tensor():
return torch.rand(10, 10, dtype=torch.float32).to_mkldnn()

print("Running CPU test")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove debug print

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was originally added on purpose (together with profiler output), I think we have many tests that do this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk, usually tests don't do this (they are tests), but there are some stray prints. Do you need this output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for debug, will remove

Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.

Test Plan:
python test/test_profiler.py -k test_memory_profiler
```
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
        aten::rand        40.00%      10.000us       100.00%      25.000us      25.000us         400 b           0 b           0 b           0 b             1
       aten::empty        24.00%       6.000us        24.00%       6.000us       6.000us         400 b         400 b           0 b           0 b             1
    aten::uniform_        36.00%       9.000us        36.00%       9.000us       9.000us           0 b           0 b           0 b           0 b             1
          [memory]         0.00%       0.000us         0.00%       0.000us       0.000us        -400 b        -400 b        -512 b        -512 b             2
------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
```

Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518)

[ghstack-poisoned]
ilia-cher added a commit that referenced this pull request Feb 4, 2021
Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
This PR also adds an API to track memory events outside of or partially
overlapping with the profiler scope.

Test Plan:
python test/test_profiler.py -k test_memory_profiler

ghstack-source-id: 2857c59eb4ade824434727116c223a8a23a03f81
Pull Request resolved: #51421
@facebook-github-bot
Copy link
Contributor

@ilia-cher merged this pull request in f1f9b04.

@facebook-github-bot facebook-github-bot deleted the gh/ilia-cher/99/head branch February 7, 2021 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants