New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[profiler] Support top-level memory events #51421
Conversation
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler [ghstack-poisoned]
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ghstack-source-id: ef780c6b19287f0574dd9fee35ae54f9f0bf83b3 Pull Request resolved: #51421
💊 CI failures summary and remediationsAs of commit 326f80c (more details on the Dr. CI page):
🚧 1 fixed upstream failure:These were probably caused by upstream breakages that were already fixed.
Please rebase on the
|
c10/core/Allocator.cpp
Outdated
std::atomic<bool> global_memory_reporting_ {false}; | ||
} | ||
void enableGlobalMemoryReporting(bool enable) { | ||
global_memory_reporting_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be =enable
here?
torch/autograd/profiler.py
Outdated
@@ -1145,11 +1149,14 @@ def parse_kineto_results(result): | |||
cuda_memory_usage = 0 | |||
if kineto_event.device_type() == DeviceType.CPU: | |||
# find the corresponding memory allocation events | |||
for mem_record in mem_records: | |||
for mem_record_idx in range(len(mem_records)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for mem_record_idx, mem_record in enumerate(mem_records):
Alternatively, you can make mem_records consist of [mem_record, flag] pairs or NamedTuples, and then you can simplify code in this and subsequent loops:
mem_record[1]=True
, or mem_record.flag=True
torch/autograd/profiler.py
Outdated
@@ -1188,6 +1195,30 @@ def parse_kineto_results(result): | |||
k_evt.start_us(), | |||
k_evt.start_us() + k_evt.duration_us()) | |||
|
|||
# output top-level memory events | |||
for mem_record_idx in range(len(mem_records)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the previous proposal this would become:
for mem_record in mem_records:
if not mem_record[1]:
@@ -292,6 +292,10 @@ void ProfiledCPUMemoryReporter::Delete(void* ptr) { | |||
allocated = allocated_; | |||
nbytes = it->second; | |||
size_table_.erase(it); | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any changes required in CUDAAllocator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDACachingAllocator already saves block sizes
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ``` ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::rand 40.00% 10.000us 100.00% 25.000us 25.000us 400 b 0 b 0 b 0 b 1 aten::empty 24.00% 6.000us 24.00% 6.000us 6.000us 400 b 400 b 0 b 0 b 1 aten::uniform_ 36.00% 9.000us 36.00% 9.000us 9.000us 0 b 0 b 0 b 0 b 1 [memory] 0.00% 0.000us 0.00% 0.000us 0.000us -400 b -400 b -512 b -512 b 2 ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518) [ghstack-poisoned]
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ghstack-source-id: 50b3f09f7a5cf4978f575f4f7a6d01e9f821666d Pull Request resolved: #51421
(we decided not to include the new api at the moment, will remove enable_global_memory_reporting) |
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ``` ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::rand 40.00% 10.000us 100.00% 25.000us 25.000us 400 b 0 b 0 b 0 b 1 aten::empty 24.00% 6.000us 24.00% 6.000us 6.000us 400 b 400 b 0 b 0 b 1 aten::uniform_ 36.00% 9.000us 36.00% 9.000us 9.000us 0 b 0 b 0 b 0 b 1 [memory] 0.00% 0.000us 0.00% 0.000us 0.000us -400 b -400 b -512 b -512 b 2 ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518) [ghstack-poisoned]
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ``` ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::rand 40.00% 10.000us 100.00% 25.000us 25.000us 400 b 0 b 0 b 0 b 1 aten::empty 24.00% 6.000us 24.00% 6.000us 6.000us 400 b 400 b 0 b 0 b 1 aten::uniform_ 36.00% 9.000us 36.00% 9.000us 9.000us 0 b 0 b 0 b 0 b 1 [memory] 0.00% 0.000us 0.00% 0.000us 0.000us -400 b -400 b -512 b -512 b 2 ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518) [ghstack-poisoned]
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ghstack-source-id: eee1b6e1a71f210b575bd2e0c9ed65fe7d6c5e4a Pull Request resolved: #51421
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove debug prints.
def test_memory_profiler(self): | ||
def run_profiler(tensor_creation_fn, metric): | ||
# collecting allocs / deallocs | ||
with _profile(profile_memory=True, record_shapes=True, use_kineto=kineto_available()) as prof: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make a follow-up item to either migrate or duplicate the tests with profiler
instead of autograd.profiler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all our autograd.profiler tests should use use_kineto=kineto_available()
now, and we have both kineto-enabled and not enabled CI builds; eventually when autograd.profiler is deprecated and we always use kineto, we will just make the tests use the new, kineto-only api
test/test_profiler.py
Outdated
del y | ||
gc.collect() | ||
stats = prof.key_averages(group_by_input_shape=True) | ||
print(stats.table(sort_by="cpu_memory_usage", row_limit=-1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a debug print
test/test_profiler.py
Outdated
def create_mkldnn_tensor(): | ||
return torch.rand(10, 10, dtype=torch.float32).to_mkldnn() | ||
|
||
print("Running CPU test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove debug print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was originally added on purpose (together with profiler output), I think we have many tests that do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, usually tests don't do this (they are tests), but there are some stray prints. Do you need this output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only for debug, will remove
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. Test Plan: python test/test_profiler.py -k test_memory_profiler ``` ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::rand 40.00% 10.000us 100.00% 25.000us 25.000us 400 b 0 b 0 b 0 b 1 aten::empty 24.00% 6.000us 24.00% 6.000us 6.000us 400 b 400 b 0 b 0 b 1 aten::uniform_ 36.00% 9.000us 36.00% 9.000us 9.000us 0 b 0 b 0 b 0 b 1 [memory] 0.00% 0.000us 0.00% 0.000us 0.000us -400 b -400 b -512 b -512 b 2 ------------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` Differential Revision: [D26166518](https://our.internmc.facebook.com/intern/diff/D26166518) [ghstack-poisoned]
Summary: Mark memory events that did not happen within an operator context explicitly in the profiler output. This PR also adds an API to track memory events outside of or partially overlapping with the profiler scope. Test Plan: python test/test_profiler.py -k test_memory_profiler ghstack-source-id: 2857c59eb4ade824434727116c223a8a23a03f81 Pull Request resolved: #51421
@ilia-cher merged this pull request in f1f9b04. |
Stack from ghstack:
Summary:
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
Test Plan:
python test/test_profiler.py -k test_memory_profiler
Differential Revision: D26166518