New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling allocator for mobile. #43951
Conversation
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8654824fc70498babdc2f52b0f54d50647ff8cfc Pull Request resolved: #43951
💊 CI failures summary and remediationsAs of commit 1f0cb92 (more details on the Dr. CI page): Commit 1f0cb92 was recently pushed. Waiting for builds... This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 12 times. |
Codecov Report
@@ Coverage Diff @@
## gh/kimishpatel/24/base #43951 +/- ##
=========================================================
Coverage ? 67.91%
=========================================================
Files ? 384
Lines ? 49839
Branches ? 0
=========================================================
Hits ? 33849
Misses ? 15990
Partials ? 0 Continue to review full report at Codecov.
|
c10/core/CPUProfilingAllocator.h
Outdated
// Records size of each allocation by their sequential allocation ids. | ||
std::vector<uint64_t> allocation_sizes; | ||
// This maps one allocation id (X) to another allocation id (Y). | ||
// Allocation X is alive until allocation Y. From allocation Y onwards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that X can be used to satisfy Y, or not? From my read of the code, the answer is yes, so maybe we should document that Y
is the id of the first allocation after X
is freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the comment after the addition of "Thus Y is the id of the first allocation after X is freed." I felt that dreiss's first statement, "X can be used to satisfy Y" was the clearest explanation of the lifetimes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add that if it helps clarify but the reason I did not was because we are not really creating aliases of allocation ids.
c10/core/CPUProfilingAllocator.h
Outdated
|
||
/* | ||
* Usage: Validate allocation plan made with WithProfileAllocationGuard | ||
* bool plan_validation_success, succes = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"success"
c10/core/CPUProfilingAllocator.h
Outdated
* bool plan_validation_success, succes = true; | ||
* for (some number of representative inputs) | ||
* { | ||
* WithValidateAllocationPlanGaurd(&plan, &plan_validation_success); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Guard"
c10/core/CPUProfilingAllocator.h
Outdated
* { | ||
* WithValidateAllocationPlanGaurd(&plan, &plan_validation_success); | ||
* module.forward(...); | ||
* success = success && plan_validation_success; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"success" seems unnecessary here.
Also, why not throw on failure, rather than returning a bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly so that user can decide note to use ProfilingAllocator based on the result of success. Other option is to have user run that in try.. catch block and make the same decision based on the outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exceptions really slow, so if you expect this to be failing on a regular basis, bool return is much better.
c10/core/CPUProfilingAllocator.h
Outdated
* WithProfilingAllocatorGuard allocator_guard(&profiling_allocator, &plan); | ||
* module.forward(...); | ||
* } | ||
* plan now contains allocation plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this last line of the comment is wrong.
c10/core/CPUProfilingAllocator.cpp
Outdated
// Nesting of allocation profiling does not seem meanigful. | ||
TORCH_CHECK(allocation_planner == nullptr, | ||
"Nesting profiling allocations is not supported."); | ||
planner_ = std::make_unique<AllocationPlanner>(plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allocate this on the heap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want AllocationPlanner default constructible so that it forces you to provide a valid plan pointer. Not sure if that is a good design, but thats what lead to this.
c10/core/CPUProfilingAllocator.cpp
Outdated
// Free being recorded was allocated outside of WithProfileAllocationGuard | ||
return; | ||
} | ||
auto id = (*it).second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it->second
c10/core/CPUProfilingAllocator.cpp
Outdated
const uint64_t size, const void* ptr) { | ||
if (allocation_id_ >= allocation_plan_->allocation_sizes.size() || | ||
allocation_plan_->allocation_sizes[allocation_id_] != size) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to throw here with details of the mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for the usage I have in mind, I still want to return boolean. Let me add warning with details.
c10/core/CPUProfilingAllocator.cpp
Outdated
allocation_plan_->allocation_offsets[i] + | ||
allocation_plan_->allocation_sizes[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to extract this expression into a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto limit = allocation_plan_->allocation_offsets[i] + allocation_plan_->allocation_sizes[i];
allocation_plan_->total_size = std::max(allocation_plan_->total_size, limit);
c10/core/CPUProfilingAllocator.cpp
Outdated
std::vector<MemEvent> create_and_sort_mem_events( | ||
const std::vector<uint64_t>& allocation_sizes, | ||
const std::vector<uint64_t>& allocation_lifetimes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review this carefully since it's the lowest-risk part of this change.
I'd suggest following up with the Glow team to see if they have better algorithms for this.
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 847aea3a69bf269f148a2c54bddc71a5292be2c2 Pull Request resolved: #43951
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0b1cdb7a0ea108c5c48acc93556120eb4b5264e4 Pull Request resolved: #43951
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: db838c2bba75a97d8f9954080f62ee99cb27176a Pull Request resolved: #43951
@@ -156,13 +157,20 @@ class DefaultMobileCPUAllocator final : public at::Allocator { | |||
// TODO: enable with better TLS support on mobile | |||
// profiledCPUMemoryReporter().Delete(pointer); | |||
auto allocator_ptr = GetThreadLocalCachingAllocator(); | |||
auto profiling_allocator_ptr = GetThreadLocalProfilingAllocator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason this lives as a separate concept from caching allocator? Given that there is never a situation where BOTH allocators are used, it seems like having two separate thread local allocators is unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the use of thread local allocator here seems super questionable. If pointers escape from the thread local scope of the allocator, you will deallocate them with the wrong allocator here (which will probably cause a segfault). And we already have machinery for making sure you associate a correct deleter with each allocation (that's what DataPtr is for), so I'm not sure why you have to do it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason this lives as a separate concept from caching allocator? Given that there is never a situation where BOTH allocators are used, it seems like having two separate thread local allocators is unnecessary?
Thanks Ed for the comment. Idea with profiling allocator is that you create a plan offline and use that plan at runtime, throwing exception when it fails. This PR introduces only the first part, where allocation plan does not have to be serialized with model but later we want to introduce those changes so as to not have to run profiling and allocation planning on mobile. Caching allocator on the other hand is model independent and it allows user to implements its own allocator by making caching allocator a virtual class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pointers escape from the thread local scope of the allocator
Can you explain what do you mean by this?
If you mean that pointer is allocated within the thread local scope but deallocated outside, the correct behavior should be that such an allocation is not managed by profiling allocator. If there is a bug that does not ensure that behavior we need to fix that, but the design is that only allocations that are allocated and freed within this thread local scope are managed by profiling allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces only the first part, where allocation plan does not have to be serialized with model but later we want to introduce those changes so as to not have to run profiling and allocation planning on mobile.
OK, I'll trust you on this. My general point was that if you only have one thread local allocator, you might be able to manage this at a higher level, in the selection of what allocator you register to the thread local at any given point. But maybe this is not so easy to do in the future.
if (allocator_ptr != nullptr) { | ||
data = allocator_ptr->allocate(alloc_size); | ||
} else if (profiling_allocator_ptr != nullptr) { | ||
data = profiling_allocator_ptr->allocate(alloc_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing from https://github.com/pytorch/pytorch/pull/43951/files#r492446863 the way to fix this problem is to change the API of the allocate
indirection here to return a DataPtr instead of a void*
pointer. Then the thread local profiling allocator that was in scope at this time can provide an appropriate deleter (rather than having to do the lookup again upon deletion). You can then dispense with the deleter
static method entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I said in my previous comments profiling allocator manages only those allocations which are allocated + freed within the scope. However, your point might still be valid, so I need to think if this is a better way to do the same thing. I will make changes accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can believe that under your intended usage pattern, the failure case that I had described would not happen. But if that's the precondition, you are at very least obligated to document the precondition (I don't think it is documented right now); and furthermore, the framework goes out of its way to make it easy for you to do things correctly even the more general situation when allocations escape the scope. (In some sense, the latter point makes it more important to handle this correctly, as most people will assume your described precondition doesn't need to hold for all allocators!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can believe that under your intended usage pattern, the failure case that I had described would not happen.
To make sure we understand each other. What I mean is that if you use profiling allocator and utils to generate allocation plan and use that plan for allocation, it will not manage allocations whose lifetimes are outside the scope of the allocator. I will add comments to the effect, such that user would know that not all allocations made inside the scope are managed by profiling allocator.
In some sense, the latter point makes it more important to handle this correctly, as most people will assume your described precondition doesn't need to hold for all allocators.
I can agree with this. But when you say "handle this correctly" do you mean that, "handle correctly even when the precondition (that allocations with lifetime outside the scope are not managed by the allocator) does not need to be held true"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that if you use profiling allocator and utils to generate allocation plan and use that plan for allocation, it will not manage allocations whose lifetimes are outside the scope of the allocator.
Yes. Though, I thought I saw some comments that implied that the profiling process can also detect allocations that escape the scope (you assign them int max ending, right?)
But when you say "handle this correctly" do you mean that, "handle correctly even when the precondition (that allocations with lifetime outside the scope are not managed by the allocator) does not need to be held true"?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, I thought I saw some comments that implied that the profiling process can also detect allocations that escape the scope (you assign them int max ending, right?)
That is correct.
c10/core/CPUProfilingAllocator.cpp
Outdated
MemBlock(uint64_t s, uint64_t e) : start_offset(s), end_offset(e) {} | ||
bool operator<(const MemBlock& other) const { | ||
return (start_offset < other.start_offset && | ||
end_offset <= other.start_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't a block be considered non-overlapping if it is zero size, even if start_offset == other.start_offset
? What sense is this <
overload being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is true, then the test here can be as simple as end_offset <= other.start_offset
, right? (Assuming that start_offset <= end_offset
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is used to make sure that allocation plan has not allocated overlapping regions. I am doing this by creating binary tree in which insertions are made. Two allocations are considered overlapping and thus will return false for <
if when comparison is made for insertion.
I think your observation is correct. Thanks for commenting. I will fix that.
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]
Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1c1b2d3d87130a57431a9455b4cb18a4935bdbd5 Pull Request resolved: #43951
} | ||
auto start_offset = allocation_offsets[i]; | ||
auto end_offset = allocation_offsets[i] + allocation_sizes[i]; | ||
if (!allocations.emplace(start_offset, end_offset).second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly does this validation check? That not two allocations land in exactly the same spot? But they could if one of them got deallocated and reallocated, right? I'd imagine that you want to check actual alloc/dealloc across time instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch @dzhulgakov. I thought I covered this but clearly not. I will fix it.
class C10_API AllocationPlan { | ||
private: | ||
// Records size of each allocation by their sequential allocation ids. | ||
std::vector<uint64_t> allocation_sizes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for future: in general vector of structs is usually more readable than a bunch of vectors
This pull request has been merged in a09e109. |
Stack from ghstack:
Summary:
AllocationPlan:
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:
AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.
CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.
Test Plan:
cpu_profiling_allocator_test on mobile.
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D23451019