Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling allocator for mobile. #43951

Closed
wants to merge 7 commits into from

Conversation

kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Sep 1, 2020

Stack from ghstack:

Summary:
AllocationPlan:
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:

  • allocation sizes
  • allocation lifetimes
  • allocation offsets
  • total size
    AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.

CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23451019

Summary:
AllocationPlan: Stores the sequence of allocations, their sizes
                and liftime of the allocations. Along with this
                it also stores the total size of a single memory
                blob, total_size, required to satisfy all the allocations.
                It also stores the offsets in the blob, of size
                total_size, corresponding to each allocation.
                Thus allocation plan contains:
                - allocation sizes
                - allocation lifetimes
                - allocation offsets
                - total size
AllocationPlaner: Takes a pointer to the allocation plan and fills
                  it ups with plan, i.e. sizes, lifetimes, offsets,
                  total size.
                  This is done via WithProfileAllocationsGuard which
                  takes in AllocationPlan* and constructs
                  AllocationPlanner* and set the thread local
                  allocation_planner to it.
                  MobileCPUAllocator profiles allocations via
                  allocation_planner.
                  In WithValidateAllocationsGuard, allocations profiled
                  in the allocation plan are validated.
CPUProfilingAllocator:
Application owns CPUProfilingAllocator
Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator
and AllocationPlan created earlier. Then CPUProfilingAllocator will
manage allocations and frees according to the plan. Allocations that
are not managed by CPUProfilingAllocator will be routed through
c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 1, 2020
Summary:
AllocationPlan: Stores the sequence of allocations, their sizes
                and liftime of the allocations. Along with this
                it also stores the total size of a single memory
                blob, total_size, required to satisfy all the allocations.
                It also stores the offsets in the blob, of size
                total_size, corresponding to each allocation.
                Thus allocation plan contains:
                - allocation sizes
                - allocation lifetimes
                - allocation offsets
                - total size
AllocationPlaner: Takes a pointer to the allocation plan and fills
                  it ups with plan, i.e. sizes, lifetimes, offsets,
                  total size.
                  This is done via WithProfileAllocationsGuard which
                  takes in AllocationPlan* and constructs
                  AllocationPlanner* and set the thread local
                  allocation_planner to it.
                  MobileCPUAllocator profiles allocations via
                  allocation_planner.
                  In WithValidateAllocationsGuard, allocations profiled
                  in the allocation plan are validated.
CPUProfilingAllocator:
Application owns CPUProfilingAllocator
Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator
and AllocationPlan created earlier. Then CPUProfilingAllocator will
manage allocations and frees according to the plan. Allocations that
are not managed by CPUProfilingAllocator will be routed through
c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 8654824fc70498babdc2f52b0f54d50647ff8cfc
Pull Request resolved: #43951
@dr-ci
Copy link

dr-ci bot commented Sep 1, 2020

💊 CI failures summary and remediations

As of commit 1f0cb92 (more details on the Dr. CI page):


Commit 1f0cb92 was recently pushed. Waiting for builds...


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 12 times.

@codecov
Copy link

codecov bot commented Sep 1, 2020

Codecov Report

❗ No coverage uploaded for pull request base (gh/kimishpatel/24/base@45ddeb5). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@                    Coverage Diff                    @@
##             gh/kimishpatel/24/base   #43951   +/-   ##
=========================================================
  Coverage                          ?   67.91%           
=========================================================
  Files                             ?      384           
  Lines                             ?    49839           
  Branches                          ?        0           
=========================================================
  Hits                              ?    33849           
  Misses                            ?    15990           
  Partials                          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45ddeb5...87e1d08. Read the comment docs.

// Records size of each allocation by their sequential allocation ids.
std::vector<uint64_t> allocation_sizes;
// This maps one allocation id (X) to another allocation id (Y).
// Allocation X is alive until allocation Y. From allocation Y onwards
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that X can be used to satisfy Y, or not? From my read of the code, the answer is yes, so maybe we should document that Y is the id of the first allocation after X is freed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the comment after the addition of "Thus Y is the id of the first allocation after X is freed." I felt that dreiss's first statement, "X can be used to satisfy Y" was the clearest explanation of the lifetimes here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add that if it helps clarify but the reason I did not was because we are not really creating aliases of allocation ids.


/*
* Usage: Validate allocation plan made with WithProfileAllocationGuard
* bool plan_validation_success, succes = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"success"

* bool plan_validation_success, succes = true;
* for (some number of representative inputs)
* {
* WithValidateAllocationPlanGaurd(&plan, &plan_validation_success);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Guard"

* {
* WithValidateAllocationPlanGaurd(&plan, &plan_validation_success);
* module.forward(...);
* success = success && plan_validation_success;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"success" seems unnecessary here.

Also, why not throw on failure, rather than returning a bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly so that user can decide note to use ProfilingAllocator based on the result of success. Other option is to have user run that in try.. catch block and make the same decision based on the outcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceptions really slow, so if you expect this to be failing on a regular basis, bool return is much better.

* WithProfilingAllocatorGuard allocator_guard(&profiling_allocator, &plan);
* module.forward(...);
* }
* plan now contains allocation plan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this last line of the comment is wrong.

// Nesting of allocation profiling does not seem meanigful.
TORCH_CHECK(allocation_planner == nullptr,
"Nesting profiling allocations is not supported.");
planner_ = std::make_unique<AllocationPlanner>(plan);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allocate this on the heap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want AllocationPlanner default constructible so that it forces you to provide a valid plan pointer. Not sure if that is a good design, but thats what lead to this.

// Free being recorded was allocated outside of WithProfileAllocationGuard
return;
}
auto id = (*it).second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it->second

const uint64_t size, const void* ptr) {
if (allocation_id_ >= allocation_plan_->allocation_sizes.size() ||
allocation_plan_->allocation_sizes[allocation_id_] != size) {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to throw here with details of the mismatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for the usage I have in mind, I still want to return boolean. Let me add warning with details.

Comment on lines 268 to 269
allocation_plan_->allocation_offsets[i] +
allocation_plan_->allocation_sizes[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to extract this expression into a local variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto limit = allocation_plan_->allocation_offsets[i] + allocation_plan_->allocation_sizes[i];
allocation_plan_->total_size = std::max(allocation_plan_->total_size, limit);

Comment on lines 53 to 55
std::vector<MemEvent> create_and_sort_mem_events(
const std::vector<uint64_t>& allocation_sizes,
const std::vector<uint64_t>& allocation_lifetimes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review this carefully since it's the lowest-risk part of this change.

I'd suggest following up with the Glow team to see if they have better algorithms for this.

Summary:
AllocationPlan: 
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:
- allocation sizes
- allocation lifetimes
- allocation offsets
- total size
AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.

CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 14, 2020
Summary:
AllocationPlan: Stores the sequence of allocations, their sizes
                and liftime of the allocations. Along with this
                it also stores the total size of a single memory
                blob, total_size, required to satisfy all the allocations.
                It also stores the offsets in the blob, of size
                total_size, corresponding to each allocation.
                Thus allocation plan contains:
                - allocation sizes
                - allocation lifetimes
                - allocation offsets
                - total size
AllocationPlaner: Takes a pointer to the allocation plan and fills
                  it ups with plan, i.e. sizes, lifetimes, offsets,
                  total size.
                  This is done via WithProfileAllocationsGuard which
                  takes in AllocationPlan* and constructs
                  AllocationPlanner* and set the thread local
                  allocation_planner to it.
                  MobileCPUAllocator profiles allocations via
                  allocation_planner.
                  In WithValidateAllocationsGuard, allocations profiled
                  in the allocation plan are validated.
CPUProfilingAllocator:
Application owns CPUProfilingAllocator
Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator
and AllocationPlan created earlier. Then CPUProfilingAllocator will
manage allocations and frees according to the plan. Allocations that
are not managed by CPUProfilingAllocator will be routed through
c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 847aea3a69bf269f148a2c54bddc71a5292be2c2
Pull Request resolved: #43951
@ezyang ezyang self-requested a review September 16, 2020 21:03
Summary:
AllocationPlan: 
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:
- allocation sizes
- allocation lifetimes
- allocation offsets
- total size
AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.

CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 16, 2020
Summary:
AllocationPlan: Stores the sequence of allocations, their sizes
                and liftime of the allocations. Along with this
                it also stores the total size of a single memory
                blob, total_size, required to satisfy all the allocations.
                It also stores the offsets in the blob, of size
                total_size, corresponding to each allocation.
                Thus allocation plan contains:
                - allocation sizes
                - allocation lifetimes
                - allocation offsets
                - total size
AllocationPlaner: Takes a pointer to the allocation plan and fills
                  it ups with plan, i.e. sizes, lifetimes, offsets,
                  total size.
                  This is done via WithProfileAllocationsGuard which
                  takes in AllocationPlan* and constructs
                  AllocationPlanner* and set the thread local
                  allocation_planner to it.
                  MobileCPUAllocator profiles allocations via
                  allocation_planner.
                  In WithValidateAllocationsGuard, allocations profiled
                  in the allocation plan are validated.
CPUProfilingAllocator:
Application owns CPUProfilingAllocator
Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator
and AllocationPlan created earlier. Then CPUProfilingAllocator will
manage allocations and frees according to the plan. Allocations that
are not managed by CPUProfilingAllocator will be routed through
c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 0b1cdb7a0ea108c5c48acc93556120eb4b5264e4
Pull Request resolved: #43951
Summary:
AllocationPlan: 
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:
- allocation sizes
- allocation lifetimes
- allocation offsets
- total size
AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.

CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 17, 2020
Summary:
AllocationPlan: Stores the sequence of allocations, their sizes
                and liftime of the allocations. Along with this
                it also stores the total size of a single memory
                blob, total_size, required to satisfy all the allocations.
                It also stores the offsets in the blob, of size
                total_size, corresponding to each allocation.
                Thus allocation plan contains:
                - allocation sizes
                - allocation lifetimes
                - allocation offsets
                - total size
AllocationPlaner: Takes a pointer to the allocation plan and fills
                  it ups with plan, i.e. sizes, lifetimes, offsets,
                  total size.
                  This is done via WithProfileAllocationsGuard which
                  takes in AllocationPlan* and constructs
                  AllocationPlanner* and set the thread local
                  allocation_planner to it.
                  MobileCPUAllocator profiles allocations via
                  allocation_planner.
                  In WithValidateAllocationsGuard, allocations profiled
                  in the allocation plan are validated.
CPUProfilingAllocator:
Application owns CPUProfilingAllocator
Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator
and AllocationPlan created earlier. Then CPUProfilingAllocator will
manage allocations and frees according to the plan. Allocations that
are not managed by CPUProfilingAllocator will be routed through
c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: db838c2bba75a97d8f9954080f62ee99cb27176a
Pull Request resolved: #43951
@@ -156,13 +157,20 @@ class DefaultMobileCPUAllocator final : public at::Allocator {
// TODO: enable with better TLS support on mobile
// profiledCPUMemoryReporter().Delete(pointer);
auto allocator_ptr = GetThreadLocalCachingAllocator();
auto profiling_allocator_ptr = GetThreadLocalProfilingAllocator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason this lives as a separate concept from caching allocator? Given that there is never a situation where BOTH allocators are used, it seems like having two separate thread local allocators is unnecessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the use of thread local allocator here seems super questionable. If pointers escape from the thread local scope of the allocator, you will deallocate them with the wrong allocator here (which will probably cause a segfault). And we already have machinery for making sure you associate a correct deleter with each allocation (that's what DataPtr is for), so I'm not sure why you have to do it this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason this lives as a separate concept from caching allocator? Given that there is never a situation where BOTH allocators are used, it seems like having two separate thread local allocators is unnecessary?

Thanks Ed for the comment. Idea with profiling allocator is that you create a plan offline and use that plan at runtime, throwing exception when it fails. This PR introduces only the first part, where allocation plan does not have to be serialized with model but later we want to introduce those changes so as to not have to run profiling and allocation planning on mobile. Caching allocator on the other hand is model independent and it allows user to implements its own allocator by making caching allocator a virtual class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pointers escape from the thread local scope of the allocator

Can you explain what do you mean by this?

If you mean that pointer is allocated within the thread local scope but deallocated outside, the correct behavior should be that such an allocation is not managed by profiling allocator. If there is a bug that does not ensure that behavior we need to fix that, but the design is that only allocations that are allocated and freed within this thread local scope are managed by profiling allocator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces only the first part, where allocation plan does not have to be serialized with model but later we want to introduce those changes so as to not have to run profiling and allocation planning on mobile.

OK, I'll trust you on this. My general point was that if you only have one thread local allocator, you might be able to manage this at a higher level, in the selection of what allocator you register to the thread local at any given point. But maybe this is not so easy to do in the future.

if (allocator_ptr != nullptr) {
data = allocator_ptr->allocate(alloc_size);
} else if (profiling_allocator_ptr != nullptr) {
data = profiling_allocator_ptr->allocate(alloc_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing from https://github.com/pytorch/pytorch/pull/43951/files#r492446863 the way to fix this problem is to change the API of the allocate indirection here to return a DataPtr instead of a void* pointer. Then the thread local profiling allocator that was in scope at this time can provide an appropriate deleter (rather than having to do the lookup again upon deletion). You can then dispense with the deleter static method entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as I said in my previous comments profiling allocator manages only those allocations which are allocated + freed within the scope. However, your point might still be valid, so I need to think if this is a better way to do the same thing. I will make changes accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can believe that under your intended usage pattern, the failure case that I had described would not happen. But if that's the precondition, you are at very least obligated to document the precondition (I don't think it is documented right now); and furthermore, the framework goes out of its way to make it easy for you to do things correctly even the more general situation when allocations escape the scope. (In some sense, the latter point makes it more important to handle this correctly, as most people will assume your described precondition doesn't need to hold for all allocators!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can believe that under your intended usage pattern, the failure case that I had described would not happen.

To make sure we understand each other. What I mean is that if you use profiling allocator and utils to generate allocation plan and use that plan for allocation, it will not manage allocations whose lifetimes are outside the scope of the allocator. I will add comments to the effect, such that user would know that not all allocations made inside the scope are managed by profiling allocator.

In some sense, the latter point makes it more important to handle this correctly, as most people will assume your described precondition doesn't need to hold for all allocators.

I can agree with this. But when you say "handle this correctly" do you mean that, "handle correctly even when the precondition (that allocations with lifetime outside the scope are not managed by the allocator) does not need to be held true"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that if you use profiling allocator and utils to generate allocation plan and use that plan for allocation, it will not manage allocations whose lifetimes are outside the scope of the allocator.

Yes. Though, I thought I saw some comments that implied that the profiling process can also detect allocations that escape the scope (you assign them int max ending, right?)

But when you say "handle this correctly" do you mean that, "handle correctly even when the precondition (that allocations with lifetime outside the scope are not managed by the allocator) does not need to be held true"?

Yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, I thought I saw some comments that implied that the profiling process can also detect allocations that escape the scope (you assign them int max ending, right?)

That is correct.

MemBlock(uint64_t s, uint64_t e) : start_offset(s), end_offset(e) {}
bool operator<(const MemBlock& other) const {
return (start_offset < other.start_offset &&
end_offset <= other.start_offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't a block be considered non-overlapping if it is zero size, even if start_offset == other.start_offset? What sense is this < overload being used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true, then the test here can be as simple as end_offset <= other.start_offset, right? (Assuming that start_offset <= end_offset)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is used to make sure that allocation plan has not allocated overlapping regions. I am doing this by creating binary tree in which insertions are made. Two allocations are considered overlapping and thus will return false for < if when comparison is made for insertion.
I think your observation is correct. Thanks for commenting. I will fix that.

Summary:
AllocationPlan: 
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:
- allocation sizes
- allocation lifetimes
- allocation offsets
- total size
AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.

CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019)

[ghstack-poisoned]
Summary:
AllocationPlan: 
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:
- allocation sizes
- allocation lifetimes
- allocation offsets
- total size
AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.

CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019)

[ghstack-poisoned]
Summary:
AllocationPlan: 
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:
- allocation sizes
- allocation lifetimes
- allocation offsets
- total size
AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.

CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Oct 6, 2020
Summary:
AllocationPlan: Stores the sequence of allocations, their sizes
                and liftime of the allocations. Along with this
                it also stores the total size of a single memory
                blob, total_size, required to satisfy all the allocations.
                It also stores the offsets in the blob, of size
                total_size, corresponding to each allocation.
                Thus allocation plan contains:
                - allocation sizes
                - allocation lifetimes
                - allocation offsets
                - total size
AllocationPlaner: Takes a pointer to the allocation plan and fills
                  it ups with plan, i.e. sizes, lifetimes, offsets,
                  total size.
                  This is done via WithProfileAllocationsGuard which
                  takes in AllocationPlan* and constructs
                  AllocationPlanner* and set the thread local
                  allocation_planner to it.
                  MobileCPUAllocator profiles allocations via
                  allocation_planner.
                  In WithValidateAllocationsGuard, allocations profiled
                  in the allocation plan are validated.
CPUProfilingAllocator:
Application owns CPUProfilingAllocator
Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator
and AllocationPlan created earlier. Then CPUProfilingAllocator will
manage allocations and frees according to the plan. Allocations that
are not managed by CPUProfilingAllocator will be routed through
c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 1c1b2d3d87130a57431a9455b4cb18a4935bdbd5
Pull Request resolved: #43951
}
auto start_offset = allocation_offsets[i];
auto end_offset = allocation_offsets[i] + allocation_sizes[i];
if (!allocations.emplace(start_offset, end_offset).second) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly does this validation check? That not two allocations land in exactly the same spot? But they could if one of them got deallocated and reallocated, right? I'd imagine that you want to check actual alloc/dealloc across time instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch @dzhulgakov. I thought I covered this but clearly not. I will fix it.

class C10_API AllocationPlan {
private:
// Records size of each allocation by their sequential allocation ids.
std::vector<uint64_t> allocation_sizes;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for future: in general vector of structs is usually more readable than a bunch of vectors

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in a09e109.

@facebook-github-bot facebook-github-bot deleted the gh/kimishpatel/24/head branch October 10, 2020 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants