Profiling allocator for mobile. #43951

kimishpatel · 2020-09-01T15:36:40Z

Stack from ghstack:

Profiling allocator for mobile. #43951 Profiling allocator for mobile.

Summary:
AllocationPlan:
Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation.
Thus allocation plan contains:

allocation sizes
allocation lifetimes
allocation offsets
total size
AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated.

CPUProfilingAllocator:
Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu.

Test Plan:
cpu_profiling_allocator_test on mobile.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23451019

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8654824fc70498babdc2f52b0f54d50647ff8cfc Pull Request resolved: #43951

dr-ci · 2020-09-01T17:28:17Z

💊 CI failures summary and remediations

As of commit 1f0cb92 (more details on the Dr. CI page):

Commit 1f0cb92 was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 12 times.

codecov · 2020-09-01T18:34:28Z

Codecov Report

❗ No coverage uploaded for pull request base (gh/kimishpatel/24/base@45ddeb5). Click here to learn what that means.
The diff coverage is n/a.

@@                    Coverage Diff                    @@
##             gh/kimishpatel/24/base   #43951   +/-   ##
=========================================================
  Coverage                          ?   67.91%           
=========================================================
  Files                             ?      384           
  Lines                             ?    49839           
  Branches                          ?        0           
=========================================================
  Hits                              ?    33849           
  Misses                            ?    15990           
  Partials                          ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45ddeb5...87e1d08. Read the comment docs.

dreiss · 2020-09-10T00:12:40Z

c10/core/CPUProfilingAllocator.h

+    // Records size of each allocation by their sequential allocation ids.
+    std::vector<uint64_t> allocation_sizes;
+    // This maps one allocation id (X) to another allocation id (Y).
+    // Allocation X is alive until allocation Y. From allocation Y onwards


Does this mean that X can be used to satisfy Y, or not? From my read of the code, the answer is yes, so maybe we should document that Y is the id of the first allocation after X is freed.

I read the comment after the addition of "Thus Y is the id of the first allocation after X is freed." I felt that dreiss's first statement, "X can be used to satisfy Y" was the clearest explanation of the lifetimes here.

I can add that if it helps clarify but the reason I did not was because we are not really creating aliases of allocation ids.

dreiss · 2020-09-10T00:15:52Z

c10/core/CPUProfilingAllocator.h

+
+/*
+ * Usage: Validate allocation plan made with WithProfileAllocationGuard
+ * bool plan_validation_success, succes = true;


dreiss · 2020-09-10T00:16:01Z

c10/core/CPUProfilingAllocator.h

+ * bool plan_validation_success, succes = true;
+ * for (some number of representative inputs)
+ * {
+ *   WithValidateAllocationPlanGaurd(&plan, &plan_validation_success);


dreiss · 2020-09-10T00:16:33Z

c10/core/CPUProfilingAllocator.h

+ * {
+ *   WithValidateAllocationPlanGaurd(&plan, &plan_validation_success);
+ *   module.forward(...);
+ *   success = success && plan_validation_success;


"success" seems unnecessary here.

Also, why not throw on failure, rather than returning a bool?

Mainly so that user can decide note to use ProfilingAllocator based on the result of success. Other option is to have user run that in try.. catch block and make the same decision based on the outcome.

Exceptions really slow, so if you expect this to be failing on a regular basis, bool return is much better.

dreiss · 2020-09-10T00:17:07Z

c10/core/CPUProfilingAllocator.h

+ *   WithProfilingAllocatorGuard allocator_guard(&profiling_allocator, &plan);
+ *   module.forward(...);
+ * }
+ * plan now contains allocation plan.


I think this last line of the comment is wrong.

dreiss · 2020-09-10T00:38:25Z

c10/core/CPUProfilingAllocator.cpp

+  // Nesting of allocation profiling does not seem meanigful.
+  TORCH_CHECK(allocation_planner == nullptr,
+      "Nesting profiling allocations is not supported.");
+  planner_ = std::make_unique<AllocationPlanner>(plan);


Why allocate this on the heap?

I did not want AllocationPlanner default constructible so that it forces you to provide a valid plan pointer. Not sure if that is a good design, but thats what lead to this.

dreiss · 2020-09-10T16:28:06Z

c10/core/CPUProfilingAllocator.cpp

+    // Free being recorded was allocated outside of WithProfileAllocationGuard
+    return;
+  }
+  auto id = (*it).second;


it->second

dreiss · 2020-09-10T16:28:49Z

c10/core/CPUProfilingAllocator.cpp

+    const uint64_t size, const void* ptr) {
+  if (allocation_id_ >= allocation_plan_->allocation_sizes.size() ||
+      allocation_plan_->allocation_sizes[allocation_id_] != size) {
+    return false;


Might want to throw here with details of the mismatch.

So for the usage I have in mind, I still want to return boolean. Let me add warning with details.

dreiss · 2020-09-10T16:29:56Z

c10/core/CPUProfilingAllocator.cpp

+        allocation_plan_->allocation_offsets[i] +
+        allocation_plan_->allocation_sizes[i];


Might want to extract this expression into a local variable.

auto limit = allocation_plan_->allocation_offsets[i] + allocation_plan_->allocation_sizes[i]; allocation_plan_->total_size = std::max(allocation_plan_->total_size, limit);

dreiss · 2020-09-10T16:34:15Z

c10/core/CPUProfilingAllocator.cpp

+std::vector<MemEvent> create_and_sort_mem_events(
+    const std::vector<uint64_t>& allocation_sizes,
+    const std::vector<uint64_t>& allocation_lifetimes) {


I didn't review this carefully since it's the lowest-risk part of this change.

I'd suggest following up with the Glow team to see if they have better algorithms for this.

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 847aea3a69bf269f148a2c54bddc71a5292be2c2 Pull Request resolved: #43951

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0b1cdb7a0ea108c5c48acc93556120eb4b5264e4 Pull Request resolved: #43951

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: db838c2bba75a97d8f9954080f62ee99cb27176a Pull Request resolved: #43951

ezyang · 2020-09-22T02:41:48Z

c10/core/CPUAllocator.cpp

@@ -156,13 +157,20 @@ class DefaultMobileCPUAllocator final : public at::Allocator {
    // TODO: enable with better TLS support on mobile
    // profiledCPUMemoryReporter().Delete(pointer);
    auto allocator_ptr = GetThreadLocalCachingAllocator();
+    auto profiling_allocator_ptr = GetThreadLocalProfilingAllocator();


Is there any particular reason this lives as a separate concept from caching allocator? Given that there is never a situation where BOTH allocators are used, it seems like having two separate thread local allocators is unnecessary?

By the way, the use of thread local allocator here seems super questionable. If pointers escape from the thread local scope of the allocator, you will deallocate them with the wrong allocator here (which will probably cause a segfault). And we already have machinery for making sure you associate a correct deleter with each allocation (that's what DataPtr is for), so I'm not sure why you have to do it this way.

Is there any particular reason this lives as a separate concept from caching allocator? Given that there is never a situation where BOTH allocators are used, it seems like having two separate thread local allocators is unnecessary?

Thanks Ed for the comment. Idea with profiling allocator is that you create a plan offline and use that plan at runtime, throwing exception when it fails. This PR introduces only the first part, where allocation plan does not have to be serialized with model but later we want to introduce those changes so as to not have to run profiling and allocation planning on mobile. Caching allocator on the other hand is model independent and it allows user to implements its own allocator by making caching allocator a virtual class.

If pointers escape from the thread local scope of the allocator

Can you explain what do you mean by this?

If you mean that pointer is allocated within the thread local scope but deallocated outside, the correct behavior should be that such an allocation is not managed by profiling allocator. If there is a bug that does not ensure that behavior we need to fix that, but the design is that only allocations that are allocated and freed within this thread local scope are managed by profiling allocator.

This PR introduces only the first part, where allocation plan does not have to be serialized with model but later we want to introduce those changes so as to not have to run profiling and allocation planning on mobile.

OK, I'll trust you on this. My general point was that if you only have one thread local allocator, you might be able to manage this at a higher level, in the selection of what allocator you register to the thread local at any given point. But maybe this is not so easy to do in the future.

ezyang · 2020-09-22T02:46:23Z

c10/core/CPUAllocator.cpp

    if (allocator_ptr != nullptr) {
      data = allocator_ptr->allocate(alloc_size);
+    } else if (profiling_allocator_ptr != nullptr) {
+      data = profiling_allocator_ptr->allocate(alloc_size);


Continuing from https://github.com/pytorch/pytorch/pull/43951/files#r492446863 the way to fix this problem is to change the API of the allocate indirection here to return a DataPtr instead of a void* pointer. Then the thread local profiling allocator that was in scope at this time can provide an appropriate deleter (rather than having to do the lookup again upon deletion). You can then dispense with the deleter static method entirely.

So as I said in my previous comments profiling allocator manages only those allocations which are allocated + freed within the scope. However, your point might still be valid, so I need to think if this is a better way to do the same thing. I will make changes accordingly.

Yes, I can believe that under your intended usage pattern, the failure case that I had described would not happen. But if that's the precondition, you are at very least obligated to document the precondition (I don't think it is documented right now); and furthermore, the framework goes out of its way to make it easy for you to do things correctly even the more general situation when allocations escape the scope. (In some sense, the latter point makes it more important to handle this correctly, as most people will assume your described precondition doesn't need to hold for all allocators!)

Yes, I can believe that under your intended usage pattern, the failure case that I had described would not happen.

To make sure we understand each other. What I mean is that if you use profiling allocator and utils to generate allocation plan and use that plan for allocation, it will not manage allocations whose lifetimes are outside the scope of the allocator. I will add comments to the effect, such that user would know that not all allocations made inside the scope are managed by profiling allocator.

In some sense, the latter point makes it more important to handle this correctly, as most people will assume your described precondition doesn't need to hold for all allocators.

I can agree with this. But when you say "handle this correctly" do you mean that, "handle correctly even when the precondition (that allocations with lifetime outside the scope are not managed by the allocator) does not need to be held true"?

What I mean is that if you use profiling allocator and utils to generate allocation plan and use that plan for allocation, it will not manage allocations whose lifetimes are outside the scope of the allocator.

Yes. Though, I thought I saw some comments that implied that the profiling process can also detect allocations that escape the scope (you assign them int max ending, right?)

But when you say "handle this correctly" do you mean that, "handle correctly even when the precondition (that allocations with lifetime outside the scope are not managed by the allocator) does not need to be held true"?

Yes.

Though, I thought I saw some comments that implied that the profiling process can also detect allocations that escape the scope (you assign them int max ending, right?)

That is correct.

ezyang · 2020-09-22T03:01:35Z

c10/core/CPUProfilingAllocator.cpp

+  MemBlock(uint64_t s, uint64_t e) : start_offset(s), end_offset(e) {}
+  bool operator<(const MemBlock& other) const {
+    return (start_offset < other.start_offset &&
+        end_offset <= other.start_offset);


Shouldn't a block be considered non-overlapping if it is zero size, even if start_offset == other.start_offset? What sense is this < overload being used?

If this is true, then the test here can be as simple as end_offset <= other.start_offset, right? (Assuming that start_offset <= end_offset)

So this is used to make sure that allocation plan has not allocated overlapping regions. I am doing this by creating binary tree in which insertions are made. Two allocations are considered overlapping and thus will return false for < if when comparison is made for insertion.
I think your observation is correct. Thanks for commenting. I will fix that.

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23451019](https://our.internmc.facebook.com/intern/diff/D23451019) [ghstack-poisoned]

Summary: AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1c1b2d3d87130a57431a9455b4cb18a4935bdbd5 Pull Request resolved: #43951

dzhulgakov · 2020-10-06T17:06:08Z

c10/mobile/CPUProfilingAllocator.cpp

+    }
+    auto start_offset = allocation_offsets[i];
+    auto end_offset = allocation_offsets[i] + allocation_sizes[i];
+    if (!allocations.emplace(start_offset, end_offset).second) {


what exactly does this validation check? That not two allocations land in exactly the same spot? But they could if one of them got deallocated and reallocated, right? I'd imagine that you want to check actual alloc/dealloc across time instead

Thanks for the catch @dzhulgakov. I thought I covered this but clearly not. I will fix it.

dzhulgakov · 2020-10-06T17:07:38Z

c10/mobile/CPUProfilingAllocator.h

+class C10_API AllocationPlan {
+  private:
+    // Records size of each allocation by their sequential allocation ids.
+    std::vector<uint64_t> allocation_sizes;


nit for future: in general vector of structs is usually more readable than a bunch of vectors

facebook-github-bot · 2020-10-06T18:19:27Z

This pull request has been merged in a09e109.

kimishpatel requested a review from dreiss September 1, 2020 15:42

dreiss approved these changes Sep 10, 2020

View reviewed changes

dreiss approved these changes Sep 15, 2020

View reviewed changes

ezyang self-requested a review September 16, 2020 21:03

ezyang mentioned this pull request Sep 21, 2020

Using external memory allocator with PyTorch #43144

Closed

ezyang reviewed Sep 22, 2020

View reviewed changes

facebook-github-bot closed this in a09e109 Oct 6, 2020

dzhulgakov reviewed Oct 6, 2020

View reviewed changes

facebook-github-bot added the Merged label Oct 6, 2020

facebook-github-bot deleted the gh/kimishpatel/24/head branch October 10, 2020 14:17

		allocation_plan_->allocation_offsets[i] +
		allocation_plan_->allocation_sizes[i];

Profiling allocator for mobile. #43951

Profiling allocator for mobile. #43951

Conversation

kimishpatel commented Sep 1, 2020 • edited

dr-ci bot commented Sep 1, 2020 • edited

💊 CI failures summary and remediations

codecov bot commented Sep 1, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kimishpatel Sep 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 6, 2020

kimishpatel commented Sep 1, 2020 •

edited

dr-ci bot commented Sep 1, 2020 •

edited

codecov bot commented Sep 1, 2020 •

edited

kimishpatel Sep 14, 2020 •

edited