Skip to content

Conversation

ceciliapeng2011
Copy link

Details:

  • item1
  • ...

Tickets:

  • ticket-id

using OutputAllocatorCPtr = std::shared_ptr<const OutputAllocator>;


class OutputMemoryMngr : public IMemoryMngr {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutputMemoryMngr should inherit IMemoryMngrObserver to be used polymorphically with other IMemoryMngrObserver implementations.

Comment on lines 892 to 896
for (auto& edge : edge_clusters[box.id]) {
if (edge->getChild()->getType() == Type::Output) {
isOutGrp++;
break;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply use std::any_of for checking if the cluster contains an output edge.

if (output != outputNodesMap.end()) {
auto parentEdge = output->second->getParentEdgeAt(0);

if (graph->hasDynamicInput()) { // TODO: internal dynamism
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check is the graph dynamic, one may use its status.

if (isOutGrp) {
IE_ASSERT(isOutGrp==1); // reuse_io_tensors false
grpMemMngr =
std::make_shared<OutputMemoryMngr>(std::unique_ptr<MemoryMngrWithReuse>(new MemoryMngrWithReuse()));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This memory manager should be stored in the graph in a container of objects of the specific OutputMemoryMngr type. So that, the infer requests can be able to get access to those mem managers.

Comment on lines 122 to 124
if (outMemMngr != nullptr) {
outMemMngr->setMemDesc(desc);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, this should not work that way. The memory descriptor may not reflect the memory size to be allocated. Please consider inPlace concat chains. As it was agreed, the reallocation should happen using the size requested from the memory manager, via OutputMemoryMngr::resize wrapped call.

Comment on lines 646 to 648
auto outblob = InferRequest::GetBlob(it.first);

outputAllocators[it.first] = std::make_shared<OutputAllocator>(outblob);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency must be inverted. The blob should be initialized with the allocator, not vice versa.


void OutputMemoryMngr::setExtBuff(void* ptr, size_t size) {
if (m_allocator) {
return;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid silent action skipping. Either replace the memory pointer in the allocator, or throw.

namespace ov {
namespace intel_cpu {

class OutputAllocator : public Allocator {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are going to use OutputMemoryMngr over Allocator we do not need this class at all.

@ceciliapeng2011 ceciliapeng2011 requested a review from maxnick June 13, 2023 05:21
@ceciliapeng2011 ceciliapeng2011 marked this pull request as draft June 15, 2023 01:19
std::map<std::string, NodePtr> inputNodesMap;
std::map<std::string, NodePtr> outputNodesMap;

std::map<std::string, MemoryMngrPtr> outputNodesMemMngrMap;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use unordered_map. Also it does make more sense to store the specific OutputMemoryMngr type to avoid type convertions.

dynBatch = newDynBatch;
}

Status getDynStatus() const {return status;}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to simply getStatus

Comment on lines 888 to 922
MemoryMngrPtr grpMemMngr;
grpMemMngr =
std::make_shared<DnnlMemoryMngr>(std::unique_ptr<MemoryMngrWithReuse>(new MemoryMngrWithReuse()));

// deternmine a group with outputs.
size_t isOutGrp = 0;
int64_t outBoxId = -1;
for (auto& box : group) {
if (std::any_of(
edge_clusters[box.id].begin(),
edge_clusters[box.id].end(),
[box](const ov::intel_cpu::EdgePtr edge) {
return edge->getChild()->getType() == Type::Output;
})) {
isOutGrp++;
outBoxId = box.id;
}
}
if (isOutGrp) {
IE_ASSERT(isOutGrp==1); // reuse_io_tensors false
grpMemMngr =
std::make_shared<OutputMemoryMngr>(grpMemMngr);
DEBUG_LOG(grpMemMngr);

// Store the output memory managers.
// So that, the infer requests can be able to get access to them.
for (auto& edge : edge_clusters[outBoxId]) {
const auto child = edge->getChild();
if (child->getType() == Type::Output) {
for (auto &output : outputNodesMap) {
if (output.second == child) outputNodesMemMngrMap[output.first] = grpMemMngr;
}
}
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part must be handled in a different way.
Even before the static memory initialization on L814 we have to do the following:

  1. Search for all the Output edges.
  2. Allocate all the found edges with the status Edge::Status::NeedAllocation using the OutputMemoryMngr
  3. Store output memory managers used for initialization of the output edges to provide access from infer requests.
    The rest of the code frame may remains the same, as those output edges now have Allocated Status and will be skipped during static and dynamic memory initialization routines.

Copy link
Author

@ceciliapeng2011 ceciliapeng2011 Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxnick There is one problem wrt your suggestion.
An output edge might share the memory from a base edge, in which case the output edge is marked "Status::NotAllocated". Then this output edge will slip and won't use OutputMemoryMngr.


const auto &currDesc = edges[0]->getMemory().getDesc();
if (currDesc.getShape().isStatic() && currDesc.getShape().getStaticDims() == newOutputShape)
if (currDesc.getShape().isStatic() && currDesc.getShape().getStaticDims() == newOutputShape && !forceUpdateShape)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corner stone of the software development is the isolation of complexity. Using this flag we exposing complexity of the memory relationship to the node. Let's try to avoid such design flow.
It seems that we can handle the memory resize on the OutputMemoryDesc level. We can register the fact that the allocator was reset and call allocate either on demand (when get ptr is called) or when resize is called. That means we will not have to handle this nuance outside the memory manager implementation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxnick yes, the existing design is simply a quick try and needs optimizing.

Copy link
Author

@ceciliapeng2011 ceciliapeng2011 Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem here if we allocate when get ptr is called -
There is no info on how much memory should be allocated in the manager, as there is no memdesc, no size of previous allocator, etc.. unless we invent a new interface in IMemoryMngr to get this information.

Another problem is getRawPtr() is const, which shouldn't change the object itself.

handle the memory resize on the OutputMemoryDesc level

I don't quite understand this well. The output edge may share memory with other edges, and each edge has its memory object and memory desc. How to let all of them know the updates from manager? Or we could assume there is no need to update memdesc?

@maxnick maxnick force-pushed the in_place_dynamic branch 3 times, most recently from cd818cd to bfaef7c Compare June 22, 2023 17:05
@maxnick maxnick force-pushed the in_place_dynamic branch 2 times, most recently from 97b8873 to 8c87601 Compare July 10, 2023 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants