-
Notifications
You must be signed in to change notification settings - Fork 0
initiate zero-copy output. #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: in_place_dynamic
Are you sure you want to change the base?
initiate zero-copy output. #62
Conversation
using OutputAllocatorCPtr = std::shared_ptr<const OutputAllocator>; | ||
|
||
|
||
class OutputMemoryMngr : public IMemoryMngr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutputMemoryMngr
should inherit IMemoryMngrObserver
to be used polymorphically with other IMemoryMngrObserver
implementations.
src/plugins/intel_cpu/src/graph.cpp
Outdated
for (auto& edge : edge_clusters[box.id]) { | ||
if (edge->getChild()->getType() == Type::Output) { | ||
isOutGrp++; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply use std::any_of
for checking if the cluster contains an output edge.
if (output != outputNodesMap.end()) { | ||
auto parentEdge = output->second->getParentEdgeAt(0); | ||
|
||
if (graph->hasDynamicInput()) { // TODO: internal dynamism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check is the graph dynamic, one may use its status.
src/plugins/intel_cpu/src/graph.cpp
Outdated
if (isOutGrp) { | ||
IE_ASSERT(isOutGrp==1); // reuse_io_tensors false | ||
grpMemMngr = | ||
std::make_shared<OutputMemoryMngr>(std::unique_ptr<MemoryMngrWithReuse>(new MemoryMngrWithReuse())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This memory manager should be stored in the graph in a container of objects of the specific OutputMemoryMngr
type. So that, the infer requests can be able to get access to those mem managers.
if (outMemMngr != nullptr) { | ||
outMemMngr->setMemDesc(desc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, this should not work that way. The memory descriptor may not reflect the memory size to be allocated. Please consider inPlace concat chains. As it was agreed, the reallocation should happen using the size requested from the memory manager, via OutputMemoryMngr::resize
wrapped call.
auto outblob = InferRequest::GetBlob(it.first); | ||
|
||
outputAllocators[it.first] = std::make_shared<OutputAllocator>(outblob); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dependency must be inverted. The blob should be initialized with the allocator, not vice versa.
|
||
void OutputMemoryMngr::setExtBuff(void* ptr, size_t size) { | ||
if (m_allocator) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid silent action skipping. Either replace the memory pointer in the allocator, or throw.
namespace ov { | ||
namespace intel_cpu { | ||
|
||
class OutputAllocator : public Allocator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are going to use OutputMemoryMngr
over Allocator
we do not need this class at all.
std::map<std::string, NodePtr> inputNodesMap; | ||
std::map<std::string, NodePtr> outputNodesMap; | ||
|
||
std::map<std::string, MemoryMngrPtr> outputNodesMemMngrMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to use unordered_map
. Also it does make more sense to store the specific OutputMemoryMngr
type to avoid type convertions.
dynBatch = newDynBatch; | ||
} | ||
|
||
Status getDynStatus() const {return status;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename to simply getStatus
MemoryMngrPtr grpMemMngr; | ||
grpMemMngr = | ||
std::make_shared<DnnlMemoryMngr>(std::unique_ptr<MemoryMngrWithReuse>(new MemoryMngrWithReuse())); | ||
|
||
// deternmine a group with outputs. | ||
size_t isOutGrp = 0; | ||
int64_t outBoxId = -1; | ||
for (auto& box : group) { | ||
if (std::any_of( | ||
edge_clusters[box.id].begin(), | ||
edge_clusters[box.id].end(), | ||
[box](const ov::intel_cpu::EdgePtr edge) { | ||
return edge->getChild()->getType() == Type::Output; | ||
})) { | ||
isOutGrp++; | ||
outBoxId = box.id; | ||
} | ||
} | ||
if (isOutGrp) { | ||
IE_ASSERT(isOutGrp==1); // reuse_io_tensors false | ||
grpMemMngr = | ||
std::make_shared<OutputMemoryMngr>(grpMemMngr); | ||
DEBUG_LOG(grpMemMngr); | ||
|
||
// Store the output memory managers. | ||
// So that, the infer requests can be able to get access to them. | ||
for (auto& edge : edge_clusters[outBoxId]) { | ||
const auto child = edge->getChild(); | ||
if (child->getType() == Type::Output) { | ||
for (auto &output : outputNodesMap) { | ||
if (output.second == child) outputNodesMemMngrMap[output.first] = grpMemMngr; | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part must be handled in a different way.
Even before the static memory initialization on L814 we have to do the following:
- Search for all the Output edges.
- Allocate all the found edges with the status
Edge::Status::NeedAllocation
using theOutputMemoryMngr
- Store output memory managers used for initialization of the output edges to provide access from infer requests.
The rest of the code frame may remains the same, as those output edges now haveAllocated
Status and will be skipped during static and dynamic memory initialization routines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxnick There is one problem wrt your suggestion.
An output edge might share the memory from a base edge, in which case the output edge is marked "Status::NotAllocated". Then this output edge will slip and won't use OutputMemoryMngr.
|
||
const auto &currDesc = edges[0]->getMemory().getDesc(); | ||
if (currDesc.getShape().isStatic() && currDesc.getShape().getStaticDims() == newOutputShape) | ||
if (currDesc.getShape().isStatic() && currDesc.getShape().getStaticDims() == newOutputShape && !forceUpdateShape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The corner stone of the software development is the isolation of complexity. Using this flag we exposing complexity of the memory relationship to the node. Let's try to avoid such design flow.
It seems that we can handle the memory resize on the OutputMemoryDesc
level. We can register the fact that the allocator was reset and call allocate either on demand (when get ptr is called) or when resize is called. That means we will not have to handle this nuance outside the memory manager implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxnick yes, the existing design is simply a quick try and needs optimizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem here if we allocate when get ptr is called -
There is no info on how much memory should be allocated in the manager, as there is no memdesc, no size of previous allocator, etc.. unless we invent a new interface in IMemoryMngr to get this information.
Another problem is getRawPtr() is const, which shouldn't change the object itself.
handle the memory resize on the OutputMemoryDesc level
I don't quite understand this well. The output edge may share memory with other edges, and each edge has its memory object and memory desc. How to let all of them know the updates from manager? Or we could assume there is no need to update memdesc?
cd818cd
to
bfaef7c
Compare
97b8873
to
8c87601
Compare
Details:
Tickets: