-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV64] [GSoC] Integrate SHL eltwise ops into OV #25674
base: master
Are you sure you want to change the base?
Conversation
[RISCV64] add shlExecutor option for eltwise node [RISCV64] fix some errors with eltwise executor [RISCV64] add constructor def and vector initialization for ShlTensor [RISCV64] remove redundant debug print [RISCV64] skip failed tests [RISCV64] change way of invoking kernels [RISCV64] set shapes for ShlTensor before SHL add [RISCV64] simplify eltwise kernel invocation [RISCV64] integrate other eltwise ops [RISCV64] fix tests for some eltwise and activation ops [RISCV64] integrate PRelu and LeakyRelu op and fix related tests [RISCV64] integrate Maximum and Minimum op for the need of some tests [RISCV64] fix some tests [RISCV64] fix inaccurate problem with LeakyReLu op [RISCV64] fix some tests [RISCV64] enable debugging for riscv64 [RISCV64] update due to API changes
void setShape(const VectorDims& shape) { | ||
get()->dim_count = shape.size(); | ||
OPENVINO_ASSERT(get()->dim_count < MAX_DIM, "Shl supports shapes with rank less or equal to 8"); | ||
for (int i = 0; i < get()->dim_count; ++i) | ||
get()->dim[i] = static_cast<int32_t>(shape[i]); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, all setters must be in private sections for the safer development.
The source: #23964 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make it private and only leave cloneWithNewShape
public!
EltwiseAttrs eltwiseAttrs; | ||
std::shared_ptr<EltwiseExecutor> aclExecPtr = nullptr; | ||
std::shared_ptr<EltwiseExecutor> eltwiseExecPtr = nullptr; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: remove please extra line
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: correct me code style
} | |
else { | |
} else { |
const void *post_ops_data_) override; | ||
|
||
impl_desc_type getImplType() const override { | ||
return implType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just return impl_desc_type::shl
and don't add new attribute impl_desc_type implType
to the class because the value of this attribute cannot be changed. ShlEltwiseExecutor
should always return impl_desc_type::shl
as implementation type
@@ -2317,6 +2317,42 @@ void Eltwise::initSupportedPrimitiveDescriptors() { | |||
} | |||
outputPrecision = filterPrecision(outputPrecision, forcedPrec); | |||
} else { | |||
#endif | |||
#if defined(OV_CPU_WITH_SHL) | |||
const bool useShl = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your case useShl
is always true
when #if defined(OV_CPU_WITH_SHL)
. I think we can even remove useShl
.
Because as I said before, aarch64 has 2 backends for Eltwise op execution: JIT (just in time code generation) and ACL (arm compute library). So the execution depends on the variable useAcl
.
In our case we have only one backend - SHL (and reference c++ impl, for sure). So we don't need to split the pipeline -> we don't need to have the variable useShl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this useShl
variable is used to avoid the following code that corresponds to the reference situation. So Im not sure about whether to delete this variable.
|
||
switch (shlEltwiseAttrs.algorithm) { | ||
case Algorithm::EltwiseAdd: | ||
params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create the separate variable shl_api = CSINN_RVV
?
If we support another api (c906 or something else), we will be able to quickly update the value
break; | ||
case Algorithm::EltwiseSubtract: | ||
params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV); | ||
setInitFunc(csinn_sub_init, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really create the separate method setInitFunc
and the class attribute init_func
?
As far as I understand, we call these functions only once and in the same method Executor::init
- I mean that no need to save these functions as class attribute.
Then you can just make so:
#define INIT(func) \
OPENVINO_ASSERT((func) == CSINN_TRUE, "ShlEltwiseExecutor: failed to init");
case Algorithm::EltwiseSubtract:
params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV);
INIT(csinn_sub_init(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())));
break;
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a very clever way to implement, which is easy but effective. I agree that the init_func
does not seem to have the necessity to exist in the class now. Originally I implemented it in this way to store exec_func
during init
process, but seemingly the exec_func
can also be replaced in this way.
return true; | ||
}; | ||
|
||
switch (eltwiseAttrs.algorithm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, we can check that we support the current op:
if (!isEltwiseAlgorithmSupported()) {
DEBUG_LOG("Eltwise algorithm ", algToString(eltwiseAttrs.algorithm), " is not supported");
return false;
}
Then we can check precisions. At the moment, we support only fp32 precision, so we can make easier:
constexpr auto supported_prec = ov::element::f32;
auto is_precision_supported = [supported_prec](const MemoryDescPtr& desc) { return desc->getPrecision() == supported_prec; };
if (std::all_of(srcDescs.cbegin(), srcDescs.cend(), is_precision_supported) &&
std::all_of(dstDescs.cbegin(), dstDescs.cend(), is_precision_supported)) {
DEBUG_LOG("ShlEltwise supports only f32");
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If changed in this way, some tests will fail with the reason Actual: ref_f32, expected: shl_f32
. As far as I see, there are no obvious differences between the two implementation. Im trying to figure it out later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like I just missed '!' in the condition. Because my suggested condition looks like: if all precisions are supported, executor doesn't support this configuration. For sure, it's incorrect 😄
Just try to add negation to the condition please and I think it will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder! It is really hard to find a !
problem! I will try it now!
// SHL PRelu op only supports these two kinds of layout | ||
if (!(supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NC1HWC0 || | ||
supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NCHW)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor
// SHL PRelu op only supports these two kinds of layout | |
if (!(supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NC1HWC0 || | |
supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NCHW)) { | |
// SHL PRelu op only supports these two kinds of layout | |
if (!one_of(supportedLayout, csinn_layout_enum::CSINN_LAYOUT_NC1HWC0, csinn_layout_enum::CSINN_LAYOUT_NCHW)) { |
switch (shlEltwiseAttrs.algorithm) { | ||
case Algorithm::EltwiseAdd: | ||
setExecFunc(csinn_add, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())); | ||
break; | ||
case Algorithm::EltwiseSubtract: | ||
setExecFunc(csinn_sub, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())); | ||
break; | ||
case Algorithm::EltwiseMultiply: | ||
setExecFunc(csinn_mul, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())); | ||
break; | ||
case Algorithm::EltwiseDivide: | ||
setExecFunc(csinn_div, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())); | ||
break; | ||
case Algorithm::EltwiseMaximum: | ||
setExecFunc(csinn_maximum, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())); | ||
break; | ||
case Algorithm::EltwiseMinimum: | ||
setExecFunc(csinn_minimum, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())); | ||
break; | ||
case Algorithm::EltwiseExp: | ||
setExecFunc(csinn_exp, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_siso_params*>(params->get())); | ||
break; | ||
case Algorithm::EltwiseRelu: | ||
if (shlEltwiseAttrs.alpha == 0) { | ||
setExecFunc(csinn_relu, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_relu_params*>(params->get())); | ||
} | ||
else { | ||
setExecFunc(csinn_leaky_relu, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_relu_params*>(params->get())); | ||
} | ||
break; | ||
case Algorithm::EltwisePrelu: | ||
setExecFunc(csinn_prelu, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_prelu_params*>(params->get())); | ||
break; | ||
default: | ||
OPENVINO_THROW("Unsupported operation type for SHL Eltwise executor: ", | ||
static_cast<int>(shlEltwiseAttrs.algorithm)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's about to hide these setters to init
stage?
We already have switch-case
in init
. Can we init exec_func
there?
class ShlEltwiseExecutor {
private:
std::function<int()> exec_func;
};
bool ShlEltwiseExecutor::init(...) {
switch (shlEltwiseAttrs.algorithm) {
case Algorithm::EltwiseAdd:
INIT(csinn_add_init(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())));
exec_func = [&]() {
return csinn_add(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
};
break;
.....
}
void ShlEltwiseExecutor::exec(...) {
for (size_t i = 0; i < src.size(); i++) {
srcTensors[i] = srcTensors[i].cloneWithNewShape(src[i]->getDescPtr()->getShape().getStaticDims());
srcTensors[i].setData(src[i]->getData());
}
for (size_t i = 0; i < dst.size(); i++) {
dstTensors[i] = dstTensors[i].cloneWithNewShape(dst[i]->getDescPtr()->getShape().getStaticDims());
dstTensors[i].setData(dst[i]->getData());
}
OPENVINO_ASSERT(exec_func() == CSINN_TRUE, "Executor failed to execute");
}
What do you think? Can we try to so?
Then we don't need to implement some index_sequence
, callFunc
and other helpers.
Could you please think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible, you can create local variable std::function<int()> init_func
and init by the same way. And to the end of Executor::init()
you can call OPENVINO_ASSERT(init_func() == CSINN_TRUE, "Executor failed to init");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I used to init all the function pointer in the during the init
stage before, at that time everything went fine.
However when the setShape()
interface is hidden and replaced with cloneWithNewShape()
, the former method does not work now. The detailed reasons are as followed:
- The method I use now will store all the params statically, which is used for later function invocation.
- As I store the smart pointer of
ShlTensor
andIShlParams
as params, you can set some specific values inside these objects (likedata
orshape
). In this way the former method still worked, as the content that the smart pointers pointed to still remained the same address. - However, with
cloneWithNewShape()
interface, the data that origin pointer points to are no longer useful, so we have to reset the params again inexec
to ensure the right pointer.
Actually I have been looking for the way to implement a random function pointer that can accept any types of arguments and return any types of value, just like the func
in following Python code:
def hello_world(param):
return 1
func = hello_world
But I failed to find a way in cpp code, leading me to this little 'ugly' implementation. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think that index_sequence
, callFunc
and params
might be possible to be removed from the class. I will give it a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHLExecutor
for Eltwise ops as ACLExecutor
has init
method which is called in Eltwise::PrepareParams
stage with executor creation.
As you know, OpenVINO supports two shape types: static (known on model compilation stage) and dynamic (they will be defined only in inference stage when user set inputs to the model).
If the op has static shapes on model compilation, Eltwise::PrepareParams
is called in Eltwise::CreatePrimitive
when all shapes are initialized and known in descriptors. We call them StaticDims
.
If the op has dynamic shapes on model compilation, Eltwise::PrepareParams
is called in Node::updateDynamicParams()
after ShapeInference
(Shape inference is process of propagation of input shapes of the model to its outputs: each op takes parent output shapes and set to own output by specific rules). It means, that in dynamic case Eltwise::PrepareParams
also knows the StaticDims
.
It means that can init ShlTensors
with StaticDims
in ShlExecutor::init
and no need to update them in ShlExecutor::exec
. Thus, ShlExecutor::exec
will update only data pointers, other information in tensors are constant.
Because of that, I think we can try to initialize exec_func
in ShlExecutor::init
method and just update data pointers in tensors and call this exec_func
in ShlExecutor::exec
😃
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation! If all the shape information can be ensured in ShlExecutor::init
, I think your implemention is indeed a better way! I will give it a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda function used here would cause segfault during the test. What about initialzing the ShlTensor
with static shapes and storing the exec_func
in the origin way during init
process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found instead of passing staticDims
to the constructor of ShlTensor
, using the cloneWithNewShape()
interface would not cause the segfault problem, and it works now! :)
Details:
shl
lib:Add
,Sub
,Mul
,Div
,Maximum
,Minimum
,Exp
,ReLu
,LeakyReLu
andPReLu
Tickets: