[RISCV64] [GSoC] Integrate SHL eltwise ops into OV #25674

BHbean · 2024-07-22T15:59:19Z

Details:

integrate eltwise ops from shl lib: Add, Sub, Mul, Div, Maximum, Minimum, Exp, ReLu, LeakyReLu and PReLu

Tickets:

N/A

[RISCV64] add shlExecutor option for eltwise node [RISCV64] fix some errors with eltwise executor [RISCV64] add constructor def and vector initialization for ShlTensor [RISCV64] remove redundant debug print [RISCV64] skip failed tests [RISCV64] change way of invoking kernels [RISCV64] set shapes for ShlTensor before SHL add [RISCV64] simplify eltwise kernel invocation [RISCV64] integrate other eltwise ops [RISCV64] fix tests for some eltwise and activation ops [RISCV64] integrate PRelu and LeakyRelu op and fix related tests [RISCV64] integrate Maximum and Minimum op for the need of some tests [RISCV64] fix some tests [RISCV64] fix inaccurate problem with LeakyReLu op [RISCV64] fix some tests [RISCV64] enable debugging for riscv64 [RISCV64] update due to API changes

a-sidorova · 2024-07-25T06:33:28Z

src/plugins/intel_cpu/src/nodes/executors/shl/shl.hpp

+    void setShape(const VectorDims& shape) {
+        get()->dim_count = shape.size();
+        OPENVINO_ASSERT(get()->dim_count < MAX_DIM, "Shl supports shapes with rank less or equal to 8");
+        for (int i = 0; i < get()->dim_count; ++i)
+            get()->dim[i] = static_cast<int32_t>(shape[i]);
+    };


As we discussed offline, all setters must be in private sections for the safer development.
The source: #23964 (comment)

I will make it private and only leave cloneWithNewShape public!

a-sidorova · 2024-07-25T06:34:26Z

src/plugins/intel_cpu/src/nodes/eltwise.h

    EltwiseAttrs eltwiseAttrs;
-    std::shared_ptr<EltwiseExecutor> aclExecPtr = nullptr;
+    std::shared_ptr<EltwiseExecutor> eltwiseExecPtr = nullptr;
+


Minor: remove please extra line

a-sidorova · 2024-07-25T06:34:43Z

src/plugins/intel_cpu/src/nodes/eltwise.cpp

+    }
+    else {


Minor: correct me code style

Suggested change

}

else {

} else {

a-sidorova · 2024-07-25T06:37:10Z

src/plugins/intel_cpu/src/nodes/executors/shl/shl_eltwise.hpp

+              const void *post_ops_data_) override;
+
+    impl_desc_type getImplType() const override {
+        return implType;


You can just return impl_desc_type::shl and don't add new attribute impl_desc_type implType to the class because the value of this attribute cannot be changed. ShlEltwiseExecutor should always return impl_desc_type::shl as implementation type

a-sidorova · 2024-07-25T06:43:53Z

src/plugins/intel_cpu/src/nodes/eltwise.cpp

@@ -2317,6 +2317,42 @@ void Eltwise::initSupportedPrimitiveDescriptors() {
    }
    outputPrecision = filterPrecision(outputPrecision, forcedPrec);
    } else {
+#endif
+#if defined(OV_CPU_WITH_SHL)
+    const bool useShl = true;


In your case useShl is always true when #if defined(OV_CPU_WITH_SHL). I think we can even remove useShl.
Because as I said before, aarch64 has 2 backends for Eltwise op execution: JIT (just in time code generation) and ACL (arm compute library). So the execution depends on the variable useAcl.
In our case we have only one backend - SHL (and reference c++ impl, for sure). So we don't need to split the pipeline -> we don't need to have the variable useShl

I think this useShl variable is used to avoid the following code that corresponds to the reference situation. So Im not sure about whether to delete this variable.

a-sidorova · 2024-07-25T12:09:08Z

src/plugins/intel_cpu/src/nodes/executors/shl/shl_eltwise.cpp

+
+    switch (shlEltwiseAttrs.algorithm) {
+    case Algorithm::EltwiseAdd:
+        params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV);


Could you please create the separate variable shl_api = CSINN_RVV?
If we support another api (c906 or something else), we will be able to quickly update the value

a-sidorova · 2024-07-25T12:21:08Z

src/plugins/intel_cpu/src/nodes/executors/shl/shl_eltwise.cpp

+        break;
+    case Algorithm::EltwiseSubtract:
+        params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV);
+        setInitFunc(csinn_sub_init, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));


Do we really create the separate method setInitFunc and the class attribute init_func?
As far as I understand, we call these functions only once and in the same method Executor::init - I mean that no need to save these functions as class attribute.

Then you can just make so:

#define INIT(func) \ OPENVINO_ASSERT((func) == CSINN_TRUE, "ShlEltwiseExecutor: failed to init"); case Algorithm::EltwiseSubtract: params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV); INIT(csinn_sub_init(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()))); break;

What do you think?

Yeah, this is a very clever way to implement, which is easy but effective. I agree that the init_func does not seem to have the necessity to exist in the class now. Originally I implemented it in this way to store exec_func during init process, but seemingly the exec_func can also be replaced in this way.

a-sidorova · 2024-07-25T12:32:28Z

src/plugins/intel_cpu/src/nodes/executors/shl/shl_eltwise.cpp

+        return true;
+    };
+
+    switch (eltwiseAttrs.algorithm) {


Firstly, we can check that we support the current op:

if (!isEltwiseAlgorithmSupported()) { DEBUG_LOG("Eltwise algorithm ", algToString(eltwiseAttrs.algorithm), " is not supported"); return false; }

Then we can check precisions. At the moment, we support only fp32 precision, so we can make easier:

constexpr auto supported_prec = ov::element::f32; auto is_precision_supported = [supported_prec](const MemoryDescPtr& desc) { return desc->getPrecision() == supported_prec; }; if (std::all_of(srcDescs.cbegin(), srcDescs.cend(), is_precision_supported) && std::all_of(dstDescs.cbegin(), dstDescs.cend(), is_precision_supported)) { DEBUG_LOG("ShlEltwise supports only f32"); return false; }

If changed in this way, some tests will fail with the reason Actual: ref_f32, expected: shl_f32. As far as I see, there are no obvious differences between the two implementation. Im trying to figure it out later.

Seems like I just missed '!' in the condition. Because my suggested condition looks like: if all precisions are supported, executor doesn't support this configuration. For sure, it's incorrect 😄
Just try to add negation to the condition please and I think it will work.

Thanks for the reminder! It is really hard to find a ! problem! I will try it now!

a-sidorova · 2024-07-25T12:35:41Z

src/plugins/intel_cpu/src/nodes/executors/shl/shl_eltwise.cpp

+                // SHL PRelu op only supports these two kinds of layout
+                if (!(supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NC1HWC0 ||
+                      supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NCHW)) {


Minor

Suggested change

// SHL PRelu op only supports these two kinds of layout

if (!(supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NC1HWC0 ||

supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NCHW)) {

// SHL PRelu op only supports these two kinds of layout

if (!one_of(supportedLayout, csinn_layout_enum::CSINN_LAYOUT_NC1HWC0, csinn_layout_enum::CSINN_LAYOUT_NCHW)) {

a-sidorova · 2024-07-25T13:05:19Z

src/plugins/intel_cpu/src/nodes/executors/shl/shl_eltwise.cpp

+    switch (shlEltwiseAttrs.algorithm) {
+    case Algorithm::EltwiseAdd:
+        setExecFunc(csinn_add, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
+        break;
+    case Algorithm::EltwiseSubtract:
+        setExecFunc(csinn_sub, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
+        break;
+    case Algorithm::EltwiseMultiply:
+        setExecFunc(csinn_mul, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
+        break;
+    case Algorithm::EltwiseDivide:
+        setExecFunc(csinn_div, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
+        break;
+    case Algorithm::EltwiseMaximum:
+        setExecFunc(csinn_maximum, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
+        break;
+    case Algorithm::EltwiseMinimum:
+        setExecFunc(csinn_minimum, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
+        break;
+    case Algorithm::EltwiseExp:
+        setExecFunc(csinn_exp, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_siso_params*>(params->get()));
+        break;
+    case Algorithm::EltwiseRelu:
+        if (shlEltwiseAttrs.alpha == 0) {
+            setExecFunc(csinn_relu, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_relu_params*>(params->get()));
+        }
+        else {
+            setExecFunc(csinn_leaky_relu, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_relu_params*>(params->get()));
+        }
+        break;
+    case Algorithm::EltwisePrelu:
+        setExecFunc(csinn_prelu, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_prelu_params*>(params->get()));
+        break;
+    default:
+        OPENVINO_THROW("Unsupported operation type for SHL Eltwise executor: ",
+                       static_cast<int>(shlEltwiseAttrs.algorithm));
+    }


What's about to hide these setters to init stage?
We already have switch-case in init. Can we init exec_func there?

class ShlEltwiseExecutor { private: std::function<int()> exec_func; }; bool ShlEltwiseExecutor::init(...) { switch (shlEltwiseAttrs.algorithm) { case Algorithm::EltwiseAdd: INIT(csinn_add_init(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()))); exec_func = [&]() { return csinn_add(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())); }; break; ..... } void ShlEltwiseExecutor::exec(...) { for (size_t i = 0; i < src.size(); i++) { srcTensors[i] = srcTensors[i].cloneWithNewShape(src[i]->getDescPtr()->getShape().getStaticDims()); srcTensors[i].setData(src[i]->getData()); } for (size_t i = 0; i < dst.size(); i++) { dstTensors[i] = dstTensors[i].cloneWithNewShape(dst[i]->getDescPtr()->getShape().getStaticDims()); dstTensors[i].setData(dst[i]->getData()); } OPENVINO_ASSERT(exec_func() == CSINN_TRUE, "Executor failed to execute"); }

What do you think? Can we try to so?

Then we don't need to implement some index_sequence, callFunc and other helpers.

Could you please think about that?

If it's possible, you can create local variable std::function<int()> init_func and init by the same way. And to the end of Executor::init() you can call OPENVINO_ASSERT(init_func() == CSINN_TRUE, "Executor failed to init");

Actually I used to init all the function pointer in the during the init stage before, at that time everything went fine.

However when the setShape() interface is hidden and replaced with cloneWithNewShape(), the former method does not work now. The detailed reasons are as followed:

The method I use now will store all the params statically, which is used for later function invocation.

As I store the smart pointer of ShlTensor and IShlParams as params, you can set some specific values inside these objects (like data or shape). In this way the former method still worked, as the content that the smart pointers pointed to still remained the same address.

However, with cloneWithNewShape() interface, the data that origin pointer points to are no longer useful, so we have to reset the params again in exec to ensure the right pointer.

Actually I have been looking for the way to implement a random function pointer that can accept any types of arguments and return any types of value, just like the func in following Python code:

def hello_world(param): return 1 func = hello_world

But I failed to find a way in cpp code, leading me to this little 'ugly' implementation. :(

But I think that index_sequence , callFunc and params might be possible to be removed from the class. I will give it a try!

SHLExecutor for Eltwise ops as ACLExecutor has init method which is called in Eltwise::PrepareParams stage with executor creation.
As you know, OpenVINO supports two shape types: static (known on model compilation stage) and dynamic (they will be defined only in inference stage when user set inputs to the model).
If the op has static shapes on model compilation, Eltwise::PrepareParams is called in Eltwise::CreatePrimitive when all shapes are initialized and known in descriptors. We call them StaticDims.
If the op has dynamic shapes on model compilation, Eltwise::PrepareParams is called in Node::updateDynamicParams() after ShapeInference (Shape inference is process of propagation of input shapes of the model to its outputs: each op takes parent output shapes and set to own output by specific rules). It means, that in dynamic case Eltwise::PrepareParams also knows the StaticDims.
It means that can init ShlTensors with StaticDims in ShlExecutor::init and no need to update them in ShlExecutor::exec. Thus, ShlExecutor::exec will update only data pointers, other information in tensors are constant.

Because of that, I think we can try to initialize exec_func in ShlExecutor::init method and just update data pointers in tensors and call this exec_func in ShlExecutor::exec 😃

What do you think?

Thanks for the detailed explanation! If all the shape information can be ensured in ShlExecutor::init, I think your implemention is indeed a better way! I will give it a try!

The lambda function used here would cause segfault during the test. What about initialzing the ShlTensor with static shapes and storing the exec_func in the origin way during init process?

I just found instead of passing staticDims to the constructor of ShlTensor, using the cloneWithNewShape() interface would not cause the segfault problem, and it works now! :)

BHbean requested review from a team as code owners July 22, 2024 15:59

github-actions bot added the category: CPU OpenVINO CPU plugin label Jul 22, 2024

sys-openvino-ci added the ExternalPR External contributor label Jul 22, 2024

a-sidorova self-assigned this Jul 23, 2024

dmitry-gorokhov added this to the 2024.4 milestone Jul 23, 2024

BHbean added 2 commits July 24, 2024 13:53

[RISCV64] fix and disable some tests

85fd517

[RISCV64] remove some comments and disable a test

e70353f

a-sidorova reviewed Jul 25, 2024

View reviewed changes

BHbean added 2 commits July 26, 2024 09:40

[RISCV64] update according to the review comments

5ce58cf

[RISCV64] initialize execution func inside Executor::init process

2e8023f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV64] [GSoC] Integrate SHL eltwise ops into OV #25674

[RISCV64] [GSoC] Integrate SHL eltwise ops into OV #25674

BHbean commented Jul 22, 2024

a-sidorova Jul 25, 2024

BHbean Jul 25, 2024 •

edited

Loading

a-sidorova Jul 25, 2024

a-sidorova Jul 25, 2024

a-sidorova Jul 25, 2024

a-sidorova Jul 25, 2024

BHbean Jul 25, 2024

a-sidorova Jul 25, 2024

a-sidorova Jul 25, 2024

BHbean Jul 25, 2024

a-sidorova Jul 25, 2024

BHbean Jul 26, 2024

a-sidorova Jul 26, 2024 •

edited

Loading

BHbean Jul 26, 2024

a-sidorova Jul 25, 2024

a-sidorova Jul 25, 2024 •

edited

Loading

a-sidorova Jul 25, 2024

BHbean Jul 25, 2024

BHbean Jul 25, 2024

a-sidorova Jul 26, 2024

BHbean Jul 26, 2024

BHbean Jul 26, 2024

BHbean Jul 26, 2024

[RISCV64] [GSoC] Integrate SHL eltwise ops into OV #25674

Are you sure you want to change the base?

[RISCV64] [GSoC] Integrate SHL eltwise ops into OV #25674

Conversation

BHbean commented Jul 22, 2024

Details:

Tickets:

Choose a reason for hiding this comment

BHbean Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-sidorova Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-sidorova Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BHbean Jul 25, 2024 •

edited

Loading

a-sidorova Jul 26, 2024 •

edited

Loading

a-sidorova Jul 25, 2024 •

edited

Loading