Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV64] [GSoC] Integrate SHL eltwise ops into OV #25674

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

BHbean
Copy link
Contributor

@BHbean BHbean commented Jul 22, 2024

Details:

  • integrate eltwise ops from shl lib: Add, Sub, Mul, Div, Maximum, Minimum, Exp, ReLu, LeakyReLu and PReLu

Tickets:

  • N/A

[RISCV64] add shlExecutor option for eltwise node

[RISCV64] fix some errors with eltwise executor

[RISCV64] add constructor def and vector initialization for ShlTensor

[RISCV64] remove redundant debug print

[RISCV64] skip failed tests

[RISCV64] change way of invoking kernels

[RISCV64] set shapes for ShlTensor before SHL add

[RISCV64] simplify eltwise kernel invocation

[RISCV64] integrate other eltwise ops

[RISCV64] fix tests for some eltwise and activation ops

[RISCV64] integrate PRelu and LeakyRelu op and fix related tests

[RISCV64] integrate Maximum and Minimum op for the need of some tests

[RISCV64] fix some tests

[RISCV64] fix inaccurate problem with LeakyReLu op

[RISCV64] fix some tests

[RISCV64] enable debugging for riscv64

[RISCV64] update due to API changes
@BHbean BHbean requested review from a team as code owners July 22, 2024 15:59
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jul 22, 2024
@sys-openvino-ci sys-openvino-ci added the ExternalPR External contributor label Jul 22, 2024
@a-sidorova a-sidorova self-assigned this Jul 23, 2024
@dmitry-gorokhov dmitry-gorokhov added this to the 2024.4 milestone Jul 23, 2024
Comment on lines 152 to 157
void setShape(const VectorDims& shape) {
get()->dim_count = shape.size();
OPENVINO_ASSERT(get()->dim_count < MAX_DIM, "Shl supports shapes with rank less or equal to 8");
for (int i = 0; i < get()->dim_count; ++i)
get()->dim[i] = static_cast<int32_t>(shape[i]);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, all setters must be in private sections for the safer development.
The source: #23964 (comment)

Copy link
Contributor Author

@BHbean BHbean Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make it private and only leave cloneWithNewShape public!

EltwiseAttrs eltwiseAttrs;
std::shared_ptr<EltwiseExecutor> aclExecPtr = nullptr;
std::shared_ptr<EltwiseExecutor> eltwiseExecPtr = nullptr;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: remove please extra line

Comment on lines 2814 to 2815
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: correct me code style

Suggested change
}
else {
} else {

const void *post_ops_data_) override;

impl_desc_type getImplType() const override {
return implType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just return impl_desc_type::shl and don't add new attribute impl_desc_type implType to the class because the value of this attribute cannot be changed. ShlEltwiseExecutor should always return impl_desc_type::shl as implementation type

@@ -2317,6 +2317,42 @@ void Eltwise::initSupportedPrimitiveDescriptors() {
}
outputPrecision = filterPrecision(outputPrecision, forcedPrec);
} else {
#endif
#if defined(OV_CPU_WITH_SHL)
const bool useShl = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your case useShl is always true when #if defined(OV_CPU_WITH_SHL). I think we can even remove useShl.
Because as I said before, aarch64 has 2 backends for Eltwise op execution: JIT (just in time code generation) and ACL (arm compute library). So the execution depends on the variable useAcl.
In our case we have only one backend - SHL (and reference c++ impl, for sure). So we don't need to split the pipeline -> we don't need to have the variable useShl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this useShl variable is used to avoid the following code that corresponds to the reference situation. So Im not sure about whether to delete this variable.


switch (shlEltwiseAttrs.algorithm) {
case Algorithm::EltwiseAdd:
params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please create the separate variable shl_api = CSINN_RVV?
If we support another api (c906 or something else), we will be able to quickly update the value

break;
case Algorithm::EltwiseSubtract:
params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV);
setInitFunc(csinn_sub_init, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really create the separate method setInitFunc and the class attribute init_func?
As far as I understand, we call these functions only once and in the same method Executor::init - I mean that no need to save these functions as class attribute.

Then you can just make so:

#define INIT(func) \
          OPENVINO_ASSERT((func) == CSINN_TRUE, "ShlEltwiseExecutor: failed to init");
 case Algorithm::EltwiseSubtract:
        params = ov::intel_cpu::make_unique<ShlDisoParams>(sess, CSINN_RVV);
        INIT(csinn_sub_init(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())));
        break;

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a very clever way to implement, which is easy but effective. I agree that the init_func does not seem to have the necessity to exist in the class now. Originally I implemented it in this way to store exec_func during init process, but seemingly the exec_func can also be replaced in this way.

return true;
};

switch (eltwiseAttrs.algorithm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, we can check that we support the current op:

if (!isEltwiseAlgorithmSupported()) {
     DEBUG_LOG("Eltwise algorithm ", algToString(eltwiseAttrs.algorithm), " is not supported");
     return false;
}

Then we can check precisions. At the moment, we support only fp32 precision, so we can make easier:

    constexpr auto supported_prec = ov::element::f32;
    auto is_precision_supported = [supported_prec](const MemoryDescPtr& desc) { return desc->getPrecision() == supported_prec; };
    if (std::all_of(srcDescs.cbegin(), srcDescs.cend(), is_precision_supported) &&
        std::all_of(dstDescs.cbegin(), dstDescs.cend(), is_precision_supported)) {
        DEBUG_LOG("ShlEltwise supports only f32");
        return false;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If changed in this way, some tests will fail with the reason Actual: ref_f32, expected: shl_f32. As far as I see, there are no obvious differences between the two implementation. Im trying to figure it out later.

Copy link
Contributor

@a-sidorova a-sidorova Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like I just missed '!' in the condition. Because my suggested condition looks like: if all precisions are supported, executor doesn't support this configuration. For sure, it's incorrect 😄
Just try to add negation to the condition please and I think it will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder! It is really hard to find a ! problem! I will try it now!

Comment on lines 71 to 73
// SHL PRelu op only supports these two kinds of layout
if (!(supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NC1HWC0 ||
supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NCHW)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor

Suggested change
// SHL PRelu op only supports these two kinds of layout
if (!(supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NC1HWC0 ||
supportedLayout == csinn_layout_enum::CSINN_LAYOUT_NCHW)) {
// SHL PRelu op only supports these two kinds of layout
if (!one_of(supportedLayout, csinn_layout_enum::CSINN_LAYOUT_NC1HWC0, csinn_layout_enum::CSINN_LAYOUT_NCHW)) {

Comment on lines 181 to 217
switch (shlEltwiseAttrs.algorithm) {
case Algorithm::EltwiseAdd:
setExecFunc(csinn_add, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
break;
case Algorithm::EltwiseSubtract:
setExecFunc(csinn_sub, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
break;
case Algorithm::EltwiseMultiply:
setExecFunc(csinn_mul, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
break;
case Algorithm::EltwiseDivide:
setExecFunc(csinn_div, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
break;
case Algorithm::EltwiseMaximum:
setExecFunc(csinn_maximum, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
break;
case Algorithm::EltwiseMinimum:
setExecFunc(csinn_minimum, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
break;
case Algorithm::EltwiseExp:
setExecFunc(csinn_exp, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_siso_params*>(params->get()));
break;
case Algorithm::EltwiseRelu:
if (shlEltwiseAttrs.alpha == 0) {
setExecFunc(csinn_relu, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_relu_params*>(params->get()));
}
else {
setExecFunc(csinn_leaky_relu, srcTensors[0].get(), dstTensors[0].get(), static_cast<csinn_relu_params*>(params->get()));
}
break;
case Algorithm::EltwisePrelu:
setExecFunc(csinn_prelu, srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_prelu_params*>(params->get()));
break;
default:
OPENVINO_THROW("Unsupported operation type for SHL Eltwise executor: ",
static_cast<int>(shlEltwiseAttrs.algorithm));
}
Copy link
Contributor

@a-sidorova a-sidorova Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's about to hide these setters to init stage?
We already have switch-case in init. Can we init exec_func there?

class ShlEltwiseExecutor {
private:
     std::function<int()> exec_func;
};

bool ShlEltwiseExecutor::init(...) {
     switch (shlEltwiseAttrs.algorithm) {
     case Algorithm::EltwiseAdd:
           INIT(csinn_add_init(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get())));
           exec_func = [&]() {
            return csinn_add(srcTensors[0].get(), srcTensors[1].get(), dstTensors[0].get(), static_cast<csinn_diso_params*>(params->get()));
        };
        break;
.....
}

void ShlEltwiseExecutor::exec(...) {
    for (size_t i = 0; i < src.size(); i++) {
        srcTensors[i] = srcTensors[i].cloneWithNewShape(src[i]->getDescPtr()->getShape().getStaticDims());
        srcTensors[i].setData(src[i]->getData());
    }
    for (size_t i = 0; i < dst.size(); i++) {
        dstTensors[i] = dstTensors[i].cloneWithNewShape(dst[i]->getDescPtr()->getShape().getStaticDims());
        dstTensors[i].setData(dst[i]->getData());
    }
    OPENVINO_ASSERT(exec_func() == CSINN_TRUE, "Executor failed to execute");
}

What do you think? Can we try to so?

Then we don't need to implement some index_sequence, callFunc and other helpers.

Could you please think about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's possible, you can create local variable std::function<int()> init_func and init by the same way. And to the end of Executor::init() you can call OPENVINO_ASSERT(init_func() == CSINN_TRUE, "Executor failed to init");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I used to init all the function pointer in the during the init stage before, at that time everything went fine.

However when the setShape() interface is hidden and replaced with cloneWithNewShape(), the former method does not work now. The detailed reasons are as followed:

  1. The method I use now will store all the params statically, which is used for later function invocation.
  2. As I store the smart pointer of ShlTensor and IShlParams as params, you can set some specific values inside these objects (like data or shape). In this way the former method still worked, as the content that the smart pointers pointed to still remained the same address.
  3. However, with cloneWithNewShape() interface, the data that origin pointer points to are no longer useful, so we have to reset the params again in exec to ensure the right pointer.

Actually I have been looking for the way to implement a random function pointer that can accept any types of arguments and return any types of value, just like the func in following Python code:

def hello_world(param):
    return 1

func = hello_world

But I failed to find a way in cpp code, leading me to this little 'ugly' implementation. :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think that index_sequence , callFunc and params might be possible to be removed from the class. I will give it a try!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHLExecutor for Eltwise ops as ACLExecutor has init method which is called in Eltwise::PrepareParams stage with executor creation.
As you know, OpenVINO supports two shape types: static (known on model compilation stage) and dynamic (they will be defined only in inference stage when user set inputs to the model).
If the op has static shapes on model compilation, Eltwise::PrepareParams is called in Eltwise::CreatePrimitive when all shapes are initialized and known in descriptors. We call them StaticDims.
If the op has dynamic shapes on model compilation, Eltwise::PrepareParams is called in Node::updateDynamicParams() after ShapeInference (Shape inference is process of propagation of input shapes of the model to its outputs: each op takes parent output shapes and set to own output by specific rules). It means, that in dynamic case Eltwise::PrepareParams also knows the StaticDims.
It means that can init ShlTensors with StaticDims in ShlExecutor::init and no need to update them in ShlExecutor::exec. Thus, ShlExecutor::exec will update only data pointers, other information in tensors are constant.

Because of that, I think we can try to initialize exec_func in ShlExecutor::init method and just update data pointers in tensors and call this exec_func in ShlExecutor::exec 😃

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation! If all the shape information can be ensured in ShlExecutor::init, I think your implemention is indeed a better way! I will give it a try!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda function used here would cause segfault during the test. What about initialzing the ShlTensor with static shapes and storing the exec_func in the origin way during init process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found instead of passing staticDims to the constructor of ShlTensor, using the cloneWithNewShape() interface would not cause the segfault problem, and it works now! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin ExternalPR External contributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants