[RISCV64][SHL] Added FC FP32 executor #23964

a-sidorova · 2024-04-10T16:19:33Z

Details:

Reused FC RVV from SHL
The PR to SHL dev branch with accuracy fix for FC f32: [RVV] Moved weight repacking for FC to the gemm shl#3

Tickets:

N/A

TODO:

Fix execType: gemm_f32
Added wrapper for csinn_tensor and csinn_session to allocate these structures and deallocate them

Prerequisites:

[RISCV64] Added SHL submodule support and T-Head toolchain #23901

dmitry-gorokhov · 2024-04-16T08:52:06Z

@EgorDuplensky Could you please review the PR?

EgorDuplensky

Any plans regarding the tests?
Is there any RISCV emulator or something?

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

src/plugins/intel_cpu/src/nodes/executors/shl/shl.hpp

src/plugins/intel_cpu/src/nodes/executors/shl/shl_fullyconnected.cpp

a-sidorova · 2024-06-05T04:26:40Z

Any plans regarding the tests? Is there any RISCV emulator or something?

I used the current common FC tests.

As for emulators,

git clone https://github.com/T-head-Semi/xuantie-gnu-toolchain.git
 ./configure --prefix=/opt/riscv
make linux build-qemu

/opt/riscv/bin/qemu-riscv64 -cpu c910v ./ov_cpu_func_tests

github-actions · 2024-06-20T00:18:59Z

This PR will be closed in a week because of 2 weeks of no activity.

a-sidorova · 2024-06-28T07:10:35Z

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:

Added SHL tests for FC
Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

EgorDuplensky · 2024-07-11T10:03:05Z

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:

Added SHL tests for FC

Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

Just wondering, is there any weights packing actually happening underneath? Or this is just shl fc not supporting a transposed weights?
Anyway, if shl fc weights are shape agnostic, we could just run i.e. some ref transpose in scope of the shl executor constructor.

src/plugins/intel_cpu/src/nodes/executors/shl/shl.hpp

src/plugins/intel_cpu/src/nodes/executors/shl/shl_fullyconnected.cpp

dmitry-gorokhov · 2024-07-16T12:55:27Z

src/plugins/intel_cpu/src/nodes/executors/shl/shl_fullyconnected.cpp

+    wei.setData(memory.at(ARG_WEI)->getData());
+    dst.setData(memory.at(ARG_DST)->getData());
+
+    OPENVINO_ASSERT(csinn_fullyconnected(src.get(), dst.get(), wei.get(), bias.get(), params.get()) == CSINN_TRUE,


Why bias data handle is not updated inside execute?

Because bias is constant data and can be handled once in executor constructor.

openvino/src/plugins/intel_cpu/src/nodes/executors/shl/shl_fullyconnected.cpp

Lines 72 to 74 in e1377b3

bias = ShlTensor(sess, memory.at(ARG_BIAS)->getDescPtr()->getShape().getStaticDims(),

precisionToShlDataType(biasDesc->getPrecision()),

getShlDataLayoutByMemoryDesc(biasDesc), memory.at(ARG_BIAS)->getData());

Correct me please if I missed something

Discussed offline: aligned wei and bias tensors behaviors. Now the both tensors update data pointers in execute and set static shapes in constructor once.
282e5b5

...lugins/intel_cpu/tests/functional/custom/single_layer_tests/instances/riscv64/shl/matmul.cpp

a-sidorova · 2024-07-17T07:59:28Z

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:

Added SHL tests for FC

Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

Just wondering, is there any weights packing actually happening underneath? Or this is just shl fc not supporting a transposed weights? Anyway, if shl fc weights are shape agnostic, we could just run i.e. some ref transpose in scope of the shl executor constructor.

SHL doesn't support transposed weights. I don't see any checks, transposing functions or even fields in csinn_fc_params. So I think that the library supports only [N, K] weights and cannot transpose [K,N] to [N,K] itself. As an idea, we can use shl_rvv_transpose_fp32 (with batch = 1) here instead of disabling FuseFCAndTransposeOnWeights.
SHL FC implementation makes repacking of weights (not transposed) before execution in initialization function. However, this function corrupts the original weights pointer (writes repacked weights to the same pointer). It leads to segfaults. To fix it, I moved this repacking to execution function. For sure, it's extra overheads on execution. And now (after review and a few months), I got how we can improve it: we can make a copy of the original weights, save them to weights cache (if possible) and pass them to initialization function for the repacking - we don't corrupt the original weights. After that we will execute FC with packed weights and no need to repack weights on each execution.
However, I'd like to suggest to do it in the separate PR to not to delay the GSOC development and allow the student to merge their own changes to master branch.

cc @dmitry-gorokhov

src/plugins/intel_cpu/src/nodes/executors/shl/shl_fullyconnected.cpp

### Details: - *Reused FC RVV from SHL* - *The PR to SHL dev branch with accuracy fix for FC f32: openvinotoolkit/shl#3 ### Tickets: - *N/A* ### TODO: - [x] Fix `execType: gemm_f32` - [x] Added wrapper for `csinn_tensor` and `csinn_session` to allocate these structures and deallocate them ### Prerequisites: - [x] openvinotoolkit#23901

### Details: - *Added parallelism support for FC* - *Enabled OpenMP on rv64 by default* - *PR to oneDNN: openvinotoolkit/oneDNN#260 ### Tickets: - *N/A * ### Prerequisites: - [x] #23901 - [x] #23964 - [x] #26175

github-actions bot added category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra category: dependency_changes Pull requests that update a dependency file labels Apr 10, 2024

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch 2 times, most recently from e318622 to ab0dff9 Compare April 13, 2024 07:41

a-sidorova added the platform: risc-v OpenVINO on RISC-V label Apr 13, 2024

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch 2 times, most recently from f9f480f to 4519a74 Compare April 13, 2024 11:05

a-sidorova marked this pull request as ready for review April 13, 2024 12:03

a-sidorova requested review from a team as code owners April 13, 2024 12:03

dmitry-gorokhov assigned EgorDuplensky Apr 16, 2024

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch from 4519a74 to 53fe96e Compare April 23, 2024 06:01

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch 2 times, most recently from ded6a39 to 384cab0 Compare May 2, 2024 18:20

a-sidorova mentioned this pull request May 2, 2024

[RISCV64] Added parallelism support for FC #24352

Merged

3 tasks

EgorDuplensky reviewed May 23, 2024

View reviewed changes

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch from 384cab0 to 540d081 Compare June 5, 2024 04:35

a-sidorova mentioned this pull request Jun 5, 2024

[CPU][RISCV64] Common branch of optimizations #24855

Closed

github-actions bot added the Stale label Jun 20, 2024

mg-intel removed the Stale label Jun 20, 2024

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch from 540d081 to 866bbbe Compare June 28, 2024 07:06

a-sidorova requested a review from EgorDuplensky June 28, 2024 07:10

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch from 866bbbe to 12b9a9f Compare June 28, 2024 07:16

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch from 12b9a9f to 260910c Compare July 11, 2024 04:47

github-actions bot removed the category: dependency_changes Pull requests that update a dependency file label Jul 11, 2024

EgorDuplensky approved these changes Jul 11, 2024

View reviewed changes

EgorDuplensky assigned dmitry-gorokhov and unassigned EgorDuplensky Jul 11, 2024

dmitry-gorokhov added this to the 2024.4 milestone Jul 12, 2024

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch from 260910c to e1377b3 Compare July 16, 2024 07:33

[RISCV64] Added FullyConnected SHL FP32 Support

6fb4635

dmitry-gorokhov reviewed Jul 16, 2024

View reviewed changes

[RISCV64] Applied Dmitry comments

b745f10

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch from e1377b3 to b745f10 Compare July 17, 2024 07:42

dmitry-gorokhov reviewed Jul 17, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/shl/shl_fullyconnected.cpp Outdated Show resolved Hide resolved

a-sidorova requested a review from dmitry-gorokhov July 17, 2024 12:00

[RISCV64] Fixed offline Dmitry comments

282e5b5

a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch from a3a1003 to 282e5b5 Compare July 17, 2024 12:10

dmitry-gorokhov approved these changes Jul 17, 2024

View reviewed changes

dmitry-gorokhov enabled auto-merge July 17, 2024 13:34

dmitry-gorokhov added this pull request to the merge queue Jul 17, 2024

Merged via the queue into openvinotoolkit:master with commit 6a94559 Jul 17, 2024
117 checks passed

dmitry-gorokhov deleted the feature/riscv64/fc_gemm_f32 branch July 17, 2024 15:51

a-sidorova mentioned this pull request Jul 25, 2024

[RISCV64] [GSoC] Integrate SHL eltwise ops into OV #25674

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV64][SHL] Added FC FP32 executor #23964

[RISCV64][SHL] Added FC FP32 executor #23964

a-sidorova commented Apr 10, 2024 •

edited

Loading

dmitry-gorokhov commented Apr 16, 2024

EgorDuplensky left a comment

a-sidorova commented Jun 5, 2024

github-actions bot commented Jun 20, 2024

a-sidorova commented Jun 28, 2024 •

edited

Loading

EgorDuplensky commented Jul 11, 2024

dmitry-gorokhov Jul 16, 2024

a-sidorova Jul 17, 2024

a-sidorova Jul 17, 2024 •

edited

Loading

a-sidorova commented Jul 17, 2024

	bias = ShlTensor(sess, memory.at(ARG_BIAS)->getDescPtr()->getShape().getStaticDims(),
	precisionToShlDataType(biasDesc->getPrecision()),
	getShlDataLayoutByMemoryDesc(biasDesc), memory.at(ARG_BIAS)->getData());

[RISCV64][SHL] Added FC FP32 executor #23964

[RISCV64][SHL] Added FC FP32 executor #23964

Conversation

a-sidorova commented Apr 10, 2024 • edited Loading

Details:

Tickets:

TODO:

Prerequisites:

dmitry-gorokhov commented Apr 16, 2024

EgorDuplensky left a comment

Choose a reason for hiding this comment

a-sidorova commented Jun 5, 2024

github-actions bot commented Jun 20, 2024

a-sidorova commented Jun 28, 2024 • edited Loading

EgorDuplensky commented Jul 11, 2024

dmitry-gorokhov Jul 16, 2024

Choose a reason for hiding this comment

a-sidorova Jul 17, 2024

Choose a reason for hiding this comment

a-sidorova Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

a-sidorova commented Jul 17, 2024

a-sidorova commented Apr 10, 2024 •

edited

Loading

a-sidorova commented Jun 28, 2024 •

edited

Loading

a-sidorova Jul 17, 2024 •

edited

Loading