Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV64][SHL] Added FC FP32 executor #23964

Merged

Conversation

a-sidorova
Copy link
Contributor

@a-sidorova a-sidorova commented Apr 10, 2024

Details:

Tickets:

  • N/A

TODO:

  • Fix execType: gemm_f32
  • Added wrapper for csinn_tensor and csinn_session to allocate these structures and deallocate them

Prerequisites:

@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra category: dependency_changes Pull requests that update a dependency file labels Apr 10, 2024
@a-sidorova a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch 2 times, most recently from e318622 to ab0dff9 Compare April 13, 2024 07:41
@a-sidorova a-sidorova added the platform: risc-v OpenVINO on RISC-V label Apr 13, 2024
@a-sidorova a-sidorova force-pushed the feature/riscv64/fc_gemm_f32 branch 2 times, most recently from f9f480f to 4519a74 Compare April 13, 2024 11:05
@a-sidorova a-sidorova marked this pull request as ready for review April 13, 2024 12:03
@a-sidorova a-sidorova requested review from a team as code owners April 13, 2024 12:03
@dmitry-gorokhov
Copy link
Contributor

@EgorDuplensky Could you please review the PR?

Copy link
Contributor

@EgorDuplensky EgorDuplensky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans regarding the tests?
Is there any RISCV emulator or something?

@a-sidorova
Copy link
Contributor Author

Any plans regarding the tests? Is there any RISCV emulator or something?

I used the current common FC tests.

As for emulators,

git clone https://github.com/T-head-Semi/xuantie-gnu-toolchain.git
 ./configure --prefix=/opt/riscv
make linux build-qemu

/opt/riscv/bin/qemu-riscv64 -cpu c910v ./ov_cpu_func_tests

Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@a-sidorova
Copy link
Contributor Author

a-sidorova commented Jun 28, 2024

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:

  • Added SHL tests for FC
  • Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

@github-actions github-actions bot removed the category: dependency_changes Pull requests that update a dependency file label Jul 11, 2024
@EgorDuplensky
Copy link
Contributor

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:

  • Added SHL tests for FC
  • Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

Just wondering, is there any weights packing actually happening underneath? Or this is just shl fc not supporting a transposed weights?
Anyway, if shl fc weights are shape agnostic, we could just run i.e. some ref transpose in scope of the shl executor constructor.

@dmitry-gorokhov dmitry-gorokhov added this to the 2024.4 milestone Jul 12, 2024
src/plugins/intel_cpu/src/nodes/executors/shl/shl.hpp Outdated Show resolved Hide resolved
wei.setData(memory.at(ARG_WEI)->getData());
dst.setData(memory.at(ARG_DST)->getData());

OPENVINO_ASSERT(csinn_fullyconnected(src.get(), dst.get(), wei.get(), bias.get(), params.get()) == CSINN_TRUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bias data handle is not updated inside execute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because bias is constant data and can be handled once in executor constructor.

bias = ShlTensor(sess, memory.at(ARG_BIAS)->getDescPtr()->getShape().getStaticDims(),
precisionToShlDataType(biasDesc->getPrecision()),
getShlDataLayoutByMemoryDesc(biasDesc), memory.at(ARG_BIAS)->getData());

Correct me please if I missed something

Copy link
Contributor Author

@a-sidorova a-sidorova Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: aligned wei and bias tensors behaviors. Now the both tensors update data pointers in execute and set static shapes in constructor once.
282e5b5

@a-sidorova
Copy link
Contributor Author

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:

  • Added SHL tests for FC
  • Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

Just wondering, is there any weights packing actually happening underneath? Or this is just shl fc not supporting a transposed weights? Anyway, if shl fc weights are shape agnostic, we could just run i.e. some ref transpose in scope of the shl executor constructor.

  1. SHL doesn't support transposed weights. I don't see any checks, transposing functions or even fields in csinn_fc_params. So I think that the library supports only [N, K] weights and cannot transpose [K,N] to [N,K] itself. As an idea, we can use shl_rvv_transpose_fp32 (with batch = 1) here instead of disabling FuseFCAndTransposeOnWeights.
  2. SHL FC implementation makes repacking of weights (not transposed) before execution in initialization function. However, this function corrupts the original weights pointer (writes repacked weights to the same pointer). It leads to segfaults. To fix it, I moved this repacking to execution function. For sure, it's extra overheads on execution. And now (after review and a few months), I got how we can improve it: we can make a copy of the original weights, save them to weights cache (if possible) and pass them to initialization function for the repacking - we don't corrupt the original weights. After that we will execute FC with packed weights and no need to repack weights on each execution.
    However, I'd like to suggest to do it in the separate PR to not to delay the GSOC development and allow the student to merge their own changes to master branch.

cc @dmitry-gorokhov

@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue Jul 17, 2024
Merged via the queue into openvinotoolkit:master with commit 6a94559 Jul 17, 2024
117 checks passed
@dmitry-gorokhov dmitry-gorokhov deleted the feature/riscv64/fc_gemm_f32 branch July 17, 2024 15:51
spran180 pushed a commit to spran180/openvino that referenced this pull request Jul 27, 2024
### Details:
 - *Reused FC RVV from SHL*
- *The PR to SHL dev branch with accuracy fix for FC f32:
openvinotoolkit/shl#3

### Tickets:
 - *N/A*

### TODO:
- [x] Fix `execType: gemm_f32`
- [x] Added wrapper for `csinn_tensor` and `csinn_session` to allocate
these structures and deallocate them


### Prerequisites:
- [x] openvinotoolkit#23901
github-merge-queue bot pushed a commit that referenced this pull request Aug 23, 2024
### Details:
 - *Added parallelism support for FC*
 - *Enabled OpenMP on rv64 by default*
 - *PR to oneDNN: openvinotoolkit/oneDNN#260

### Tickets:
 - *N/A *

### Prerequisites:
- [x] #23901
- [x] #23964
- [x] #26175
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin platform: risc-v OpenVINO on RISC-V
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants