POWER10: Add optimized dgemm kernel#9652
Conversation
This patch makes use of POWER10 matrix multiply assist feature and adds new DGEMM kernel.
|
@yufenglee Requesting review. I also have handled common header changes in this PR that you commented in DGEMM PR last week. |
|
@yufenglee @snnn Requesting review. |
|
@yufenglee Just a reminder on this review. |
| MLAS_FLOAT64X2 ABroadcast[RowCount] | ||
| ) | ||
| { | ||
| ABroadcast[0] = vec_mergee (AElements[0], AElements[1]); |
There was a problem hiding this comment.
[](http://example.com/codeflow?start=0&length=8)
nit: space
|
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU TensorRT CI Pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
|
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
|
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
|
Azure Pipelines successfully started running 6 pipeline(s). |
|
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU TensorRT CI Pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
|
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
|
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
|
Azure Pipelines successfully started running 6 pipeline(s). |
|
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline |
|
Azure Pipelines successfully started running 10 pipeline(s). |
|
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline |
|
Azure Pipelines successfully started running 10 pipeline(s). |
|
@yufenglee Thanks for the review. |
This patch makes use of POWER10 matrix multiply assist feature and
adds new DGEMM kernel.