Skip to content

Conversation

@Qiyu8
Copy link
Member

@Qiyu8 Qiyu8 commented Jan 20, 2021

Introduction

Here is the final part of #17049 , With code reduced by 69%, There has no impact on X86 platform and about 14%~49% increased performance in ARM.

Benchmark

Here is the ASV benchmark result.

SSE2 enabled

· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.7-Cython
·· Building e4402bd8  for virtualenv-py3.7-Cython
·· Installing e4402bd8  into virtualenv-py3.7-Cython
· Running 28 total benchmarks (2 commits * 1 environments * 14 benchmarks)
[  0.00%] · For numpy commit 2908338b  (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.7-Cython
[  0.00%] ·· Benchmarking virtualenv-py3.7-Cython
[  1.79%] ··· Running (bench_linalg.Einsum.time_einsum_contig_contig--)..............
[ 25.00%] · For numpy commit e4402bd8  (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.7-Cython
[ 25.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 26.79%] ··· Running (bench_linalg.Einsum.time_einsum_contig_contig--)..............
[ 50.00%] · For numpy commit e4402bd8  (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 51.79%] ··· bench_linalg.Einsum.time_einsum_contig_contig                                                                                                                                                   ok
[ 51.79%] ··· =============== =========
                   dtype
              --------------- ---------
               numpy.float32   139±2μs
               numpy.float64   240±5μs
              =============== =========

[ 53.57%] ··· bench_linalg.Einsum.time_einsum_contig_outstride0 ok
[ 53.57%] ··· =============== =========
dtype
--------------- ---------
numpy.float32 130±1μs
numpy.float64 229±9μs
=============== =========

[ 55.36%] ··· bench_linalg.Einsum.time_einsum_mul ok
[ 55.36%] ··· =============== =============
dtype
--------------- -------------
numpy.float32 1.40±0.05ms
numpy.float64 2.74±0.03ms
=============== =============

[ 57.14%] ··· bench_linalg.Einsum.time_einsum_multiply ok
[ 57.14%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 65.5±0.7μs
numpy.float64 78.9±3μs
=============== ============

[ 58.93%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_contig ok
[ 58.93%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 42.1±1μs
numpy.float64 43.4±0.5μs
=============== ============

[ 60.71%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_outstride0 ok
[ 60.71%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 32.0±0.7μs
numpy.float64 33.0±2μs
=============== ============

[ 62.50%] ··· bench_linalg.Einsum.time_einsum_noncon_mul ok
[ 62.50%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 39.6±2μs
numpy.float64 46.3±3μs
=============== ==========

[ 64.29%] ··· bench_linalg.Einsum.time_einsum_noncon_multiply ok
[ 64.29%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 67.9±3μs
numpy.float64 78.3±4μs
=============== ==========

[ 66.07%] ··· bench_linalg.Einsum.time_einsum_noncon_outer ok
[ 66.07%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 10.7±0.1ms
numpy.float64 21.8±0.3ms
=============== ============

[ 67.86%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul ok
[ 67.86%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 65.3±1μs
numpy.float64 65.9±6μs
=============== ==========

[ 69.64%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul2 ok
[ 69.64%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 57.7±5μs
numpy.float64 63.5±2μs
=============== ==========

[ 71.43%] ··· bench_linalg.Einsum.time_einsum_outer ok
[ 71.43%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 24.7±0.3ms
numpy.float64 49.9±0.7ms
=============== ============

[ 73.21%] ··· bench_linalg.Einsum.time_einsum_sum_mul ok
[ 73.21%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 55.7±2μs
numpy.float64 59.5±3μs
=============== ==========

[ 75.00%] ··· bench_linalg.Einsum.time_einsum_sum_mul2 ok
[ 75.00%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 55.2±0.9μs
numpy.float64 57.7±2μs
=============== ============

[ 75.00%] · For numpy commit 2908338b (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.7-Cython
[ 75.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 76.79%] ··· bench_linalg.Einsum.time_einsum_contig_contig ok
[ 76.79%] ··· =============== =========
dtype
--------------- ---------
numpy.float32 149±7μs
numpy.float64 237±4μs
=============== =========

[ 78.57%] ··· bench_linalg.Einsum.time_einsum_contig_outstride0 ok
[ 78.57%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 133±8μs
numpy.float64 226±20μs
=============== ==========

[ 80.36%] ··· bench_linalg.Einsum.time_einsum_mul ok
[ 80.36%] ··· =============== =============
dtype
--------------- -------------
numpy.float32 1.31±0.01ms
numpy.float64 2.66±0.06ms
=============== =============

[ 82.14%] ··· bench_linalg.Einsum.time_einsum_multiply ok
[ 82.14%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 65.6±4μs
numpy.float64 79.2±4μs
=============== ==========

[ 83.93%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_contig ok
[ 83.93%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 42.2±1μs
numpy.float64 44.8±3μs
=============== ==========

[ 85.71%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_outstride0 ok
[ 85.71%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 34.3±5μs
numpy.float64 33.0±0.5μs
=============== ============

[ 87.50%] ··· bench_linalg.Einsum.time_einsum_noncon_mul ok
[ 87.50%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 38.8±0.7μs
numpy.float64 41.3±2μs
=============== ============

[ 89.29%] ··· bench_linalg.Einsum.time_einsum_noncon_multiply ok
[ 89.29%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 67.8±3μs
numpy.float64 78.2±4μs
=============== ==========

[ 91.07%] ··· bench_linalg.Einsum.time_einsum_noncon_outer ok
[ 91.07%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 10.8±0.2ms
numpy.float64 22.1±0.2ms
=============== ============

[ 92.86%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul ok
[ 92.86%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 59.9±2μs
numpy.float64 66.2±3μs
=============== ==========

[ 94.64%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul2 ok
[ 94.64%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 56.6±0.9μs
numpy.float64 70.6±7μs
=============== ============

[ 96.43%] ··· bench_linalg.Einsum.time_einsum_outer ok
[ 96.43%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 24.5±0.6ms
numpy.float64 49.2±0.9ms
=============== ============

[ 98.21%] ··· bench_linalg.Einsum.time_einsum_sum_mul ok
[ 98.21%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 56.5±1μs
numpy.float64 62.9±3μs
=============== ==========

[100.00%] ··· bench_linalg.Einsum.time_einsum_sum_mul2 ok
[100.00%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 55.1±0.8μs
numpy.float64 62.3±5μs
=============== ============

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

AV2 enabled

· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.7-Cython
·· Building e4402bd8  for virtualenv-py3.7-Cython
·· Installing e4402bd8  into virtualenv-py3.7-Cython
· Running 28 total benchmarks (2 commits * 1 environments * 14 benchmarks)
[  0.00%] · For numpy commit 2908338b  (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.7-Cython
[  0.00%] ·· Benchmarking virtualenv-py3.7-Cython
[  1.79%] ··· Running (bench_linalg.Einsum.time_einsum_contig_contig--)..............
[ 25.00%] · For numpy commit e4402bd8  (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.7-Cython
[ 25.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 26.79%] ··· Running (bench_linalg.Einsum.time_einsum_contig_contig--)..............
[ 50.00%] · For numpy commit e4402bd8  (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 51.79%] ··· bench_linalg.Einsum.time_einsum_contig_contig                                                                                                                                                   ok
[ 51.79%] ··· =============== =========
                   dtype
              --------------- ---------
               numpy.float32   100±2μs
               numpy.float64   152±6μs
              =============== =========

[ 53.57%] ··· bench_linalg.Einsum.time_einsum_contig_outstride0 ok
[ 53.57%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 146±10μs
numpy.float64 244±2μs
=============== ==========

[ 55.36%] ··· bench_linalg.Einsum.time_einsum_mul ok
[ 55.36%] ··· =============== =============
dtype
--------------- -------------
numpy.float32 1.41±0.02ms
numpy.float64 2.66±0.03ms
=============== =============

[ 57.14%] ··· bench_linalg.Einsum.time_einsum_multiply ok
[ 57.14%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 79.9±2μs
numpy.float64 87.4±2μs
=============== ==========

[ 58.93%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_contig ok
[ 58.93%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 42.6±2μs
numpy.float64 41.6±0.6μs
=============== ============

[ 60.71%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_outstride0 ok
[ 60.71%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 41.0±0.9μs
numpy.float64 40.2±0.5μs
=============== ============

[ 62.50%] ··· bench_linalg.Einsum.time_einsum_noncon_mul ok
[ 62.50%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 49.2±0.4μs
numpy.float64 48.2±1μs
=============== ============

[ 64.29%] ··· bench_linalg.Einsum.time_einsum_noncon_multiply ok
[ 64.29%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 76.7±2μs
numpy.float64 83.3±2μs
=============== ==========

[ 66.07%] ··· bench_linalg.Einsum.time_einsum_noncon_outer ok
[ 66.07%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 10.4±0.1ms
numpy.float64 21.9±0.4ms
=============== ============

[ 67.86%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul ok
[ 67.86%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 68.6±2μs
numpy.float64 70.7±1μs
=============== ==========

[ 69.64%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul2 ok
[ 69.64%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 75.4±6μs
numpy.float64 71.7±1μs
=============== ==========

[ 71.43%] ··· bench_linalg.Einsum.time_einsum_outer ok
[ 71.43%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 24.1±0.3ms
numpy.float64 50.2±0.6ms
=============== ============

[ 73.21%] ··· bench_linalg.Einsum.time_einsum_sum_mul ok
[ 73.21%] ··· =============== ===========
dtype
--------------- -----------
numpy.float32 67.5±2μs
numpy.float64 71.0±10μs
=============== ===========

[ 75.00%] ··· bench_linalg.Einsum.time_einsum_sum_mul2 ok
[ 75.00%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 69.8±4μs
numpy.float64 69.6±2μs
=============== ==========

[ 75.00%] · For numpy commit 2908338b (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.7-Cython
[ 75.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 76.79%] ··· bench_linalg.Einsum.time_einsum_contig_contig ok
[ 76.79%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 101±5μs
numpy.float64 155±10μs
=============== ==========

[ 78.57%] ··· bench_linalg.Einsum.time_einsum_contig_outstride0 ok
[ 78.57%] ··· =============== =========
dtype
--------------- ---------
numpy.float32 146±7μs
numpy.float64 252±7μs
=============== =========

[ 80.36%] ··· bench_linalg.Einsum.time_einsum_mul ok
[ 80.36%] ··· =============== =============
dtype
--------------- -------------
numpy.float32 1.39±0.09ms
numpy.float64 2.63±0.04ms
=============== =============

[ 82.14%] ··· bench_linalg.Einsum.time_einsum_multiply ok
[ 82.14%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 78.6±2μs
numpy.float64 87.2±3μs
=============== ==========

[ 83.93%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_contig ok
[ 83.93%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 47.4±5μs
numpy.float64 53.9±9μs
=============== ==========

[ 85.71%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_outstride0 ok
[ 85.71%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 40.2±2μs
numpy.float64 41.6±4μs
=============== ==========

[ 87.50%] ··· bench_linalg.Einsum.time_einsum_noncon_mul ok
[ 87.50%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 41.3±3μs
numpy.float64 39.6±2μs
=============== ==========

[ 89.29%] ··· bench_linalg.Einsum.time_einsum_noncon_multiply ok
[ 89.29%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 77.5±4μs
numpy.float64 86.9±1μs
=============== ==========

[ 91.07%] ··· bench_linalg.Einsum.time_einsum_noncon_outer ok
[ 91.07%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 10.7±0.2ms
numpy.float64 21.9±0.1ms
=============== ============

[ 92.86%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul ok
[ 92.86%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 67.9±3μs
numpy.float64 70.6±2μs
=============== ==========

[ 94.64%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul2 ok
[ 94.64%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 68.3±1μs
numpy.float64 69.7±1μs
=============== ==========

[ 96.43%] ··· bench_linalg.Einsum.time_einsum_outer ok
[ 96.43%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 25.0±0.4ms
numpy.float64 49.1±0.5ms
=============== ============

[ 98.21%] ··· bench_linalg.Einsum.time_einsum_sum_mul ok
[ 98.21%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 70.0±3μs
numpy.float64 69.2±7μs
=============== ==========

[100.00%] ··· bench_linalg.Einsum.time_einsum_sum_mul2 ok
[100.00%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 69.9±1μs
numpy.float64 69.1±1μs
=============== ==========

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

NEON enabled

· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.7-Cython.
·· Building ebfed05a  for virtualenv-py3.7-Cython................................................
·· Installing ebfed05a  into virtualenv-py3.7-Cython.
· Running 28 total benchmarks (2 commits * 1 environments * 14 benchmarks)
[  0.00%] · For numpy commit efaf210f  (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.7-Cython..................................................
[  0.00%] ·· Benchmarking virtualenv-py3.7-Cython
[  1.79%] ··· Running (bench_linalg.Einsum.time_einsum_contig_contig--)..............
[ 25.00%] · For numpy commit ebfed05a  (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.7-Cython..
[ 25.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 26.79%] ··· Running (bench_linalg.Einsum.time_einsum_contig_contig--)..............
[ 50.00%] · For numpy commit ebfed05a  (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 51.79%] ··· bench_linalg.Einsum.time_einsum_contig_contig                                                       ok
[ 51.79%] ··· =============== =========
                   dtype               
              --------------- ---------
               numpy.float32   198±2μs 
               numpy.float64   353±5μs 
              =============== =========

[ 53.57%] ··· bench_linalg.Einsum.time_einsum_contig_outstride0 ok
[ 53.57%] ··· =============== =========
dtype
--------------- ---------
numpy.float32 224±2μs
numpy.float64 356±6μs
=============== =========

[ 55.36%] ··· bench_linalg.Einsum.time_einsum_mul ok
[ 55.36%] ··· =============== ==========
dtype
--------------- ----------
numpy.float32 487±10μs
numpy.float64 863±20μs
=============== ==========

[ 57.14%] ··· bench_linalg.Einsum.time_einsum_multiply ok
[ 57.14%] ··· =============== ===========
dtype
--------------- -----------
numpy.float32 115±0.5μs
numpy.float64 141±2μs
=============== ===========

[ 58.93%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_contig ok
[ 58.93%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 73.5±0.5μs
numpy.float64 74.0±0.7μs
=============== ============

[ 60.71%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_outstride0 ok
[ 60.71%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 59.6±0.2μs
numpy.float64 60.5±0.6μs
=============== ============

[ 62.50%] ··· bench_linalg.Einsum.time_einsum_noncon_mul ok
[ 62.50%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 70.1±0.4μs
numpy.float64 72.0±0.5μs
=============== ============

[ 64.29%] ··· bench_linalg.Einsum.time_einsum_noncon_multiply ok
[ 64.29%] ··· =============== =========
dtype
--------------- ---------
numpy.float32 115±2μs
numpy.float64 140±2μs
=============== =========

[ 66.07%] ··· bench_linalg.Einsum.time_einsum_noncon_outer ok
[ 66.07%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 2.73±0.1ms
numpy.float64 6.22±0.2ms
=============== ============

[ 67.86%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul ok
[ 67.86%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 99.3±0.4μs
numpy.float64 107±0.8μs
=============== ============

[ 69.64%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul2 ok
[ 69.64%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 99.0±0.3μs
numpy.float64 107±1μs
=============== ============

[ 71.43%] ··· bench_linalg.Einsum.time_einsum_outer ok
[ 71.43%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 10.8±0.4ms
numpy.float64 22.1±0.3ms
=============== ============

[ 73.21%] ··· bench_linalg.Einsum.time_einsum_sum_mul ok
[ 73.21%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 94.5±0.4μs
numpy.float64 98.1±0.3μs
=============== ============

[ 75.00%] ··· bench_linalg.Einsum.time_einsum_sum_mul2 ok
[ 75.00%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 94.3±0.7μs
numpy.float64 98.7±0.4μs
=============== ============

[ 75.00%] · For numpy commit efaf210f (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.7-Cython..
[ 75.00%] ·· Benchmarking virtualenv-py3.7-Cython
[ 76.79%] ··· bench_linalg.Einsum.time_einsum_contig_contig ok
[ 76.79%] ··· =============== =========
dtype
--------------- ---------
numpy.float32 204±6μs
numpy.float64 355±5μs
=============== =========

[ 78.57%] ··· bench_linalg.Einsum.time_einsum_contig_outstride0 ok
[ 78.57%] ··· =============== =========
dtype
--------------- ---------
numpy.float32 226±6μs
numpy.float64 356±4μs
=============== =========

[ 80.36%] ··· bench_linalg.Einsum.time_einsum_mul ok
[ 80.36%] ··· =============== =============
dtype
--------------- -------------
numpy.float32 789±30μs
numpy.float64 1.00±0.01ms
=============== =============

[ 82.14%] ··· bench_linalg.Einsum.time_einsum_multiply ok
[ 82.14%] ··· =============== ===========
dtype
--------------- -----------
numpy.float32 117±0.2μs
numpy.float64 145±3μs
=============== ===========

[ 83.93%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_contig ok
[ 83.93%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 74.2±0.4μs
numpy.float64 76.0±0.6μs
=============== ============

[ 85.71%] ··· bench_linalg.Einsum.time_einsum_noncon_contig_outstride0 ok
[ 85.71%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 59.4±0.3μs
numpy.float64 60.5±0.3μs
=============== ============

[ 87.50%] ··· bench_linalg.Einsum.time_einsum_noncon_mul ok
[ 87.50%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 72.0±0.2μs
numpy.float64 72.9±0.2μs
=============== ============

[ 89.29%] ··· bench_linalg.Einsum.time_einsum_noncon_multiply ok
[ 89.29%] ··· =============== ===========
dtype
--------------- -----------
numpy.float32 115±0.5μs
numpy.float64 140±0.8μs
=============== ===========

[ 91.07%] ··· bench_linalg.Einsum.time_einsum_noncon_outer ok
[ 91.07%] ··· =============== =============
dtype
--------------- -------------
numpy.float32 5.35±0.07ms
numpy.float64 7.07±0.2ms
=============== =============

[ 92.86%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul ok
[ 92.86%] ··· =============== ===========
dtype
--------------- -----------
numpy.float32 101±0.6μs
numpy.float64 109±0.5μs
=============== ===========

[ 94.64%] ··· bench_linalg.Einsum.time_einsum_noncon_sum_mul2 ok
[ 94.64%] ··· =============== ===========
dtype
--------------- -----------
numpy.float32 101±1μs
numpy.float64 109±0.7μs
=============== ===========

[ 96.43%] ··· bench_linalg.Einsum.time_einsum_outer ok
[ 96.43%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 17.7±0.3ms
numpy.float64 25.8±0.5ms
=============== ============

[ 98.21%] ··· bench_linalg.Einsum.time_einsum_sum_mul ok
[ 98.21%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 95.5±0.5μs
numpy.float64 101±0.8μs
=============== ============

[100.00%] ··· bench_linalg.Einsum.time_einsum_sum_mul2 ok
[100.00%] ··· =============== ============
dtype
--------------- ------------
numpy.float32 95.5±0.8μs
numpy.float64 98.8±0.5μs
=============== ============

   before           after         ratio
 [efaf210f]       [ebfed05a]
 <master>         <einsum-twooperands>
  • 1.00±0.01ms         863±20μs     0.86  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>)
    
  •  25.8±0.5ms       22.1±0.3ms     0.86  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>)
    
  •    789±30μs         487±10μs     0.62  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>)
    
  •  17.7±0.3ms       10.8±0.4ms     0.61  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>)
    
  • 5.35±0.07ms       2.73±0.1ms     0.51  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>)
    

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

System Info

  Arm x86
Hardware KunPeng  
Processor ARMv8 2.6GMHZ 8 processors Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz
OS Linux ecs-9d50 4.19.36-vhulk1905.1.0.h276.eulerosv2r8.aarch64 Windows Server 2008 R2 Enterprise
Compiler gcc (GCC) 7.3.0 MSVC14.06

@Qiyu8 Qiyu8 added 01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Jan 20, 2021
@eric-wieser eric-wieser changed the title Optimize the sub function two-operands by using SIMD. MAINT: einsum: Optimize the sub function two-operands by using SIMD. Jan 20, 2021
@seiko2plus seiko2plus self-assigned this Jan 20, 2021
@seiko2plus seiko2plus self-requested a review January 20, 2021 22:47
Copy link
Member

@seiko2plus seiko2plus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, Thank you Chunlin!

@seiko2plus
Copy link
Member

The new code improves the accuracy due to the use of FMA too. we will have to dispatch AVX2&FMA3 and AVX512F in runtime.

@charris charris merged commit b91f3c0 into numpy:master Jan 21, 2021
@charris
Copy link
Member

charris commented Jan 21, 2021

Thanks Chunlin.

@Qiyu8 Qiyu8 deleted the einsum-twooperands branch January 21, 2021 01:11
@Qiyu8
Copy link
Member Author

Qiyu8 commented Jan 21, 2021

The dispatching solution for non-UFunc is not recommend in the past, If it is acceptable now, then we can have a discussion on the mailing list.

@ZiqiChai
Copy link

ZiqiChai commented Mar 2, 2021

This is exactly what I'm looking for right now! And I just found it here! Thanks, Chunlin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

01 - Enhancement 03 - Maintenance component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants