Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations #121

Merged
merged 30 commits into from
Oct 22, 2021
Merged

Optimizations #121

merged 30 commits into from
Oct 22, 2021

Conversation

chewxy
Copy link
Member

@chewxy chewxy commented Oct 18, 2021

No description provided.

@coveralls
Copy link

coveralls commented Oct 18, 2021

Coverage Status

Coverage increased (+0.04%) to 21.906% when pulling b3454bb on optimizations into 3039f42 on master.

… reduce allocations

Results vs prev:

benchmark                                              old ns/op     new ns/op     delta
BenchmarkSoftmax/(3,4)_Float64_axis_0-20               2237          2057          -8.05%
BenchmarkSoftmax/(3,4)_Float32_axis_0-20               2138          1920          -10.20%
BenchmarkSoftmax/(3,4)_Float64_axis_1-20               2112          1798          -14.87%
BenchmarkSoftmax/(3,4)_Float32_axis_1-20               2123          1844          -13.14%
BenchmarkSoftmax/(2,3,2)_Float64_axis_0-20             2236          1937          -13.37%
BenchmarkSoftmax/(2,3,2)_Float32_axis_0-20             2305          2040          -11.50%
BenchmarkSoftmax/(2,3,2)_Float64_axis_1-20             2167          1931          -10.89%
BenchmarkSoftmax/(2,3,2)_Float32_axis_1-20             2261          1884          -16.67%
BenchmarkSoftmax/(2,3,2)_Float64_axis_2-20             2119          2035          -3.96%
BenchmarkSoftmax/(2,3,2)_Float32_axis_2-20             2143          1846          -13.86%
BenchmarkSoftmax/(2,3,2)_Float64_axis_-1-20            2212          1821          -17.68%
BenchmarkSoftmax/(2,3,2)_Float32_axis_-1-20            2164          1930          -10.81%
BenchmarkSoftmax/(641,19,199)_Float64_axis_-1-20       36898948      36137745      -2.06%
BenchmarkSoftmax/(641,_19,_199)_Float32_axis_-1-20     35541861      35019509      -1.47%

benchmark                                              old allocs     new allocs     delta
BenchmarkSoftmax/(3,4)_Float64_axis_0-20               16             12             -25.00%
BenchmarkSoftmax/(3,4)_Float32_axis_0-20               16             12             -25.00%
BenchmarkSoftmax/(3,4)_Float64_axis_1-20               16             12             -25.00%
BenchmarkSoftmax/(3,4)_Float32_axis_1-20               16             12             -25.00%
BenchmarkSoftmax/(2,3,2)_Float64_axis_0-20             16             12             -25.00%
BenchmarkSoftmax/(2,3,2)_Float32_axis_0-20             16             12             -25.00%
BenchmarkSoftmax/(2,3,2)_Float64_axis_1-20             16             12             -25.00%
BenchmarkSoftmax/(2,3,2)_Float32_axis_1-20             16             12             -25.00%
BenchmarkSoftmax/(2,3,2)_Float64_axis_2-20             16             12             -25.00%
BenchmarkSoftmax/(2,3,2)_Float32_axis_2-20             16             12             -25.00%
BenchmarkSoftmax/(2,3,2)_Float64_axis_-1-20            16             12             -25.00%
BenchmarkSoftmax/(2,3,2)_Float32_axis_-1-20            16             12             -25.00%
BenchmarkSoftmax/(641,19,199)_Float64_axis_-1-20       17             13             -23.53%
BenchmarkSoftmax/(641,_19,_199)_Float32_axis_-1-20     17             13             -23.53%

benchmark                                              old bytes     new bytes     delta
BenchmarkSoftmax/(3,4)_Float64_axis_0-20               664           568           -14.46%
BenchmarkSoftmax/(3,4)_Float32_axis_0-20               616           520           -15.58%
BenchmarkSoftmax/(3,4)_Float64_axis_1-20               664           568           -14.46%
BenchmarkSoftmax/(3,4)_Float32_axis_1-20               616           520           -15.58%
BenchmarkSoftmax/(2,3,2)_Float64_axis_0-20             696           600           -13.79%
BenchmarkSoftmax/(2,3,2)_Float32_axis_0-20             648           552           -14.81%
BenchmarkSoftmax/(2,3,2)_Float64_axis_1-20             696           600           -13.79%
BenchmarkSoftmax/(2,3,2)_Float32_axis_1-20             648           552           -14.81%
BenchmarkSoftmax/(2,3,2)_Float64_axis_2-20             696           600           -13.79%
BenchmarkSoftmax/(2,3,2)_Float32_axis_2-20             648           552           -14.81%
BenchmarkSoftmax/(2,3,2)_Float64_axis_-1-20            696           600           -13.79%
BenchmarkSoftmax/(2,3,2)_Float32_axis_-1-20            648           552           -14.81%
BenchmarkSoftmax/(641,19,199)_Float64_axis_-1-20       19392926      19392912      -0.00%
BenchmarkSoftmax/(641,_19,_199)_Float32_axis_-1-20     9701448       9701351       -0.00%
…ocs. Results:

```
benchmark                                              old ns/op     new ns/op     delta
BenchmarkSoftmax/(3,4)_Float64_axis_0-20               2057          1619          -21.29%
BenchmarkSoftmax/(3,4)_Float32_axis_0-20               1920          1563          -18.59%
BenchmarkSoftmax/(3,4)_Float64_axis_1-20               1798          1508          -16.13%
BenchmarkSoftmax/(3,4)_Float32_axis_1-20               1844          1575          -14.59%
BenchmarkSoftmax/(2,3,2)_Float64_axis_0-20             1937          1836          -5.21%
BenchmarkSoftmax/(2,3,2)_Float32_axis_0-20             2040          1672          -18.04%
BenchmarkSoftmax/(2,3,2)_Float64_axis_1-20             1931          1704          -11.76%
BenchmarkSoftmax/(2,3,2)_Float32_axis_1-20             1884          1542          -18.15%
BenchmarkSoftmax/(2,3,2)_Float64_axis_2-20             2035          1558          -23.44%
BenchmarkSoftmax/(2,3,2)_Float32_axis_2-20             1846          1626          -11.92%
BenchmarkSoftmax/(2,3,2)_Float64_axis_-1-20            1821          1552          -14.77%
BenchmarkSoftmax/(2,3,2)_Float32_axis_-1-20            1930          1499          -22.33%
BenchmarkSoftmax/(641,19,199)_Float64_axis_-1-20       36137745      36795574      +1.82%
BenchmarkSoftmax/(641,_19,_199)_Float32_axis_-1-20     35019509      34759423      -0.74%

benchmark                                              old allocs     new allocs     delta
BenchmarkSoftmax/(3,4)_Float64_axis_0-20               12             10             -16.67%
BenchmarkSoftmax/(3,4)_Float32_axis_0-20               12             10             -16.67%
BenchmarkSoftmax/(3,4)_Float64_axis_1-20               12             10             -16.67%
BenchmarkSoftmax/(3,4)_Float32_axis_1-20               12             10             -16.67%
BenchmarkSoftmax/(2,3,2)_Float64_axis_0-20             12             10             -16.67%
BenchmarkSoftmax/(2,3,2)_Float32_axis_0-20             12             10             -16.67%
BenchmarkSoftmax/(2,3,2)_Float64_axis_1-20             12             10             -16.67%
BenchmarkSoftmax/(2,3,2)_Float32_axis_1-20             12             10             -16.67%
BenchmarkSoftmax/(2,3,2)_Float64_axis_2-20             12             10             -16.67%
BenchmarkSoftmax/(2,3,2)_Float32_axis_2-20             12             10             -16.67%
BenchmarkSoftmax/(2,3,2)_Float64_axis_-1-20            12             10             -16.67%
BenchmarkSoftmax/(2,3,2)_Float32_axis_-1-20            12             10             -16.67%
BenchmarkSoftmax/(641,19,199)_Float64_axis_-1-20       13             11             -15.38%
BenchmarkSoftmax/(641,_19,_199)_Float32_axis_-1-20     13             11             -15.38%

benchmark                                              old bytes     new bytes     delta
BenchmarkSoftmax/(3,4)_Float64_axis_0-20               568           528           -7.04%
BenchmarkSoftmax/(3,4)_Float32_axis_0-20               520           480           -7.69%
BenchmarkSoftmax/(3,4)_Float64_axis_1-20               568           528           -7.04%
BenchmarkSoftmax/(3,4)_Float32_axis_1-20               520           480           -7.69%
BenchmarkSoftmax/(2,3,2)_Float64_axis_0-20             600           552           -8.00%
BenchmarkSoftmax/(2,3,2)_Float32_axis_0-20             552           504           -8.70%
BenchmarkSoftmax/(2,3,2)_Float64_axis_1-20             600           552           -8.00%
BenchmarkSoftmax/(2,3,2)_Float32_axis_1-20             552           504           -8.70%
BenchmarkSoftmax/(2,3,2)_Float64_axis_2-20             600           552           -8.00%
BenchmarkSoftmax/(2,3,2)_Float32_axis_2-20             552           504           -8.70%
BenchmarkSoftmax/(2,3,2)_Float64_axis_-1-20            600           552           -8.00%
BenchmarkSoftmax/(2,3,2)_Float32_axis_-1-20            552           504           -8.70%
BenchmarkSoftmax/(641,19,199)_Float64_axis_-1-20       19392912      19392892      -0.00%
BenchmarkSoftmax/(641,_19,_199)_Float32_axis_-1-20     9701351       9701312       -0.00%
```
@chewxy chewxy merged commit ffcb1c7 into master Oct 22, 2021
@chewxy chewxy deleted the optimizations branch October 22, 2021 18:36
@chewxy chewxy restored the optimizations branch November 15, 2021 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants