`array_chunks` (performance feature), `alloc` improvements, code modularisation. #3

ZaneHannanAU · 2021-07-08T11:55:05Z

Implements the algorithm using array_chunks/chunks_exact for branchless performance improvements, and potentially reduced code size for constrained devices.
Improvements to initial allocation for String and Vec to reduce allocations during runtime.
Splits runtime code into decode, encode, and test modules.

ZaneHannanAU · 2021-07-08T13:03:15Z

Note: benchmarks are currently untested. Running tests now.

…code`.

ZaneHannanAU · 2021-07-09T21:55:29Z

... ok, so checking out main gives a performance loss of ... lot?

07:38:52 $ cargo criterion 
07:39:16
   Compiling base45 v3.0.0 (/home/zeen3/documents/projects/base45)
    Finished bench [optimized] target(s) in 16.86s

running 15 tests
test tests::decode_ab ... ignored
test tests::decode_base45 ... ignored
test tests::decode_fail ... ignored
test tests::decode_fail_out_of_range ... ignored
test tests::decode_hello ... ignored
test tests::decode_ietf ... ignored
test tests::decode_long_string ... ignored
test tests::encode_ab ... ignored
test tests::encode_base45 ... ignored
test tests::encode_emoji ... ignored
test tests::encode_hello ... ignored
test tests::encode_hello_from_buffer ... ignored
test tests::encode_ietf ... ignored
test tests::encode_long_string ... ignored
test tests::encode_unicode ... ignored

test result: ok. 0 passed; 0 failed; 15 ignored; 0 measured; 0 filtered out; finished in 0.00s

encode long string      time:   [6.2442 us 6.2594 us 6.2764 us]                                
                        change: [+693.71% +696.46% +699.17%] (p = 0.00 < 0.05)
                        Performance has regressed.

encode long string from buffer                                                                             
                        time:   [6.3520 us 6.3731 us 6.3981 us]
                        change: [+694.97% +697.78% +700.66%] (p = 0.00 < 0.05)
                        Performance has regressed.

decode long string      time:   [1.7816 us 1.7912 us 1.8025 us]                                
                        change: [+52.709% +54.116% +55.609%] (p = 0.00 < 0.05)
                        Performance has regressed.

07:40:30 $ git checkout array_chunks 
07:41:16
Switched to branch 'array_chunks'
07:41:16 $ cargo criterion
07:41:21
   Compiling base45 v3.1.0 (/home/zeen3/documents/projects/base45)
warning: use of deprecated function `encode::encode_from_buffer`: Equivalent to `encode`. Use `encode` instead.
   --> src/tests.rs:116:23
    |
116 |         let encoded = encode_from_buffer(&b[..]);
    |                       ^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(deprecated)]` on by default

warning: use of deprecated function `encode::encode_from_buffer`: Equivalent to `encode`. Use `encode` instead.
  --> src/tests.rs:94:9
   |
94 |         encode_from_buffer(vec![72, 101, 108, 108, 111, 33, 33]),
   |         ^^^^^^^^^^^^^^^^^^

warning: use of deprecated function `base45::encode_from_buffer`: Equivalent to `encode`. Use `encode` instead.
  --> benches/bench.rs:12:13
   |
12 |             base45::encode_from_buffer(black_box(vec![
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(deprecated)]` on by default

warning: 2 warnings emitted

warning: 1 warning emitted

    Finished bench [optimized] target(s) in 12.21s

running 21 tests
test tests::decode_ab ... ignored
test tests::decode_base45 ... ignored
test tests::decode_fail ... ignored
test tests::decode_fail_out_of_range ... ignored
test tests::decode_hello ... ignored
test tests::decode_ietf ... ignored
test tests::decode_long_string ... ignored
test tests::encode_ab ... ignored
test tests::encode_base45 ... ignored
test tests::encode_emoji ... ignored
test tests::encode_hello ... ignored
test tests::encode_hello_from_buffer ... ignored
test tests::encode_ietf ... ignored
test tests::encode_long_string ... ignored
test tests::encode_unicode ... ignored
test tests::bench_decode_quick_brown_fox ... bench:     113,692 ns/iter (+/- 1,843)
test tests::bench_encode_quick_brown_fox ... bench:      77,526 ns/iter (+/- 248)
test tests::bench_encode_random_0x10     ... bench:         276 ns/iter (+/- 8)
test tests::bench_encode_random_0x100    ... bench:       2,693 ns/iter (+/- 37)
test tests::bench_encode_random_0x1000   ... bench:      39,509 ns/iter (+/- 187)
test tests::bench_encode_random_0x10000  ... bench:     646,121 ns/iter (+/- 4,902)

test result: ok. 0 passed; 0 failed; 15 ignored; 6 measured; 0 filtered out; finished in 8.02s

encode long string      time:   [736.05 ns 737.40 ns 738.88 ns]                                
                        change: [-88.277% -88.234% -88.189%] (p = 0.00 < 0.05)
                        Performance has improved.

encode long string from buffer                                                                             
                        time:   [764.57 ns 766.43 ns 768.50 ns]
                        change: [-88.037% -87.983% -87.927%] (p = 0.00 < 0.05)
                        Performance has improved.

decode long string      time:   [1.1312 us 1.1349 us 1.1391 us]                                
                        change: [-38.033% -37.372% -36.763%] (p = 0.00 < 0.05)
                        Performance has improved.

07:42:41 $

Of course, there are some optimizations I could be running which I'm currently not but... as of yet I'm working on them. Only 1--2% increase, I guess.

…. Looks weird but meh.

ZaneHannanAU · 2021-07-09T22:10:24Z

Old:

test result: ok. 0 passed; 0 failed; 15 ignored; 0 measured; 0 filtered out; finished in 0.00s

encode long string      time:   [6.2338 us 6.2485 us 6.2658 us]                                
                        change: [+766.60% +772.21% +778.09%] (p = 0.00 < 0.05)
                        Performance has regressed.

encode long string from buffer                                                                             
                        time:   [6.3149 us 6.3315 us 6.3496 us]
                        change: [+737.03% +740.47% +743.92%] (p = 0.00 < 0.05)
                        Performance has regressed.

decode long string      time:   [1.7251 us 1.7291 us 1.7335 us]                                
                        change: [+49.848% +50.557% +51.242%] (p = 0.00 < 0.05)
                        Performance has regressed.

New:

Switched to branch 'array_chunks'
08:03:58 $ cargo criterion
#...
test tests::bench_decode_quick_brown_fox ... bench:     131,063 ns/iter (+/- 819)
test tests::bench_encode_quick_brown_fox ... bench:      66,336 ns/iter (+/- 134)
test tests::bench_encode_random_0x10     ... bench:         246 ns/iter (+/- 1)
test tests::bench_encode_random_0x100    ... bench:       2,596 ns/iter (+/- 18)
test tests::bench_encode_random_0x1000   ... bench:      38,448 ns/iter (+/- 890)
test tests::bench_encode_random_0x10000  ... bench:     621,165 ns/iter (+/- 5,123)

test result: ok. 0 passed; 0 failed; 17 ignored; 6 measured; 0 filtered out; finished in 4.21s

encode long string      time:   [720.96 ns 722.17 ns 723.59 ns]                                
                        change: [-88.622% -88.543% -88.468%] (p = 0.00 < 0.05)
                        Performance has improved.

encode long string from buffer                                                                             
                        time:   [749.84 ns 751.25 ns 752.83 ns]
                        change: [-88.157% -88.107% -88.059%] (p = 0.00 < 0.05)
                        Performance has improved.

decode long string      time:   [1.1376 us 1.1402 us 1.1429 us]                                
                        change: [-34.342% -34.040% -33.743%] (p = 0.00 < 0.05)
                        Performance has improved.

08:05:14 $

with array_chunks:

08:07:01 $ cargo criterion --features array_chunks
#...
test tests::bench_decode_quick_brown_fox ... bench:     135,537 ns/iter (+/- 8,804)
test tests::bench_encode_quick_brown_fox ... bench:      66,740 ns/iter (+/- 292)
test tests::bench_encode_random_0x10     ... bench:         250 ns/iter (+/- 5)
test tests::bench_encode_random_0x100    ... bench:       2,569 ns/iter (+/- 27)
test tests::bench_encode_random_0x1000   ... bench:      38,377 ns/iter (+/- 698)
test tests::bench_encode_random_0x10000  ... bench:     629,568 ns/iter (+/- 38,156)

test result: ok. 0 passed; 0 failed; 17 ignored; 6 measured; 0 filtered out; finished in 2.63s

encode long string      time:   [723.99 ns 726.25 ns 728.91 ns]                                
                        change: [-0.1916% +0.2028% +0.5912%] (p = 0.32 > 0.05)
                        No change in performance detected.

encode long string from buffer                                                                             
                        time:   [744.31 ns 745.91 ns 747.66 ns]
                        change: [-1.1878% -0.7567% -0.3479%] (p = 0.00 < 0.05)
                        Change within noise threshold.

decode long string      time:   [1.1544 us 1.1571 us 1.1601 us]                                
                        change: [+1.0319% +1.4711% +1.9136%] (p = 0.00 < 0.05)
                        Performance has regressed.

08:08:20 $

Unsure as to how this handles. Relatively, there's not too much that needs work, I think...

With other potential performance boosts available (variable length arrays on stack), we could see more minor improvements in the long term, though.

ZaneHannanAU · 2021-07-11T01:53:39Z

vs branch main, no features:

test result: ok. 0 passed; 0 failed; 17 ignored; 0 measured; 0 filtered out; finished in 0.00s

encode long string      time:   [516.50 ns 517.81 ns 519.25 ns]                               
                        change: [-91.750% -91.712% -91.676%] (p = 0.00 < 0.05)
                        Performance has improved.

encode long string from buffer                                                                             
                        time:   [540.79 ns 542.17 ns 543.65 ns]
                        change: [-91.428% -91.391% -91.352%] (p = 0.00 < 0.05)
                        Performance has improved.

decode long string      time:   [530.35 ns 531.53 ns 532.82 ns]                                
                        change: [-69.520% -69.397% -69.264%] (p = 0.00 < 0.05)
                        Performance has improved.

I think 90% encode and >50% decode perf is ok...

believer · 2022-02-09T09:47:38Z

Sorry that I haven't looked at this, I've been away on parental leave! 😅 I just merged #2 and I don't know how that affects your PR

ZaneHannanAU · 2022-02-24T04:25:55Z

All good, shit happens. Basically just dumped everything since I had basically rewritten the entire thing...

Version updating is handled by semantic release in actions workflow

Also handled by semantic release

believer · 2023-06-30T14:03:33Z

Finally remembered this, thank you so much for the contribution! 🎉

# [3.1.0](v3.0.0...v3.1.0) (2023-06-30) ### Features * `array_chunks` (performance feature), `alloc` improvements ([#3](#3)) ([9e0e933](9e0e933)) * performance improvements ([#2](#2)) ([63db6f2](63db6f2))

Zane Hannan added 2 commits July 8, 2021 21:47

Update Cargo.toml for array_chunks feature, improve/split decode/encode.

bac8b6d

Update Cargo.toml version & changelog to match.

73d5e06

ZaneHannanAU changed the title ~~array_chunks (performance feature), alloc improvements.~~ array_chunks (performance feature), alloc improvements, splits code. Jul 8, 2021

ZaneHannanAU changed the title ~~array_chunks (performance feature), alloc improvements, splits code.~~ array_chunks (performance feature), alloc improvements, code modularisation. Jul 8, 2021

ZaneHannanAU changed the title ~~array_chunks (performance feature), alloc improvements, code modularisation.~~ array_chunks (performance feature), alloc improvements, code modularisation. Jul 8, 2021

Zane Hannan added 3 commits July 8, 2021 23:11

Add benchmarking stuff

5b39c98

mildly improve microoptimisations

d43e8a3

Add deprecation notice of encode_from_buffer, as it is equal to `en…

b2f2f74

…code`.

Microoptimisations for encoder, inline decoder. Test 0xff byte string…

6060508

…. Looks weird but meh.

Zane Hannan added 3 commits July 10, 2021 08:24

fix wording

df4c4b9

cut the unnecessary alloc, halving decode time once again.

f7c5d82

improve alg once again

4d7d5d9

Merge branch 'main' into array_chunks

cc13985

believer added 6 commits June 30, 2023 15:46

chore: revert version update

d557f4b

Version updating is handled by semantic release in actions workflow

chore: revert changelog update

561c068

Also handled by semantic release

chore(deps): update dependencies

34b09b7

chore(lint): fix recommendations from clippy

63f44f1

chore: formatting fixes

fb8b2ff

chore: update checkout action

8e20768

believer merged commit 9e0e933 into opendevtools:main Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`array_chunks` (performance feature), `alloc` improvements, code modularisation. #3

`array_chunks` (performance feature), `alloc` improvements, code modularisation. #3

ZaneHannanAU commented Jul 8, 2021 •

edited

Loading

ZaneHannanAU commented Jul 8, 2021

ZaneHannanAU commented Jul 9, 2021

ZaneHannanAU commented Jul 9, 2021 •

edited

Loading

ZaneHannanAU commented Jul 11, 2021

believer commented Feb 9, 2022

ZaneHannanAU commented Feb 24, 2022

believer commented Jun 30, 2023

array_chunks (performance feature), alloc improvements, code modularisation. #3

array_chunks (performance feature), alloc improvements, code modularisation. #3

Conversation

ZaneHannanAU commented Jul 8, 2021 • edited Loading

ZaneHannanAU commented Jul 8, 2021

ZaneHannanAU commented Jul 9, 2021

ZaneHannanAU commented Jul 9, 2021 • edited Loading

ZaneHannanAU commented Jul 11, 2021

believer commented Feb 9, 2022

ZaneHannanAU commented Feb 24, 2022

believer commented Jun 30, 2023

`array_chunks` (performance feature), `alloc` improvements, code modularisation. #3

`array_chunks` (performance feature), `alloc` improvements, code modularisation. #3

ZaneHannanAU commented Jul 8, 2021 •

edited

Loading

ZaneHannanAU commented Jul 9, 2021 •

edited

Loading