Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array_chunks (performance feature), alloc improvements, code modularisation. #3

Merged
merged 16 commits into from
Jun 30, 2023

Conversation

ZaneHannanAU
Copy link
Contributor

@ZaneHannanAU ZaneHannanAU commented Jul 8, 2021

  • Implements the algorithm using array_chunks/chunks_exact for branchless performance improvements, and potentially reduced code size for constrained devices.
  • Improvements to initial allocation for String and Vec to reduce allocations during runtime.
  • Splits runtime code into decode, encode, and test modules.

@ZaneHannanAU ZaneHannanAU changed the title array_chunks (performance feature), alloc improvements. array_chunks (performance feature), alloc improvements, splits code. Jul 8, 2021
@ZaneHannanAU ZaneHannanAU changed the title array_chunks (performance feature), alloc improvements, splits code. array_chunks (performance feature), alloc improvements, code modularisation. Jul 8, 2021
@ZaneHannanAU ZaneHannanAU changed the title array_chunks (performance feature), alloc improvements, code modularisation. array_chunks (performance feature), alloc improvements, code modularisation. Jul 8, 2021
@ZaneHannanAU
Copy link
Contributor Author

Note: benchmarks are currently untested. Running tests now.

@ZaneHannanAU
Copy link
Contributor Author

... ok, so checking out main gives a performance loss of ... lot?

07:38:52 $ cargo criterion 
07:39:16
   Compiling base45 v3.0.0 (/home/zeen3/documents/projects/base45)
    Finished bench [optimized] target(s) in 16.86s

running 15 tests
test tests::decode_ab ... ignored
test tests::decode_base45 ... ignored
test tests::decode_fail ... ignored
test tests::decode_fail_out_of_range ... ignored
test tests::decode_hello ... ignored
test tests::decode_ietf ... ignored
test tests::decode_long_string ... ignored
test tests::encode_ab ... ignored
test tests::encode_base45 ... ignored
test tests::encode_emoji ... ignored
test tests::encode_hello ... ignored
test tests::encode_hello_from_buffer ... ignored
test tests::encode_ietf ... ignored
test tests::encode_long_string ... ignored
test tests::encode_unicode ... ignored

test result: ok. 0 passed; 0 failed; 15 ignored; 0 measured; 0 filtered out; finished in 0.00s

encode long string      time:   [6.2442 us 6.2594 us 6.2764 us]                                
                        change: [+693.71% +696.46% +699.17%] (p = 0.00 < 0.05)
                        Performance has regressed.

encode long string from buffer                                                                             
                        time:   [6.3520 us 6.3731 us 6.3981 us]
                        change: [+694.97% +697.78% +700.66%] (p = 0.00 < 0.05)
                        Performance has regressed.

decode long string      time:   [1.7816 us 1.7912 us 1.8025 us]                                
                        change: [+52.709% +54.116% +55.609%] (p = 0.00 < 0.05)
                        Performance has regressed.

07:40:30 $ git checkout array_chunks 
07:41:16
Switched to branch 'array_chunks'
07:41:16 $ cargo criterion
07:41:21
   Compiling base45 v3.1.0 (/home/zeen3/documents/projects/base45)
warning: use of deprecated function `encode::encode_from_buffer`: Equivalent to `encode`. Use `encode` instead.
   --> src/tests.rs:116:23
    |
116 |         let encoded = encode_from_buffer(&b[..]);
    |                       ^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(deprecated)]` on by default

warning: use of deprecated function `encode::encode_from_buffer`: Equivalent to `encode`. Use `encode` instead.
  --> src/tests.rs:94:9
   |
94 |         encode_from_buffer(vec![72, 101, 108, 108, 111, 33, 33]),
   |         ^^^^^^^^^^^^^^^^^^

warning: use of deprecated function `base45::encode_from_buffer`: Equivalent to `encode`. Use `encode` instead.
  --> benches/bench.rs:12:13
   |
12 |             base45::encode_from_buffer(black_box(vec![
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(deprecated)]` on by default

warning: 2 warnings emitted

warning: 1 warning emitted

    Finished bench [optimized] target(s) in 12.21s

running 21 tests
test tests::decode_ab ... ignored
test tests::decode_base45 ... ignored
test tests::decode_fail ... ignored
test tests::decode_fail_out_of_range ... ignored
test tests::decode_hello ... ignored
test tests::decode_ietf ... ignored
test tests::decode_long_string ... ignored
test tests::encode_ab ... ignored
test tests::encode_base45 ... ignored
test tests::encode_emoji ... ignored
test tests::encode_hello ... ignored
test tests::encode_hello_from_buffer ... ignored
test tests::encode_ietf ... ignored
test tests::encode_long_string ... ignored
test tests::encode_unicode ... ignored
test tests::bench_decode_quick_brown_fox ... bench:     113,692 ns/iter (+/- 1,843)
test tests::bench_encode_quick_brown_fox ... bench:      77,526 ns/iter (+/- 248)
test tests::bench_encode_random_0x10     ... bench:         276 ns/iter (+/- 8)
test tests::bench_encode_random_0x100    ... bench:       2,693 ns/iter (+/- 37)
test tests::bench_encode_random_0x1000   ... bench:      39,509 ns/iter (+/- 187)
test tests::bench_encode_random_0x10000  ... bench:     646,121 ns/iter (+/- 4,902)

test result: ok. 0 passed; 0 failed; 15 ignored; 6 measured; 0 filtered out; finished in 8.02s

encode long string      time:   [736.05 ns 737.40 ns 738.88 ns]                                
                        change: [-88.277% -88.234% -88.189%] (p = 0.00 < 0.05)
                        Performance has improved.

encode long string from buffer                                                                             
                        time:   [764.57 ns 766.43 ns 768.50 ns]
                        change: [-88.037% -87.983% -87.927%] (p = 0.00 < 0.05)
                        Performance has improved.

decode long string      time:   [1.1312 us 1.1349 us 1.1391 us]                                
                        change: [-38.033% -37.372% -36.763%] (p = 0.00 < 0.05)
                        Performance has improved.

07:42:41 $

Of course, there are some optimizations I could be running which I'm currently not but... as of yet I'm working on them. Only 1--2% increase, I guess.

@ZaneHannanAU
Copy link
Contributor Author

ZaneHannanAU commented Jul 9, 2021

Old:

test result: ok. 0 passed; 0 failed; 15 ignored; 0 measured; 0 filtered out; finished in 0.00s

encode long string      time:   [6.2338 us 6.2485 us 6.2658 us]                                
                        change: [+766.60% +772.21% +778.09%] (p = 0.00 < 0.05)
                        Performance has regressed.

encode long string from buffer                                                                             
                        time:   [6.3149 us 6.3315 us 6.3496 us]
                        change: [+737.03% +740.47% +743.92%] (p = 0.00 < 0.05)
                        Performance has regressed.

decode long string      time:   [1.7251 us 1.7291 us 1.7335 us]                                
                        change: [+49.848% +50.557% +51.242%] (p = 0.00 < 0.05)
                        Performance has regressed.

New:

Switched to branch 'array_chunks'
08:03:58 $ cargo criterion
#...
test tests::bench_decode_quick_brown_fox ... bench:     131,063 ns/iter (+/- 819)
test tests::bench_encode_quick_brown_fox ... bench:      66,336 ns/iter (+/- 134)
test tests::bench_encode_random_0x10     ... bench:         246 ns/iter (+/- 1)
test tests::bench_encode_random_0x100    ... bench:       2,596 ns/iter (+/- 18)
test tests::bench_encode_random_0x1000   ... bench:      38,448 ns/iter (+/- 890)
test tests::bench_encode_random_0x10000  ... bench:     621,165 ns/iter (+/- 5,123)

test result: ok. 0 passed; 0 failed; 17 ignored; 6 measured; 0 filtered out; finished in 4.21s

encode long string      time:   [720.96 ns 722.17 ns 723.59 ns]                                
                        change: [-88.622% -88.543% -88.468%] (p = 0.00 < 0.05)
                        Performance has improved.

encode long string from buffer                                                                             
                        time:   [749.84 ns 751.25 ns 752.83 ns]
                        change: [-88.157% -88.107% -88.059%] (p = 0.00 < 0.05)
                        Performance has improved.

decode long string      time:   [1.1376 us 1.1402 us 1.1429 us]                                
                        change: [-34.342% -34.040% -33.743%] (p = 0.00 < 0.05)
                        Performance has improved.

08:05:14 $

with array_chunks:

08:07:01 $ cargo criterion --features array_chunks
#...
test tests::bench_decode_quick_brown_fox ... bench:     135,537 ns/iter (+/- 8,804)
test tests::bench_encode_quick_brown_fox ... bench:      66,740 ns/iter (+/- 292)
test tests::bench_encode_random_0x10     ... bench:         250 ns/iter (+/- 5)
test tests::bench_encode_random_0x100    ... bench:       2,569 ns/iter (+/- 27)
test tests::bench_encode_random_0x1000   ... bench:      38,377 ns/iter (+/- 698)
test tests::bench_encode_random_0x10000  ... bench:     629,568 ns/iter (+/- 38,156)

test result: ok. 0 passed; 0 failed; 17 ignored; 6 measured; 0 filtered out; finished in 2.63s

encode long string      time:   [723.99 ns 726.25 ns 728.91 ns]                                
                        change: [-0.1916% +0.2028% +0.5912%] (p = 0.32 > 0.05)
                        No change in performance detected.

encode long string from buffer                                                                             
                        time:   [744.31 ns 745.91 ns 747.66 ns]
                        change: [-1.1878% -0.7567% -0.3479%] (p = 0.00 < 0.05)
                        Change within noise threshold.

decode long string      time:   [1.1544 us 1.1571 us 1.1601 us]                                
                        change: [+1.0319% +1.4711% +1.9136%] (p = 0.00 < 0.05)
                        Performance has regressed.

08:08:20 $

Unsure as to how this handles. Relatively, there's not too much that needs work, I think...

With other potential performance boosts available (variable length arrays on stack), we could see more minor improvements in the long term, though.

@ZaneHannanAU
Copy link
Contributor Author

vs branch main, no features:

test result: ok. 0 passed; 0 failed; 17 ignored; 0 measured; 0 filtered out; finished in 0.00s

encode long string      time:   [516.50 ns 517.81 ns 519.25 ns]                               
                        change: [-91.750% -91.712% -91.676%] (p = 0.00 < 0.05)
                        Performance has improved.

encode long string from buffer                                                                             
                        time:   [540.79 ns 542.17 ns 543.65 ns]
                        change: [-91.428% -91.391% -91.352%] (p = 0.00 < 0.05)
                        Performance has improved.

decode long string      time:   [530.35 ns 531.53 ns 532.82 ns]                                
                        change: [-69.520% -69.397% -69.264%] (p = 0.00 < 0.05)
                        Performance has improved.

I think 90% encode and >50% decode perf is ok...

@believer
Copy link
Member

believer commented Feb 9, 2022

Sorry that I haven't looked at this, I've been away on parental leave! 😅 I just merged #2 and I don't know how that affects your PR

@ZaneHannanAU
Copy link
Contributor Author

All good, shit happens. Basically just dumped everything since I had basically rewritten the entire thing...

@believer believer merged commit 9e0e933 into opendevtools:main Jun 30, 2023
@believer
Copy link
Member

Finally remembered this, thank you so much for the contribution! 🎉

github-actions bot pushed a commit that referenced this pull request Jun 30, 2023
# [3.1.0](v3.0.0...v3.1.0) (2023-06-30)

### Features

* `array_chunks` (performance feature), `alloc` improvements ([#3](#3)) ([9e0e933](9e0e933))
* performance improvements ([#2](#2)) ([63db6f2](63db6f2))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants