Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize character conversion #15

Merged

Conversation

ManyTheFish
Copy link
Contributor

@ManyTheFish ManyTheFish commented Apr 28, 2022

Add benchmarks and optimize character-converter.

Commits

Add Benchmarks

  test benches::simplified_to_traditional ... bench:     129,776 ns/iter (+/- 4,669)
  test benches::traditional_to_simplified ... bench:     121,232 ns/iter (+/- 2,866)

Avoid string allocations

  test benches::simplified_to_traditional ... bench:     112,347 ns/iter (+/- 5,848)
  test benches::traditional_to_simplified ... bench:     103,843 ns/iter (+/- 6,099)

Avoid iterating from the start of the string

  test benches::simplified_to_traditional ... bench:       94,790 ns/iter (+/- 625)
  test benches::traditional_to_simplified ... bench:       85,548 ns/iter (+/- 2,355)

Create slices instead of collecting chars into an allocated string

  test benches::simplified_to_traditional ... bench:      19,382 ns/iter (+/- 224)
  test benches::traditional_to_simplified ... bench:      16,697 ns/iter (+/- 358)

Find by prefix using an FST

  test benches::bench_simplified_to_traditional ... bench:       2,590 ns/iter (+/- 135)
  test benches::bench_traditional_to_simplified ... bench:       2,568 ns/iter (+/- 21)

Add more benchmarks

  test benches::bench_simplified_is_simplified   ... bench:       2,086 ns/iter (+/- 47)
  test benches::bench_simplified_is_traditional  ... bench:         535 ns/iter (+/- 27)
  test benches::bench_simplified_to_simplified   ... bench:       2,454 ns/iter (+/- 18)
  test benches::bench_simplified_to_traditional  ... bench:       2,578 ns/iter (+/- 20)
  test benches::bench_traditional_is_simplified  ... bench:         546 ns/iter (+/- 25)
  test benches::bench_traditional_is_traditional ... bench:       1,961 ns/iter (+/- 20)
  test benches::bench_traditional_to_simplified  ... bench:       2,547 ns/iter (+/- 26)
  test benches::bench_traditional_to_traditional ... bench:       2,432 ns/iter (+/- 29)

Use cow instead of always allocating string

This will avoid to create a string when no characters changes.

  test benches::bench_simplified_is_simplified   ... bench:       2,141 ns/iter (+/- 77)
  test benches::bench_simplified_is_traditional  ... bench:         550 ns/iter (+/- 37)
+ test benches::bench_simplified_to_simplified   ... bench:       2,262 ns/iter (+/- 12)
  test benches::bench_simplified_to_traditional  ... bench:       2,580 ns/iter (+/- 31)
  test benches::bench_traditional_is_simplified  ... bench:         572 ns/iter (+/- 10)
  test benches::bench_traditional_is_traditional ... bench:       2,009 ns/iter (+/- 19)
  test benches::bench_traditional_to_simplified  ... bench:       2,562 ns/iter (+/- 18)
+ test benches::bench_traditional_to_traditional ... bench:       2,210 ns/iter (+/- 38)

Encode in a buffer instead of creating a string in is_script

+ test benches::bench_simplified_is_simplified   ... bench:         845 ns/iter (+/- 32)
+ test benches::bench_simplified_is_traditional  ... bench:         188 ns/iter (+/- 1)
  test benches::bench_simplified_to_simplified   ... bench:       2,260 ns/iter (+/- 18)
  test benches::bench_simplified_to_traditional  ... bench:       2,571 ns/iter (+/- 34)
+ test benches::bench_traditional_is_simplified  ... bench:         170 ns/iter (+/- 2)
+ test benches::bench_traditional_is_traditional ... bench:         787 ns/iter (+/- 6)
  test benches::bench_traditional_to_simplified  ... bench:       2,561 ns/iter (+/- 44)
  test benches::bench_traditional_to_traditional ... bench:       2,211 ns/iter (+/- 21)

Use String::new() instead of to_string()

  test benches::bench_simplified_is_simplified   ... bench:         884 ns/iter (+/- 22)
  test benches::bench_simplified_is_traditional  ... bench:         194 ns/iter (+/- 4)
+ test benches::bench_simplified_to_simplified   ... bench:       2,230 ns/iter (+/- 27)
+ test benches::bench_simplified_to_traditional  ... bench:       2,532 ns/iter (+/- 19)
  test benches::bench_traditional_is_simplified  ... bench:         206 ns/iter (+/- 1)
  test benches::bench_traditional_is_traditional ... bench:         833 ns/iter (+/- 40)
+ test benches::bench_traditional_to_simplified  ... bench:       2,496 ns/iter (+/- 17)
+ test benches::bench_traditional_to_traditional ... bench:       2,212 ns/iter (+/- 32)

Use with_capacity instead of new when allocating the string

  test benches::bench_simplified_is_simplified   ... bench:         889 ns/iter (+/- 23)
  test benches::bench_simplified_is_traditional  ... bench:         190 ns/iter (+/- 1)
  test benches::bench_simplified_to_simplified   ... bench:       2,235 ns/iter (+/- 10)
+ test benches::bench_simplified_to_traditional  ... bench:       2,420 ns/iter (+/- 72)
  test benches::bench_traditional_is_simplified  ... bench:         197 ns/iter (+/- 3)
  test benches::bench_traditional_is_traditional ... bench:         871 ns/iter (+/- 32)
+ test benches::bench_traditional_to_simplified  ... bench:       2,399 ns/iter (+/- 22)
  test benches::bench_traditional_to_traditional ... bench:       2,177 ns/iter (+/- 17)

Use contains_key in is_script

contains_key version expresses better the behavior of the code despite the small performance decrease.

- test benches::bench_simplified_is_simplified   ... bench:       1,001 ns/iter (+/- 31)
- test benches::bench_simplified_is_traditional  ... bench:         217 ns/iter (+/- 3)
  test benches::bench_simplified_to_simplified   ... bench:       2,237 ns/iter (+/- 18)
  test benches::bench_simplified_to_traditional  ... bench:       2,425 ns/iter (+/- 22)
- test benches::bench_traditional_is_simplified  ... bench:         230 ns/iter (+/- 4)
- test benches::bench_traditional_is_traditional ... bench:         979 ns/iter (+/- 72)
  test benches::bench_traditional_to_simplified  ... bench:       2,406 ns/iter (+/- 56)
  test benches::bench_traditional_to_traditional ... bench:       2,189 ns/iter (+/- 20)

poke @Kerollmops

@ManyTheFish ManyTheFish force-pushed the optimize-character-converter branch 2 times, most recently from 6a56072 to b41ad5b Compare April 28, 2022 15:34
@ManyTheFish ManyTheFish marked this pull request as ready for review April 28, 2022 17:00
@sotch-pr35mac sotch-pr35mac self-assigned this Apr 28, 2022
README.md Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@sotch-pr35mac
Copy link
Owner

@ManyTheFish Please confirm the intention here is to merge this branch into sotch-pr35mac:master and not ManyTheFish:master.

@Kerollmops
Copy link

Kerollmops commented Apr 29, 2022

@sotch-pr35mac, this is indeed what we want to do: merge these improvements into your main branch.

@ManyTheFish ManyTheFish force-pushed the optimize-character-converter branch from 4dfc85f to 982701b Compare May 2, 2022 11:29
@ManyTheFish ManyTheFish force-pushed the optimize-character-converter branch from 982701b to 38be7b4 Compare May 2, 2022 11:34
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@ManyTheFish ManyTheFish force-pushed the optimize-character-converter branch 2 times, most recently from 758c99e to e86a035 Compare May 2, 2022 12:36
@ManyTheFish ManyTheFish force-pushed the optimize-character-converter branch from e86a035 to 404968a Compare May 2, 2022 12:39
src/lib.rs Outdated Show resolved Hide resolved
@ManyTheFish
Copy link
Contributor Author

Hello @sotch-pr35mac, I think the implementation is finished on our side. 🙂

Could you please review this PR in order to merge it?

Thanks!

@sotch-pr35mac
Copy link
Owner

Thanks for letting me know. I'll take a look tonight.

Copy link
Owner

@sotch-pr35mac sotch-pr35mac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash your commits as well.

@@ -9,26 +9,28 @@ Turn Traditional Chinese script to Simplified Chinese script and vice-versa. Che
```rust
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ This is a major breaking change, bump the version number to 2.0.0 above and elsewhere throughout.

use fst::raw::{Fst, Output};
use once_cell::sync::Lazy;

static T2S: Lazy<HashMap<String, String>> =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the implementation with the FST, especially how clean it is to simply walk forward through it. After doing some very basic benchmarking on my laptop I'm noticing ~60x improvement in the conversion post-initialization. Initialization however has increased by ~4x to ~2.2s up from ~0.56s on my machine. Initialization time is very important to me here, so I'd like to keep this at or around the previous time. It is also very important to me that the consumer is able to specify when to perform initialization.

So what I'd like to do to handle these tradeoffs is preprocess the FSTs into "profiles" the same way we have done for the HashMaps. Then we should add three initialization functions, one for the HashMaps, one for the FSTs, and one for both the HashMaps and FSTs. We'll probably need to implement a serialize trait for the FST, but after a cursory look around it doesn't look like that should be out of the question. I'm open to other suggestions on how to reduce the initialization latency, but as it stands the ~4x is just too high.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sotch-pr35mac! Thank you for the review, I made the small changes you wanted.

However, I want to respond to some of your suggestions:

I really like the implementation with the FST, especially how clean it is to simply walk forward through it. After doing some very basic benchmarking on my laptop I'm noticing ~60x improvement in the conversion post-initialization. Initialization however has increased by ~4x to ~2.2s up from ~0.56s on my machine. Initialization time is very important to me here, so I'd like to keep this at or around the previous time. It is also very important to me that the consumer is able to specify when to perform initialization.

So what I'd like to do to handle these tradeoffs is preprocess the FSTs into "profiles" the same way we have done for the HashMaps. Then we should add three initialization functions, one for the HashMaps, one for the FSTs, and one for both the HashMaps and FSTs. We'll probably need to implement a serialize trait for the FST, but after a cursory look around it doesn't look like that should be out of the question. I'm open to other suggestions on how to reduce the initialization latency, but as it stands the ~4x is just too high.

I understand your point on this, I would suggest moving the initialization in a build.rs in order to build the FST at compile time.

Please squash your commits as well.

Can we avoid squashing them, I find it interesting to link atomical commits with the performance gain.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point on this, I would suggest moving the initialization in a build.rs in order to build the FST at compile time.

I'm open to that. Just to confirm my own understanding, you're suggesting that we add a build script to build the FSTs at compile time and serialize them out to a file that's later loaded with Lazy?

Can we avoid squashing them, I find it interesting to link atomical commits with the performance gain.

Hmm... I see, I think that should be fine then.

.rustfmt.toml Outdated Show resolved Hide resolved
@ManyTheFish
Copy link
Contributor Author

Hey @sotch-pr35mac, I made the changes about creating the FST at compile time. Could you retry your initialization tests, please? 😊

@ManyTheFish ManyTheFish force-pushed the optimize-character-converter branch from 08d1674 to ce11684 Compare May 4, 2022 10:21
@sotch-pr35mac
Copy link
Owner

Thanks for the quick turnaround, I will give it another look after work today.

Copy link
Owner

@sotch-pr35mac sotch-pr35mac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good

@sotch-pr35mac sotch-pr35mac merged commit 4b45edb into sotch-pr35mac:master May 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants