Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements : Speed #3

Open
chris-ha458 opened this issue Sep 24, 2023 · 16 comments
Open

Improvements : Speed #3

chris-ha458 opened this issue Sep 24, 2023 · 16 comments

Comments

@chris-ha458
Copy link
Contributor

chris-ha458 commented Sep 24, 2023

As per our discussion in #2 for speed improvements the following has been suggested

  • calc coherence & mess in threads
  • or calc mess for plugins in threads (or some async?)
  • or something other...

The paths I had in mind was these:

  • Related to threads idea : use Rayon
    • Replace HashMap with concurrent DashMap (Current std HashMap implements rayon so not strictly necessary, but might be useful to look into regardless)
  • Use replace hashing algorithm used in HashMap with FxHash, AHash, HighwayHash
  • Replace sort() with sort_unstable() Sorting was changed to unstable #6
  • Identfiy preallocation opportunities. For instance, replace Vec::new() with Vec::with_capacity()
    • Seems like most current new() cannot really preallocate due to uncertainty. The basic preallocation algorithm is optimized enough that unless we have a strong idea regarding memory access premature allocation is not helpful.

Many of these are low hanging fruit and related to refactoring the code to idiomatic Rust code.
For example, there are many for loops in this code. Iterator based code is more idiomatic, easier to improve with rayon, and interact better with allocation. (pushing items from within a for loop can cause multiple allocs and copies, while collecting an iterator can allow fewer allocations.)

@nickspring
Copy link
Owner

nickspring commented Sep 25, 2023

Thank you for issue.

  • Rayon is undoubtedly a good idea for parallel iterations, it should probably be implemented with feature flag.

  • Do you mean the hashing algorithm for LRU caching in functions (there are a lot of hashing)?

  • Why do you think that sort_unstable() will be faster than sort()? According to documentation it uses quicksort algorithm (sort uses mergesort). Mergesort has better time complexity in comparison to quicksort, hasn't?

@nickspring
Copy link
Owner

nickspring commented Sep 25, 2023

I tried to replace sort() with sort_unstable() and sort_by() with sort_unstable_by(). There is almost no difference in speed, at least for this performance test and benchmark large_datasets. But unstable sorting can produce different results. So I'm not sure it makes sense to use it.

@chris-ha458
Copy link
Contributor Author

Do you mean the hashing algorithm for LRU caching in functions (there are a lot of hashing)?

In rust, hashmaps themselves can use an alternative hashing method for speed or other reasons.
The default hasher is siphash1-3 which is ddos resistant, but not fast.

Why do you think that sort_unstable() will be faster than sort()

This is what the documentation says :

When applicable, unstable sorting is preferred because it is generally faster than stable sorting and it doesn’t allocate auxiliary memory. See sort_unstable.

Also sort on vec is timsort and sort_unstable is pdqsort

Unstable sort is only relevant if there are other indexes to consider. Atleast for the sorted indexes it should yield the same result.

In case different results are not desirable, I think it would be a good idea to have a unit or integration test to verify expected behavior.

@nickspring
Copy link
Owner

OK, I will push sort_unstable to repo. Concerning the hashing function - we need to try to change it. There are a lot of hash operations because of LRU cache, it can help!

@nickspring
Copy link
Owner

hm, it looks like I tested it wrongly and unstable sort speed up it for 10-20%.

@chris-ha458
Copy link
Contributor Author

chris-ha458 commented Sep 26, 2023

Yes, I understand. I plan to work on hash function replacement at this time. Rayon will be implemented later, not a goal right now.
Originally posted by @nickspring in #8 (comment)

Some recommendations for hash functions :

  1. FxHash : I don't like it particularly, but I thought i'd share since this is often the first choice in this kind of situation.
  2. aHash : This is not stable (the hash values change from version to version) and only seeks to be the fastest.
  3. highway-rs : This is often overlooked but it is very fast, not cryptographically tested but gone through extensive testing and validation(original paper and Smhasher). It also has sample code on how one might implement it.
use std::collections::HashMap;
use std::hash::BuildHasherDefault;
use highway::HighwayHasher;
let mut map =
  HashMap::with_hasher(BuildHasherDefault::<HighwayHasher>::default());

map.insert(1, 2);
assert_eq!(map.get(&1), Some(&2));

If I had to choose, I would have chosen highwayhash.
(No known hash vulernabilities like fxhash,ahash,xxhash,wyhash. Much faster than any cryptographic hash and competitive with non-cryptographic good enough hashes. Pure idiomatic rust implementation with SIMD intrinsic pathways.)

@nickspring
Copy link
Owner

Actually it's good question. I tested standard hasher on big file and it was slow. And I added blake3 hasher (not sure if we need fingerprint, as it is in Python version, at all for comparison, maybe we can compare just decoded_payload)

blake3::hash(

@chris-ha458
Copy link
Contributor Author

#14 implements aHash. For pure speed with little other considerations, it is good enough.
It is an unstable hash, meaning there is no guarantee that there will be same hashes across platforms or versions.
This might become a problem if/when serde becomes a goal, but the way the repo is used currently it won't be a problem. We could implement highwayhash or others when it does become a problem.

As such for short term, further hasher investigations would be lower priority.
To improve from here we would have to implement benchmarks to compare siphash,fxhash,ahash etc.

@nickspring
Copy link
Owner

Hashing is used only for caching, so it doesn't matter that it's unstable. Also, we don't serialize hashes via serde.
I сhose ahash as it's implemented already in cached library as a feature.

@nickspring
Copy link
Owner

I did some experiments with Rayon. At this time it looks like it has high overhead. So I plan to try to rewrite mess measurement code in more efficient way (I have some ideas).

@chris-ha458
Copy link
Contributor Author

I was looking into the lib.rs Two main loops.
Especially the named loops 'iana_encodings_loop and 'chunks_loop

Unfortunately, as they are it does not seem like they are easily written into functional Rayon code (if at all)

I agree that some other parts of the code are not good candidates for Rayon due to overhead. Hope you find a good approach!

@chris-ha458
Copy link
Contributor Author

I'm exploring integrating encoding_rs instead of rust-encoding.

rust-encoding seems to be abandoned and unmaintained. encoding_rs is better supported.
However, they work in fundamentally different ways and encoding_rs does not support many current constructs.

It would obviously take some time.

@nickspring
Copy link
Owner

I'm exploring integrating encoding_rs instead of rust-encoding.

rust-encoding seems to be abandoned and unmaintained. encoding_rs is better supported. However, they work in fundamentally different ways and encoding_rs does not support many current constructs.

It would obviously take some time.

Yes, it doesn't fit to the concept of this library.

@nickspring
Copy link
Owner

It looks like new mess detector approach is faster #31

--> A) charset-normalizer-rs Conclusions
   --> Accuracy: 97.1%
   --> Total time: 1.147958363s
   --> Avg time: 2.806743ms
   --> 50th: 1.044498ms
   --> 95th: 8.228578ms
   --> 99th: 19.302748ms

------------------------------
--> B) chardet Conclusions
   --> Faster than charset-normalizer-rs by 0.6 times
   --> Accuracy: 82.6%
   --> Total time: 2.083252561s
   --> Avg time: 5.093526ms
   --> 50th: 331.699µs
   --> 95th: 2.515793ms
   --> 99th: 16.465157ms

------------------------------
--> C) chardetng Conclusions
   --> Faster than charset-normalizer-rs by 0.9 times
   --> Accuracy: 90.7%
   --> Total time: 1.253477517s
   --> Avg time: 3.064736ms
   --> 50th: 821.598µs
   --> 95th: 8.578877ms
   --> 99th: 24.627836ms

However I broke a lot of idiomatic code in ms.rs.
Tomorrow I will clean code (remove unused functions, rewrite tests for new mess detector, refactor code).

@chris-ha458
Copy link
Contributor Author

From the list, only Rayon remains.
Unfortunately the current implementation is streamlined enough that naive parallelization is not helpful.

If there are any ideas, let me know. In the meantime, feel free to close this comment.
I think it would be relevant again when there is a major change in this repo or any upstream dependencies (most of which are also highly optimized)

@nickspring
Copy link
Owner

Yes, I have an idea, to refactor and speed up cd.rs module. I think it's possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants