New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: implementation of SSE 4.2 compatible parsing (incl. utf8) #30
Conversation
Hi Sunny, For encapsulation given that it will span multiple modules we might want to have a avx and a sse42 sub module but that would be some housekeeping not a blocker :). It’s just a bit of thinking ahead for later since arm would also be a good target later on and with 3 different targets sub modules become a little cleaner. The conditional compilation can be a bit tricky ( and require some experimentation ) but you are on the right track with guarding the mod line. If you don’t have access to a avx capable host I can set you up with a user on a test box of mine. |
9524b04
to
69dacad
Compare
@Licenser thank you so much for the directions - I've done a bit of tweaking to demonstrate the parts that are avx and sse specific. I'm still new to rust modules, so I wasn't able to get the code organized exactly like I hoped with conditional compilation. I was able to hack it by using a symlink of I was hoping maybe you would know what to do when you see the patterns between the |
So there is a trick of getting a list of target features. For me the command would be:
For us with that knowledge we can use pre compiler guards to handle dependant compilation. if you have say stage1.rs
Note: not tested since I'm traveling and don't have full access to all my computers, sorry about that. |
Back from the travels, if you want any help with the dependant compilation I can jump in and make a PR against your branch with an example but it's up to you -I don't want to steal the learning opportunity. |
@Licenser oh nice! I am just hopping back onto this now & should have a revised PR for you in the next 2-3h if all goes well. I love the learning and I'll take any suggestions you have as well - the goal is to get it into a form you feel proud to merge... |
69dacad
to
e5e4b18
Compare
src/avx2_deser.rs
Outdated
|
||
impl<'de> Deserializer<'de> { | ||
#[cfg_attr(not(feature = "no-inline"), inline(always))] | ||
pub fn parse_str_(&mut self) -> Result<&'de str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_str_ is the only method currently platform-specific in Deserializer (made it public to ease testing, might not have to be that way)
src/avx2_stage1.rs
Outdated
@@ -9,6 +9,8 @@ use std::arch::x86_64::*; | |||
|
|||
use std::mem; | |||
|
|||
pub const SIMDJSON_PADDING: usize = mem::size_of::<__m256i>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made constant architecture-specific, even though values are same
src/sse42_deser.rs
Outdated
|
||
impl<'de> Deserializer<'de> { | ||
#[cfg_attr(not(feature = "no-inline"), inline(always))] | ||
pub fn parse_str_(&mut self) -> Result<&'de str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_str_ is the only method currently platform-specific in Deserializer (made it public to ease testing, might not have to be that way)
src/sse42_utf8check.rs
Outdated
*/ | ||
|
||
// all byte values must be no larger than 0xF4 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this implementation is largely the same as the AVX2 version, just half-width
src/sse42_utf8check.rs
Outdated
|
||
// all byte values must be no larger than 0xF4 | ||
#[cfg_attr(not(feature = "no-inline"), inline)] | ||
fn avxcheck_smaller_than_0xf4(current_bytes: __m128i, has_error: &mut __m128i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the avx* names are a misnomer, I held off on a renaming party (for now) thinking we'd need to change both places (AVX2 and SSE42)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ja with rust we have modules so we don't really need function prefixes :) good catch!
@Licenser ok, this is ready for you to take a look -- let me know what you think, thank you again for the tips! I'm still stuck a tiny bit on the conditional compilation -- I think it's close, just needs sensible defaulting so things like Interesting thing -- benchmarks tended to be within 10-20% of AVX2 (good news, "slow" is still fast on slow platforms; bad news, "fast" could probably be faster on fast platforms) |
Laptop benchmarks (not the best, but slightly interesting) AVX2
SSE42
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It all in all looks great! I would restructure it a bit to make it more rusty and less cy.
Basically add a avx2.rs
and move:
avx2_deser.rs
->avx2/deser.rs
avx2_stage1.rs
->avx2/stage1.rs
avx2_utf8check.rs
->avx2/utf8check.rs
(and the same for sse42 files)
I think that would clean the file structure up a bit and contain the different implementations to the degree that we only need a single compilation dependant line (or two) in lib.rs
to include either the sse42 or the avx2 version.
That plus moving from crate features to cpu features and I think this is done. :)
Cargo.toml
Outdated
@@ -46,6 +46,10 @@ harness = false | |||
|
|||
[features] | |||
default = ["swar-number-parsing", "serde_impl"] | |||
# AVX2 compatibility -- TODO, figure out default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those should not be features they are architecture dependancies. rust exposes them already and will pick the 'right' one depending on the compilation target.
Exposing them as features can lead to comilations that are either less performat when rust uses polyfills for them or straight out won't run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/reference/conditional-compilation.html is the documentation. I think the right set would be:
#[cfg(feature = "avx2")]
and #[cfg(all(not(feature = "avx2"), feature = "sse4.2")]
(we need to exclude avx on the second one since avx2 CPUs usually (always?) support sse4.2 as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I will through it to 24h of fuzzing just to be sure and merge it if nothing breaks!
@Licenser thank you again! (snuck one commit by you for the generator compile-time constant) |
To make it compile on the test bench I had to add the following changes. Can we include them? Other then that the fuzzer is running :) diff --git a/simd-fuzz-target/Makefile b/simd-fuzz-target/Makefile
index 4d3c548..061b8bb 100644
--- a/simd-fuzz-target/Makefile
+++ b/simd-fuzz-target/Makefile
@@ -1,7 +1,11 @@
run: build
- RUSTFLAGS='-C codegen-units=1' cargo +nightly afl fuzz -i in -o out target/debug/simd-fuzz-target
+ RUSTFLAGS='-C codegen-units=1 -C target-cpu=native' cargo +nightly afl fuzz -i in -o out target/debug/simd-fuzz-target
build:
RUSTFLAGS='-C codegen-units=1' cargo +nightly afl build
+run-sse: build-sse
+ RUSTFLAGS='-C codegen-units=1 -C target-cpu=native -C target-feature=-avx2' cargo +nightly afl fuzz -i in -o out target/debug/simd-fuzz-target
+build-sse:
+ RUSTFLAGS='-C codegen-units=1 -C target-cpu=native -C target-feature=-avx2' cargo +nightly afl build
copy:
for from in `ls out/crashes/id*`; do to=`echo $$from | sed -e 's;out/crashes/id:;crash;' -e 's;,.*;.json;'`; cp $$from ../simdjson-rs/data/crash/$$to; done
diff --git a/src/lib.rs b/src/lib.rs
index 1723909..2516a0e 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -78,31 +78,31 @@ mod charutils;
#[macro_use]
mod macros;
mod error;
-mod stringparse;
mod numberparse;
mod parsedjson;
mod portability;
+mod stringparse;
#[cfg(target_feature = "avx2")]
mod avx2;
#[cfg(target_feature = "avx2")]
-const SIMDJSON_PADDING : usize = crate::avx2::stage1::SIMDJSON_PADDING;
-#[cfg(target_feature = "avx2")]
pub use crate::avx2::deser::*;
+#[cfg(target_feature = "avx2")]
+use crate::avx2::stage1::SIMDJSON_PADDING;
#[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
mod sse42;
#[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
-const SIMDJSON_PADDING : usize = crate::sse42::stage1::SIMDJSON_PADDING;
-#[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
pub use crate::sse42::deser::*;
+#[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
+use crate::sse42::stage1::SIMDJSON_PADDING;
mod stage2;
pub mod value;
use crate::numberparse::Number;
-use std::str;
use std::mem;
+use std::str;
pub use crate::error::{Error, ErrorType};
pub use crate::value::*;
diff --git a/src/stage2.rs b/src/stage2.rs
index 29ebe04..7bbb1da 100644
--- a/src/stage2.rs
+++ b/src/stage2.rs
@@ -1,7 +1,10 @@
#![allow(dead_code)]
+#[cfg(target_feature = "avx2")]
+use crate::avx2::stage1::SIMDJSON_PADDING;
use crate::charutils::*;
-use crate::{Deserializer, Error, ErrorType, Result, SIMDJSON_PADDING};
-//use crate::portability::*;
+#[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
+use crate::sse42::stage1::SIMDJSON_PADDING;
+use crate::{Deserializer, Error, ErrorType, Result};
#[cfg_attr(not(feature = "no-inline"), inline(always))]
pub fn is_valid_true_atom(loc: &[u8]) -> bool { |
@Licenser thanks again - patch applied (modulo one trailing whitespace char after a colon character in the makefile)! |
@@ -102,7 +107,7 @@ pub trait BaseGenerator { | |||
// quote characters that gives us a bitmask of 0x1f for that | |||
// region, only quote (`"`) and backslash (`\`) are not in | |||
// this range. | |||
if is_x86_feature_detected!("avx2") { | |||
if AVX2_PRESENT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even easier, put this around the block:
#[cfg(target_feature = "avx2")]
{
...
}
That way the code doesn't even get generated when we don't have avx2 present :)
Thanks! Some great work :D |
Greetings! Thank you so much for your amazing work with
simdjson-rs.
I did some work earlier this year on an SSE 4.2 port of simdjson,
and I'd love to gain more experience with Rust.
I humbly submit this code for your comment & consideration. I'm
not an expert in Rust feature detection & conditional compilation,
but hopefully this code gives an idea of what the SSE-compatible
code looks like.
If you think it is an interesting idea, I'd be happy to do whatever
work to get it in shape for possible merging.
The items that are still noticeable TODO:
Thank you again for your consideration!
Sincerely,
-Sunny Gleason