-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD vectorization behind a feature flag #120
Comments
It would be interesting to see hermetic tests done for that vs iteration with a LUT. You need to:
And then you still need to do something about the remaining |
I made a very rough test in json-rust. Using this as test data: {"foo":"This is a super long string, omg so long, why is it so long? It is so long because it needs to contain many multiplications of 16 bytes in order for us to get any benefits from SIMD at all! That's why! It's kind of annoying like that, but hey, what can you do..."} Without SIMD:
With SIMD:
This is pretty much the best case scenario for this kind of a thing, but it works. The implementation is pretty simple, before I start running a loop checking bytes one by one, I do this: for _ in 0 .. ($parser.length - $parser.index) >> 4 {
let bytes = simd::u8x16::load($parser.source.as_bytes(), $parser.index);
if (bytes.lt(CT_SIMD) | bytes.eq(BS_SIMD) | bytes.eq(QU_SIMD)).any() {
break;
}
$parser.index += 16;
} Where Gonna run this against json-benchmark, will come back with numbers. |
On |
Same here, even a little slower. It may be that unaligned u8x16 loads are slow - you don't have this part that establishes 16-byte alignment. That would also serve to mitigate the impact on parsing short strings. I would be interested in benchmarking two variants of your code - one that aligns to 16 bytes before the SIMD loop like in RapidJSON, and one that goes 16 bytes past the first 16-byte alignment before starting SIMD. It is all about balancing no-SIMD for short strings (which are most strings) and SIMD for long strings (which are slow otherwise). |
I think the simd lib does the alignment via the repr annotation: /// A SIMD vector of 16 `u8`s.
#[repr(simd)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug, Copy)]
pub struct u8x16(u8, u8, u8, u8, u8, u8, u8, u8,
u8, u8, u8, u8, u8, u8, u8, u8); I've been only catching up with low level stuff this year, so things like byte alignments are still semi exotic to me. I've only looked this up, and the author of the answer is the author of the simd implementation. |
I am talking about alignment of |
Gotcha! I assume casting pointer into usize and grabbing modulo 16 out of it, then running over that many bytes with regular check should be sufficient. |
Yes. And then for the other test, that many bytes + 16 more. |
I don't seem to get any statistically significant difference. Increased the string length to 3155 characters. No SIMD:
SIMD, no alignment:
SIMD, with alignment:
SIMD, with alignment offset by 1:
Pushed to |
Ok, got it, seems let mut bytes: simd::u8x16 = mem::uninitialized();
let bytes_ptr: *mut u8 = mem::transmute(&mut bytes);
ptr::copy_nonoverlapping(
$parser.byte_ptr.offset($parser.index as isize),
bytes_ptr,
16
); Now I get expected, results. With alignment:
With alignment offset 1:
Difference isn't staggering, but it's there. |
Wow, that's interesting! |
SIMD is on track for stabilization 🎉 |
Has anyone done anything with this? |
This recently became a thing: https://github.com/lemire/simdjson |
I ported simdjson to rust the performance boost SIMD gives is significant. Even skipping the tape tricks and adopting a more rust-ergonomic data representation it is up to 300% faster than vanilla-rust serde-json |
How has the discussion progressed so far? It's been 7 years already. |
sonic-rs also shows promising results on json_benchmark. It's developed by Bytedance and its APIs are very similar to serde_json. |
RapidJSON does quite a lot of this.
The text was updated successfully, but these errors were encountered: