Optimize a scan of non state-chaning bytes with SSE2 instructions · nodejs/llparse@71da0d6

Commit

Optimize a scan of non state-chaning bytes with SSE2 instructions

This commit optimizes the scan of non-state-changing bytes using SSE2 instructions.

A [_mm_cmpestri](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cmpestri) operation appears to be quite slow
compared to alternative approach that involves (_mm_shuffle_epi8)[https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shuffle_epi8]
for low/high nibble of the input and using bitwise and for the results to get a 16 bytes of LUT in one go (it also involves a bunch of other SSE2 operations
which all have nice latency/throughput properties). The resulting LUT of 16 bytes can be analyzed (also vectorized) to get the index of the first byte (if any)
that changes the state. That is done by figuring out the first byte that LUTs to zero.

The tricky part here is the following:

```
Find A, B arrays (uint8_t[16]) such that
* `A[i] | B[j] == 0` if `LUT[i | (j <<4)] == 0`
* `A[i] | B[j] != 0` if `LUT[i | (j <<4)] != 0` // Note we don't need any specific non-zero value
for all i,j = 0..15.
```

To find `A` and `B` satisfying the above conditions a [Z3](https://github.com/Z3Prover/z3) library is used.
The npm package that wrapps z3 for using in ts is not particularly friendly to the author of this change so another package (synckit)
was required to handle the async API for z3-wrapper.

Using llhttp as a benchmark framework this change draws the following improvemnts:

```
Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz

http: "seanmonstar/httparse" (C)
BEFORE: 8192.00 mb | 1456.72 mb/s | 2172811.81 ops/sec | 5.62 s
AFTER:  8192.00 mb | 1752.90 mb/s | 2614577.82 ops/sec | 4.67 s

~20% improvement

http: "nodejs/http-parser" (C)
BEFORE: 8192.00 mb | 1050.60 mb/s | 2118535.14 ops/sec | 7.80 s
AFTER:  8192.00 mb | 1167.42 mb/s | 2354101.76 ops/sec | 7.02 s

~11% improvement
```

For more header-fields-heavy messages numbers might be even more convincing.

Loading branch information

ngrodzitski committed Oct 10, 2023

1 parent 4d7e352 commit 71da0d6

0 comments on commit `71da0d6`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `71da0d6`

Commit

There are no files selected for viewing

0 comments on commit 71da0d6

0 comments on commit `71da0d6`