RFC: implementation of SSE 4.2 compatible parsing (incl. utf8) #30

sunnygleason · 2019-07-19T23:13:15Z

Greetings! Thank you so much for your amazing work with
simdjson-rs.

I did some work earlier this year on an SSE 4.2 port of simdjson,
and I'd love to gain more experience with Rust.

I humbly submit this code for your comment & consideration. I'm
not an expert in Rust feature detection & conditional compilation,
but hopefully this code gives an idea of what the SSE-compatible
code looks like.

If you think it is an interesting idea, I'd be happy to do whatever
work to get it in shape for possible merging.

The items that are still noticeable TODO:

utf8 validation (can port SSE version for conditional usage)
conditional compilation/feature detection

Thank you again for your consideration!

Sincerely,

-Sunny Gleason

Licenser · 2019-07-20T06:15:43Z

Hi Sunny,
That’s a great idea! Making simdjson-rs usable on more platforms is an absolute improvement. I’ll be traveling today so I’m restricted to flaky internet and a tablet so I can’t really try the code but from what I’ve seen it looks good.

For encapsulation given that it will span multiple modules we might want to have a avx and a sse42 sub module but that would be some housekeeping not a blocker :). It’s just a bit of thinking ahead for later since arm would also be a good target later on and with 3 different targets sub modules become a little cleaner.

The conditional compilation can be a bit tricky ( and require some experimentation ) but you are on the right track with guarding the mod line.

If you don’t have access to a avx capable host I can set you up with a user on a test box of mine.

…sing)

sunnygleason · 2019-07-23T02:17:19Z

@Licenser thank you so much for the directions - I've done a bit of tweaking to demonstrate the parts that are avx and sse specific. I'm still new to rust modules, so I wasn't able to get the code organized exactly like I hoped with conditional compilation. I was able to hack it by using a symlink of lib.rs to the appropriate architecture-specific version.

I was hoping maybe you would know what to do when you see the patterns between the _avx2 and _sse42 versions of the files. Thank you again for your consideration, I look forward to any pointers you might have re: the modularization!

Licenser · 2019-07-23T12:24:12Z

So there is a trick of getting a list of target features. For me the command would be:

 $ rustc --target=x86_64-apple-darwin --print target-features

Available features for this target:
    16bit-mode                    - 16-bit mode (i8086).
    32bit-mode                    - 32-bit mode (80386).
    3dnow                         - Enable 3DNow! instructions.
    3dnowa                        - Enable 3DNow! Athlon instructions.
    64bit                         - Support 64-bit instructions.
    64bit-mode                    - 64-bit mode (x86_64).
    adx                           - Support ADX instructions.
    aes                           - Enable AES instructions.
    atom                          - Intel Atom processors.
    avx                           - Enable AVX instructions.
    avx2                          - Enable AVX2 instructions.
    avx512bitalg                  - Enable AVX-512 Bit Algorithms.
    avx512bw                      - Enable AVX-512 Byte and Word Instructions.
    avx512cd                      - Enable AVX-512 Conflict Detection Instructions.
    avx512dq                      - Enable AVX-512 Doubleword and Quadword Instructions.
    avx512er                      - Enable AVX-512 Exponential and Reciprocal Instructions.
    avx512f                       - Enable AVX-512 instructions.
    avx512ifma                    - Enable AVX-512 Integer Fused Multiple-Add.
    avx512pf                      - Enable AVX-512 PreFetch Instructions.
    avx512vbmi                    - Enable AVX-512 Vector Byte Manipulation Instructions.
    avx512vbmi2                   - Enable AVX-512 further Vector Byte Manipulation Instructions.
    avx512vl                      - Enable AVX-512 Vector Length eXtensions.
    avx512vnni                    - Enable AVX-512 Vector Neural Network Instructions.
    avx512vpopcntdq               - Enable AVX-512 Population Count Instructions.
    bmi                           - Support BMI instructions.
    bmi2                          - Support BMI2 instructions.
    cldemote                      - Enable Cache Demote.
    clflushopt                    - Flush A Cache Line Optimized.
    clwb                          - Cache Line Write Back.
    clzero                        - Enable Cache Line Zero.
    cmov                          - Enable conditional move instructions.
    cx16                          - 64-bit with cmpxchg16b.
    ermsb                         - REP MOVS/STOS are fast.
    f16c                          - Support 16-bit floating point conversion instructions.
    false-deps-lzcnt-tzcnt        - LZCNT/TZCNT have a false dependency on dest register.
    false-deps-popcnt             - POPCNT has a false dependency on dest register.
    fast-11bytenop                - Target can quickly decode up to 11 byte NOPs.
    fast-15bytenop                - Target can quickly decode up to 15 byte NOPs.
    fast-bextr                    - Indicates that the BEXTR instruction is implemented as a single uop with good throughput..
    fast-gather                   - Indicates if gather is reasonably fast..
    fast-hops                     - Prefer horizontal vector math instructions (haddp, phsub, etc.) over normal vector instructions with shuffles.
    fast-lzcnt                    - LZCNT instructions are as fast as most simple integer ops.
    fast-partial-ymm-or-zmm-write - Partial writes to YMM/ZMM registers are fast.
    fast-scalar-fsqrt             - Scalar SQRT is fast (disable Newton-Raphson).
    fast-shld-rotate              - SHLD can be used as a faster rotate.
    fast-variable-shuffle         - Shuffles with variable masks are fast.
    fast-vector-fsqrt             - Vector SQRT is fast (disable Newton-Raphson).
    fma                           - Enable three-operand fused multiple-add.
    fma4                          - Enable four-operand fused multiple-add.
    fsgsbase                      - Support FS/GS Base instructions.
    fxsr                          - Support fxsave/fxrestore instructions.
    gfni                          - Enable Galois Field Arithmetic Instructions.
    glm                           - Intel Goldmont processors.
    glp                           - Intel Goldmont Plus processors.
    idivl-to-divb                 - Use 8-bit divide for positive values less than 256.
    idivq-to-divl                 - Use 32-bit divide for positive values less than 2^32.
    invpcid                       - Invalidate Process-Context Identifier.
    lea-sp                        - Use LEA for adjusting the stack pointer.
    lea-uses-ag                   - LEA instruction needs inputs at AG stage.
    lwp                           - Enable LWP instructions.
    lzcnt                         - Support LZCNT instruction.
    macrofusion                   - Various instructions can be fused with conditional branches.
    merge-to-threeway-branch      - Merge branches to a three-way conditional branch.
    mmx                           - Enable MMX instructions.
    movbe                         - Support MOVBE instruction.
    movdir64b                     - Support movdir64b instruction.
    movdiri                       - Support movdiri instruction.
    mpx                           - Support MPX instructions.
    mwaitx                        - Enable MONITORX/MWAITX timer functionality.
    nopl                          - Enable NOPL instruction.
    pad-short-functions           - Pad short functions.
    pclmul                        - Enable packed carry-less multiplication instructions.
    pconfig                       - platform configuration instruction.
    pku                           - Enable protection keys.
    popcnt                        - Support POPCNT instruction.
    prefer-256-bit                - Prefer 256-bit AVX instructions.
    prefetchwt1                   - Prefetch with Intent to Write and T1 Hint.
    prfchw                        - Support PRFCHW instructions.
    ptwrite                       - Support ptwrite instruction.
    rdpid                         - Support RDPID instructions.
    rdrnd                         - Support RDRAND instruction.
    rdseed                        - Support RDSEED instruction.
    retpoline                     - Remove speculation of indirect branches from the generated code, either by avoiding them entirely or lowering them with a speculation blocking construct..
    retpoline-external-thunk      - When lowering an indirect call or branch using a `retpoline`, rely on the specified user provided thunk rather than emitting one ourselves. Only has effect when combined with some other retpoline feature..
    retpoline-indirect-branches   - Remove speculation of indirect branches from the generated code..
    retpoline-indirect-calls      - Remove speculation of indirect calls from the generated code..
    rtm                           - Support RTM instructions.
    sahf                          - Support LAHF and SAHF instructions.
    sgx                           - Enable Software Guard Extensions.
    sha                           - Enable SHA instructions.
    shstk                         - Support CET Shadow-Stack instructions.
    slm                           - Intel Silvermont processors.
    slow-3ops-lea                 - LEA instruction with 3 ops or certain registers is slow.
    slow-incdec                   - INC and DEC instructions are slower than ADD and SUB.
    slow-lea                      - LEA instruction with certain arguments is slow.
    slow-pmaddwd                  - PMADDWD is slower than PMULLD.
    slow-pmulld                   - PMULLD instruction is slow.
    slow-shld                     - SHLD instruction is slow.
    slow-two-mem-ops              - Two memory operand instructions are slow.
    slow-unaligned-mem-16         - Slow unaligned 16-byte memory access.
    slow-unaligned-mem-32         - Slow unaligned 32-byte memory access.
    soft-float                    - Use software floating point features..
    sse                           - Enable SSE instructions.
    sse-unaligned-mem             - Allow unaligned memory operands with SSE instructions.
    sse2                          - Enable SSE2 instructions.
    sse3                          - Enable SSE3 instructions.
    sse4.1                        - Enable SSE 4.1 instructions.
    sse4.2                        - Enable SSE 4.2 instructions.
    sse4a                         - Support SSE 4a instructions.
    ssse3                         - Enable SSSE3 instructions.
    tbm                           - Enable TBM instructions.
    tremont                       - Intel Tremont processors.
    vaes                          - Promote selected AES instructions to AVX512/AVX registers.
    vpclmulqdq                    - Enable vpclmulqdq instructions.
    waitpkg                       - Wait and pause enhancements.
    wbnoinvd                      - Write Back No Invalidate.
    x87                           - Enable X87 float instructions.
    xop                           - Enable XOP instructions.
    xsave                         - Support xsave instructions.
    xsavec                        - Support xsavec instructions.
    xsaveopt                      - Support xsaveopt instructions.
    xsaves                        - Support xsaves instructions.

Use +feature to enable a feature, or -feature to disable it.
For example, rustc -C -target-cpu=mycpu -C target-feature=+feature1,-feature2

For us avx (or avx2) and sse4 are the ones that matter.

with that knowledge we can use pre compiler guards to handle dependant compilation.

if you have say stage1/avx.rs and stage1/sse4.rs you can do something like:

stage1.rs

#[cfg(target_feature = "avx")]
mod avx;
#[cfg(target_feature = "avx")]
pub use avx::*;
// if we have avx we already included the above and any avx cpu has sse4 too AFAIK
#[cfg(all(target_feature = "sse4"), not(target_feature = "avx"))] 
mod sse4;
#[cfg(all(target_feature = "sse4"), not(target_feature = "avx"))] 
pub use sse4::*;

// the pub use makes it still possible to `use stage1::*` from outside so we contain the guard.

Note: not tested since I'm traveling and don't have full access to all my computers, sorry about that.

Licenser · 2019-07-27T15:58:59Z

Back from the travels, if you want any help with the dependant compilation I can jump in and make a PR against your branch with an example but it's up to you -I don't want to steal the learning opportunity.

sunnygleason · 2019-07-27T18:28:22Z

@Licenser oh nice! I am just hopping back onto this now & should have a revised PR for you in the next 2-3h if all goes well. I love the learning and I'll take any suggestions you have as well - the goal is to get it into a form you feel proud to merge...

sunnygleason · 2019-07-27T20:15:18Z

src/avx2_deser.rs

+
+impl<'de> Deserializer<'de> {
+    #[cfg_attr(not(feature = "no-inline"), inline(always))]
+    pub fn parse_str_(&mut self) -> Result<&'de str> {


parse_str_ is the only method currently platform-specific in Deserializer (made it public to ease testing, might not have to be that way)

sunnygleason · 2019-07-27T20:15:52Z

src/avx2_stage1.rs

@@ -9,6 +9,8 @@ use std::arch::x86_64::*;

 use std::mem;

+pub const SIMDJSON_PADDING: usize = mem::size_of::<__m256i>();


made constant architecture-specific, even though values are same

src/lib.rs

sunnygleason · 2019-07-27T20:17:07Z

src/sse42_deser.rs

+
+impl<'de> Deserializer<'de> {
+    #[cfg_attr(not(feature = "no-inline"), inline(always))]
+    pub fn parse_str_(&mut self) -> Result<&'de str> {


parse_str_ is the only method currently platform-specific in Deserializer (made it public to ease testing, might not have to be that way)

sunnygleason · 2019-07-27T20:17:53Z

src/sse42_utf8check.rs

+ */
+
+// all byte values must be no larger than 0xF4
+


this implementation is largely the same as the AVX2 version, just half-width

sunnygleason · 2019-07-27T20:18:41Z

src/sse42_utf8check.rs

+
+// all byte values must be no larger than 0xF4
+#[cfg_attr(not(feature = "no-inline"), inline)]
+fn avxcheck_smaller_than_0xf4(current_bytes: __m128i, has_error: &mut __m128i) {


the avx* names are a misnomer, I held off on a renaming party (for now) thinking we'd need to change both places (AVX2 and SSE42)

Ja with rust we have modules so we don't really need function prefixes :) good catch!

src/value/generator.rs

sunnygleason · 2019-07-27T20:21:51Z

@Licenser ok, this is ready for you to take a look -- let me know what you think, thank you again for the tips!

I'm still stuck a tiny bit on the conditional compilation -- I think it's close, just needs sensible defaulting so things like cargo test work with the right features etc.

Interesting thing -- benchmarks tended to be within 10-20% of AVX2 (good news, "slow" is still fast on slow platforms; bad news, "fast" could probably be faster on fast platforms)

sunnygleason · 2019-07-27T20:28:27Z

Laptop benchmarks (not the best, but slightly interesting)

AVX2

Benchmarking apache_builds/simd_json
Benchmarking apache_builds/simd_json: Warming up for 3.0000 s
Benchmarking apache_builds/simd_json: Collecting 100 samples in estimated 5.8026 s (20k iterations)
Benchmarking apache_builds/simd_json: Analyzing
apache_builds/simd_json time:   [234.49 us 235.77 us 237.01 us]
                        thrpt:  [512.13 MiB/s 514.83 MiB/s 517.64 MiB/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Benchmarking apache_builds/simd_json-owned
Benchmarking apache_builds/simd_json-owned: Warming up for 3.0000 s
Benchmarking apache_builds/simd_json-owned: Collecting 100 samples in estimated 7.8109 s (10k iterations)
Benchmarking apache_builds/simd_json-owned: Analyzing
apache_builds/simd_json-owned
                        time:   [733.70 us 736.96 us 740.80 us]
                        thrpt:  [163.85 MiB/s 164.70 MiB/s 165.43 MiB/s]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

SSE42

Benchmarking apache_builds/simd_json
Benchmarking apache_builds/simd_json: Warming up for 3.0000 s
Benchmarking apache_builds/simd_json: Collecting 100 samples in estimated 5.7037 s (20k iterations)
Benchmarking apache_builds/simd_json: Analyzing
apache_builds/simd_json time:   [250.20 us 251.54 us 252.96 us]
                        thrpt:  [479.84 MiB/s 482.55 MiB/s 485.13 MiB/s]
                 change:
                        time:   [+4.8700% +6.3486% +7.5335%] (p = 0.00 < 0.05)
                        thrpt:  [-7.0057% -5.9696% -4.6438%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Benchmarking apache_builds/simd_json-owned
Benchmarking apache_builds/simd_json-owned: Warming up for 3.0000 s
Benchmarking apache_builds/simd_json-owned: Collecting 100 samples in estimated 7.9369 s (10k iterations)
Benchmarking apache_builds/simd_json-owned: Analyzing
apache_builds/simd_json-owned
                        time:   [746.03 us 749.28 us 752.88 us]
                        thrpt:  [161.22 MiB/s 161.99 MiB/s 162.70 MiB/s]
                 change:
                        time:   [-0.6106% +0.5125% +1.4529%] (p = 0.36 > 0.05)
                        thrpt:  [-1.4321% -0.5099% +0.6144%]
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe

Licenser

It all in all looks great! I would restructure it a bit to make it more rusty and less cy.

Basically add a avx2.rs and move:

avx2_deser.rs -> avx2/deser.rs
avx2_stage1.rs -> avx2/stage1.rs
avx2_utf8check.rs -> avx2/utf8check.rs

(and the same for sse42 files)

I think that would clean the file structure up a bit and contain the different implementations to the degree that we only need a single compilation dependant line (or two) in lib.rs to include either the sse42 or the avx2 version.

That plus moving from crate features to cpu features and I think this is done. :)

Licenser · 2019-07-28T09:48:57Z

Cargo.toml

@@ -46,6 +46,10 @@ harness = false

 [features]
 default = ["swar-number-parsing", "serde_impl"]
+# AVX2 compatibility -- TODO, figure out default


Those should not be features they are architecture dependancies. rust exposes them already and will pick the 'right' one depending on the compilation target.

Exposing them as features can lead to comilations that are either less performat when rust uses polyfills for them or straight out won't run.

https://doc.rust-lang.org/reference/conditional-compilation.html is the documentation. I think the right set would be:

#[cfg(feature = "avx2")] and #[cfg(all(not(feature = "avx2"), feature = "sse4.2")] (we need to exclude avx on the second one since avx2 CPUs usually (always?) support sse4.2 as well)

src/sse42_deser.rs

Licenser

This looks great! I will through it to 24h of fuzzing just to be sure and merge it if nothing breaks!

sunnygleason · 2019-07-28T15:07:42Z

@Licenser thank you again! (snuck one commit by you for the generator compile-time constant)

Licenser · 2019-07-28T16:44:59Z

To make it compile on the test bench I had to add the following changes. Can we include them? Other then that the fuzzer is running :)

diff --git a/simd-fuzz-target/Makefile b/simd-fuzz-target/Makefile
index 4d3c548..061b8bb 100644
--- a/simd-fuzz-target/Makefile
+++ b/simd-fuzz-target/Makefile
@@ -1,7 +1,11 @@
 run: build
-	RUSTFLAGS='-C codegen-units=1' cargo +nightly afl fuzz -i in -o out target/debug/simd-fuzz-target
+	RUSTFLAGS='-C codegen-units=1 -C target-cpu=native' cargo +nightly afl fuzz -i in -o out target/debug/simd-fuzz-target
 build: 
 	RUSTFLAGS='-C codegen-units=1' cargo +nightly afl build
+run-sse: build-sse
+	RUSTFLAGS='-C codegen-units=1 -C target-cpu=native -C target-feature=-avx2' cargo +nightly afl fuzz -i in -o out target/debug/simd-fuzz-target
+build-sse: 
+	RUSTFLAGS='-C codegen-units=1 -C target-cpu=native -C target-feature=-avx2' cargo +nightly afl build
 
 copy:
 	for from in `ls out/crashes/id*`; do to=`echo $$from | sed -e 's;out/crashes/id:;crash;' -e 's;,.*;.json;'`; cp $$from ../simdjson-rs/data/crash/$$to; done
diff --git a/src/lib.rs b/src/lib.rs
index 1723909..2516a0e 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -78,31 +78,31 @@ mod charutils;
 #[macro_use]
 mod macros;
 mod error;
-mod stringparse;
 mod numberparse;
 mod parsedjson;
 mod portability;
+mod stringparse;
 
 #[cfg(target_feature = "avx2")]
 mod avx2;
 #[cfg(target_feature = "avx2")]
-const SIMDJSON_PADDING : usize = crate::avx2::stage1::SIMDJSON_PADDING;
-#[cfg(target_feature = "avx2")]
 pub use crate::avx2::deser::*;
+#[cfg(target_feature = "avx2")]
+use crate::avx2::stage1::SIMDJSON_PADDING;
 
 #[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
 mod sse42;
 #[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
-const SIMDJSON_PADDING : usize = crate::sse42::stage1::SIMDJSON_PADDING;
-#[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
 pub use crate::sse42::deser::*;
+#[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
+use crate::sse42::stage1::SIMDJSON_PADDING;
 
 mod stage2;
 pub mod value;
 
 use crate::numberparse::Number;
-use std::str;
 use std::mem;
+use std::str;
 
 pub use crate::error::{Error, ErrorType};
 pub use crate::value::*;
diff --git a/src/stage2.rs b/src/stage2.rs
index 29ebe04..7bbb1da 100644
--- a/src/stage2.rs
+++ b/src/stage2.rs
@@ -1,7 +1,10 @@
 #![allow(dead_code)]
+#[cfg(target_feature = "avx2")]
+use crate::avx2::stage1::SIMDJSON_PADDING;
 use crate::charutils::*;
-use crate::{Deserializer, Error, ErrorType, Result, SIMDJSON_PADDING};
-//use crate::portability::*;
+#[cfg(all(target_feature = "sse4.2", not(target_feature = "avx2")))]
+use crate::sse42::stage1::SIMDJSON_PADDING;
+use crate::{Deserializer, Error, ErrorType, Result};
 
 #[cfg_attr(not(feature = "no-inline"), inline(always))]
 pub fn is_valid_true_atom(loc: &[u8]) -> bool {

Licenser · 2019-07-28T17:23:30Z

looking good so far! :D

sunnygleason · 2019-07-29T01:37:29Z

@Licenser thanks again - patch applied (modulo one trailing whitespace char after a colon character in the makefile)!

Licenser · 2019-07-28T17:28:19Z

src/value/generator.rs

@@ -102,7 +107,7 @@ pub trait BaseGenerator {
            // quote characters that gives us a bitmask of 0x1f for that
            // region, only quote (`"`) and backslash (`\`) are not in
            // this range.
-            if is_x86_feature_detected!("avx2") {
+            if AVX2_PRESENT {


even easier, put this around the block:

#[cfg(target_feature = "avx2")] { ... }

That way the code doesn't even get generated when we don't have avx2 present :)

Licenser · 2019-07-29T08:49:37Z

Thanks! Some great work :D

sunnygleason added 4 commits July 22, 2019 14:44

feat: identify/separate AVX-specific classes

2745bf5

feat: symlink for lib_avx2.rs

cf3f6ed

feat: sse42 implementation (appears to have one bug in numeric proces…

fa1f922

…sing)

fix: SIMDJSON_PADDING is 2x m128 size

7784f07

sunnygleason force-pushed the rfc-sse42-impl branch from 9524b04 to 69dacad Compare July 23, 2019 02:10

feat: refactor implementation for conditional compilation

e5e4b18

sunnygleason force-pushed the rfc-sse42-impl branch from 69dacad to e5e4b18 Compare July 27, 2019 20:13

sunnygleason commented Jul 27, 2019

View reviewed changes

src/lib.rs Show resolved Hide resolved

sunnygleason commented Jul 27, 2019

View reviewed changes

src/lib.rs Show resolved Hide resolved

sunnygleason commented Jul 27, 2019

View reviewed changes

src/lib.rs Show resolved Hide resolved

sunnygleason commented Jul 27, 2019

View reviewed changes

src/value/generator.rs Outdated Show resolved Hide resolved

sunnygleason changed the title ~~RFC: implementation of SSE 4.2 compatible parsing (utf8 TODO)~~ RFC: implementation of SSE 4.2 compatible parsing (incl. utf8) Jul 27, 2019

Licenser requested changes Jul 28, 2019

View reviewed changes

sunnygleason added 2 commits July 28, 2019 10:25

fix: remove avx/sse targets

5b3feb3

feat: improved modularity/rustiness

cef2529

Licenser approved these changes Jul 28, 2019

View reviewed changes

feat: improve generator detection (compile-time versus runtime)

a4d315a

feat: fuzzing targets for SSE, minor 'mod/use' reordering

c1d266b

Licenser approved these changes Jul 29, 2019

View reviewed changes

Licenser merged commit efd382c into simd-lite:master Jul 29, 2019

sunnygleason deleted the rfc-sse42-impl branch August 16, 2019 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: implementation of SSE 4.2 compatible parsing (incl. utf8) #30

RFC: implementation of SSE 4.2 compatible parsing (incl. utf8) #30

sunnygleason commented Jul 19, 2019

Licenser commented Jul 20, 2019

sunnygleason commented Jul 23, 2019

Licenser commented Jul 23, 2019

Licenser commented Jul 27, 2019

sunnygleason commented Jul 27, 2019

sunnygleason Jul 27, 2019

sunnygleason Jul 27, 2019

sunnygleason Jul 27, 2019

sunnygleason Jul 27, 2019

sunnygleason Jul 27, 2019

Licenser Jul 28, 2019

sunnygleason commented Jul 27, 2019

sunnygleason commented Jul 27, 2019

Licenser left a comment

Licenser Jul 28, 2019

Licenser Jul 28, 2019

Licenser left a comment

sunnygleason commented Jul 28, 2019

Licenser commented Jul 28, 2019 •

edited

Licenser commented Jul 28, 2019

sunnygleason commented Jul 29, 2019

Licenser Jul 28, 2019

Licenser commented Jul 29, 2019

		@@ -9,6 +9,8 @@ use std::arch::x86_64::*;

		use std::mem;

		pub const SIMDJSON_PADDING: usize = mem::size_of::<__m256i>();

RFC: implementation of SSE 4.2 compatible parsing (incl. utf8) #30

RFC: implementation of SSE 4.2 compatible parsing (incl. utf8) #30

Conversation

sunnygleason commented Jul 19, 2019

Licenser commented Jul 20, 2019

sunnygleason commented Jul 23, 2019

Licenser commented Jul 23, 2019

stage1.rs

Licenser commented Jul 27, 2019

sunnygleason commented Jul 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunnygleason commented Jul 27, 2019

sunnygleason commented Jul 27, 2019

Licenser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licenser left a comment

Choose a reason for hiding this comment

sunnygleason commented Jul 28, 2019

Licenser commented Jul 28, 2019 • edited

Licenser commented Jul 28, 2019

sunnygleason commented Jul 29, 2019

Choose a reason for hiding this comment

Licenser commented Jul 29, 2019

Licenser commented Jul 28, 2019 •

edited