Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a fast path for integer parsing #692

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

casperisfine
Copy link

rb_cstr2inum isn't very fast because it handles tons of different scenarios, and also require a NULL terminated string which forces us to copy the number into a secondary buffer.

But since the parser already computed the length, we can much more cheaply do this with a very simple function as long as the number is small enough to fit into a native type (long long).

If the number is too long, we can fallback to the rb_cstr2inum slowpath.

Before:

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    40.000 i/100ms
                  oj    35.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    425.941 (± 1.9%) i/s    (2.35 ms/i) -      2.160k in   5.072833s
                  oj    349.617 (± 1.7%) i/s    (2.86 ms/i) -      1.750k in   5.006953s
          Oj::Parser    464.767 (± 1.7%) i/s    (2.15 ms/i) -      2.340k in   5.036381s
           rapidjson    382.413 (± 2.4%) i/s    (2.61 ms/i) -      1.938k in   5.070757s

Comparison:
                json:      425.9 i/s
          Oj::Parser:      464.8 i/s - 1.09x  faster
           rapidjson:      382.4 i/s - 1.11x  slower
                  oj:      349.6 i/s - 1.22x  slower

After:

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    46.000 i/100ms
                  oj    33.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    462.332 (± 3.2%) i/s    (2.16 ms/i) -      2.346k in   5.080504s
                  oj    351.140 (± 1.1%) i/s    (2.85 ms/i) -      1.782k in   5.075616s
          Oj::Parser    473.500 (± 1.3%) i/s    (2.11 ms/i) -      2.385k in   5.037695s
           rapidjson    395.052 (± 3.5%) i/s    (2.53 ms/i) -      1.989k in   5.042275s

Comparison:
                json:      462.3 i/s
          Oj::Parser:      473.5 i/s - same-ish: difference falls within error
           rapidjson:      395.1 i/s - 1.17x  slower
                  oj:      351.1 i/s - 1.32x  slower

`rb_cstr2inum` isn't very fast because it handles tons of
different scenarios, and also require a NULL terminated string
which forces us to copy the number into a secondary buffer.

But since the parser already computed the length, we can much more
cheaply do this with a very simple function as long as the number
is small enough to fit into a native type (`long long`).

If the number is too long, we can fallback to the `rb_cstr2inum`
slowpath.

Before:

```
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    40.000 i/100ms
                  oj    35.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    425.941 (± 1.9%) i/s    (2.35 ms/i) -      2.160k in   5.072833s
                  oj    349.617 (± 1.7%) i/s    (2.86 ms/i) -      1.750k in   5.006953s
          Oj::Parser    464.767 (± 1.7%) i/s    (2.15 ms/i) -      2.340k in   5.036381s
           rapidjson    382.413 (± 2.4%) i/s    (2.61 ms/i) -      1.938k in   5.070757s

Comparison:
                json:      425.9 i/s
          Oj::Parser:      464.8 i/s - 1.09x  faster
           rapidjson:      382.4 i/s - 1.11x  slower
                  oj:      349.6 i/s - 1.22x  slower
```

After:

```
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.0dev (2024-11-06T07:59:09Z precompute-hash-wh.. 7943f98a8a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    46.000 i/100ms
                  oj    33.000 i/100ms
          Oj::Parser    45.000 i/100ms
           rapidjson    39.000 i/100ms
Calculating -------------------------------------
                json    462.332 (± 3.2%) i/s    (2.16 ms/i) -      2.346k in   5.080504s
                  oj    351.140 (± 1.1%) i/s    (2.85 ms/i) -      1.782k in   5.075616s
          Oj::Parser    473.500 (± 1.3%) i/s    (2.11 ms/i) -      2.385k in   5.037695s
           rapidjson    395.052 (± 3.5%) i/s    (2.53 ms/i) -      1.989k in   5.042275s

Comparison:
                json:      462.3 i/s
          Oj::Parser:      473.5 i/s - same-ish: difference falls within error
           rapidjson:      395.1 i/s - 1.17x  slower
                  oj:      351.1 i/s - 1.32x  slower
```
@@ -1488,20 +1488,42 @@ enum {JSON_integer_en_main = 1};
#line 695 "parser.rl"


#define MAX_FAST_INTEGER_SIZE 18
static inline VALUE fast_parse_integer(char *p, char *pe)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't even a fast path to be honest, it's a very naive integer parsing function. Dunno if it's worth parsing multiple bytes at a time with __builtin_bswap*?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to a profile, in citm_catalog which is the benchmark with the most integers, we're spending 0.6% in fast_parse_integer, and 1.8% in the overall JSON_parse_integer.

So maybe not worth it.

@byroot byroot merged commit 3f950f2 into ruby:master Nov 6, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants