Skip to content

Conversation

@byroot
Copy link
Member

@byroot byroot commented Jan 31, 2025

The goal is to be able to dispatch to more optimized search implementations without having to duplicate the escaping code.

Somehow, this is a few % faster already:

== Encoding activitypub.json (52595 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after     2.257k i/100ms
Calculating -------------------------------------
               after     22.930k (± 1.3%) i/s   (43.61 μs/i) -    115.107k in   5.020814s

Comparison:
              before:    21604.0 i/s
               after:    22930.1 i/s - 1.06x  faster

== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   137.000 i/100ms
Calculating -------------------------------------
               after      1.397k (± 1.1%) i/s  (715.57 μs/i) -      6.987k in   5.000408s

Comparison:
              before:     1344.4 i/s
               after:     1397.5 i/s - 1.04x  faster

== Encoding twitter.json (466906 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   249.000 i/100ms
Calculating -------------------------------------
               after      2.464k (± 1.8%) i/s  (405.81 μs/i) -     12.450k in   5.054131s

Comparison:
              before:     2326.5 i/s
               after:     2464.2 i/s - 1.06x  faster

cc @samyron I think that should make integrating SIMD and bit twiddling algorithms easier.

The existence of a state struct should also allow to resume search without rescaning the same bytes and stay aligned like https://lemire.me/blog/2024/07/20/scan-html-even-faster-with-simd-instructions-c-and-c/

What do you think? Does this new refactoring makes things easier for you? (I think it does, but maybe you have a different opinion).

continue;
}
}
search_flush(state);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since search_escape is now responsible for flushing bytes that don't need escaping, it should allow to write an SIMD version that does both scanning and copying at once. e.g. load X bytes, and if none need to be escaped, directly copy them.

@samyron
Copy link
Contributor

samyron commented Jan 31, 2025

I think this looks great. The separation between the searching and the escaping should make the SIMD integration easier.

After this gets merged, I'll re-work the SIMD branches. I might close the existing PRs and create smaller more targeted PRs for each SIMD implementation.

@byroot
Copy link
Member Author

byroot commented Jan 31, 2025

Alright, I'll try to refactor convert_UTF8_to_ASCII_only_JSON in a similar way for consistency.

The goal is to be able to dispatch to more optimized search implementations
without having to duplicate the escaping code.

Somehow, this is a few % faster already:

```
== Encoding activitypub.json (52595 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after     2.257k i/100ms
Calculating -------------------------------------
               after     22.930k (± 1.3%) i/s   (43.61 μs/i) -    115.107k in   5.020814s

Comparison:
              before:    21604.0 i/s
               after:    22930.1 i/s - 1.06x  faster

== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   137.000 i/100ms
Calculating -------------------------------------
               after      1.397k (± 1.1%) i/s  (715.57 μs/i) -      6.987k in   5.000408s

Comparison:
              before:     1344.4 i/s
               after:     1397.5 i/s - 1.04x  faster

== Encoding twitter.json (466906 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   249.000 i/100ms
Calculating -------------------------------------
               after      2.464k (± 1.8%) i/s  (405.81 μs/i) -     12.450k in   5.054131s

Comparison:
              before:     2326.5 i/s
               after:     2464.2 i/s - 1.06x  faster
```
@byroot byroot force-pushed the refactor-convert-utf8 branch from 99a3aea to 8fb5ae8 Compare January 31, 2025 14:59
@byroot byroot merged commit 61cda86 into ruby:master Jan 31, 2025
33 checks passed
@byroot byroot deleted the refactor-convert-utf8 branch January 31, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants