Skip to content

perf(wasm): skip serde_json::Value round-trip in renderLatex#32

Merged
erweixin merged 1 commit intoerweixin:mainfrom
EurFelux:perf/wasm-skip-json-value
Apr 7, 2026
Merged

perf(wasm): skip serde_json::Value round-trip in renderLatex#32
erweixin merged 1 commit intoerweixin:mainfrom
EurFelux:perf/wasm-skip-json-value

Conversation

@EurFelux
Copy link
Copy Markdown
Contributor

@EurFelux EurFelux commented Apr 7, 2026

Follow-up to the optimization (1) discussed in #30. Replaces the
to_valuesanitize_json_numbersto_string pipeline with a
single in-place pass over the typed DisplayList, then a direct
serde_json::to_string.

What changed

Before:

let value = serde_json::to_value(&display_list)?;   // ① clone whole tree into Value
let sanitized = sanitize_json_numbers(value);       // ② recursively clone it again
serde_json::to_string(&sanitized)                   // ③ serialize Value tree

After:

let mut display_list = to_display_list(&layout_box);
sanitize_display_list(&mut display_list);           // single in-place walk, no allocation
serde_json::to_string(&display_list)                // direct typed serialize

sanitize_display_list walks DisplayList / DisplayItem / PathCommand
and clamps any non-finite f64 (NaN / Infinity) to 0.0. This preserves
the previous behavior — serde_json's default f64 serializer would
otherwise error on non-finite values, which is exactly why the original
code went through Value (whose from_f64 silently produces Null).

Net effect on the hot path:

Before After
Tree walks 4 (to_value + sanitize + to_string + JSON.parse) 2 (sanitize + to_string + JSON.parse)
Per-node Map/Vec allocations yes (one per object/array) none
Intermediate Value tree yes no

Benchmark results

Measured via the harness from #30, parse+layout stage, 200 iterations + 20 warmup, Chrome. The benchmark is somewhat noisy on small formulas — relative trends are reliable, individual cells less so.

Formula KaTeX (ms) RaTeX before RaTeX after Gap (after)
simple add 0.008 ~0.038 0.033 0.025
Euler 0.011 ~0.052 0.035 0.024
Schrödinger 0.019 0.040 0.021
Einstein field eq. 0.034 0.048 0.014
long mixed 0.072 0.094 0.022

Overall parse-stage speedup: 0.42× → 0.60× (sum of medians). Per-formula numbers are still <1×, but every row improved and the gain is structural rather than noise.

Full-render stage was already a RaTeX win and is not regressed by this change (it goes through the same renderLatex entry point, just with more downstream Canvas work).

What this doesn't fix — and why that's interesting

The most striking thing in the post-optimization numbers is that the gap between RaTeX and KaTeX is essentially constant at ~22 µs across all formulas, regardless of complexity. That is a very clean signal:

  • Simple formula gap: 0.025 ms
  • 5× more complex formula gap: 0.022 ms
  • 10× more complex formula gap: 0.022 ms

If the bottleneck were anything Rust-side (parse, layout, serialize), the gap would scale with formula size. It doesn't. So the remaining ~22 µs lives entirely in:

  1. wasm → JS string copy at the wasm-bindgen boundary
  2. JS-side JSON.parse fixed startup cost
  3. wasm-bindgen's Result<String, JsValue> glue

In other words, this PR has taken the Rust-side serialize cost about as low as it can go without changing the boundary protocol. Any further parse-stage improvement would need to attack the boundary itself — either via a binary protocol (Uint8Array + DataView / bincode) or by pushing the draw call entirely into wasm so the display list never crosses the boundary at all.

I'm not proposing either of those in this PR — you already noted in #30 that the display list isn't large enough to justify a binary protocol, which is a fair call given the complexity cost. Just flagging the constant-gap observation as data for future reference: if parse-stage parity with KaTeX ever becomes a goal, the evidence points clearly at the FFI boundary as the only remaining lever.

(That said: I'd argue parse-stage parity matters less than it looks, because RaTeX's actual unique value lives outside the web target where KaTeX isn't even an option — see #31 for the longer version of that argument.)

Notes

  • I'll update README.md as a separate follow-up — this PR is intentionally scoped to the perf change so the diff is easy to review and revert if needed.
  • All existing tests pass (cargo test -p ratex-wasm, cargo test -p ratex-types).
  • The behavioral contract is unchanged: non-finite f64 values are still clamped to 0 before they reach JS.

cc @erweixin — thanks again for the project! 🐾

The previous pipeline did:
  to_value -> sanitize_json_numbers (recursive clone) -> to_string

For every render this walked the display list four times and allocated
a fresh Map/Vec at every node just to clamp NaN/Infinity. Replace it
with a single in-place pass that clamps non-finite f64 fields and then
calls serde_json::to_string directly on the typed display list.

Refs: erweixin#30
@erweixin
Copy link
Copy Markdown
Owner

erweixin commented Apr 7, 2026

LGTM, Thank you for the PR !

@erweixin erweixin merged commit 4562607 into erweixin:main Apr 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants