Skip to content

Fix compact_script dropping $raw$ custom-syntax bodies#1079

Merged
schungx merged 2 commits intorhaiscript:mainfrom
yuvalrakavy:fix/compact-script-raw-custom-syntax
Apr 10, 2026
Merged

Fix compact_script dropping $raw$ custom-syntax bodies#1079
schungx merged 2 commits intorhaiscript:mainfrom
yuvalrakavy:fix/compact-script-raw-custom-syntax

Conversation

@yuvalrakavy
Copy link
Copy Markdown
Contributor

Summary

Engine::compact_script silently drops every character captured by custom syntax that uses the $raw$ marker (registered via register_custom_syntax_with_state_raw or register_custom_syntax_without_look_ahead_raw). The compacted output contains the custom-syntax keyword and surrounding non-raw tokens, but the raw body is gone.

For example, a grab { let x = 1; let y = 2; print(x + y); } script compacts to grab{ — body and closing brace both stripped. The compacted script then either fails to parse back or, worse, parses as a truncated no-op and only surfaces the corruption at a later recompile.

$block$, $expr$, $ident$, $symbol$, and $token$ are all unaffected — only $raw$ is broken. The plain register_custom_syntax API explicitly rejects $raw$ (src/api/custom_syntax.rs:254), so this only affects the raw-variant registration APIs.

Root cause

The _char_mode branch in TokenIterator::next (src/tokenizer.rs) pulls one character at a time via self.stream.get_next() and returns Token::UnprocessedRawChar(ch) early, before the normal token-compression block at the tail of next() runs. Raw characters therefore never reach the compressed buffer.

Fix

Inside the _char_mode branch, before returning the raw char, append it verbatim to control.compressed when compression is active. Appending verbatim (rather than going through the normal identifier-boundary spacing path) is correct because raw content is opaque to Rhai — there is no grammar to strip whitespace from, and the plugin's parse function may be counting characters in its state machine. The surrounding non-raw tokens are still compacted normally, so the output remains smaller than the input.

Fix is ~10 lines in src/tokenizer.rs. No change to public API.

Test plan

  • Added test_compact_script_preserves_raw_custom_syntax_body to tests/custom_syntax.rs — registers a minimal grab { BODY } raw syntax, compacts a script, and asserts that the body tokens still appear in the compacted form and that the compacted script round-trips through compile.
  • Full cargo test --features internals passes — 0 failures across all test binaries, including the existing $raw$ tests (test_custom_syntax_raw, test_custom_syntax_raw2, test_custom_syntax_raw_interpolation).
  • Standalone minimal reproducer flips from exit 2 (bug) to exit 0 (fixed) with this patch applied.

Background

Discovered while using a timer NAME { BODY } custom syntax in a downstream project (home-automation store server). The timer keyword uses $raw$ to capture the body, and every stored script had its timer body dropped at validation time. The corruption only surfaced at cold load when the broken compact form failed to recompile, producing confusing "unexpected token" errors on scripts that had parsed cleanly moments earlier.

🤖 Generated with Claude Code

`Engine::compact_script` drops every character captured by custom
syntax that uses the `$raw$` marker. Root cause: the `_char_mode`
branch in `TokenIterator::next` pulls one character at a time via
`self.stream.get_next()` and returns `Token::UnprocessedRawChar(ch)`
early, before the normal token-compression block at the tail of
`next()` runs. The compressed buffer therefore never sees raw chars,
and the compacted output contains the custom-syntax keyword and
surrounding tokens but with the raw body silently stripped.

For example, a custom `grab { BODY }` syntax compacts to `grab{`
instead of `grab{BODY}` — the compacted script then either fails
to parse back or, worse, parses as a truncated no-op and only
surfaces the corruption at a later recompile.

Fix: inside the `_char_mode` branch, before returning the raw char,
append it verbatim to `control.compressed` when compression is
active. Appending verbatim (rather than going through the normal
identifier-boundary spacing path) is correct because raw content
is opaque to Rhai — there is no grammar to strip whitespace from,
and the plugin's parse function may be counting characters in its
state machine. The surrounding non-raw tokens are still compacted
normally, so the output remains smaller than the input.

Regression test: test_compact_script_preserves_raw_custom_syntax_body
in tests/custom_syntax.rs registers a minimal `grab { BODY }` raw
syntax, compacts a script, and asserts that the body tokens
(`let x`, `let y`, `print`) still appear in the compacted form
and that the compacted script round-trips through `compile`.

Discovered while using a `timer NAME { BODY }` custom syntax in a
downstream project: every stored script had its timer body dropped
at validation time, with the corruption only surfacing at cold
load when the broken compact form failed to recompile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yuvalrakavy
Copy link
Copy Markdown
Contributor Author

I have fixed the camera compilation error (with no_object). It should pass tests now

@schungx
Copy link
Copy Markdown
Collaborator

schungx commented Apr 10, 2026

Great catch! I already completely forgot compact_script.

There is still one test failing under no_object...

`test_compact_script_preserves_raw_custom_syntax_body` uses `rhai::Map`
which is unavailable under the `no_object` feature, causing a CI build
failure. Add `#[cfg(not(feature = "no_object"))]` to skip the test in
that configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@schungx schungx merged commit 57ec703 into rhaiscript:main Apr 10, 2026
42 checks passed
@schungx schungx added the bug label Apr 10, 2026
@schungx
Copy link
Copy Markdown
Collaborator

schungx commented Apr 10, 2026

Thanks, @yuvalrakavy for this PR.

Just purely out of curiosity, what are you doing with custom syntax?

@yuvalrakavy
Copy link
Copy Markdown
Contributor Author

yuvalrakavy commented Apr 10, 2026 via email

@schungx
Copy link
Copy Markdown
Collaborator

schungx commented Apr 11, 2026

What a coincidence. My introduction to Rust started with me writing the next version of my home automation system which was developed 20 years ago on Windows with C# (I started with .NET since alpha version). I needed scripting to script events, and so I picked up Rhai, eventually became its maintainer.

Would love to compare notes with you.

I didn't go all the way to write my own object store. I used MSSQL and later moved to also support MySQL. What do you see as need to write your own object store inside of using an existing one?

@yuvalrakavy
Copy link
Copy Markdown
Contributor Author

Stephen — apologies for the slow reply. I wanted to give you a proper, informative answer rather than a hand-wave, and putting the write-up together took longer than I expected. What a lovely coincidence, by the way — I'd genuinely love to compare notes.
Funny parallel: around the same era you started writing your home automation system, I was picking up C# and writing — and still maintaining — control software for model trains. Different domain, same flavor of problem (typed object model, real-time events, hardware bridges, scripting for behavior), and a similarly long arc. The Store is in many ways the substrate I wish I'd had back then.
I actually wrote up the overview specifically with you in mind, as a first pass at the article we'll publish alongside the eventual open-source release. It's here:

https://gist.github.com/yuvalrakavy/cfb8a7169e029509e63da20663f26afb

The short version: the system is called the Store — a hierarchical, transactional object store in Rust, where schema, runtime state, automations, and UI logic all live in the same typed tree. Rhai is the scripting substrate everywhere: event handlers, typed store methods, instance/static methods, display formatters, timer bodies, MQTT topic handlers, scheduler conditions. Scripts appear inside an SDL (Schema Definition Language) file that also declares the class itself, so one file declares schema + behavior + reactivity.
A few Rhai-specific things you might find interesting:

on_missing_function is load-bearing for us. We don't pre-register class methods on the engine; instead, on every unresolved call we walk the target object's class chain at call time and dispatch to the appropriate fn / store fn. This is the hook I contributed upstream — it's what makes polymorphic, class-defined Rhai methods possible without forking the language. Thank you for merging it.
Custom timer syntax (the PR you just merged). It reads timer "name" every 1000 { ... }, plus after, after ... every, update, and kill. Implemented with $raw$ + a nested-brace state machine in the parser so the body is a real nested block. The @ in a name expands to self.id for per-instance identity. Scope variables persist across firings. I'd love your eyes on it when you have a moment.
Compile-once-cache-forever for every script; per-class modules merged at load; RhaiObject bridge so obj["Prop"], obj.children, obj.Method(...) all feel native; blocking-task adapters to bridge to Tokio without stalling.

The first application on top is HomeTouch — Lutron / DALI / DMX / HDL / CoolMaster drivers (each a separate MQTT bridge, never in-process), a sunset-aware scheduler, and touchscreens that are true thin clients streaming RFB from a server-side renderer fed by a live gRPC mirror of the object tree. The ancestor is Premise SYS from the early 2000s — which, from your description, sounds like it might be the same era of idea you're revisiting.
Would love to hear about your system. Happy to jump on a call whenever suits you.
Yuval

@schungx
Copy link
Copy Markdown
Collaborator

schungx commented Apr 19, 2026

Wow, that was intense. Mine wasn't in the same League, although the devices we tackled were similar: Lutron, C-Bus, EIB, Midea (their earlier attempt to connect, I worked with the engineer himself), HDL, CoolMaster, and for me also the ELK security panel (designed an abstraction of a security system, but eventually only wrote for the ELK). Interesting as HDL wasn't a very big name in the space at that time, so you started early with them (as did we). Also streaming video (Hulu/Netflix didn't exist at that time) and infrared redirection (people still switch channels, Smart TV didn't exist).

I originally use C# and VB.NET as the scripting language, which was quite easy to do in .NET. Moving to Rust, I had to find a new scripting language. Lua was too hard to integrate at that point (who wants to spend days figuring out how to connect to an essentially C-based library instead of writing interesting code?), JavaScript I hate (had too much of it already, I was one of the very earliest adapters of TypeScript), and then I discovered Rhai. It was still in a relatively early phase of development, but it was simple enough and easy to use enough.

My system was relatively basic, but it was already enough for me to start a business in the 2000's on it. I still have clients paying for regular maintenance, although these days we don't do such systems anymore because cell phones are everywhere, and every electrical and electronic gadget has an App... We are still developing v2 (actually version 3, there are two versions in C#) of the system in Rust, and incidentally choosing to use MQTT for the unified message bus. Incidentally, my system worked exactly like you said: all devices and schema definitions (I called them "Zone" definitions, each room in a premise a zone) reside on the same type tree and can be inspected or written to (which triggers actions translated via drivers to the actual devices). Scripts are used as triggers to tie the whole network together into a unified platform (although no atomic updates... I deliberately used an API model which is fire-and-forget... you poll for the result and continue when it happens or handle time-out). So it seems we came around to the same solutions at around the same time.

Just out of curiosity... what is the need for $raw$ for the block? Why not just use a $block$ and be done with it?

Also, are you using $raw$ for definition blocks (in another syntax)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants