Fix compact_script dropping $raw$ custom-syntax bodies#1079
Fix compact_script dropping $raw$ custom-syntax bodies#1079schungx merged 2 commits intorhaiscript:mainfrom
Conversation
`Engine::compact_script` drops every character captured by custom
syntax that uses the `$raw$` marker. Root cause: the `_char_mode`
branch in `TokenIterator::next` pulls one character at a time via
`self.stream.get_next()` and returns `Token::UnprocessedRawChar(ch)`
early, before the normal token-compression block at the tail of
`next()` runs. The compressed buffer therefore never sees raw chars,
and the compacted output contains the custom-syntax keyword and
surrounding tokens but with the raw body silently stripped.
For example, a custom `grab { BODY }` syntax compacts to `grab{`
instead of `grab{BODY}` — the compacted script then either fails
to parse back or, worse, parses as a truncated no-op and only
surfaces the corruption at a later recompile.
Fix: inside the `_char_mode` branch, before returning the raw char,
append it verbatim to `control.compressed` when compression is
active. Appending verbatim (rather than going through the normal
identifier-boundary spacing path) is correct because raw content
is opaque to Rhai — there is no grammar to strip whitespace from,
and the plugin's parse function may be counting characters in its
state machine. The surrounding non-raw tokens are still compacted
normally, so the output remains smaller than the input.
Regression test: test_compact_script_preserves_raw_custom_syntax_body
in tests/custom_syntax.rs registers a minimal `grab { BODY }` raw
syntax, compacts a script, and asserts that the body tokens
(`let x`, `let y`, `print`) still appear in the compacted form
and that the compacted script round-trips through `compile`.
Discovered while using a `timer NAME { BODY }` custom syntax in a
downstream project: every stored script had its timer body dropped
at validation time, with the corruption only surfacing at cold
load when the broken compact form failed to recompile.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
I have fixed the camera compilation error (with no_object). It should pass tests now |
|
Great catch! I already completely forgot There is still one test failing under |
`test_compact_script_preserves_raw_custom_syntax_body` uses `rhai::Map` which is unavailable under the `no_object` feature, causing a CI build failure. Add `#[cfg(not(feature = "no_object"))]` to skip the test in that configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks, @yuvalrakavy for this PR. Just purely out of curiosity, what are you doing with custom syntax? |
|
Hi
Two things: I implemented timer functionality:
let name = “Hercules”;
timer timer_name after 5000 {
print(`It took 5 seconds to ${name} to get here`);
}
Or
timer “visit” every 5000 {
print(`Every 5 seconds ${name} is visiting here`);
}
And span for open telemetry:
span “sever-junk-request” {
…
}
The application itself is object oriented state store. Class methods are
rhai scripts. This will be used as infrastructure for my house automation
system and will replace a a system that inspired the architecture that was
written 20 years ago and is running on Windows. The goal is to eventually
make it an open source software.
Yuval
…On Sat, 11 Apr 2026 at 0:14 Stephen Chung ***@***.***> wrote:
*schungx* left a comment (rhaiscript/rhai#1079)
<#1079 (comment)>
Thanks, @yuvalrakavy <https://github.com/yuvalrakavy> for this PR.
Just purely out of curiosity, what are you doing with custom syntax?
—
Reply to this email directly, view it on GitHub
<#1079 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAWJAFBTIZMJAUV3RXJWJPT4VFW47AVCNFSM6AAAAACXTJDKDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DEMRXGA4TENBRGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
What a coincidence. My introduction to Rust started with me writing the next version of my home automation system which was developed 20 years ago on Windows with C# (I started with .NET since alpha version). I needed scripting to script events, and so I picked up Rhai, eventually became its maintainer. Would love to compare notes with you. I didn't go all the way to write my own object store. I used MSSQL and later moved to also support MySQL. What do you see as need to write your own object store inside of using an existing one? |
|
Stephen — apologies for the slow reply. I wanted to give you a proper, informative answer rather than a hand-wave, and putting the write-up together took longer than I expected. What a lovely coincidence, by the way — I'd genuinely love to compare notes. https://gist.github.com/yuvalrakavy/cfb8a7169e029509e63da20663f26afb The short version: the system is called the Store — a hierarchical, transactional object store in Rust, where schema, runtime state, automations, and UI logic all live in the same typed tree. Rhai is the scripting substrate everywhere: event handlers, typed store methods, instance/static methods, display formatters, timer bodies, MQTT topic handlers, scheduler conditions. Scripts appear inside an SDL (Schema Definition Language) file that also declares the class itself, so one file declares schema + behavior + reactivity. on_missing_function is load-bearing for us. We don't pre-register class methods on the engine; instead, on every unresolved call we walk the target object's class chain at call time and dispatch to the appropriate fn / store fn. This is the hook I contributed upstream — it's what makes polymorphic, class-defined Rhai methods possible without forking the language. Thank you for merging it. The first application on top is HomeTouch — Lutron / DALI / DMX / HDL / CoolMaster drivers (each a separate MQTT bridge, never in-process), a sunset-aware scheduler, and touchscreens that are true thin clients streaming RFB from a server-side renderer fed by a live gRPC mirror of the object tree. The ancestor is Premise SYS from the early 2000s — which, from your description, sounds like it might be the same era of idea you're revisiting. |
|
Wow, that was intense. Mine wasn't in the same League, although the devices we tackled were similar: Lutron, C-Bus, EIB, Midea (their earlier attempt to connect, I worked with the engineer himself), HDL, CoolMaster, and for me also the ELK security panel (designed an abstraction of a security system, but eventually only wrote for the ELK). Interesting as HDL wasn't a very big name in the space at that time, so you started early with them (as did we). Also streaming video (Hulu/Netflix didn't exist at that time) and infrared redirection (people still switch channels, Smart TV didn't exist). I originally use C# and VB.NET as the scripting language, which was quite easy to do in .NET. Moving to Rust, I had to find a new scripting language. Lua was too hard to integrate at that point (who wants to spend days figuring out how to connect to an essentially C-based library instead of writing interesting code?), JavaScript I hate (had too much of it already, I was one of the very earliest adapters of TypeScript), and then I discovered Rhai. It was still in a relatively early phase of development, but it was simple enough and easy to use enough. My system was relatively basic, but it was already enough for me to start a business in the 2000's on it. I still have clients paying for regular maintenance, although these days we don't do such systems anymore because cell phones are everywhere, and every electrical and electronic gadget has an App... We are still developing v2 (actually version 3, there are two versions in C#) of the system in Rust, and incidentally choosing to use MQTT for the unified message bus. Incidentally, my system worked exactly like you said: all devices and schema definitions (I called them "Zone" definitions, each room in a premise a zone) reside on the same type tree and can be inspected or written to (which triggers actions translated via drivers to the actual devices). Scripts are used as triggers to tie the whole network together into a unified platform (although no atomic updates... I deliberately used an API model which is fire-and-forget... you poll for the result and continue when it happens or handle time-out). So it seems we came around to the same solutions at around the same time. Just out of curiosity... what is the need for Also, are you using |
Summary
Engine::compact_scriptsilently drops every character captured by custom syntax that uses the$raw$marker (registered viaregister_custom_syntax_with_state_raworregister_custom_syntax_without_look_ahead_raw). The compacted output contains the custom-syntax keyword and surrounding non-raw tokens, but the raw body is gone.For example, a
grab { let x = 1; let y = 2; print(x + y); }script compacts tograb{— body and closing brace both stripped. The compacted script then either fails to parse back or, worse, parses as a truncated no-op and only surfaces the corruption at a later recompile.$block$,$expr$,$ident$,$symbol$, and$token$are all unaffected — only$raw$is broken. The plainregister_custom_syntaxAPI explicitly rejects$raw$(src/api/custom_syntax.rs:254), so this only affects the raw-variant registration APIs.Root cause
The
_char_modebranch inTokenIterator::next(src/tokenizer.rs) pulls one character at a time viaself.stream.get_next()and returnsToken::UnprocessedRawChar(ch)early, before the normal token-compression block at the tail ofnext()runs. Raw characters therefore never reach thecompressedbuffer.Fix
Inside the
_char_modebranch, before returning the raw char, append it verbatim tocontrol.compressedwhen compression is active. Appending verbatim (rather than going through the normal identifier-boundary spacing path) is correct because raw content is opaque to Rhai — there is no grammar to strip whitespace from, and the plugin's parse function may be counting characters in its state machine. The surrounding non-raw tokens are still compacted normally, so the output remains smaller than the input.Fix is ~10 lines in
src/tokenizer.rs. No change to public API.Test plan
test_compact_script_preserves_raw_custom_syntax_bodytotests/custom_syntax.rs— registers a minimalgrab { BODY }raw syntax, compacts a script, and asserts that the body tokens still appear in the compacted form and that the compacted script round-trips throughcompile.cargo test --features internalspasses — 0 failures across all test binaries, including the existing$raw$tests (test_custom_syntax_raw,test_custom_syntax_raw2,test_custom_syntax_raw_interpolation).Background
Discovered while using a
timer NAME { BODY }custom syntax in a downstream project (home-automation store server). Thetimerkeyword uses$raw$to capture the body, and every stored script had its timer body dropped at validation time. The corruption only surfaced at cold load when the broken compact form failed to recompile, producing confusing "unexpected token" errors on scripts that had parsed cleanly moments earlier.🤖 Generated with Claude Code