⚡ perf: reduce startup and eval overhead across engine and CLI #1785
Conversation
Merging this PR will improve performance by ×2.7
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | eval_array_chained_operations |
9.2 ms | 3.9 ms | ×2.4 |
| ⚡ | eval_array_filter |
9.8 ms | 4.4 ms | ×2.2 |
| ⚡ | eval_nested_object_access |
6.7 ms | 1.3 ms | ×5.1 |
| ⚡ | eval_string_equality |
6.7 ms | 1.4 ms | ×4.8 |
| ⚡ | eval_array_fold |
7.4 ms | 2.1 ms | ×3.5 |
| ⚡ | eval_no_macro_large_program |
288.3 µs | 203.6 µs | +41.59% |
| ⚡ | eval_variable_assignment_chain |
672.4 µs | 606 µs | +10.96% |
| ⚡ | eval_array_map |
8 ms | 2.7 ms | ×3 |
| ⚡ | eval_nodes |
6.4 ms | 1 ms | ×6.3 |
| ⚡ | eval_yaml_parse |
7.3 ms | 2.1 ms | ×3.5 |
| ⚡ | parse_fibonacci |
180 µs | 93.1 µs | +93.25% |
| ⚡ | eval_csv_parse |
7.3 ms | 2 ms | ×3.6 |
| ⚡ | eval_object_field_access |
7.1 ms | 1.7 ms | ×4.2 |
| ⚡ | eval_pipeline_with_conditionals |
7 ms | 1.7 ms | ×4.2 |
| ⚡ | eval_foreach |
6.8 ms | 1.5 ms | ×4.6 |
| ⚡ | eval_select_h |
287.8 µs | 200.1 µs | +43.78% |
| ⚡ | eval_if_else_branching |
8.1 ms | 2.8 ms | ×2.9 |
| ⚡ | eval_json_parse |
6.9 ms | 1.6 ms | ×4.4 |
| ⚡ | eval_large_markdown_filtering |
25.2 ms | 20.1 ms | +25.66% |
| ⚡ | eval_macro_expansion_simple |
245.9 µs | 137.9 µs | +78.3% |
| ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/startup-and-eval-improvements (7a82dc5) with main (1d5fde9)1
Footnotes
5a83ef4 to
efba1b0
Compare
There was a problem hiding this comment.
Pull request overview
This PR focuses on reducing startup and evaluation overhead by reusing parsed artifacts in both the CLI and the language engine.
Changes:
- Refactors
mq-runCLI execution flow and adds an optimization to compile the effective query once per batch when safe. - Introduces a thread-local cache for the built-in module parse result to reduce repeated builtin parsing cost.
- Adds a public
Engine::compile+Engine::eval_compiledAPI and updates docs/tests accordingly.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/mq-run/src/cli.rs | Refactors execution helpers and adds a batch fast-path that uses compiled programs. |
| crates/mq-lang/src/module.rs | Adds a thread-local cache intended to reuse parsed builtin module tokens/module across engines. |
| crates/mq-lang/src/lib.rs | Updates ast-json feature documentation to reflect new execution API. |
| crates/mq-lang/src/eval.rs | Adjusts evaluator env passing to reduce cloning (but still clones in hot paths). |
| crates/mq-lang/src/engine.rs | Adds compile/eval_compiled, updates examples/tests, and removes eval_ast. |
| crates/mq-lang/src/arena.rs | Adds extend_from_slice/as_slice helpers used by the builtin cache. |
Comments suppressed due to low confidence (1)
crates/mq-run/src/cli.rs:1215
write_fileis currently unused in this test module (no call sites found). Consider removing it or using it in tests to avoid dead-code warnings and keep the test helpers minimal.
#[test]
fn test_cli_null_input() {
let cli = Cli {
input: InputArgs {
input_format: Some(InputFormat::Null),
..Default::default()
},
| pub fn load_builtin(&mut self, token_arena: TokenArena) -> Result<Module, ModuleError> { | ||
| self.load(Module::BUILTIN_MODULE, BUILTIN_FILE, token_arena) | ||
| if self.loaded_modules.contains(Module::BUILTIN_MODULE.into()) { | ||
| return Err(ModuleError::AlreadyLoaded(Cow::Borrowed(Module::BUILTIN_MODULE))); | ||
| } | ||
|
|
||
| // Cache is only valid when both arenas are in their initial state (builtin | ||
| // module_id == 1, tokens right after the dummy EOF). Fall back to full parse otherwise. | ||
| let pristine = self.loaded_modules.len() == 1 && { | ||
| #[cfg(not(feature = "sync"))] | ||
| { | ||
| token_arena.borrow().len() == 1 | ||
| } | ||
| #[cfg(feature = "sync")] | ||
| { | ||
| token_arena.read().unwrap().len() == 1 | ||
| } | ||
| }; | ||
|
|
||
| if pristine { | ||
| let cached = | ||
| BUILTIN_CACHE.with(|cache| cache.borrow().as_ref().map(|c| (c.tokens.clone(), c.module.clone()))); | ||
|
|
||
| if let Some((tokens, module)) = cached { | ||
| { | ||
| #[cfg(not(feature = "sync"))] | ||
| token_arena.borrow_mut().extend_from_slice(&tokens); | ||
| #[cfg(feature = "sync")] | ||
| token_arena.write().unwrap().extend_from_slice(&tokens); | ||
| } | ||
| self.loaded_modules.alloc(Module::BUILTIN_MODULE.into()); | ||
| return Ok(module); | ||
| } | ||
| } |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
crates/mq-lang/src/eval.rs:296
- Same issue as above:
&Shared::clone(&self.env)here adds unnecessary cloning/temporary creation. Use&self.env(or reuse an&Shared<_>binding) to avoid refcount churn in the hot path.
} else {
self.eval_program(&nodes_program, values?.into(), &Shared::clone(&self.env))
.map(|value| {
| let values: Result<Vec<RuntimeValue>, InnerError> = input | ||
| .map(|runtime_value| match &runtime_value { | ||
| RuntimeValue::Markdown(node, _) => self.eval_markdown_node(&program, node), | ||
| _ => self | ||
| .eval_program(&program, runtime_value, Shared::clone(&self.env)) | ||
| .eval_program(&program, runtime_value, &Shared::clone(&self.env)) | ||
| .map_err(|e| e.into_inner_error()), |
| pub fn eval_compiled<I: Iterator<Item = RuntimeValue>>( | ||
| &mut self, | ||
| compiled: &CompiledProgram, | ||
| input: I, | ||
| ) -> MqResult { | ||
| self.evaluator | ||
| .eval(&program, input.into_iter()) | ||
| .eval(&compiled.program, input) | ||
| .map(|values| values.into()) | ||
| .map_err(|e| Box::new(error::Error::from_error("", e, self.evaluator.module_loader.clone()))) | ||
| .map_err(|e| { | ||
| Box::new(error::Error::from_error( | ||
| &compiled.source, | ||
| e, | ||
| self.evaluator.module_loader.clone(), | ||
| )) | ||
| }) | ||
| } |
| /// Compiles mq code into a [`CompiledProgram`] that can be evaluated multiple times. | ||
| /// | ||
| /// This is similar to `eval`, but takes an AST directly, skipping parsing. | ||
| /// The AST is typically obtained from deserializing a JSON AST. | ||
| /// Use this with `eval_compiled` to avoid re-parsing the same query for each input. | ||
| pub fn compile(&mut self, code: &str) -> Result<CompiledProgram, Box<error::Error>> { | ||
| if code.is_empty() { | ||
| return Ok(CompiledProgram { | ||
| source: String::new(), | ||
| program: vec![], | ||
| }); | ||
| } | ||
| let program = parse(code, Shared::clone(&self.token_arena))?; | ||
| Ok(CompiledProgram { | ||
| source: code.to_string(), | ||
| program, | ||
| }) | ||
| } | ||
|
|
||
| /// Evaluates a pre-compiled program against the given input. | ||
| /// | ||
| /// Use with `compile` to avoid re-parsing the same query for each input file, | ||
| /// or with a [`CompiledProgram`] constructed from a deserialized JSON AST (`ast-json` feature). | ||
| /// |
- Cache parsed builtin.mq tokens + Module in thread-local storage so each rayon worker thread parses builtin only once instead of once per file; adds Arena::extend_from_slice / as_slice helpers to support it - Add Engine::compile and Engine::eval_compiled for pre-parsed query reuse; CLI sequential mode now compiles the query once and reuses it across all files when all share the same effective query prefix - Change eval_program env parameter from owned Shared<SharedCell<Env>> to &Shared<SharedCell<Env>>, eliminating Rc clone on every loop iteration in for/foreach/map pipelines
Remove eval_ast (which was behind the ast-json feature gate and took Program by value) and unify into eval_compiled (&Program, no feature gate). Update the test and lib.rs docs accordingly.
…env) for improved efficiency
…eferences - Add AlreadyLoaded check at the top of load_builtin for explicit idempotency - Only use the thread-local cache when both arenas are in pristine state (loaded_modules.len()==1 and token_arena.len()==1); fall back to full parse otherwise to avoid broken offsets when other modules are loaded first - Handle AlreadyLoaded in Evaluator::load_builtin_module as Ok(()) so re-entrant calls (e.g. DAP adapter on re-launch) do not panic - Add tests for idempotent and non-pristine load scenarios
… error diagnostics
Bundle the original query source with the compiled AST so that runtime
errors surfaced via eval_compiled can display the relevant code snippet
and source range, matching the diagnostic quality of the eval path.
- Add CompiledProgram { source, program } with source()/program() accessors
- Implement From<Program> for CompiledProgram for ast-json deserialization use cases
- Change Engine::compile to return CompiledProgram
- Change Engine::eval_compiled to accept &CompiledProgram and thread source into from_error
Cover source preservation, program access, clone, empty-code compile, eval reuse across inputs, runtime error source propagation, and the From<Program> path that produces empty source in diagnostics.
Cover the key invariants of the thread-local BUILTIN_CACHE: module.rs (pristine_token_arena fixture required to hit the cache path): - arena size is identical whether tokens come from cache or fresh parse - module function/var/macro/module counts are consistent across both paths - BUILTIN_MODULE is always registered at loaded_modules index 1 - all cached tokens carry module_id == BUILTIN_MODULE_ID (so error diagnostics resolve to the builtin source file, not garbage) engine.rs: - sequential engines calling the same builtin functions produce identical results - eval_compiled on an engine that used the cache produces correct output (verifies token_id references in compiled programs remain valid after cache replay) - runtime error source_code is correctly preserved on a cache-using engine
Add two tests that check the error *location* (token offset + span), not just source_code, to catch token_id shifts introduced by cache replay: - test_builtin_cache_runtime_error_token_location_correct: for queries where the erroring identifier is at different offsets (0, 4, 9 bytes), confirm that location.offset() == query.find(ident) and the slice at that offset equals the expected identifier. - test_builtin_cache_and_fresh_parse_error_location_identical: two sequential engines must produce identical error.location, so that a cache-using engine cannot silently shift token_ids relative to a fresh-parse engine.
42b0c1f to
fd24083
Compare
Set source code on the module loader inside eval_compiled when the debugger feature is enabled, so token location info is available during evaluation of pre-compiled programs.
b2d9733 to
f8fb3a7
Compare
No description provided.