-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create bytecode compiler and evaluation #147
Conversation
This is great. You're probably referring to #100 where I think compiling Rhai down to a bytecode format with an interpreter may make it run faster for the really demanding applications (e.g. games). Can I suggest you base your changes on https://github.com/schungx/rhai/tree/bytecodes where I do most of the on-going work to avoid merge conflicts later. That branch I'll try to keep it up to date by occasionally merging in changes from |
Do you plan on retaining the AST as a valid execution format and generate bytecodes from the AST, or compile directly to bytecodes? |
// TODO: Removed if !state.always_search | ||
Expr::Variable(_, Some(index), _) => | ||
self.instructions.push(GetVariable{ index }), | ||
// TODO: Is this ever used without eval? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. always_search
is only ever set to true
when there is an eval
call and the number of variables in the scope changes. But this is necessary if we ever use eval
. Or do you plan to disallow eval
calls when compiling to bytecodes? That can work also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of discussed below - disallowing/backing out to the AST version of the code seems like the simplest solution (which means outright disallowing in functions that want the ability to yield/call functions with the ability to yield - but as originally mentioned I'm not sure that's a particularly mainstream feature)
Not attached to it at all. Just put it in because it was dead simple in the beginning. Now I realize why people who work on JS optimizers curse the Since an
I'm also wondering how much faster will bytecodes be as compared to AST-walking. Do you have any preliminary benchmarks?
Yes, I've been thinking about it also. Profiling runs with Rhai show that function-name/arg-types hashing consistently shows up on top of IR counts. Therefore, dispatching to the correct function appears to be a hot-path bottleneck, although so far I haven't thought of a way to do it faster than hashing everything. |
Sure, will do. Have you considered making schungx/rhai the main repo for the language?
For my purposes I'll be compiling directly to bytecode. The ability to implement the aforementioned features is necessary, and I don't see how I could do that easily with an AST, not to mention that performance should be better. If you're asking from a language perspective I assume AST compilation will stick around. At least for one off scripts without loops I think it should be faster. It's certainly too early to think about killing it.
The idea is to just leave the entire function containing the eval in the ast-executor code. Calling an ast function from bytecode should be no harder than calling any other foreign function. That should drop all the extra variables before returning to bytecode.
I just quickly set up a version of the fibonacci benchmark in response and out of curiosity.
Considering that no effort has gone into optimizing it, and that there are some easy looking wins, I'm pretty happy with it being faster at all.
There are probably very few overrides of any given function on average, and small linear searches are typically faster than hashes, so something like this might work well to speed up the search (note that the hot path doesn't touch a struct FunctionsTable {
function_names: HashMap<String, usize>,
functions: Vec<Vec<FnType>>,
}
fn register_fn(table: &mut FunctionsTable, fn_name: String, fn: FnType) {
if let Some(idx) = table.function_names.get(&fn_name) {
table.functions[idx].push(fn)
} else {
table.insert(name, functions.len());
functions.push(vec![fn]);
}
}
enum ExprOrBytecode {
Call{ fn_name_idx: usize, args: Vec<Dynamic> }
}
fn exec_call(table: &mut FunctionsTable, fn_name_idx: usize, args: Vec<Dynamic>) {
let functions = table.functions[fn_name_idx];
for function in functions {
if accepts(function, args) {
return function(args)
}
}
return error
} |
Not really. I am mostly only tinkering based on the fabulous work of people before me.
Compiling directly to bytecode will probably be a large surgery on the parser. Optimizing machine-generated scripts with spliced-in constants also works more easily on an AST than on bytecodes. For your purposes, I'd assume that you would be compiling your users' scripts into bytecodes, and then persisting the bytecode stream so it can be reloaded later and no need to re-parse, so it might be more useful to keep the AST around.
That would then change the semantics of the language because you're allowed to introduce new variables via
And it looks like it is more consistent as well judging from the timing variations...
A search will have to compare all the In the pathological case of multiple overloading (e.g. arithmetic functions supporting all integer types), then it will be slower.
Yes it does. You need to map a function name to its index. It can be pre-mapped if using only one AST, but when using multiple AST's plus user-loaded packages, you cannot predict the index of a function based on its name alone. In other words, you'll need to do a table lookup based on the name string. Of course, compiling to bytecodes assumes you won't add new functions later on, so you can freely do this optimization. Unless you want to give your users freedom to add/mix functions in bytecodes form, then you'll need to do the name lookups. One way to handle it is to cache the name->index mapping and only incur this cost once, until the functions library is changed. Then the cache is invalidated. |
Ah, sorry, we're talking about different things. I don't particularly have a problem with compiling "source -> AST -> bytecode", maybe someday I will want to optimize that but for now it seems fine. I just don't expect to ever actually execute the AST. I'll probably persist the original source (for display and deduplication reasons) and persist/cache the bytecode... As long as I keep a version of the bytecode around I don't think I'll need the AST to reload it, but if it ends up being useful it's not an issue to store it as well.
Am I mistaken in thinking that eval inside a function can't introduce variables outside that function? Top level eval would mean backing out the top level "function" to the AST version just like everywhere else, but functions that don't themselves contain eval should still be fine?
True, and these are really common operators... Thinking more it might be possible to have our cake and eat it too, at ast/bytecode-build-time decide whether to go the hash route or the lookup route depending on the number of overloads. But since this isn't optimizing the most common type of call (arithmetic) it might not be worth the effort. |
That's the point. At top level, |
Hmm, yes, it does require some sort of linking step that is going to take time O(code_size). That's probably a reasonable thing to do in an AST optimization pass though. The bytecode version already has a pass like this called |
And Fibonacci is probably not a good benchmark to test for your use case as it is almost exclusively recursion. The workload will be on stack management which you probably can't save much. You'll want to use |
@gmorenz Unfortunately I added modules support so you're going to have to deal with functions and variables qualified by modules (and possibly a way to load modules that are pre-compiled to bytecodes). I can help out with some of the infrastructure and plumbing after you get bytecodes to work without modules -- meaning it compiles under |
Ya, that's no problem. Some form of modules were on my wants for this language anyways, and I definitely already have ideas about how to implement modules and linking of modules. Current work here is still on creating a clean and efficient index chain model... it turns out that the semantics don't fit a stack machine particularly cleanly. It's close to done, but I got sidetracked on the get_mut idea. |
Yes, array indexing is hairy especially when the language doesn't have pointers... otherwise it is a piece of cake, but then you'll have to probably throw in a GC... The problem with implementing it with a stack machine is that you always want need to keep a back-pointer to the original object somewhere down the stack so that, once you get all the index values on the stack, you can restart from the beginning and work your way up the top of the stack. So some form of a frame-pointer to keep a stack frame for the indexing operation... |
As mentioned in schungx/11 I do not anticipate making time to finish this work anytime soon. Closing this for now. Feel free to re-use any part of this in any way if you so desire. Thanks for the warm welcome @schungx. |
Hi,
This is a first pass at creating a bytecode implementation for the language. I noticed you (offhand) discussed making a bytecode in comments on another issue, so I'm hoping that you will be interested in merging this once it's more complete. Definitely feel free to let me know if you'd rather parts of this were designed or implemented differently, or if you don't feel that it's an appropriate feature in the first place.
This isn't done, but it's at the point where I'm pretty sure I will finish it, and it's beginning to be at the point where I don't want to be accidentally duplicating work. Right now
cargo test --features="no_object no_index" -- --skip eval --skip test_timestamp --skip test_type_of
passes and it hasn't been optimized at all.My eventual goal for this is to use it in a game I'm writing to let players script their units actions. I'm hoping that the bytecode will improve performance when repeatedly executing the same script, give me a deterministic measure of work done by counting instructions executed, and let me implement stuff where user scripts can act as a generator that yields values and pauses/resumes.
So far none of those goals have been achieved, however I think there is a pretty clear path forwards from this implementation to all of them. I'm also not sure if the parts needed to achieve the second two goals will really belong in the core language, but they could live behind feature flags, or I could maintain a fork, since the amount of code needed for them should be pretty small.
TODO - In the immediate future
Position
s. Possibly with a second source map array in the bytecode struct.TODO - But it could probably wait until after this is merged
Scope
, calling arbitrary functions, merging multiple pieces of bytecode, and so on.String
s)TODO - Potential future projects
TODO - But not in this pull request and maybe not in this repository
Yield(Dynamic, State)
return option that allows for storing the execution state and restarting at well.Yield
but without a value).