New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise builtin function dispatch #2529
Conversation
This is of course great. Let's make sure we're aware of issues though. First, this makes bindings development harder. Then, I wondered how you solve problem of checking if an object is a function, and see you didn't much. That would lead to problems like |
Hmm, I didn't think of that, that all builtin functions should have the same type. I wonder if this point is specified in the CPython specs...?
It would be easy to make it so there were only 2 distinct declaration macros, But perhaps it's good to make the DECLARE macros mirror the DEFINE ones, as it allows to independently change the data structures in the future if needed. |
There is |
...
I'm using |
I can easily change it so that there are only 2 variants of the structure (instead of 5), one for fixed args, the other for variable args.
Yes, unfortunatelly. I was trying to think of a tricky way to do this and one option is to change the "name" of builtin function types to "builtin_function" (instead of "function"). |
The fixed args with the union then? That would simplify things yes. On the other hand it's no big deal, I only use it in 2 or 3 common pieces of code, I wouldn't go through hoops to make things tricky and possibly ofuscated if there's not much benefits. |
I'd say it's not important, because Python's doesn't have a single type for all functions. If there're separate functions for builtin vs Python functions, then a user can robustly check a type only of their own functions, because any other function can be either builtin or Python. |
Yes, everyone who has C modules outside of core will need to do bunch of work. So, the sooner this merged, the sooner that work would complete. |
In order to have more fine-grained control over how builtin functions are constructed, the MP_DECLARE_CONST_FUN_OBJ macros are made more specific, with suffix of _0, _1, _2, _3, _VAR, _VAR_BETEEN or _KW. These names now match the MP_DEFINE_CONST_FUN_OBJ macros.
Builtin functions with a fixed number of arguments (0, 1, 2 or 3) are quite common. Before this patch the wrapper for such a function cost 3 machine words. After this patch it only takes 2, which can reduce the code size by quite a bit (and pays off even more, the more functions are added). It also makes function dispatch slightly more efficient in CPU usage, and furthermore reduces stack usage for these cases. On x86 and Thumb archs the dispatch functions are now tail-call optimised by the compiler. The bare-arm port has its code size increase by 76 bytes, but stmhal drops by 904 bytes. Stack usage by these builtin functions is decreased by 48 bytes on Thumb2 archs.
154b227
to
571e6f2
Compare
Builtin functions with a fixed number of arguments (0, 1, 2 or 3) are quite common. Before this patch the wrapper for such a function cost 3 machine words (uPy type, num args/kw, pointer to C function). After this patch it only takes 2 (uPy type, pointer to C function), which can reduce the code size by quite a bit (and pays off even more, the more functions are added). It also makes function dispatch slightly more efficient in CPU usage, and furthermore reduces stack usage for these cases. On x86 and Thumb archs the dispatch functions are now tail-call optimised by the compiler (but not on xtensa).
Stack usage by these builtin functions is decreased from 48 bytes to 0 on Thumb2 archs (due to tail-call opt), and is decreased from 48 to 16 bytes on xtensa.
Change in code size, in bytes, is:
As you can see the bare-arm and minimal builds increase, but all other "realistic" ports decrease by a lot. This is because they have a decent number of builtin functions which benefit from this optimisation. And the optimisation will pay off even more when more functions are added (as long as they have fixed number of args from 0-3).
To me this patch is definitely worth it, but I wanted to make a PR because it increases code size of the baseline ports.