New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement get_node with a get_node_raw #14384
Conversation
Co-authored-by: SFENCE <sfence.software@gmail.com>
2f2bbd4
to
8bd6f5f
Compare
8bd6f5f
to
bee78f4
Compare
Added the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time / ms | PR / ms | |
---|---|---|
314.16 | 79.64 | |
290.42 | 81.25 | |
254.11 | 80.21 | |
193.72 | 80.7 | |
184.71 | 80.32 | |
329.17 | 80.05 | |
251.46 | 80.42 | |
301.68 | 79.2 | |
197.04 | 79.23 | |
197.5 | 80.82 | |
average | 251.4 | 80.2 |
MEDIAN | 252.8 | 80.3 |
I find it strange that there's this much difference. pushnode
already uses Lua functions to construct the table from arguments (core.set_push_node
). So apparently by moving the table constructor after "returning" from C++ makes such big difference? Why's that?
In master we call lua functions from C++ to read vector and push node values. The C->Lua call overhead is probably quite big. And also we always need to look up the functions to call first, which requires a bunch of Lua C api calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works well. Cannot complain.
get_node
.get_node
is implemented in builtin with aget_node_raw(x, y, z) -> content, param1, param2, pos_ok
.get_node
calls lua functions to push nodes, and to read vectors. This is much faster if done in lua.Flamegraphs:
master:
PR:
Rough benchmark results:
master: 450-850 ms
PR: around 270-300 ms (it's much more persistent, idk why)
Benchmark function adopted from #14225 by @sfence. (Added you as co-author. :))
To do
This PR is a Ready for Review.
How to test
/bench_bulk_get_node
get_node
andget_node_or_nil
?Here's some instructions I've noted down at some point to disable cpu scaling, for better testing: