Add bulk_get_node function #14225

sfence · 2024-01-07T12:12:26Z

Add a Lua function for getting multiple nodes at one call.
Fixes lua api: Function to get adjacent nodes. #14110.

To do

This PR is a Ready for Review.

How to test

Run devgame /unittests command.

…t_node functions

doc/lua_api.md

rubenwardy · 2024-01-08T12:31:29Z

Please may you include benchmarks, as that is the justification for this feature

doc/lua_api.md

games/devtest/mods/unittests/bulk.lua

doc/lua_api.md

src/script/lua_api/l_env.cpp

sfence · 2024-01-08T17:41:42Z

@rubenwardy
Benchmark added. 665 ms vs 531 ms on my machine for actual implementation.

doc/lua_api.md

games/devtest/mods/benchmarks/init.lua

games/devtest/mods/unittests/bulk.lua

doc/lua_api.md

games/devtest/mods/unittests/bulk.lua

sfan5

code looks good and CI is green

appgurueu

The implementation looks reasonable. Just a thought: I wonder whether we could achieve more than 1.3x (especially if we take building the input tables into account) if we change the interface a bit:

Have you considered passing the input positions as three separate lists of x, y and z tables?
Alternatively, packed even tighter, we could pass input positions as a list of hashed node positions.
Output could be three lists of nodenames / param1s / param2s, or maybe even content IDs / param1s / param2s like voxelmanip. This could also be bitpacked in one table, though that might make extraction of the values awkward or inefficient.
We could also consider a faster get_node interface which just pushes the 3 values to the stack rather than building a hash table each time.

sfence · 2024-01-10T14:43:07Z

@appgurueu
Sounds like an idea for new Lua API functions.

SmallJoker · 2024-01-17T19:04:46Z

(sorry for the fuss. deleted my erroneous comment)

Some rough benchmarks with LuaJIT on x86_64. Left side is in ms, right side in us.

Size=98³	loop	bulk	vmanip	loop	bulk	vmanip
	261	268	32	19	12	61
	260	164	33	17	12	156
	240	224	31	19	15	66
	165	144	28	17	12	62
	272	163	31	17	15	62
	165	154	32	19	14	60
	194	236	31	22	13	78
	260	164	29	80	14	88
	198	234	32	17	48	76
	163	142	29	16	38	66
	193	240	30	16	14	95
	264	167	29	16	14	94
	163	171	31	18	14	97

(average)	215	190	31	23	18	82
median	198	167	31	17	14	76

That's roughly a 19% speed-up over regular minetest.get_node calls.

appgurueu · 2024-01-18T20:28:50Z

Okay, I finally got to benchmarking this. I'm afraid the benchmark seems to be flawed. minetest.bulk_get_node seems to be getting an advantage due to running second. You can verify this by swapping the two sections in the benchmark: If minetest.get_node goes first, it wins, and by a similar factor; I conjecture that this probably is due to caching effects (possibly both at software and hardware levels):

(Furthermore, the current code does not store minetest.get_node in a local variable, but this does not seem to have a major impact. The bulk_set_node benchmark suffers from the same flaws, though the order of operations should be less relevant there since there is a proper "warmup" phase.)

The changes I made for benchmarking:

diff --git a/games/devtest/mods/benchmarks/init.lua b/games/devtest/mods/benchmarks/init.lua
index 75d959c3d..cf4e4176e 100644
--- a/games/devtest/mods/benchmarks/init.lua
+++ b/games/devtest/mods/benchmarks/init.lua
@@ -133,15 +133,16 @@ minetest.register_chatcommand("bench_bulk_get_node", {
                minetest.chat_send_player(name, "Benchmarking minetest.bulk_get_node...")
 
                local start_time = minetest.get_us_time()
+               minetest.bulk_get_node(pos_list, {name = "mapgen_stone"})
+               local middle_time = minetest.get_us_time()
+               local get_node = minetest.get_node
                for i=1,#pos_list do
-                       minetest.get_node(pos_list[i])
+                       get_node(pos_list[i])
                end
-               local middle_time = minetest.get_us_time()
-               minetest.bulk_get_node(pos_list, {name = "mapgen_stone"})
                local end_time = minetest.get_us_time()
                local msg = string.format("Benchmark results: minetest.get_node loop: %.2f ms; minetest.bulk_get_node: %.2f ms",
-                       ((middle_time - start_time)) / 1000,
-                       ((end_time - middle_time)) / 1000
+                       ((end_time - middle_time)) / 1000,
+                       ((middle_time - start_time)) / 1000
                )
                return true, msg
        end,

I'm not sure that "bulking" by means of a list of node positions as parameters / nodes as return value is the right way to optimize this. This reduces the Lua - C context switch overhead, but that overhead isn't all that large, and we pay for it by having to store the nodes we return in a table. (In the future, a "bulk" API could do further "batching" optimizations, but currently this isn't being done. Hence there is a case to be made for introducing a "bulk" API both for modder convenience, and for prospective future performance improvements which that would enable.)

If we want to optimize getting nodes, there are other approaches worth considering (see #14225 (review)).

TL;DR:

The current benchmarks are unfortunately flawed. As it stands, I see no reason to assume a significant performance improvement.
This PR might still have merit for modder convenience and future performance improvements.

games/devtest/mods/benchmarks/init.lua

sfence · 2024-01-19T04:04:24Z

Benchmark updated.
My samples for it (x86_64, Linux, luaJit, /bench_bulk_get_node):

	get_node	bulk_get_node	ration
mean	586.799353901595	450.751030768249	1.30182587248102
avarage	590.270625	459.245625	1.28530484095521
	584.5	506.81
	680.22	490.4
	642.22	374.14
	509.17	542.02
	551.68	508.02
	557.92	579.47
	507.39	532.36
	649.84	379.67
	615.15	370.21
	492.51	491.43
	620.45	369.11
	499.58	674.21
	707.78	378.6
	603.2	376.1
	597.74	377.59
	624.98	397.79

sfence · 2024-01-19T04:17:03Z

Storing get_node in a local variable does not look to make a significant effect on my machine:

	get_node	bulk_get_node	ration
mean	547.849938348686	425.677725965754	1.28700635464483
avarage	553.707777777778	433.313333333333	1.27784615700212
	638.78	428.49
	709.79	416.42
	470.31	644.37
	592.57	374.84
	586.63	346.03
	469	478.32
	479.28	345.91
	473.51	397.21
	563.5	468.23

Desour · 2024-01-19T15:52:57Z

FYI, most of the overhead of get_node and get_node_bulk comes from reading vectors and pushing nodes, because that does C++->Lua calls. If you do something like this for example, you get something faster than get_node_bulk: Desour@c40ccd2
#12438 should be adopted.

get_node_bulk might still be ok to have though, API wise.

sfan5 · 2024-02-16T20:25:59Z

So are we doing this or not?

appgurueu · 2024-02-17T20:17:12Z

I think we can add this as a convenience feature (which might lend itself better to future optimization work than get_node), but we should probably remove the performance hint / advice from the docs for now.

grorp · 2024-02-18T15:47:58Z

If we add this, I think we should go a step further: We should explicitly state that there is no performance benefit to avoid modders doing premature optimisation.

sfence · 2024-02-23T05:18:47Z

Removed performance hint from doc.

Desour · 2024-03-03T15:29:00Z

Rebase needed.

Also, with #14384 merged, this feature is now somewhat obsolete. @sfence Thoughts?

rubenwardy · 2024-03-03T15:33:36Z

I don't think this offers a meaningful usability improvement, so without the performance improvement I'd say this is not needed

Zughy · 2024-03-03T16:08:28Z

@sfence do you still see a reason for this PR or a way to make it faster? See today's IRC log https://irc.minetest.net/minetest-dev/2024-03-03#i_6156853

sfence · 2024-03-03T19:07:43Z

I have created this PR to fix #14110. If it is no longer interesting, please close #14110 as won't add.

This PR has performance benefits in comparison with core.get_node in most cases, as the benchmark shows. The problem is, that the performance benefits look to be different on different machines.

sfence · 2024-03-03T19:19:48Z

From the API side view, it can be nice to have a complement function to the core.bulk_set_node function.

But at all performance benefits of #14384 are much better.

sfan5 · 2024-03-15T10:09:22Z

Considering this as rejected, then.

sfence added 2 commits January 5, 2024 19:24

Add bulk_get_node function and unittest for bulk_get_node and bulk_se…

17bea5c

…t_node functions

Fix lua violation.

5a7d0af

wsor4035 added the @ Script API label Jan 7, 2024

AFCMS reviewed Jan 8, 2024

View reviewed changes

doc/lua_api.md Outdated Show resolved Hide resolved

sfan5 reviewed Jan 8, 2024

View reviewed changes

sfan5 added the Feature ✨ PRs that add or enhance a feature label Jan 8, 2024

Better unitest. Add benchmark. Update doc.

842e40c

Optimalization.

7eee396

sfan5 reviewed Jan 9, 2024

View reviewed changes

Fixes from review.

bf899c0

sfan5 reviewed Jan 9, 2024

View reviewed changes

doc/lua_api.md Outdated Show resolved Hide resolved

games/devtest/mods/unittests/bulk.lua Outdated Show resolved Hide resolved

Typo errors.

d102f16

sfan5 approved these changes Jan 10, 2024

View reviewed changes

sfan5 added the One approval ✅ ◻️ label Jan 10, 2024

appgurueu reviewed Jan 10, 2024

View reviewed changes

Zughy added the Concept approved Approved by a core dev: PRs welcomed! label Jan 15, 2024

appgurueu added the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Jan 18, 2024

Desour reviewed Jan 18, 2024

View reviewed changes

games/devtest/mods/benchmarks/init.lua Outdated Show resolved Hide resolved

Update benchmark test.

053b3fc

Store get_node in local variable.

a2e8246

Zughy removed the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Jan 21, 2024

Desour mentioned this pull request Feb 18, 2024

Implement get_node with a get_node_raw #14384

Merged

Remove information about performance benefit.

264f7f8

Desour added Rebase needed The PR needs to be rebased by its author. Action / change needed Code still needs changes (PR) / more information requested (Issues) labels Mar 3, 2024

Zughy added the Possible close label Mar 3, 2024

sfan5 closed this Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bulk_get_node function #14225

Add bulk_get_node function #14225

sfence commented Jan 7, 2024 •

edited by Zughy

rubenwardy commented Jan 8, 2024

sfence commented Jan 8, 2024

sfan5 left a comment

appgurueu left a comment

sfence commented Jan 10, 2024

SmallJoker commented Jan 17, 2024 •

edited

appgurueu commented Jan 18, 2024 •

edited

sfence commented Jan 19, 2024 •

edited

sfence commented Jan 19, 2024

Desour commented Jan 19, 2024 •

edited

sfan5 commented Feb 16, 2024

appgurueu commented Feb 17, 2024

grorp commented Feb 18, 2024

sfence commented Feb 23, 2024

Desour commented Mar 3, 2024

rubenwardy commented Mar 3, 2024

Zughy commented Mar 3, 2024

sfence commented Mar 3, 2024

sfence commented Mar 3, 2024

sfan5 commented Mar 15, 2024

Add bulk_get_node function #14225

Add bulk_get_node function #14225

Conversation

sfence commented Jan 7, 2024 • edited by Zughy

To do

How to test

rubenwardy commented Jan 8, 2024

sfence commented Jan 8, 2024

sfan5 left a comment

Choose a reason for hiding this comment

appgurueu left a comment

Choose a reason for hiding this comment

sfence commented Jan 10, 2024

SmallJoker commented Jan 17, 2024 • edited

appgurueu commented Jan 18, 2024 • edited

sfence commented Jan 19, 2024 • edited

sfence commented Jan 19, 2024

Desour commented Jan 19, 2024 • edited

sfan5 commented Feb 16, 2024

appgurueu commented Feb 17, 2024

grorp commented Feb 18, 2024

sfence commented Feb 23, 2024

Desour commented Mar 3, 2024

rubenwardy commented Mar 3, 2024

Zughy commented Mar 3, 2024

sfence commented Mar 3, 2024

sfence commented Mar 3, 2024

sfan5 commented Mar 15, 2024

sfence commented Jan 7, 2024 •

edited by Zughy

SmallJoker commented Jan 17, 2024 •

edited

appgurueu commented Jan 18, 2024 •

edited

sfence commented Jan 19, 2024 •

edited

Desour commented Jan 19, 2024 •

edited