Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bulk_get_node function #14225

Closed
wants to merge 9 commits into from
Closed

Conversation

sfence
Copy link
Contributor

@sfence sfence commented Jan 7, 2024

To do

This PR is a Ready for Review.

How to test

Run devgame /unittests command.

doc/lua_api.md Outdated Show resolved Hide resolved
@rubenwardy
Copy link
Member

Please may you include benchmarks, as that is the justification for this feature

doc/lua_api.md Show resolved Hide resolved
doc/lua_api.md Outdated Show resolved Hide resolved
games/devtest/mods/unittests/bulk.lua Show resolved Hide resolved
doc/lua_api.md Outdated Show resolved Hide resolved
src/script/lua_api/l_env.cpp Outdated Show resolved Hide resolved
src/script/lua_api/l_env.cpp Show resolved Hide resolved
src/script/lua_api/l_env.cpp Outdated Show resolved Hide resolved
src/script/lua_api/l_env.cpp Outdated Show resolved Hide resolved
@sfan5 sfan5 added the Feature ✨ PRs that add or enhance a feature label Jan 8, 2024
@sfence
Copy link
Contributor Author

sfence commented Jan 8, 2024

@rubenwardy
Benchmark added. 665 ms vs 531 ms on my machine for actual implementation.

doc/lua_api.md Outdated Show resolved Hide resolved
doc/lua_api.md Outdated Show resolved Hide resolved
games/devtest/mods/benchmarks/init.lua Outdated Show resolved Hide resolved
games/devtest/mods/benchmarks/init.lua Outdated Show resolved Hide resolved
games/devtest/mods/benchmarks/init.lua Outdated Show resolved Hide resolved
games/devtest/mods/unittests/bulk.lua Show resolved Hide resolved
doc/lua_api.md Outdated Show resolved Hide resolved
games/devtest/mods/unittests/bulk.lua Outdated Show resolved Hide resolved
Copy link
Member

@sfan5 sfan5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good and CI is green

Copy link
Contributor

@appgurueu appgurueu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks reasonable. Just a thought: I wonder whether we could achieve more than 1.3x (especially if we take building the input tables into account) if we change the interface a bit:

  • Have you considered passing the input positions as three separate lists of x, y and z tables?
  • Alternatively, packed even tighter, we could pass input positions as a list of hashed node positions.
  • Output could be three lists of nodenames / param1s / param2s, or maybe even content IDs / param1s / param2s like voxelmanip. This could also be bitpacked in one table, though that might make extraction of the values awkward or inefficient.
  • We could also consider a faster get_node interface which just pushes the 3 values to the stack rather than building a hash table each time.

@sfence
Copy link
Contributor Author

sfence commented Jan 10, 2024

@appgurueu
Sounds like an idea for new Lua API functions.

@Zughy Zughy added the Concept approved Approved by a core dev: PRs welcomed! label Jan 15, 2024
@SmallJoker
Copy link
Member

SmallJoker commented Jan 17, 2024

(sorry for the fuss. deleted my erroneous comment)

Some rough benchmarks with LuaJIT on x86_64. Left side is in ms, right side in us.

Size=98³ loop bulk vmanip   Size=3³ loop bulk vmanip
  261 268 32     19 12 61
  260 164 33     17 12 156
  240 224 31     19 15 66
  165 144 28     17 12 62
  272 163 31     17 15 62
  165 154 32     19 14 60
  194 236 31     22 13 78
  260 164 29     80 14 88
  198 234 32     17 48 76
  163 142 29     16 38 66
  193 240 30     16 14 95
  264 167 29     16 14 94
  163 171 31     18 14 97
                 
(average) 215 190 31     23 18 82
median 198 167 31     17 14 76

That's roughly a 19% speed-up over regular minetest.get_node calls.

@appgurueu
Copy link
Contributor

appgurueu commented Jan 18, 2024

Okay, I finally got to benchmarking this. I'm afraid the benchmark seems to be flawed. minetest.bulk_get_node seems to be getting an advantage due to running second. You can verify this by swapping the two sections in the benchmark: If minetest.get_node goes first, it wins, and by a similar factor; I conjecture that this probably is due to caching effects (possibly both at software and hardware levels):

Screenshot from 2024-01-18 21-12-15

(Furthermore, the current code does not store minetest.get_node in a local variable, but this does not seem to have a major impact. The bulk_set_node benchmark suffers from the same flaws, though the order of operations should be less relevant there since there is a proper "warmup" phase.)

The changes I made for benchmarking:

diff --git a/games/devtest/mods/benchmarks/init.lua b/games/devtest/mods/benchmarks/init.lua
index 75d959c3d..cf4e4176e 100644
--- a/games/devtest/mods/benchmarks/init.lua
+++ b/games/devtest/mods/benchmarks/init.lua
@@ -133,15 +133,16 @@ minetest.register_chatcommand("bench_bulk_get_node", {
                minetest.chat_send_player(name, "Benchmarking minetest.bulk_get_node...")
 
                local start_time = minetest.get_us_time()
+               minetest.bulk_get_node(pos_list, {name = "mapgen_stone"})
+               local middle_time = minetest.get_us_time()
+               local get_node = minetest.get_node
                for i=1,#pos_list do
-                       minetest.get_node(pos_list[i])
+                       get_node(pos_list[i])
                end
-               local middle_time = minetest.get_us_time()
-               minetest.bulk_get_node(pos_list, {name = "mapgen_stone"})
                local end_time = minetest.get_us_time()
                local msg = string.format("Benchmark results: minetest.get_node loop: %.2f ms; minetest.bulk_get_node: %.2f ms",
-                       ((middle_time - start_time)) / 1000,
-                       ((end_time - middle_time)) / 1000
+                       ((end_time - middle_time)) / 1000,
+                       ((middle_time - start_time)) / 1000
                )
                return true, msg
        end,

I'm not sure that "bulking" by means of a list of node positions as parameters / nodes as return value is the right way to optimize this. This reduces the Lua - C context switch overhead, but that overhead isn't all that large, and we pay for it by having to store the nodes we return in a table. (In the future, a "bulk" API could do further "batching" optimizations, but currently this isn't being done. Hence there is a case to be made for introducing a "bulk" API both for modder convenience, and for prospective future performance improvements which that would enable.)

If we want to optimize getting nodes, there are other approaches worth considering (see #14225 (review)).

TL;DR:

  • The current benchmarks are unfortunately flawed. As it stands, I see no reason to assume a significant performance improvement.
  • This PR might still have merit for modder convenience and future performance improvements.

@appgurueu appgurueu added the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Jan 18, 2024
@sfence
Copy link
Contributor Author

sfence commented Jan 19, 2024

Benchmark updated.
My samples for it (x86_64, Linux, luaJit, /bench_bulk_get_node):

get_node bulk_get_node ration
mean 586.799353901595 450.751030768249 1.30182587248102
avarage 590.270625 459.245625 1.28530484095521
584.5 506.81
680.22 490.4
642.22 374.14
509.17 542.02
551.68 508.02
557.92 579.47
507.39 532.36
649.84 379.67
615.15 370.21
492.51 491.43
620.45 369.11
499.58 674.21
707.78 378.6
603.2 376.1
597.74 377.59
624.98 397.79

@sfence
Copy link
Contributor Author

sfence commented Jan 19, 2024

Storing get_node in a local variable does not look to make a significant effect on my machine:

get_node bulk_get_node ration
mean 547.849938348686 425.677725965754 1.28700635464483
avarage 553.707777777778 433.313333333333 1.27784615700212
638.78 428.49
709.79 416.42
470.31 644.37
592.57 374.84
586.63 346.03
469 478.32
479.28 345.91
473.51 397.21
563.5 468.23

@Desour
Copy link
Member

Desour commented Jan 19, 2024

FYI, most of the overhead of get_node and get_node_bulk comes from reading vectors and pushing nodes, because that does C++->Lua calls. If you do something like this for example, you get something faster than get_node_bulk: Desour@c40ccd2
#12438 should be adopted.


get_node_bulk might still be ok to have though, API wise.

@Zughy Zughy removed the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Jan 21, 2024
@sfan5
Copy link
Member

sfan5 commented Feb 16, 2024

So are we doing this or not?

@appgurueu
Copy link
Contributor

I think we can add this as a convenience feature (which might lend itself better to future optimization work than get_node), but we should probably remove the performance hint / advice from the docs for now.

@grorp
Copy link
Member

grorp commented Feb 18, 2024

If we add this, I think we should go a step further: We should explicitly state that there is no performance benefit to avoid modders doing premature optimisation.

@sfence
Copy link
Contributor Author

sfence commented Feb 23, 2024

Removed performance hint from doc.

@Desour
Copy link
Member

Desour commented Mar 3, 2024

Rebase needed.

Also, with #14384 merged, this feature is now somewhat obsolete. @sfence Thoughts?

@Desour Desour added Rebase needed The PR needs to be rebased by its author. Action / change needed Code still needs changes (PR) / more information requested (Issues) labels Mar 3, 2024
@rubenwardy
Copy link
Member

I don't think this offers a meaningful usability improvement, so without the performance improvement I'd say this is not needed

@Zughy
Copy link
Member

Zughy commented Mar 3, 2024

@sfence do you still see a reason for this PR or a way to make it faster? See today's IRC log https://irc.minetest.net/minetest-dev/2024-03-03#i_6156853

@sfence
Copy link
Contributor Author

sfence commented Mar 3, 2024

I have created this PR to fix #14110. If it is no longer interesting, please close #14110 as won't add.

This PR has performance benefits in comparison with core.get_node in most cases, as the benchmark shows. The problem is, that the performance benefits look to be different on different machines.

@sfence
Copy link
Contributor Author

sfence commented Mar 3, 2024

From the API side view, it can be nice to have a complement function to the core.bulk_set_node function.

But at all performance benefits of #14384 are much better.

@sfan5
Copy link
Member

sfan5 commented Mar 15, 2024

Considering this as rejected, then.

@sfan5 sfan5 closed this Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Action / change needed Code still needs changes (PR) / more information requested (Issues) Concept approved Approved by a core dev: PRs welcomed! Feature ✨ PRs that add or enhance a feature One approval ✅ ◻️ Possible close Rebase needed The PR needs to be rebased by its author. @ Script API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lua api: Function to get adjacent nodes.
10 participants