Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend table allocation options (e.g. table fill & resize) #172

Closed
Hexcede opened this issue Nov 7, 2021 · 4 comments
Closed

Extend table allocation options (e.g. table fill & resize) #172

Hexcede opened this issue Nov 7, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@Hexcede
Copy link

Hexcede commented Nov 7, 2021

Currently, luau implements two very nice functions for table allocation: table.move & table.create
However, luau doesn't offer much support at all for doing our own reallocations of tables, such as filling and shrinking/growing (resizing). table.move and table.create both are useful for allocating new data, and rearranging existing data, however, these still limit the prior two use cases with overhead.

All of these do one-time allocations, and perform bulk table changes, which is important because it allows for reducing the computational cost of a lot of code in a very general way. (Which especially applies to porting many lower level algorithms or assembly langs into luau, but also applies directly to DOD code which benefits the most)

I also got a little carried away with my nerding, decided to embrace it and pretty much wrote a whole entire RFC (a wack one) for this, even though it's probably really excessive. I will probably make a PR since the implementation of the two proposed functions should be quick. Not sure though.

Pretty much an RFC??

Proposed APIs

  • table.resize(table, newSize) - Resizes the input array to the input size
    Possibly just a wrapper to luaH_resizearray with extra sanity and validation?
  • table.fill(table, indexA, indexB, value) - Writes value from indexA, to indexB into the array, resizing it if necessary.

Considered alternatives

I considered table.move for trying to perform resizes, but it will only resize a table to allow for all of the elements being moved to fit (plus it's not particularly elegant). This does not allow us to grow tables, or shrink them (which, in comparison would be a bit ambiguous and could be reasonable to address separate to growing). It also does not allow us to do bulk writes to existing tables without allocation overhead (fill).

While table.create has the desired fill behaviour for tables and could be combined with table.move to get a decently close alternative, it allocates a brand new table, which is probably not something we would really like when we might want to be dynamically resizing a table for DOD, say, a big list of debris data. We basically double the amount of writes, and allocate a new, potentially large table.

Justification of incompleteness

The implementation of both of these behaviours would make the implementations of both other functions possible to write direct pure luau ports of. For example, table.create & table.move both essentially can be thought of as utilizing both underlying features.

This might make the two seem redundant, but I would argue that because all four of these behaviours exist in other places as their own instructions (e.g. WASM) as well as the justification that the reduced overhead is desirable for where any of these functions would be desired to be used.

Because both the table.move & table.create functions don't have direct pure lua ports in luau without requiring excessive allocations, I would argue that this justifies incompleteness, considering that the use cases of the proposed functions are general and used in luau as general utilities, which are non specific.

Example in WASM

There are two cases I found being used in a lot of lower level algorithms which also apply generally, table filling, and table shrinking/growing (resizing), for example, WebAssembly has a comparable table system to lua. This system actually has two instructions that luau has, table.move is essentially a 1:1 implementation of WASM's table.copy instruction, and table.create is essentially a 1:1 implementation of WASM's table.init instruction

WASM offers both the ability to fill, and the ability to (limitedly) resize tables.

@Hexcede Hexcede added the enhancement New feature or request label Nov 7, 2021
@zeux
Copy link
Collaborator

zeux commented Nov 10, 2021

WASM is a bit different because there you can't grow the table by simply writing to it.

These aren't difficult to implement, but I'd like to hear the motivation from the ergonomics or performance perspective. For example, when we were adding table.create, the discussion was essentially:

  • it helps performance because any time there's a loop that table.insert's values into a fresh table and you can estimate the capacity, it significantly cuts down on reallocation cost; this is very common!
  • it helps ergonomics because there's common cases in algorithms when you initialize a table with zeroes; this is not as common as 1
  • while ergonomics can be solved with a wrapper function in user code, performance can not be.

table.move is an interesting example because we added this as a Lua 5.3 port to align with the upstream library, so there wasn't a specific motivation in performance. Subsequently we haven't seen this being used as often as I'd expect, and we even had some period of time while an optimal loop that copied values was actually faster than table.move :)

So for either of these to be added I think it's important to know what type of code would leverage these and how, because it'll make the case for including or not including these easier. table.resize in particular is complicated because it's effectively a no-op semantically if the length is larger than internal array size and if the length is smaller it's more difficult to implement (does table.resize(0) wipe out all numeric keys larger than 0?), which means we should make sure that it solves a need that commonly occurs in practice.

@Hexcede
Copy link
Author

Hexcede commented Nov 10, 2021

WASM is a bit different because there you can't grow the table by simply writing to it.

These aren't difficult to implement, but I'd like to hear the motivation from the ergonomics or performance perspective. For example, when we were adding table.create, the discussion was essentially:

  • it helps performance because any time there's a loop that table.insert's values into a fresh table and you can estimate the capacity, it significantly cuts down on reallocation cost; this is very common!
  • it helps ergonomics because there's common cases in algorithms when you initialize a table with zeroes; this is not as common as 1
  • while ergonomics can be solved with a wrapper function in user code, performance can not be.

table.move is an interesting example because we added this as a Lua 5.3 port to align with the upstream library, so there wasn't a specific motivation in performance. Subsequently we haven't seen this being used as often as I'd expect, and we even had some period of time while an optimal loop that copied values was actually faster than table.move :)

So for either of these to be added I think it's important to know what type of code would leverage these and how, because it'll make the case for including or not including these easier. table.resize in particular is complicated because it's effectively a no-op semantically if the length is larger than internal array size and if the length is smaller it's more difficult to implement (does table.resize(0) wipe out all numeric keys larger than 0?), which means we should make sure that it solves a need that commonly occurs in practice.

This is all makes sense to me. I'll try and do a mock of your bullets for both functions as well as giving some specific generalized use cases...
Overall, these functions would benefit data oriented design the most for sure.

For table fill:
Key property of its use cases: Deterministic bulk table changes
Key scenarios it benefits: Large quantities of small, repetitive & high frequency writes. Small quantities of large, repetitive & low frequency writes. (Comparable to table.concat vs string.repeat)
Common cases: Primarily data & image processing, assembly/bytecode emulation.

  • It helps ergonomics because there are common cases in algorithms when you may need to clear, reset, or update data in quantity. Flood algorithms, compression algorithms, flow grids, and memory emulation would benefit from this.
  • It also helps ergonomics & readability by condensing repetitive writes into an easy to understand form.
  • It slightly benefits performance for repetitive, small writes by reducing the number of variable allocations & operations (which are likely to be done in bulk where it best applies). This is commonly applicable in data processing (especially in compression, as repetition is effectively the whole purpose) where its expected there will be a large quantity of small writes.
  • Low overhead

For table resize:
Key property of its use cases: Non deterministic bulk table changes
Key scenarios it benefits: Low frequency writes. Small quantities of large, unique writes. Predictable frequencies, quantities & phases of writes.
Common cases: Highly dynamic environments. (Networking, dynamic/interactive games, etc)

  • It helps performance when adding lots of new data to existing tables. This follows table.create loosely, benefiting large quantities over small ones, but applies more to dynamic environments where size is likely to change very frequently, whereas table.create applies to static ones where size is expected to be mostly static. This is common in any sort of code involving dynamic bodies where many items will be inserted and removed at once.
  • This applies to scenarios where there are very predictable events where things are likely to be deleted or created all at once.
  • For example, node based pathfinding algorithms such as A* needs to manage a large node array, and different areas of a game map might be activated or deactivated based on the states of different doors in the world.
  • High overhead if used incorrectly

@zeux
Copy link
Collaborator

zeux commented Aug 15, 2022

Coming back to this, it would be interesting to see examples of code where either resize or fill would be useful. Ideally we'd look at some algorithm implementation that we can then benchmark with or without first-class functions like this to determine how important these would be. Overall I'm hesitant to expose resize as it raises too many questions about the behavior in edge cases, but fill would be a more straightforward addition because it can be formulated in terms of simple table assignment - the key question being, what's an example of an algorithm where it improves performance.

@zeux
Copy link
Collaborator

zeux commented Aug 18, 2022

I'm going to close this for now, feel free to suggest examples of code where filling a table with the same value is important wrt performance and then we can reevaluate.

@zeux zeux closed this as not planned Won't fix, can't repro, duplicate, stale Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants