Suggestion for even higher performance gains #4

cipharius · 2024-02-28T21:33:05Z

Checking out your current encode/decode logic the system seems neat and could be relatively easily adjusted to drop the need for dynamically growing buffer object by allocating just the right size buffer.

I think this could be achived by adding new method to the dataTypeInterface<T> called size, which would then compute and return size of the value it represents.

Not only this would greatly speed up your encoding/decoding process, but also simplify the code by removing explicit memory management via bufferWriter.alloc which scatters dynamic allocations across the codebase.

This same idea could be also applied to the server and client queueing logic. Instead of appending the new packet's buffer to the pending buffer, keep all the pending buffers in a table, which will be your queue and then right before sending compute the total packet size, allocate one large buffer and copy each buffers content into the final buffer.

I use similar method in my Luau msgpack implementation and given that it can outperform native JSON encode/decode methods shows that now one can actually reason about memory allocation patterns in Luau and do a better job than native code that most likely neglected these concerns.

The text was updated successfully, but these errors were encountered:

ffrostfall · 2024-03-05T14:13:52Z

This method is faster and better performance without native code generation, however, in native code generation, buffers outperform tables and functions.

I used to use an approach like this, and it works well w/o codegen enabled. But the second you enable codegen, things change.

I'll likely be using this approach on the client, and the resizing approach on the server.

cipharius · 2024-03-05T16:56:06Z

I don't see how native codegen could change the memory access and allocation patterns. I am not suggesting to use table, but to use a single buffer that is allocated once instead of reallocated multiple times.

Memory reallocation has signifficant performance implications even in compiled C/C++/Rust whichever code.

ffrostfall · 2024-03-08T04:22:14Z

Because using purely buffers is faster than using tables alongside buffers. That's all.

Tables also need to allocate their own memory. I'll leave this open because it's actually accurate: this is the better method when considering the overhead of creating buffers, but there is no overhead for that in native codegen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for even higher performance gains #4

Suggestion for even higher performance gains #4

cipharius commented Feb 28, 2024

ffrostfall commented Mar 5, 2024

cipharius commented Mar 5, 2024

ffrostfall commented Mar 8, 2024

Suggestion for even higher performance gains #4

Suggestion for even higher performance gains #4

Comments

cipharius commented Feb 28, 2024

ffrostfall commented Mar 5, 2024

cipharius commented Mar 5, 2024

ffrostfall commented Mar 8, 2024