Skip to content

Conversation

@JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Oct 31, 2025

A long integer (one that doesn't fit in the 4-byte encoding) will now be encoded like this:

  • initial header byte
  • short integer (1-4 bytes) encoding the number of bytes of data and sign
  • variable-length number of data bytes (absolute value of the integer) -- all bits are used

For example, a 32-bit integer can now always be encoded using at most 6 bytes (+ type tag).

This is optimized for size efficiency, not performance, since large integers are not expected to be a performance bottleneck. Having an efficient format makes it easier to improve performance in the future, however, without changing the encoding.

The header byte has a few unused bits which could be used to slightly improve efficiency, but I decided that it's not worth the extra complexity.

@JukkaL JukkaL requested a review from ilevkivskyi October 31, 2025 14:54
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks!

@JukkaL JukkaL merged commit 7213139 into master Oct 31, 2025
13 checks passed
@JukkaL JukkaL deleted the serialize-long-int branch October 31, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants