[Break change] Support 64bit length, add various types and typed containers #311
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a proposal for a lot of modifications based on current specs. Part 1 will break backward compatibility (specifically depricate old
fixext
fields), but it's well worth it. It will drastically improve space efficiency and time efficiency in certain applications. The main modification includesTop-level types (Part 1)
Add variable length
fixext
fixext 1,2,4,8,16
is unified to a more compact format as proposed in #310 . Benefit for this includes:For example, to store a 3-byte ext data, previous we need
0xc7
+0x03
+ type byte + 4-byte payload = 7 bytes. With the proposed format it only needs 5 bytes (28.57% less)Add complex numbers
The most common 64-bit and 128-bit complex type are added.
Complex number is the only primitive type missing in msgpack format. It's natively supported by most general-purpose and scientific programming languages. Adding complex numbers as top level types will help serialize scientific data with typed containers proposed below.
Add bin 64, ext 64
This is a feature requested by a lot of people (#214 #190 #268). 64-bit indexing support is added to
bin
andext
, which will fit most of the demands.In modern computers, RAM size is usually larger than 4GB (can be up to TB in data centers), so loading all data into memory is very common. Chunking the data is inconvenient and will lead to performance loss if large data is stored. Moreover, there's currently no specification about how to chunk the data in msgpack. With the help of 4 additional type codes freed by variable length
fixext
, this can be easily added to the specification.In my opinion, msgpack is very simple and clean, it can be used to store large data, satisfying more demands than network communication.
More ext types (Part 2)
Add bigint, bigfloat
This proposal is modified from #249, fixing #206, #292. Only interger and floating point number is added. Large decimal and fraction types are rarely demanded in my opinion.
int 128
,float 16
andfloat 128
are also proposed with this format, which only requires 2 extra byte thanks to the variable lengthfixext
.Add UUID
UUID is widely used nowadays. Officially support UUID by assigning an extension type is not a bad idea in my opinion. This will fix #222 #239.
With UUID, Bigint and Bigfloat supported, there're 4 additional ext types left within
fixext
capacity, which can be used in future.Add typed containers
Motivated by #267 and #268, I added support for typed containers, specifically typed array, typed map and typed n-d array. The benefit for typed containers is for reducing overhead of the additional type bytes and zeroing copies. "structured array" as proposed in #267 is not added since it's a lot more complicated for parsers to implement than the formats proposed in this PR.
Note that the size of the containers is not explicity stored in the proposed format, it should be calculated by
(payload size - overhead size) / (element size)
This is a big proposal, comments, suggestions and modifications are welcome!