ADR 002: Length-Agnostic Implicit Array Encoding for Integer and Float #9

quickwritereader · 2026-02-11T22:39:13Z

quickwritereader
Feb 11, 2026
Maintainer

ADR 002: Length-Agnostic Implicit Array Encoding

Status

Proposed / Draft

Context

PackOS aims for extreme compactness. Conventional length-prefixed arrays (storing the count as a uint32 or uint64) add unnecessary bytes. We require a format for Integer and Floating-point arrays that minimizes metadata while remaining fully compatible with the chunking model defined in ADR 001.

Decision

We introduce a unified array format for Type 1 (TypeInteger) and Type 3 (TypeFloating). The distinction between a scalar and an array is derived purely from the payload size indicated in the 13-bit header.

1. Encoding Rules

Scalar Mode: If the payload is 8 bytes or less, the data is treated as a single scalar.
Array Mode: If the payload is greater than 8 bytes, the first byte is the Element Size Indicator (1, 2, 4, or 8 bytes).
Payload: Raw packed binary values follow the indicator byte.

2. Implicit Count Logic
The element count is not stored in the binary. It is calculated at runtime by the decoder using this logic:

count = (payloadSize - 1) / elementSize

3. Memory Alignment & Implementation Note
While the format is highly compact, it is important to note:

Platform Variance: While in C and certain hardware platforms with unaligned access support may allow direct casting/traversal over array, this is not a universal guarantee. and also it's safe to copy and proper cast array into if the array will be used elsewhere. this also fixes problem that stems holding big buffer for some array chunk.
Implementation Safety: In languages like Go, we prioritize safety and portability. We will use proper encoding/decoding methods (such as encoding/binary) to handle byte-to-type conversion, ensuring alignment and endianness are correctly managed regardless of the host architecture.

4. Integration with ADR 001
If a numerical array exceeds the Adjustable Oversize Limit (default 4 KB), it is wrapped in a TypeExtendedTagContainer. The array is split across segments, and the implicit count logic is applied locally to each segment's payload.

Consequences

Pros: 1-byte overhead for arrays of any length; removes redundant count fields; highly efficient for bulk numerical processing.
Cons: Requires a simple division operation during the initial decode of an array chunk; necessitates careful implementation in high-level languages to avoid alignment faults.

heartical · 2026-03-23T21:14:08Z

heartical
Mar 23, 2026

#10

Look at this ❤️

3 replies

quickwritereader Apr 8, 2026
Maintainer Author

thanks a lot, I will try to review it this week end.

quickwritereader Apr 8, 2026
Maintainer Author

I checked. array implementation is great. just we may need redo integer array decoding for each type , the way you did for float. float32 and float64.

quickwritereader Apr 8, 2026
Maintainer Author

I also viewed (but not tested) the extended implementation. its good and straightforward.
but
I think in ADR it is desired to be triggered for root and nested containers
so that while decoding all, or part of elements within the same level could be accessed directly.
so it means while encoding. it should promote to extendedContainer,
and also while encoding we should be able to patch/replace nextOffsets
somehow we should know this triplet [parent segment where our nextOffset is, nextOffset address within that parent segment, its actual segment]
so that while making full buffer or file store, we can replace those offsets with real links or offsets.

so, if we modify your solution with the triplet I mentioned above, imo we may have more flexibility choosing tradeoff bfs/dfs style access.

imagine we have

{
     int(0xaabbccdd),
     int(0xbbccddee),
     tuple { 
           something large [8kb]
     }
     int(0xdeaddead) 
     int(0xabdeabde)
}

when we pack the above,
it should promote tuple to extended type and only hold allowed size, in our case 4kb
======start========
//headers, offset, size
int  0  4
int  4  4
extendedContainer  8   4kb+4
int    4kb+12   4
int    4kb+16   4

// values
0xaabbccdd
0xbbccddee
[FFFFFFFF]  //it means this is new container (this 4 bytes maybe removed,but if we keep it we will have extra validation logic, if we remove we should store/send our segments ordered)
[4kb+20] //next_continuation_address
[actual type] 

.....its internal data
....
...
0xdeaddead
0xabdeabde
=====first pack segment ends here======

====2nd pack start=======
//headers
extendedContainer  0   4kb

//values
[real_offset]   // for validation/cheking
[0x0000000]  //no continuation
[actual type]
.....the rest internal data
....
...
====2nd pack ends here=======

quickwritereader · 2026-04-09T19:23:50Z

quickwritereader
Apr 9, 2026
Maintainer Author

@heartical I merged it, but it will not be propogated to main with the current.
could you redo it the way I asked. firstly we should write encoding side. then decode and so on.
read my comments above.
also do not rush if you did not understand. ask before doing.
your works had good insights thats why I kept it on experimental branch.
I did not want to make you cascading changes . thats why I want you to start redo with encode (put access) alone firstly.

0 replies

quickwritereader · 2026-04-09T23:03:00Z

quickwritereader
Apr 9, 2026
Maintainer Author

@heartical I added simple test pseudo code
#10 (comment)

if you have time, lets rewrite encode first this way. firstly strings(bytes) + int float arrays (which u implemented) ,
then tuple,
then map
and then root (which is tuple like) to achieve nested.

for maps, we may add extra restriction , like key strings are being less than some length(1kb or 2kb)
also for file serialization we may directly write chunk segments as file and use offsets as filename unordered
so there are plenty opportunities and optimizations.
it means in reality we can also store extended container segments with callback function
func store(parentId, patchSegmentId , nextAddressPatchOffset int, packedBuffer [] byte) int //returns segmentId

and decide how to store them outside. in tree for making it easier bfs offset way concat, or just direct file write with named as offsets (filename.offset.data) (by default its dfs order reversed). as in this case triplets can be stored ordinary way without storing buffers as segments). and root file will be last one. and we can name it filename.root.data)

after encoding decoding will be more straightforward.

0 replies

quickwritereader · 2026-04-10T01:10:42Z

quickwritereader
Apr 10, 2026
Maintainer Author

@heartical I rebased ADR-02 into main. but Adr-01 will stay inside experimental for now due to being inconsistent with it.as we need to redo it gradual way. firstly starting encoding. thats why it needs to be [WIP (work in progress PR] for encoding, then later decoding

thanks a lot

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ADR 002: Length-Agnostic Implicit Array Encoding for Integer and Float #9

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

ADR 002: Length-Agnostic Implicit Array Encoding for Integer and Float #9

Uh oh!

Uh oh!

quickwritereader Feb 11, 2026 Maintainer