Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor(experimental): pre-build the byte array before encoding with…
… codecs (#1865) _Note this is a draft PR focusing on `codecs-core` to gather feedback, I will continue to update the other libraries afterwards._ This PR updates the codecs API such that both encoding and decoding function **have access to the entire byte array**. Let’s first see the change this PR introduce and then see why this is a valuable change. ## API Changes - **Encode**: The `Encoder` type contains a new function `write`. Contrary to the `encode` function which creates a new `Uint8Array` and returns it directly, the `write function` updates the provided `bytes` argument at the provided `offset`. It then returns the next offset that should be written to. ```ts // Before type Encoder<T> = { encode: (value: T) => Uint8Array; // ... }; // After type Encoder<T> = { encode: (value: T) => Uint8Array; write: (value: T, bytes: Uint8Array, offset: Offset) => Offset; // ... }; ``` A new `createEncoder` function was provided to automatically fill the `encode` function from the `write` function. ```ts const myU8Encoder = createEncoder({ fixedSize: 1, write: (value: number, bytes: Uint8Array, offset: Offset) { bytes.set(value, offset); return offset + 1; }; }); ``` - **Decode**: The `decode` function already following a similar approach by using offsets. The newly added function `read` takes over this responsibility. The only difference is we now make the offset a mandatory argument to stay consistent with the `write` function. The `decode` function now becomes syntactic sugar for accessing the value directly. ```ts // Before type Decoder<T> = { decode: (bytes: Uint8Array, offset?: Offset) => [T, Offset]; // ... }; // After type Decoder<T> = { decode: (bytes: Uint8Array, offset?: Offset) => T; read: (bytes: Uint8Array, offset: Offset) => [T, Offset]; // ... }; ``` Similarly to the `Encoder` changes, a new `createDecoder` function is provided to fill the `decode` function using the `read` function. ```ts const myU8Decoder = createDecoder({ fixedSize: 1, read: (bytes: Uint8Array, offset: Offset) { return [bytes[offset], offset + 1]; }; }); ``` - **Sizes**: Because we now need to pre-build the entire byte array that will be encoded, we need a way to find the variable size of a given value. We introduce a new `variableSize` function and narrow the types so that it can only be provided when `fixedSize` is `null`. ```ts // Before type Encoder<T> = { fixedSize: number | null; maxSize: number | null; } // After type Encoder<T> = { ... } & ( | { fixedSize: number; } | { fixedSize: null; variableSize: (value: T) => number; maxSize?: number } ) ``` We do something similar the `Decoder` except that this one doesn’t need to know about the variable size (it would make no sense as the type parameter `T` for decoder refers to the decoded type and not the type to encode). ```ts // Before type Decoder<T> = { fixedSize: number | null; maxSize: number | null; } // After type Decoder<T> = { ... } & ( | { fixedSize: number; } | { fixedSize: null; maxSize?: number } ) ``` - **Description**: This PR takes this refactoring opportunity to remove the `description` attribute from the codecs API which brings little value to the end-user. ## Why? - **Consistent API**: The implementation of the `encode` / `decode` and `write` / `read` functions are now consistent with each other. Before one was using offsets to navigate through an entire byte array and the other was returning and merge byte arrays together. Now they both use offsets to navigate the byte array to encode or decode. - **Performant API**: By pre-building the byte array once, we’re avoiding creating multiple instance of byte arrays and merging them together. - **Non-linear serialisation**: The main reason why it’s important for the `encode` method to have access to the entire encoded byte array is that it allows us to offer more complex codec primitives that are able to jump back and forth within the buffer. Without it, we are locking ourselves to only supporting serialisation strategies that are read linearly which isn’t always the case. For instance, imagine we have the size of an array being stored at the very beginning of the account whereas the items themselves are being stored at the end. Because we now have the full byte array when encoding, we can push the size at the beginning whilst inserting all items at the requested offset. We could even offer a `getOffsetCodec` (or similar) that allows us to shift the offset forward or backward to compose more complex, non-linear data structures. This would be simply impossible with the current format of the `encode` function.
- Loading branch information