Skip to content

Commit

Permalink
refactor(experimental): pre-build the byte array before encoding with…
Browse files Browse the repository at this point in the history
… codecs (#1865)

_Note this is a draft PR focusing on `codecs-core` to gather feedback, I will continue to update the other libraries afterwards._

This PR updates the codecs API such that both encoding and decoding function **have access to the entire byte array**. Let’s first see the change this PR introduce and then see why this is a valuable change.

## API Changes

- **Encode**: The `Encoder` type contains a new function `write`. Contrary to the `encode` function which creates a new `Uint8Array` and returns it directly, the `write function` updates the provided `bytes` argument at the provided `offset`. It then returns the next offset that should be written to.

  ```ts
  // Before
  type Encoder<T> = {
    encode: (value: T) => Uint8Array;
    // ...
  };

  // After
  type Encoder<T> = {
    encode: (value: T) => Uint8Array;
    write: (value: T, bytes: Uint8Array, offset: Offset) => Offset;
    // ...
  };
  ```
  
  A new `createEncoder` function was provided to automatically fill the `encode` function from the `write` function.

  ```ts
  const myU8Encoder = createEncoder({
    fixedSize: 1,
    write: (value: number, bytes: Uint8Array, offset: Offset) {
      bytes.set(value, offset);
      return offset + 1;
    };
  });
  ```
  

- **Decode**: The `decode` function already following a similar approach by using offsets. The newly added function `read` takes over this responsibility. The only difference is we now make the offset a mandatory argument to stay consistent with the `write` function. The `decode` function now becomes syntactic sugar for accessing the value directly.

  ```ts
  // Before
  type Decoder<T> = {
    decode: (bytes: Uint8Array, offset?: Offset) => [T, Offset];
    // ...
  };

  // After
  type Decoder<T> = {
    decode: (bytes: Uint8Array, offset?: Offset) => T;
    read: (bytes: Uint8Array, offset: Offset) => [T, Offset];
    // ...
  };
  ```

  Similarly to the `Encoder` changes, a new `createDecoder` function is provided to fill the `decode` function using the `read` function.

  ```ts
  const myU8Decoder = createDecoder({
    fixedSize: 1,
    read: (bytes: Uint8Array, offset: Offset) {
      return [bytes[offset], offset + 1];
    };
  });
  ```

- **Sizes**: Because we now need to pre-build the entire byte array that will be encoded, we need a way to find the variable size of a given value. We introduce a new `variableSize` function and narrow the types so that it can only be provided when `fixedSize` is `null`.

  ```ts
  // Before
  type Encoder<T> = {
    fixedSize: number | null;
    maxSize: number | null;
  }

  // After
  type Encoder<T> = { ... } & (
    | { fixedSize: number; }
    | { fixedSize: null; variableSize: (value: T) => number; maxSize?: number }
  )
  ```

  We do something similar the `Decoder` except that this one doesn’t need to know about the variable size (it would make no sense as the type parameter `T` for decoder refers to the decoded type and not the type to encode).

  ```ts
  // Before
  type Decoder<T> = {
    fixedSize: number | null;
    maxSize: number | null;
  }

  // After
  type Decoder<T> = { ... } & (
    | { fixedSize: number; }
    | { fixedSize: null; maxSize?: number }
  )
  ```

- **Description**: This PR takes this refactoring opportunity to remove the `description` attribute from the codecs API which brings little value to the end-user.

## Why?

- **Consistent API**: The implementation of the `encode` / `decode` and `write` / `read` functions are now consistent with each other. Before one was using offsets to navigate through an entire byte array and the other was returning and merge byte arrays together. Now they both use offsets to navigate the byte array to encode or decode.

- **Performant API**: By pre-building the byte array once, we’re avoiding creating multiple instance of byte arrays and merging them together.

- **Non-linear serialisation**: The main reason why it’s important for the `encode` method to have access to the entire encoded byte array is that it allows us to offer more complex codec primitives that are able to jump back and forth within the buffer. Without it, we are locking ourselves to only supporting serialisation strategies that are read linearly which isn’t always the case. For instance, imagine we have the size of an array being stored at the very beginning of the account whereas the items themselves are being stored at the end. Because we now have the full byte array when encoding, we can push the size at the beginning whilst inserting all items at the requested offset. We could even offer a `getOffsetCodec` (or similar) that allows us to shift the offset forward or backward to compose more complex, non-linear data structures. This would be simply impossible with the current format of the `encode` function.
  • Loading branch information
lorisleiva committed Dec 1, 2023
1 parent a7c51e0 commit 7800e3b
Show file tree
Hide file tree
Showing 12 changed files with 368 additions and 337 deletions.
40 changes: 23 additions & 17 deletions packages/codecs-core/src/__tests__/__setup__.ts
Original file line number Diff line number Diff line change
@@ -1,31 +1,37 @@
import { Codec } from '../codec';
import { Codec, createCodec } from '../codec';

export const b = (s: string) => base16.encode(s);

export const base16: Codec<string> = {
decode(bytes, offset = 0) {
export const base16: Codec<string> = createCodec({
fixedSize: null,
read(bytes, offset) {
const value = bytes.slice(offset).reduce((str, byte) => str + byte.toString(16).padStart(2, '0'), '');
return [value, bytes.length];
},
description: 'base16',
encode(value: string) {
variableSize: (value: string) => Math.ceil(value.length / 2),
write(value: string, bytes, offset) {
const matches = value.toLowerCase().match(/.{1,2}/g);
return Uint8Array.from(matches ? matches.map((byte: string) => parseInt(byte, 16)) : []);
const hexBytes = matches ? matches.map((byte: string) => parseInt(byte, 16)) : [];
bytes.set(hexBytes, offset);
return offset + hexBytes.length;
},
fixedSize: null,
maxSize: null,
};
});

export const getMockCodec = (
config: {
defaultValue?: string;
description?: string;
size?: number | null;
} = {},
) => ({
decode: jest.fn().mockReturnValue([config.defaultValue ?? '', 0]),
description: config.description ?? 'mock',
encode: jest.fn().mockReturnValue(new Uint8Array()),
fixedSize: config.size ?? null,
maxSize: config.size ?? null,
});
} = {}
) =>
createCodec({
fixedSize: config.size ?? null,
maxSize: config.size ?? undefined,
read: jest.fn().mockReturnValue([config.defaultValue ?? '', 0]),
variableSize: jest.fn().mockReturnValue(config.size ?? 0),
write: jest.fn().mockReturnValue(0),
}) as Codec<unknown> & {
readonly read: jest.Mock;
readonly variableSize: jest.Mock;
readonly write: jest.Mock;
};
66 changes: 32 additions & 34 deletions packages/codecs-core/src/__tests__/codec-test.ts
Original file line number Diff line number Diff line change
@@ -1,75 +1,73 @@
import { Codec, Decoder, Encoder } from '../codec';
import { Codec, createCodec, createDecoder, createEncoder, Decoder, Encoder } from '../codec';

describe('Encoder', () => {
it('can define Encoder instances', () => {
const myEncoder: Encoder<string> = {
description: 'myEncoder',
encode: (value: string) => {
const bytes = new Uint8Array(32).fill(0);
const myEncoder: Encoder<string> = createEncoder({
fixedSize: 32,
write: (value: string, bytes, offset) => {
const charCodes = [...value.slice(0, 32)].map(char => Math.min(char.charCodeAt(0), 255));
bytes.set(new Uint8Array(charCodes));
return bytes;
bytes.set(charCodes, offset);
return offset + 32;
},
fixedSize: 32,
maxSize: 32,
};
});

expect(myEncoder.description).toBe('myEncoder');
expect(myEncoder.fixedSize).toBe(32);
expect(myEncoder.maxSize).toBe(32);

const expectedBytes = new Uint8Array(32).fill(0);
expectedBytes.set(new Uint8Array([104, 101, 108, 108, 111]));
expect(myEncoder.encode('hello')).toStrictEqual(expectedBytes);

const writtenBytes = new Uint8Array(32).fill(0);
expect(myEncoder.write('hello', writtenBytes, 0)).toBe(32);
expect(writtenBytes).toStrictEqual(expectedBytes);
});
});

describe('Decoder', () => {
it('can define Decoder instances', () => {
const myDecoder: Decoder<string> = {
decode: (bytes: Uint8Array, offset = 0) => {
const myDecoder: Decoder<string> = createDecoder({
fixedSize: 32,
read: (bytes: Uint8Array, offset) => {
const slice = bytes.slice(offset, offset + 32);
const str = [...slice].map(charCode => String.fromCharCode(charCode)).join('');
return [str, offset + 32];
},
description: 'myDecoder',
fixedSize: 32,
maxSize: 32,
};
});

expect(myDecoder.description).toBe('myDecoder');
expect(myDecoder.fixedSize).toBe(32);
expect(myDecoder.maxSize).toBe(32);
expect(myDecoder.decode(new Uint8Array([104, 101, 108, 108, 111]))).toStrictEqual(['hello', 32]);

expect(myDecoder.decode(new Uint8Array([104, 101, 108, 108, 111]))).toBe('hello');
expect(myDecoder.read(new Uint8Array([104, 101, 108, 108, 111]), 0)).toStrictEqual(['hello', 32]);
});
});

describe('Codec', () => {
it('can define Codec instances', () => {
const myCodec: Codec<string> = {
decode: (bytes: Uint8Array, offset = 0) => {
const myCodec: Codec<string> = createCodec({
fixedSize: 32,
read: (bytes: Uint8Array, offset) => {
const slice = bytes.slice(offset, offset + 32);
const str = [...slice].map(charCode => String.fromCharCode(charCode)).join('');
return [str, offset + 32];
},
description: 'myCodec',
encode: (value: string) => {
const bytes = new Uint8Array(32).fill(0);
write: (value: string, bytes, offset) => {
const charCodes = [...value.slice(0, 32)].map(char => Math.min(char.charCodeAt(0), 255));
bytes.set(new Uint8Array(charCodes));
return bytes;
bytes.set(charCodes, offset);
return offset + 32;
},
fixedSize: 32,
maxSize: 32,
};
});

expect(myCodec.description).toBe('myCodec');
expect(myCodec.fixedSize).toBe(32);
expect(myCodec.maxSize).toBe(32);

const expectedBytes = new Uint8Array(32).fill(0);
expectedBytes.set(new Uint8Array([104, 101, 108, 108, 111]));
expect(myCodec.encode('hello')).toStrictEqual(expectedBytes);
expect(myCodec.decode(new Uint8Array([104, 101, 108, 108, 111]))).toStrictEqual(['hello', 32]);

const writtenBytes = new Uint8Array(32).fill(0);
expect(myCodec.write('hello', writtenBytes, 0)).toBe(32);
expect(writtenBytes).toStrictEqual(expectedBytes);

expect(myCodec.decode(new Uint8Array([104, 101, 108, 108, 111]))).toBe('hello');
expect(myCodec.read(new Uint8Array([104, 101, 108, 108, 111]), 0)).toStrictEqual(['hello', 32]);
});
});
84 changes: 29 additions & 55 deletions packages/codecs-core/src/__tests__/combine-codec.ts
Original file line number Diff line number Diff line change
@@ -1,89 +1,63 @@
import { Codec, Decoder, Encoder } from '../codec';
import { Codec, createDecoder, createEncoder, Decoder, Encoder } from '../codec';
import { combineCodec } from '../combine-codec';

describe('combineCodec', () => {
const mockEncode: Encoder<number>['encode'] = () => new Uint8Array([]);
const mockDecode: Decoder<number>['decode'] = (_bytes: Uint8Array, offset = 0) => [42, offset];

it('can join encoders and decoders with the same type', () => {
const u8Encoder: Encoder<number> = {
description: 'u8',
encode: (value: number) => new Uint8Array([value]),
const u8Encoder: Encoder<number> = createEncoder({
fixedSize: 1,
maxSize: 1,
};
write: (value: number, buffer, offset) => {
buffer.set([value], offset);
return offset + 1;
},
});

const u8Decoder: Decoder<number> = {
decode: (bytes: Uint8Array, offset = 0) => [bytes[offset], offset + 1],
description: 'u8',
const u8Decoder: Decoder<number> = createDecoder({
fixedSize: 1,
maxSize: 1,
};
read: (bytes: Uint8Array, offset = 0) => [bytes[offset], offset + 1],
});

const u8Codec: Codec<number> = combineCodec(u8Encoder, u8Decoder);

expect(u8Codec.description).toBe('u8');
expect(u8Codec.fixedSize).toBe(1);
expect(u8Codec.maxSize).toBe(1);
expect(u8Codec.encode(42)).toStrictEqual(new Uint8Array([42]));
expect(u8Codec.decode(new Uint8Array([42]))).toStrictEqual([42, 1]);
expect(u8Codec.decode(new Uint8Array([42]))).toBe(42);
});

it('can join encoders and decoders with different but matching types', () => {
const u8Encoder: Encoder<number | bigint> = {
description: 'u8',
encode: (value: number | bigint) => new Uint8Array([Number(value)]),
const u8Encoder: Encoder<number | bigint> = createEncoder({
fixedSize: 1,
maxSize: 1,
};
write: (value: number | bigint, buffer, offset) => {
buffer.set([Number(value)], offset);
return offset + 1;
},
});

const u8Decoder: Decoder<bigint> = {
decode: (bytes: Uint8Array, offset = 0) => [BigInt(bytes[offset]), offset + 1],
description: 'u8',
const u8Decoder: Decoder<bigint> = createDecoder({
fixedSize: 1,
maxSize: 1,
};
read: (bytes: Uint8Array, offset = 0) => [BigInt(bytes[offset]), offset + 1],
});

const u8Codec: Codec<number | bigint, bigint> = combineCodec(u8Encoder, u8Decoder);

expect(u8Codec.description).toBe('u8');
expect(u8Codec.fixedSize).toBe(1);
expect(u8Codec.maxSize).toBe(1);
expect(u8Codec.encode(42)).toStrictEqual(new Uint8Array([42]));
expect(u8Codec.encode(42n)).toStrictEqual(new Uint8Array([42]));
expect(u8Codec.decode(new Uint8Array([42]))).toStrictEqual([42n, 1]);
expect(u8Codec.decode(new Uint8Array([42]))).toBe(42n);
});

it('cannot join encoders and decoders with sizes or descriptions', () => {
it('cannot join encoders and decoders with different sizes', () => {
expect(() =>
combineCodec(
{ description: 'u8', encode: mockEncode, fixedSize: 1, maxSize: 1 },
{ decode: mockDecode, description: 'u8', fixedSize: 2, maxSize: 1 },
),
createEncoder({ fixedSize: 1, write: jest.fn() }),
createDecoder({ fixedSize: 2, read: jest.fn() })
)
).toThrow('Encoder and decoder must have the same fixed size, got [1] and [2]');

expect(() =>
combineCodec(
{ description: 'u8', encode: mockEncode, fixedSize: 1, maxSize: 1 },
{ decode: mockDecode, description: 'u8', fixedSize: 1, maxSize: null },
),
).toThrow('Encoder and decoder must have the same max size, got [1] and [null]');

expect(() =>
combineCodec(
{ description: 'u8', encode: mockEncode, fixedSize: 1, maxSize: 1 },
{ decode: mockDecode, description: 'u16', fixedSize: 1, maxSize: 1 },
),
).toThrow('Encoder and decoder must have the same description, got [u8] and [u16]');
});

it('can override the description of the joined codec', () => {
const myCodec = combineCodec(
{ description: 'u8', encode: mockEncode, fixedSize: 1, maxSize: 1 },
{ decode: mockDecode, description: 'u16', fixedSize: 1, maxSize: 1 },
'myCustomDescription',
);

expect(myCodec.description).toBe('myCustomDescription');
createEncoder({ fixedSize: null, maxSize: 1, variableSize: jest.fn(), write: jest.fn() }),
createDecoder({ fixedSize: null, read: jest.fn() })
)
).toThrow('Encoder and decoder must have the same max size, got [1] and [undefined]');
});
});

0 comments on commit 7800e3b

Please sign in to comment.