Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce literal enum codec. #2296

Closed
lithdew opened this issue Mar 12, 2024 · 18 comments · Fixed by #2394
Closed

Introduce literal enum codec. #2296

lithdew opened this issue Mar 12, 2024 · 18 comments · Fixed by #2394
Assignees
Labels
enhancement New feature or request

Comments

@lithdew
Copy link

lithdew commented Mar 12, 2024

As of right now, we have both getScalarEnumCodec and getDataEnumCodec for specifying enums.

It would be useful to be able to specify enum codecs with string/number literals as its input/output, similar to Zod's z.enum([]).

Example:

const currencyCodec = getLiteralEnumCodec(["sol", "like", "usdc"]);
const bytes = currencyCodec.encode("sol"); // 0x00
const decoded = currencyCodec.decode(Uint8Array.of(0x01)); // "sol" | "like" | "usdc"
@lithdew lithdew added the enhancement New feature or request label Mar 12, 2024
@lorisleiva
Copy link
Collaborator

Oh that's interesting. Technically you can use getScalarEnumCodec with an enum that uses string variants to achieve a similar result but that would be a nice way to bypass the enum lookup object altogether.

Thanks for all your codec feedback btw, it's much appreciated.

@lithdew
Copy link
Author

lithdew commented Mar 13, 2024

Right - to bypass the enum lookup object altogether, the code is a bit manual. Might you have any suggestions on improving how the codec is mapped?

export enum Currency {
  sol,
  like,
  usdc,
}

export const CURRENCY_CODEC = mapCodec(
  getScalarEnumCodec(Currency),
  (v: "sol" | "like" | "usdc") => {
    switch (v) {
      case "sol":
        return Currency.sol;
      case "like":
        return Currency.like;
      case "usdc":
        return Currency.usdc;
    }
  },
  (v: ScalarEnumFrom<typeof Currency>) => {
    switch (v) {
      case 0:
      case Currency.sol:
        return "sol";
      case 1:
      case Currency.like:
        return "like";
      case 2:
      case Currency.usdc:
        return "usdc";
    }

    throw new Error("Invalid currency");
  }
);

@lorisleiva
Copy link
Collaborator

Oh no I meant that you can use a scalar enum with string variants like so:

export enum Currency {
    sol = 'sol',
    like = 'like',
    usdc = 'usdc',
}

const codec = getScalarEnumCodec(Currency);
codec.encode('sol'); // 0x00
codec.decode(new Uint8Array([1])); // 'like'

@lithdew
Copy link
Author

lithdew commented Mar 14, 2024

Ah got it. Thought the enum had to map to numbers as there are some cases where we can't rely on the enum's key ordering to figure out an enum's byte representation/value (if, say, we wanted to define a scalar enum codec schema in JSON).

Based on the way the types are defined for getScalarEnum also btw, noticed that an object could be passed in as well.

const codec = getScalarEnumCodec({
  sol: "sol",
  like: "like",
  usdc: "usdc"
});

@lorisleiva
Copy link
Collaborator

Knowing that, do you still think there is a need for getLiteralEnumCodec(["sol", "like", "usdc"])?

@lithdew
Copy link
Author

lithdew commented Mar 14, 2024

Knowing that, do you still think there is a need for getLiteralEnumCodec(["sol", "like", "usdc"])?

Hmm I still think it would be a nice-to-have - Zod for example has separate nativeEnum and enum validators.

I think it would be better for getScalarEnumCodec to encode enum literals directly to the values it maps to (meaning the enum's values must be of a literal type that can be encoded to bytes).

@lorisleiva
Copy link
Collaborator

Do you mean this?

export enum Currency {
    sol,
    like,
    usdc,
}

const codec = getScalarEnumCodec(Currency);
codec.encode('sol'); // 0x00
codec.encode('like'); // 0x01
codec.encode('usdc'); // 0x02

Because that's already possible. It's just that when you decode, you'll get the values of the enum.

codec.decode(new Uint8Array([0])); // Currency.sol (same as 0)
codec.decode(new Uint8Array([1])); // Currency.like (same as 1)
codec.decode(new Uint8Array([2])); // Currency.usdc (same as 2)

Here's another example.

@lithdew
Copy link
Author

lithdew commented Mar 14, 2024

I mean like this:

import { getScalarEnumCodec } from "@solana/web3.js";

export enum Currency {
    sol = 3,
    like = 5,
    usdc = 9,
}

const codec = getScalarEnumCodec(Currency);
console.log(codec.encode('sol')); // Should be 0x03
console.log(codec.encode('like')); // Should be 0x05
console.log(codec.encode('usdc')); // Should be 0x09

@lorisleiva
Copy link
Collaborator

Oh wait is that not the case?! If so that's a bug then. Do you want to raise a separate issue for it?

@lithdew
Copy link
Author

lithdew commented Mar 14, 2024

Yep, will file it. For the case of this issue then, what do you think about having getScalarEnumCodec being able to take in an array of literals then vs. having a separate getLiteralEnumCodec?

@lorisleiva
Copy link
Collaborator

I think it'd be cleaner both internally and for the end-user to split them up into two separate codecs. I think something like getStringUnionCodec would better describe what the function does as well. Wdyt?

@lithdew
Copy link
Author

lithdew commented Mar 15, 2024

I think it'd be cleaner both internally and for the end-user to split them up into two separate codecs. I think something like getStringUnionCodec would better describe what the function does as well. Wdyt?

This function could support other types of literals (hence getLiteralEnumCodec). i.e. numbers, symbols, booleans, etc. The discriminator for an enum literal would depend on its index in the array provided to getLiteralEnumCodec.

@lorisleiva
Copy link
Collaborator

That's a good point! I think with the current naming convention getLiteralEnumCodec makes the most sense.

But all your feedback has gotten me thinking a bit more about it and I think it may be confusing for us to use the term Enum when the encoded type isn't actually one.

Therefore, I've been thinking about the following refactoring.

// Current API.
getScalarEnumCodec();
getDataEnumCodec();
getLiteralEnumCodec(); // Soon.

// Proposed API.
getEnumCodec(); // Because that's technically the only one that uses enums.
getStructUnionCodec(); // With a new option that enables you to configure the discriminant.
getLiteralUnionCodec();
getUnionCodec(); // (optionally) a more generic helper that the previous two can use under the hood.

@lithdew
Copy link
Author

lithdew commented Mar 15, 2024

If we are to reference how types are named in C or Zig, the naming convention would be the following:

getEnumCodec(); // Enums are available in C and Zig. Should support both JavaScript unions and an array of literals.
getDiscriminatedUnionCodec(); // Discriminated unions are available in C and Zig.

@lorisleiva
Copy link
Collaborator

lorisleiva commented Mar 15, 2024

Since the encoded/decoded types are actually represented as TypeScript types, I think it makes more sense to follow a naming convention that's idiomatic with the JavaScript/TypeScript language instead of C, Rust or Zig.

It is up to the designer of these codecs to decide how they want to represent these byte arrays in something that fits in TypeScript land.

Related: I'm also thinking of renaming getStructCodec to getObjectCodec to stay consistent in that regard (See discord message). So the proposed API here would become getObjectUnionCodec instead of getStructUnionCodec, or getDiscriminatedUnionCodec as you suggested since there will be a common discriminator attribute to these objects.

@lithdew
Copy link
Author

lithdew commented Mar 18, 2024

Hmm if we are to follow TypeScript's naming conventions, what about these?

getRecordCodec(); // Represents a Record<string, any>.
getLiteralEnumCodec(); // Represents a literal enum expression.
getDiscriminatedUnionCodec(); // Represents a union that is discriminated by an expression.

@lorisleiva lorisleiva self-assigned this Mar 22, 2024
lorisleiva added a commit that referenced this issue Mar 27, 2024
…ionCodec (#2382)

This PR renames the concept of a "Data Enum" to what it actually is: a "Discriminated Union".

See #2296.
lorisleiva added a commit that referenced this issue Mar 27, 2024
)

Now that "Data Enum" is called "Discriminated Union", "Scalar Enum" can also be appropriately renamed as "Enum".

See #2296.
lorisleiva added a commit that referenced this issue Apr 2, 2024
…ta-structures (#2394)

This PR adds a new `getLiteralUnionCodec` that functions similarly to `getEnumCodec` but uses TypeScript unions to describe all possible values.

```ts
const codec = getLiteralUnionCodec(['left', 'right', 'up', 'down']);
// ^? FixedSizeCodec<"left" | "right" | "up" | "down">

const bytes = codec.encode('left'); // 0x00
const value = codec.decode(bytes); // 'left'
```

Fixes #2296
@lorisleiva
Copy link
Collaborator

In the end, I went for:

getEnumCodec();
getLiteralEnumCodec();
getDiscriminatedUnionCodec();

As you suggested initially.

Copy link
Contributor

Because there has been no activity on this issue for 7 days since it was closed, it has been automatically locked. Please open a new issue if it requires a follow up.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
2 participants