Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce untagged variants. #6103

Merged
merged 31 commits into from
Apr 8, 2023
Merged

Introduce untagged variants. #6103

merged 31 commits into from
Apr 8, 2023

Conversation

cristianoc
Copy link
Collaborator

@cristianoc cristianoc commented Mar 30, 2023

Introduce untagged variants (see forum discussion).

In this guide, we'll explain how to work with untagged variants in ReScript using the @unboxed attribute. This new feature allows you to eliminate the need for tags when working with different types of data, providing better JavaScript interop. We'll go through several examples to help you understand how to use this feature.

  1. Defining untagged variants:
    To define an untagged variant, use the @unboxed attribute followed by the type definition. For example:
@unboxed
type t = A | I(int) | S(string)
  1. Pattern matching with untagged variants:
    You can use pattern matching with untagged variants just like with ordinary variants. Here's an example:
let classify = x =>
  switch x {
  | I(_) => "An integer"
  | S(s) => "A string" ++ s
  | A => "A"
  }
  1. Working with nested types and recursive types:
    You can use untagged variants with nested types and recursive types, as shown in the following examples:
module ListWithTuples = {
  @unboxed
  type rec t<'a> = | @as(undefined) Empty | Cons(('a, t<'a>))
}

module ListWithObjects = {
  @unboxed
  type rec t<'a> = | @as(null) Empty | Cons({hd: 'a, tl: t<'a>})
}
  1. Overlapping cases:
    In some cases, you might have overlapping cases like a string or a number being used as both a literal and a payload. You can handle these cases using untagged variants as well:
module OverlapString = {
  @unboxed
  type enum = One | Two | Three | FutureAddedValue(string)
}

module OverlapNumber = {
  @unboxed
  type enum = | @as(1.0) One | Two | Three | FutureAddedValue(float)
}

module OverlapObject = {
  @unboxed
  type enum = | @as(null) One | Two | Three | Object({x: int})
}

These examples should help you understand how to work with untagged variants in ReScript using the @unboxed attribute. The key takeaway is that this new feature provides a simpler and more expressive way to handle different types of data without using tags, while still maintaining the necessary level of type safety and pattern matching capabilities.


Semi-formal description

@unboxed
type t = literals | blocks
  • Literals are: true, 0, 3.14, "abc", A (a single payload variant with no arguments), null, undefined.

  • Block kinds are: int, float, string, object, array unknown. Where object includes cases with multiple payloads, as in X(int, string) and inline-records, as in X({x:int, y:string}), and array is a single payload with array type. Instead, unknown is any other case with 1 payload which is not one of the 3 base types listed above.

Restriction on blocks:

  • If there is an unknown, then it must be the only case in the blocks.
  • At most one each of object or array.

Runtime checks that need expressing:

  • isSomeLiteralCase checks if the value is one of the literals and not one of the blocks.
  • isLiteralCase matches individual literals using ===.
  • isBlockCase matches individual blocks using typeof.
    The unknown case always ends up in the final else ..., so it does not need any specific runtime type checks.

Some details by example:

  • isSomeLiteralCase(true | 0 | A | C({x:int})) checks if the value is one of the literals (true, 0, A) and not one of the blocks (C({x:int})). Expressed as typeof x !== "object".

  • isBlockCase(int) is typeof x === "number", and isBlockCase(C({x:int, y:string})) is typeof x === "object".

@cristianoc cristianoc changed the title Towards prototyping untagged variants. Introduce untagged variants. Apr 3, 2023
@cristianoc cristianoc requested review from cknitt and zth April 3, 2023 10:50
@cristianoc
Copy link
Collaborator Author

cristianoc commented Apr 7, 2023

Unmarshalling Binary Data and Computing the Sum of an OCaml List

In this post, we'll explore how to read binary marshalled data produced by an OCaml compiler and perform high-level list operations on it. Here's a step-by-step explanation of the code discussed in the conversation above:

First, we have a native OCaml program that creates a list of integers, marshals it to a string, and saves that string into aaa.marshal:

let foo v =
  let s = Marshal.to_string v [Compat_32] in
  let ch = open_out_bin "aaa.marshal" in
  let () = output_string ch s in
  close_out ch

foo [1;2;3;4;5]

Next, we read the marshalled file, unmarshal it, and pass it to the sum function. The sum function operates at a high level on lists, but its definition uses the runtime representation that corresponds to OCaml's runtime representation for lists:

let s = caml_read_file_content("./aaa.marshal")

@unboxed
type rec myList<'a> = | @as(0) Empty | Cons((unknown, int, myList<'a>))

let v: myList<int> = unmarshal(s)

let rec sum = l =>
  switch l {
  | Empty => 0
  | Cons((_, i, l)) => i + sum(l)
  }

Js.log2("v", v)
Js.log2("sum:", sum(v))

To see what the runtime representation looks like, the first log gives:

v [ 0, 1, [ 0, 2, [ 0, 3, [Array] ] ] ]

v is a nested array where the first element is always 0, the second element is the integer, and the third element is the next item in the list.

The sum function walks the runtime representation directly and gives the correct sum 15.

This approach demonstrates a neat technique for working with OCaml's runtime representation to manipulate lists while maintaining high-level abstractions.

@cometkim
Copy link
Contributor

Can we use Array.isArray(x) instead of x instanceof Array? It is 2 bytes shorter and more accurate

@cristianoc
Copy link
Collaborator Author

Can we use Array.isArray(x) instead of x instanceof Array? It is 2 bytes shorter and more accurate

Perhaps, but we need some measures. Perf etc.

@cometkim
Copy link
Contributor

cometkim commented Apr 10, 2023

Array.isArray is actually faster.

just tested on Node.js v18.14.0 (V8 v10.2)

x instanceof Array x   155,610,377 ops/sec ±0.29% (101 runs sampled)
Array.isArray(x)   x 1,032,780,516 ops/sec ±0.16% (101 runs sampled)
test.mjs
import b from 'benchmark';
const { Benchmark } = b;

const suite = new Benchmark.Suite();

const x = [1, 'a', null];

suite.add('x instanceof Array', () => {
  void (x instanceof Array);
});

suite.add('Array.isArray(x)', () => {
  void (Array.isArray(x));
});

suite.on('cycle', (event) => {
  console.log(event.target.toString());
});

suite.run();

@cometkim
Copy link
Contributor

cometkim commented Apr 10, 2023

Tested using another framework on Node.js and Bun

❯ node test2.mjs
┌─────────┬──────────────────────┬─────────────────────┬────────────────────────┐
│ (index) │      Task Name       │  Average Time (ps)  │     Variance (ps)      │
├─────────┼──────────────────────┼─────────────────────┼────────────────────────┤
│    0    │ 'x instanceof Array' │ 0.05411331629941917 │ 0.00014958788020771055 │
│    1    │  'Array.isArray(x)'  │ 0.05541524928361437 │ 0.0002851362270896031  │
└─────────┴──────────────────────┴─────────────────────┴────────────────────────┘
❯ bun test2.mjs
[
  {
    "Task Name": "x instanceof Array",
    "Average Time (ps)": 0.06340699361350076,
    "Variance (ps)": 0.0005180995121908796
  }, {
    "Task Name": "Array.isArray(x)",
    "Average Time (ps)": 0.059773168855239485,
    "Variance (ps)": 0.00020890229653848768
  }
]
test2.mjs
import { Bench } from 'tinybench';

const bench = new Bench();

const x = [1, 'a', null];

bench.add('x instanceof Array', () => {
  void (x instanceof Array);
});

bench.add('Array.isArray(x)', () => {
  void (Array.isArray(x));
});

await bench.warmup();
await bench.run();

console.table(
  bench.tasks.map(({ name, result }) => ({
    'Task Name': name,
    'Average Time (ps)': result?.mean * 1000,
    'Variance (ps)': result?.variance * 1000,
  })),
);

@cristianoc
Copy link
Collaborator Author

Tested using another framework on Node.js and Bun

❯ node test2.mjs
┌─────────┬──────────────────────┬─────────────────────┬────────────────────────┐
│ (index) │      Task Name       │  Average Time (ps)  │     Variance (ps)      │
├─────────┼──────────────────────┼─────────────────────┼────────────────────────┤
│    0    │ 'x instanceof Array' │ 0.05411331629941917 │ 0.00014958788020771055 │
│    1    │  'Array.isArray(x)'  │ 0.05541524928361437 │ 0.0002851362270896031  │
└─────────┴──────────────────────┴─────────────────────┴────────────────────────┘
❯ bun test2.mjs
[
  {
    "Task Name": "x instanceof Array",
    "Average Time (ps)": 0.06340699361350076,
    "Variance (ps)": 0.0005180995121908796
  }, {
    "Task Name": "Array.isArray(x)",
    "Average Time (ps)": 0.059773168855239485,
    "Variance (ps)": 0.00020890229653848768
  }
]

test2.mjs

Thanks!
Here's a PR: #6121
I guess there are no compatibility issues.
Anything else to check?

@cometkim
Copy link
Contributor

Good! I think it's OK

cristianoc added a commit that referenced this pull request Apr 10, 2023
@cristianoc
Copy link
Collaborator Author

Merged.

@TheSpyder
Copy link
Contributor

TheSpyder commented Apr 10, 2023

The other reason to not use instanceof in code like this - ever - is handling values created in another window. We run into this all the time with TinyMCE where the default editor uses an iframe to host document content, and that has lead to some interesting changes in rescript-webapi.

There are also edge cases like new String('foo'), although I don't know if ReScript will run into that one. But given that the goal is to support TS bindings, where weird values may come up, I'd like to offer our battle-tested "type of" code that has evolved to deal with every edge case JS can throw at us:
https://github.com/tinymce/tinymce/blob/develop/modules/katamari/src/main/ts/ephox/katamari/api/Type.ts

What may be particularly useful in future is the generic is function, this handles constructors like Date and Regexp which came up on the forum in a cross-window-compatible way. Here are our tests that hopefully help to explain our use cases:
https://github.com/tinymce/tinymce/blob/develop/modules/katamari/src/test/ts/atomic/api/type/TypeTest.ts

I will admit that some of this relies on modern browsers, so if IE support is still important for some reason there is a less capable but IE-compatible version from our previous release.

@cristianoc
Copy link
Collaborator Author

That's great thanks.
Curious: array is the only one where a deeper check is offered. Any reason for that?
Asking as in future we might want to check object shapes (possibly).

@TheSpyder
Copy link
Contributor

That’s isArrayOf - checking if every element in an array is the same type is fairly easy.

We don’t have a generic way to check object shapes, for everything TypeScript we trust the type system, but it does come up in two places:

  • editor configuration, where so few nested objects are used we just built specific validation for them
  • real-time collaboration, where we used a combination of protobuf bindings and atdgen.

We do have object equality code but I don’t think that’s really what you’re looking for 🤔

@cristianoc
Copy link
Collaborator Author

That’s isArrayOf - checking if every element in an array is the same type is fairly easy.

We don’t have a generic way to check object shapes, for everything TypeScript we trust the type system, but it does come up in two places:

  • editor configuration, where so few nested objects are used we just built specific validation for them
  • real-time collaboration, where we used a combination of protobuf bindings and atdgen.

We do have object equality code but I don’t think that’s really what you’re looking for 🤔

My interest is in the use cases you found. And the reply answers it perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants