Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ocapn ByteArray as a passStyleOf recognized immutable byte array. #1331

Open
kriskowal opened this issue Oct 18, 2022 · 31 comments · May be fixed by #1538 or #2311
Open

Implement ocapn ByteArray as a passStyleOf recognized immutable byte array. #1331

kriskowal opened this issue Oct 18, 2022 · 31 comments · May be fixed by #1538 or #2311
Assignees
Labels
enhancement New feature or request kriskowal-review-2024-01 Issues that kriskowal wants to bring to the attention of the team for review as of January, 2024

Comments

@kriskowal
Copy link
Member

Our current best solution for carrying bytes is a Uint8Array, which we use pervasively for hardened streams. This is one of the reasons that we treat Typed Arrays like Map for the purposes of hardening. The content is mutable but the API surface is tamper-proof. However, the content is readable and writable by both the sender and receiver over time, so like a Map, TypedArrays are not passable. That makes them a poor foundation for remote streams in an ocap discipline.

I propose that we need an analogous solution to the CopyMap and MapStore for transmission of immutable byte strings, tentatively CopyBytes and BytesStore, which would be passable and provide a firmer foundation for streams (async iterators of byte strings). CopyBytes should mesh well with TypedArrays, particularly Uint8Array and ArrayBuffer. To that end, it may be necessary for them to be Proxies, such that uint8Array.set(copyBytes) is sensible. Constructing an immutable copy of the content of an array-like should be possible, like copyBytes(uint8Array) or makeCopyBytes([0, 1, 2, …, 127]). CopyBytes would be backed by a TypedArray but would attenuate mutability. Slicing CopyBytes should be trivial. As with glyph strings, ropes are a possible optimization in the fullness of time.

As a critical optimization, we often use ring buffers for byte streams. We could conceivably also support another type for revocable copy bytes, that would permit a stream to reclaim the underlying byte range when the consumer signals it’s ready for another subrange.

@kriskowal kriskowal added the enhancement New feature or request label Oct 18, 2022
@kriskowal
Copy link
Member Author

attn @erights @gibson042 @mhofman

@mhofman
Copy link
Contributor

mhofman commented Oct 18, 2022

I'm extremely concerned by the performance overhead of such a user-land abstraction. We should pursue a standard approach which allows engines to avoid array buffer copies, and minimal overhead on read access.

One such proposal at stage 1 is @Jack-Works's Limited ArrayBuffer

@kriskowal
Copy link
Member Author

kriskowal commented Oct 18, 2022

Shimming a standards track solution would be good.

Ah, memories. (My abandoned hobbies include a decade old fantasy of JavaScript having byte strings.)

@erights
Copy link
Contributor

erights commented Oct 18, 2022

Attn @phoddie

@phoddie
Copy link

phoddie commented Oct 18, 2022

If you'd like an efficient read-only ArrayBuffer, that code is already in xst as part of our Secure ECMAScript / Hardened JavaScript work . This fragment...

const bytes = Uint8Array.of(1, 2, 3);
petrify(bytes.buffer);
bytes[2] = 1;

...throws TypeError: ?: read-only buffer.

Yes, a fully standard solution would be welcome.

@kriskowal
Copy link
Member Author

A petrified buffer would certainly be eligible for passing over a CapTP.

Shimming petrify I think would require us to replace the TypedArray constructors and presenting all typed arrays as proxies, such that we could emulate petrification of a TypedArray with an ArrayBuffer shared by other TypedArrays.

@phoddie
Copy link

phoddie commented Oct 18, 2022

This may be what you mean, but... I wouldn't suggest shimming petrify as it is extremely broad & general. Instead, have a focused way to make an ArrayBuffer read-only. The implementation of that when running on XS will petrify instead of Proxy and such and so avoid oodles of runtime overhead.

@erights
Copy link
Contributor

erights commented Oct 18, 2022

To shim making an ArrayBuffer itself read-only, we'd need a way to actually make an ArrayBuffer read-only in place. The only thing we know how to shim would be something that returns what seems to be a new read-only ArrayBuffer.

Same for a TypedArray.

@kriskowal
Copy link
Member Author

kriskowal commented Oct 18, 2022

Aye, I lay it out like this because oodles of runtime overhead might not be acceptable, even in an intermediate step toward a standard. But if petrify had standards ambitions, we could shim the intersection of petrify and TypedArrays, throwing for anything we can’t petrify. Or we could anticipate and shim a world where TypedArray can be frozen. Either of these strategies would require us to replace the TypedArray and ArrayBuffer constructors such that they return proxies and deny all programs thereafter direct access to TypedArray and ArrayBuffer instances, such that freezing one façade of an ArrayBuffer would freeze all the façades of the same ArrayBuffer.

The web platform’s solution to this problem is opaque Blob objects. If you want to see the contents, you have to copy them into a TypedArray. We could certainly harden and make blobs passable too without doing anything to TypedArrays.

@phoddie
Copy link

phoddie commented Oct 18, 2022

To shim making an ArrayBuffer itself read-only, we'd need a way to actually make an ArrayBuffer read-only in place. The only thing we know how to shim would be something that returns what seems to be a new read-only ArrayBuffer.

Y'all are the experts in patching things into a web-ish runtime. My goal is to avoid an ugly optimization problem around Proxy in XS when it already supports read-only buffers efficiently. If it is a problem that petrify does too much here, the functionality can easily be repackaged in a call that is focused on the needs of this particular problem.

Maybe side-stepping the problem with Blob is pragmatic, though there's more data copying. Perhaps implementing Blob with a petrified ArrayBuffer would allow Blob.prototype.arrayBuffer() to be almost free?

@erights
Copy link
Contributor

erights commented Oct 18, 2022

replace the TypedArray and ArrayBuffer constructors

Doh! I'm embarrassed that I didn't think of this. Thanks!

Ok, I just read about Blobs at https://developer.mozilla.org/en-US/docs/Web/API/Blob . Aside from their non-JS stream() method, which we can omit from our whitelist, they seem like perfectly fine forms of CopyBytes. When we do add CopyBytes to our system, we could call it CopyBlob instead, and have passStyleOf certify that a Blob is a CopyBytes when it is hardened and has no surprising properties.

This would take the pressure off of needing to shim making an ArrayBuffer or TypedArray read-only in place.

@erights
Copy link
Contributor

erights commented Oct 18, 2022

Maybe side-stepping the problem with Blob is pragmatic, though there's more data copying. Perhaps implementing Blob with a petrified ArrayBuffer would allow Blob.prototype.arrayBuffer() to be almost free?

I like that!

@kriskowal
Copy link
Member Author

Maybe side-stepping the problem with Blob is pragmatic, though there's more data copying. Perhaps implementing Blob with a petrified ArrayBuffer would allow Blob.prototype.arrayBuffer() to be almost free?

I think the expectation of Blob.prototype.arrayBuffer() is that it would produce a mutable array buffer. It could be nearly-free if ArrayBuffers have a copy-on-write optimization.

@phoddie
Copy link

phoddie commented Oct 18, 2022

Yea, I had the same thought walking back from coffee just now. Alas.

@mhofman
Copy link
Contributor

mhofman commented Oct 19, 2022

We could add a non standard option bag to arrayBuffer({readonly: true}), but that's getting kinda verbose for what would likely be the default case in a Hardened JS environment. I honestly doubt much code would break if the default was a readonly buffer, but still.

I actually think copy on write array buffers would be a good general solution on top of being a nice optimization. First it would likely apply to non Blob use cases. For example doing an arrayBuffer.slice() could return a copy on write buffer (making the original arrayBuffer copy on write as well obviously). At that point we could make construction of a Blob similarly copy free with a copy on write guard on the original buffer, aka a Blob could be a user-land concept that simply does the following (simplified to single array buffer input):

class Blob {
  constructor([inputBuffer]) {
    this.#buffer = inputBuffer.slice(0);
  }

  async arrayBuffer() {
    return this.#buffer.slice(0);
  }
}

@mhofman
Copy link
Contributor

mhofman commented Nov 29, 2022

I mentioned Copy-on-Write in the context of transfer() in the November Plenary. I remain convinced that CoW for AB would solve a lot of performance issues, at the expense I understand of implementation complexity.

@erights
Copy link
Contributor

erights commented Apr 13, 2023

See #1538

@dckc
Copy link
Collaborator

dckc commented May 24, 2023

Interesting bit of related work: Hex template literal | DeviceScript

example:

const buf: Buffer = hex`00 ab 12 2f 00`

h/t @turadg

@phoddie
Copy link

phoddie commented May 24, 2023

FWIW - the Moddable SDK has long done something like that for UUIDs to simplify BLE code. This generates an ArrayBuffer with the bytes of the UUID:

const CHARACTERISTIC_UUID = uuid`9CF53570-DDD9-47F3-BA63-09ACEFC60415`;

@dckc
Copy link
Collaborator

dckc commented Nov 20, 2023

motivation: I was trying to do an isolated signer as an endo app, but I hit:

Error#2: Cannot pass mutable typed arrays like Uint8Array(33) [...]

So I had to put the signer and the network client in the same vat. :-/

hdWallet.js 0130695
dckc/finquick#64

@kriskowal kriskowal added the kriskowal-review-2024-01 Issues that kriskowal wants to bring to the attention of the team for review as of January, 2024 label Jan 8, 2024
@kriskowal
Copy link
Member Author

My latest batch of thoughts on this problem:

We need a solution that is a good balance between:

  • performance, no unnecessary copying
  • eval twins: supports multiple instances of endo modules in a shared realm
  • something I’m not remembering right now

We might have to take a page from the promise.then distinguished method trick, but we can use a symbol-named method. This would allow eval twins to assimilate CopyBytes and still keep a private WeakMap of known-to-be-unmodifiable ArrayBuffers. Consider this sketch:

const copyBytesSymbol = Symbol.for('copyBytesInto');
const fixedBytes = new WeakMap();

const copyBytesPrototype = {
  [copyBytesSymbol](target) {
    target.set(fixedBytes.get(this), 0);
  }
};

const makeCopyBytes = array => {
  let copyBytes = fixedBytes.get(array);
  if (copyBytes !== undefined) {
    return copyBytes;
  }
  copyBytes = Object.create(copyBytesPrototype);
  fixedBytes.set(copyBytes, fromByteArray(array)); // XXX
  return harden(copyBytes);
};

const copyBytesInto = (target, source) => {
  const internal = fixedBytes.get(source);
  if (internal !== undefined) {
    target.set(source, 0);
  } else if (source[copyBytesSymbol] !== undefined) {
    assert.typeof(source[copyBytesSymbol], 'function');
    source[copyBytesSymbol](target)
  }
};

@erights
Copy link
Contributor

erights commented Jun 3, 2024

What "promise.then distinguished method trick"?

@kriskowal
Copy link
Member Author

What "promise.then distinguished method trick"?

Using a distinguished method like "then" to recognize an object that an eval-twin can assimilate into its own domain by invocation of that method. One could call it the “Thenable Pattern”.

@erights
Copy link
Contributor

erights commented Jun 3, 2024

The eval twin problem explained at #1583 seems fatal to having passStyleOf certify as passable any object with non-builtin methods. That's why all our passable types are operated on by functions rather than having their own methods. (Attn @mhofman you know why ;) )

@kriskowal
Copy link
Member Author

So Blob is an eligible representation because all its methods are builtins on Node.js and the web platform, and we can Moddable to implement Blob on XS?

const blobArrayBuffer = uncurryThis(Blob.prototype.arrayBuffer);
const arrayBuffer = await blobArrayBuffer(new Blob(['hello, world']))

@phoddie
Copy link

phoddie commented Jun 4, 2024

We've avoided implementing Blob for a very long time. I wouldn't be sad for that to extend indefinitely. At least you aren't asking about Buffer.

That said, iIt should mostly be possible, with the exception of stream(). The API surface area isn't that big. We'd need to be thorough in testing to ensure the API is being faithfully emulated.

@erights
Copy link
Contributor

erights commented Jun 5, 2024

Ok, I tried learning about Blob by writing a minimal toy shim and adding it to #1538 at src/ses/blob-shim.js . Compare with the pass-style/src/byteArray.js in the same PR. Both encapsulate an actual ArrayBuffer to hold their data. I don’t love either. But some problems I see with Blob:

  • There is no way to get the contents out synchronously. The only ways to get the contents out are arrayBuffer() and text() , both of which return a promise for the contents in the requested form.
  • whitelisting Blob as part of the, so to speak, language, to avoid the eval-twin problem would bring in, via the stream() method, ReadableStream. To avoid whitelisting ReadableStream we’d omit stream(), making this different than the “standard” Blob.
  • Blobs have a mime type string. ocapn ByteArrays do not. Clearly not a fatal problem, but whatever we do here will surprise someone.

The ByteArray class at pass-style/src/byteArray.js is effectively an immutable subclass of ArrayBuffer . The ByteArray.slice instance method returns an actual ArrayBuffer, thereby getting the contents out synchronously. Because it doesn’t add semantics to ArrayBuffer, it has no distracting mime type.

The problem holding ByteArray back is the eval twin problem. If we’re willing to consider whitelisting Blob as a ses builtin, then we should discuss whether we could whitelist ByteArray (under some name) similarly, avoiding the same eval twin problem. If we do, we should also consider proposing such an immutable variant of ArrayBuffer to tc39. We cannot propose Blob to tc39 because of stream(). An immutable variant of ArrayBuffer has no such problem.

@kriskowal
Copy link
Member Author

The eval twin problem explained at #1583 seems fatal to having passStyleOf certify as passable any object with non-builtin methods. That's why all our passable types are operated on by functions rather than having their own methods. (Attn @mhofman you know why ;) )

To be clear, I’m not proposing that passStyleOf would certify an object as being pass-style=CopyBytes, with methods that are safe to use, but suggesting that we (maybe) could use a distinguished method to convert it into a handle to the immutable content it stored at the time it was first recognized by passStyleOf and thereafter treated as an opaque handle to that data.

To that end, maybe we could even adopt a TypedArray, which we could at least capture by built-in methods.

@erights
Copy link
Contributor

erights commented Jun 5, 2024

For a new data type with a new method, I see only these choices:

  • it cannot be passable, because passStyleOf cannot certify it due to the eval twin problem. This prevents us from extending marshal from unserializing an ocapn ByteArray to anything in place, which is a non-starter.
  • we adopt the new data type as an honorable builtin at the ses level, adding it to the whitelisted shared globals
  • we ensure the absence of eval twins. I think we need to do this long term. But after talking to @kriskowal about it, I'm convinced the term is too long to block ocap conformance on.

This leaves us with the adoption option. I don't see any other choice.

I don't see a way to do an ocapn ByteArray as a Passable classified as such by passStyleOf without introducing a new data type with a new method, since all the existing language builtins for binary data cannot be made immutable in place. The ByteArray at pass-style/src/byteArray.js inherits from ArrayBuffer.prototype and tries to act like an immutable variant/subclass of ByteArray, by overriding only the query-only methods to work. We could do likewise for an immutable variant/subclass of UInt8Array. But in both cases, there's no way to enable the method as inherited from the shared super prototype to work, unless we to a ses-initialization-time repair to replace the query only methods with ones that detect if they're applied to one of these immutable variants.

The problem with starting with UInt8Array instead is indexed property access. The only way to support this on an immutable UInt8Array variant is to make it a proxy, which I do not want to do just to get to ocapn conformance.

So, I'd like to start by proposing the ByteArray at pass-style/src/byteArray.js as a candidate to move into ses as an honorary new whitelisted shared-global class.

@erights erights changed the title CopyBytes, BytesStore implement ocapn ByteArray as a passStyleOf recognized immutable byte array. Jun 5, 2024
@erights erights changed the title implement ocapn ByteArray as a passStyleOf recognized immutable byte array. Implement ocapn ByteArray as a passStyleOf recognized immutable byte array. Jun 5, 2024
@erights
Copy link
Contributor

erights commented Jun 5, 2024

I changed the title for clarity. But if it is too narrow for what this issue is intended for, change it back and we can start a separate narrower issue.

@mhofman
Copy link
Contributor

mhofman commented Jun 6, 2024

I don't think we should adopt the Blob API wholesale. As stated it has major drawbacks like being asynchronous only, entraining the complex Web Streams API, and carrying mime type meta information.

The ByteArray defined at pass-style/src/byteArray.js is roughly what I had in mind for a Blob-like object that internally holds the binary data, with a method to get a copy of said binary data as a normal ArrayBuffer (or maybe Uint8Array). Where I think that ByteArray is mistaken is to pretend it is a an ArrayBuffer by inheriting from its prototype. It should just wrap the binary data, provide the slice method and byteLength property, and that's it.

My understanding of the problem in #1583 is that we can't trust an object with methods to behave as inert copy data unless we can recognize it, and that we can't recognize objects because of eval twins. I would however say we're already in a world where we need to recognize objects: for plain objects and arrays, unless we have a predicate to test for exotic proxy behavior, we shouldn't trust these either to be copy passable.

Maybe we can take the approach of promises and add a static method (or even just rely on the constructor, if we drop validation) to ByteArray that adopts another ByteArray? And if the argument is already a recognized ByteArray it returns it?

const bytes = Uint8Array.from([1,2,3,4,5,6,7,8]);
const ba1 = ByteArray(bytes);
const ba1Bytes = ba1.slice();
assert(ba1Bytes !== bytes); // However the contents are the same
const ba2 = ByteArray(ba1);
assert(ba2 === ba1); 

That of course assumes like promises that there is a "genuine" / intrinsic ByteArray, but I think it's fine for passStyle installed as a sort of trusted shim to define this. We already need to switch to a more centralized passStyleOf anyway because of the problem of proxy recognition. But I don't think we automatically need to move ByteArray into the ses shim layer, passStyleOf can still lookup a global ByteArray and use it to validate objects that claim to be that type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request kriskowal-review-2024-01 Issues that kriskowal wants to bring to the attention of the team for review as of January, 2024
Projects
None yet
6 participants