Hide internal properties from Ecmascript code #979

svaarala · 2016-09-28T23:25:07Z

Draft of an approach where internal properties are hidden from user Ecmascript code even when the correct (internal) key is used. Internal properties can only be accessed using the C API, which should fulfill sandboxing requirements for protecting the internal properties securely.

Related potential change: remove global and thread stashes, because internal properties on the global object and the thread object can be used instead (and will be inaccessible from Ecmascript code)

svaarala · 2016-09-29T01:42:23Z

This prototype branch is now rebased but it's probably easiest to first figure out ES6 Symbol approach #980 and merge in basic Symbol support first.

fatcerberus · 2016-09-29T01:47:27Z

+1 for Symbol support. :)

fatcerberus · 2016-09-29T03:59:07Z

Following up on #980 (comment), it might make sense, if this is not already the case, to add a compiler option which would disable String.fromBuffer() and similar sandboxing-unfriendly functions. Of course then we get into the can of worms of what is a potential sandboxing issue and what isn't...

That said, Duktape implements Node.js Buffer, which has a native "buffer to string" primitive (Buffer#toString()) as part of its API contract. So maybe this wouldn't solve the issue after all...

svaarala · 2016-09-29T04:08:10Z

Buffer.toString() is actually not an issue if it handles decoding correctly; the result would always be a standard string which would never be confused with an internal one. However, the current Buffer.toString() is 1:1 from the buffer to the internal string representation which is an issue (and further exposes the internal string representation to user code).

Adding some kind of "sandboxing safe" option would be possible but it's maybe better implemented as a configure.py profile because there may be multiple features that need to be controlled. But even when doing so, every C binding written by the user potentially allows a 1:1 buffer-to-string conversion by accident, so this pull - or some other solution - is still necessary for better sandboxing.

One possible alternative solution would be an actual symbol type, and make the internal properties symbols that cannot be created via buffers. That would definitely be a more clean solution conceptually, but I'm not sure if footprint agrees ;)

fatcerberus · 2016-09-29T04:10:12Z

I do wonder how much of a footprint issue a new tagged type would actually be, considering all the special casing that now needs to be added to string handling... ;-)

svaarala · 2016-09-29T04:11:20Z

There isn't much in the way of special casing: handling of internal strings is already in place. Symbols deviate from that only in a few locations.

svaarala · 2016-09-29T04:13:03Z

Also just a tagged type wouldn't actually be enough: object property table keys are a list of untagged duk_hstring pointers. That space would need to be expanded one way or another, either using a bit in the 8-bit per-property flags field or expanding the key to include a flag somehow (awkward). There are most likely other places where a similar issue exists.

svaarala · 2016-09-29T04:18:10Z

My rough guess would be that a separate tagged type would be around 4-5kB larger: it would affect internals here and there for probably around 1-2kB, and it would require a new type tag in the API with all the associated API calls. That's of course just an informed guess :) For the stripped build it would mean roughly a 4% increase of footprint which is not huge but still quite large considering the RegExp engine is less than 10kB total.

It really is an honest design trade-off with several viable choices. I tend to favor low footprint choices because footprint caused by code structures is very difficult to rein in.

fatcerberus · 2016-09-29T04:21:20Z

Hey, as long as Ecmascript compliance is honored, I'm happy with whatever implementation you decide is best. :)

svaarala · 2016-09-29T12:46:35Z

One middle-of-the-way implementation approach I've toyed around:

Allow two duk_hstrings with the exact same byte sequence but different "internal" flag (now "symbol") to exist. String interning would treat them differently based on the flag.
Because property code ultimately looks up a property using the duk_hstring * reference, this would allow a plain \xFFfoo to have a different property slot from \xFFfoo with a symbol flag set. In essence, they'd be separate keys despite having the same byte representation.
A buffer-to-string operation would only create plain strings. Similarly for duk_push_string() etc. There'd be a separate duk_push_symbol() to create duk_hstrings with a certain byte sequence and the internal symbol flag set.

The upside of this that a new tagged or API type wouldn't needed but symbols and strings are still entirely separate and you can't create them even via custom buffer operations. Footprint-wise this would work very well.

For C code it'd be a little bit awkward. In general as a C coder I'd prefer strings and symbols to be strings with no internal NULs. This would allow me to pass them around as const char *s, to use C literals for them in #defines etc. In this approach there wouldn't be a new API type for symbols, but separate calls would still be needed for creating symbols. Also if you wanted to use duk_get_prop_string() you couldn't use that with a symbol (the implicitly interned string would be of the "plain" variety) so another binding would be needed for symbols.

This is a downside common with any approach introducing an actual symbol type though. On the other hand it can also be considered an upside because then also C code won't accidentaly mix property and symbol lookups. That can be achieved in other ways too, e.g. with this pull, duk_get_prop_string() and friends could be changed so that they never operate on symbol strings (just like Ecmascript code). One would then need a few calls for "raw" property access which would allow internal keys.

fatcerberus · 2016-09-29T14:32:12Z

On the other hand, C code may want guaranteed unique symbols, and in that case it would have to get a symbol through the API (the equivalent of calling Symbol("name") in Ecmascript) either way. This would even be a natural thing to do, so that Ecmascript code using Symbol.for() can't accidentally overwrite the host's control information.

svaarala · 2016-09-29T15:39:22Z

Sure it'd be useful to have an API to create a unique symbol. But it could then behave like any other string from the API perspective. Or not, depending on which approach is used.

fatcerberus · 2016-09-29T15:44:40Z

Of course, I was just pointing out that that's a weak argument in favor of "symbols == strings" because the C code may want to create unique symbols through the API in either case. There are still other benefits, of course.

fatcerberus · 2016-09-29T15:45:59Z

By the way, I'm not always opposing you with my arguments, sometimes I just like to play devil's advocate. :)

svaarala · 2016-09-29T15:47:55Z

Yeah, but may main concern with the API is not really just creation of the symbols - but for example:

The ability to use plain C strings as symbols (e.g. #define MY_SYMBOL ("\xa0" "foo")) which isn't possible if they're distinct types,
The duplication of all string helpers, e.g. duk_get_prop_string() needs a duk_get_prop_symbol() variant, same for duk_get_global_string(), etc. Also all pushers, type checkers, etc will spawn a bunch of new calls.

Symbol creation is by far the smallest concern :) And it'd actually be a useful API even now: it would allow hiding the \xFF prefix from user code that didn't want to specifically deal with it.

fatcerberus · 2016-09-29T15:51:03Z

Agreed, those were the "other benefits" I mentioned. :)

svaarala · 2016-09-29T15:51:18Z

Just as a side note, in this example:

#define MY_SYMBOL ("\xa0" "fooSymbol")

one could:

#define MY_SYMBOL DUK_MAKE_SYMBOL("fooSymbol")

Similarly, for existing internal properties:

#define MY_INT_PROP DUK_MAKE_INTPROP("fooBar")  /* -> "\xFF" "fooBar" */

This would hide the concrete prefix from user code, and would also avoid any potential issues with hex escape ambiguity (which makes it necessary to define the string in parts). This wouldn't need to be explained to users.

Draft of an approach where internal properties are hidden from user Ecmascript code even when the correct (internal) key is used. Internal properties can only be accessed using the C API, which should fulfill sandboxing requirements for protecting the internal properties securely.

svaarala · 2016-12-19T19:46:24Z

I'll drop this from 2.0.0 until the symbol typing issues have been resolved.

svaarala added this to the v2.0.0 milestone Sep 28, 2016

svaarala force-pushed the hide-internal-properties branch from 5ee08a1 to e0eed06 Compare September 28, 2016 23:38

svaarala mentioned this pull request Sep 29, 2016

Add documentation for ES6 Symbol implementation #980

Merged

svaarala mentioned this pull request Sep 29, 2016

Add ES6 Symbol support for Duktape 2.x #981

Closed

12 tasks

svaarala mentioned this pull request Sep 30, 2016

Implement WHATWG Encoding API support #975

Merged

7 tasks

svaarala mentioned this pull request Oct 10, 2016

Remove default bindings converting a buffer 1:1 to an internal string representation #1005

Closed

11 tasks

This was referenced Oct 20, 2016

Add internal support for [[Get]]/[[Set]] receiver #1027

Closed

Implement Reflect built-in (ES6+) #1025

Merged

svaarala mentioned this pull request Nov 2, 2016

Remove internal support for codepoints above U+10FFFF #983

Open

8 tasks

svaarala force-pushed the hide-internal-properties branch from e0eed06 to 855c36f Compare November 16, 2016 19:12

svaarala mentioned this pull request Nov 28, 2016

Add initial ES6 symbol support #982

Merged

10 tasks

svaarala removed this from the v2.0.0 milestone Dec 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hide internal properties from Ecmascript code #979

Hide internal properties from Ecmascript code #979

svaarala commented Sep 28, 2016 •

edited

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

fatcerberus commented Sep 29, 2016 •

edited

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

svaarala commented Sep 29, 2016

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

svaarala commented Dec 19, 2016

Hide internal properties from Ecmascript code #979

Are you sure you want to change the base?

Hide internal properties from Ecmascript code #979

Conversation

svaarala commented Sep 28, 2016 • edited

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

fatcerberus commented Sep 29, 2016 • edited

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

svaarala commented Sep 29, 2016

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

fatcerberus commented Sep 29, 2016

svaarala commented Sep 29, 2016

svaarala commented Dec 19, 2016

svaarala commented Sep 28, 2016 •

edited

fatcerberus commented Sep 29, 2016 •

edited