Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide internal properties from Ecmascript code #979

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

svaarala
Copy link
Owner

@svaarala svaarala commented Sep 28, 2016

Draft of an approach where internal properties are hidden from user Ecmascript code even when the correct (internal) key is used. Internal properties can only be accessed using the C API, which should fulfill sandboxing requirements for protecting the internal properties securely.

  • Related potential change: remove global and thread stashes, because internal properties on the global object and the thread object can be used instead (and will be inaccessible from Ecmascript code)

@svaarala
Copy link
Owner Author

This prototype branch is now rebased but it's probably easiest to first figure out ES6 Symbol approach #980 and merge in basic Symbol support first.

@fatcerberus
Copy link
Contributor

+1 for Symbol support. :)

@fatcerberus
Copy link
Contributor

fatcerberus commented Sep 29, 2016

Following up on #980 (comment), it might make sense, if this is not already the case, to add a compiler option which would disable String.fromBuffer() and similar sandboxing-unfriendly functions. Of course then we get into the can of worms of what is a potential sandboxing issue and what isn't...

That said, Duktape implements Node.js Buffer, which has a native "buffer to string" primitive (Buffer#toString()) as part of its API contract. So maybe this wouldn't solve the issue after all...

@svaarala
Copy link
Owner Author

Buffer.toString() is actually not an issue if it handles decoding correctly; the result would always be a standard string which would never be confused with an internal one. However, the current Buffer.toString() is 1:1 from the buffer to the internal string representation which is an issue (and further exposes the internal string representation to user code).

Adding some kind of "sandboxing safe" option would be possible but it's maybe better implemented as a configure.py profile because there may be multiple features that need to be controlled. But even when doing so, every C binding written by the user potentially allows a 1:1 buffer-to-string conversion by accident, so this pull - or some other solution - is still necessary for better sandboxing.

One possible alternative solution would be an actual symbol type, and make the internal properties symbols that cannot be created via buffers. That would definitely be a more clean solution conceptually, but I'm not sure if footprint agrees ;)

@fatcerberus
Copy link
Contributor

I do wonder how much of a footprint issue a new tagged type would actually be, considering all the special casing that now needs to be added to string handling... ;-)

@svaarala
Copy link
Owner Author

There isn't much in the way of special casing: handling of internal strings is already in place. Symbols deviate from that only in a few locations.

@svaarala
Copy link
Owner Author

Also just a tagged type wouldn't actually be enough: object property table keys are a list of untagged duk_hstring pointers. That space would need to be expanded one way or another, either using a bit in the 8-bit per-property flags field or expanding the key to include a flag somehow (awkward). There are most likely other places where a similar issue exists.

@svaarala
Copy link
Owner Author

My rough guess would be that a separate tagged type would be around 4-5kB larger: it would affect internals here and there for probably around 1-2kB, and it would require a new type tag in the API with all the associated API calls. That's of course just an informed guess :) For the stripped build it would mean roughly a 4% increase of footprint which is not huge but still quite large considering the RegExp engine is less than 10kB total.

It really is an honest design trade-off with several viable choices. I tend to favor low footprint choices because footprint caused by code structures is very difficult to rein in.

@fatcerberus
Copy link
Contributor

Hey, as long as Ecmascript compliance is honored, I'm happy with whatever implementation you decide is best. :)

@svaarala
Copy link
Owner Author

One middle-of-the-way implementation approach I've toyed around:

  • Allow two duk_hstrings with the exact same byte sequence but different "internal" flag (now "symbol") to exist. String interning would treat them differently based on the flag.
  • Because property code ultimately looks up a property using the duk_hstring * reference, this would allow a plain \xFFfoo to have a different property slot from \xFFfoo with a symbol flag set. In essence, they'd be separate keys despite having the same byte representation.
  • A buffer-to-string operation would only create plain strings. Similarly for duk_push_string() etc. There'd be a separate duk_push_symbol() to create duk_hstrings with a certain byte sequence and the internal symbol flag set.

The upside of this that a new tagged or API type wouldn't needed but symbols and strings are still entirely separate and you can't create them even via custom buffer operations. Footprint-wise this would work very well.

For C code it'd be a little bit awkward. In general as a C coder I'd prefer strings and symbols to be strings with no internal NULs. This would allow me to pass them around as const char *s, to use C literals for them in #defines etc. In this approach there wouldn't be a new API type for symbols, but separate calls would still be needed for creating symbols. Also if you wanted to use duk_get_prop_string() you couldn't use that with a symbol (the implicitly interned string would be of the "plain" variety) so another binding would be needed for symbols.

This is a downside common with any approach introducing an actual symbol type though. On the other hand it can also be considered an upside because then also C code won't accidentaly mix property and symbol lookups. That can be achieved in other ways too, e.g. with this pull, duk_get_prop_string() and friends could be changed so that they never operate on symbol strings (just like Ecmascript code). One would then need a few calls for "raw" property access which would allow internal keys.

@fatcerberus
Copy link
Contributor

On the other hand, C code may want guaranteed unique symbols, and in that case it would have to get a symbol through the API (the equivalent of calling Symbol("name") in Ecmascript) either way. This would even be a natural thing to do, so that Ecmascript code using Symbol.for() can't accidentally overwrite the host's control information.

@svaarala
Copy link
Owner Author

Sure it'd be useful to have an API to create a unique symbol. But it could then behave like any other string from the API perspective. Or not, depending on which approach is used.

@fatcerberus
Copy link
Contributor

Of course, I was just pointing out that that's a weak argument in favor of "symbols == strings" because the C code may want to create unique symbols through the API in either case. There are still other benefits, of course.

@fatcerberus
Copy link
Contributor

By the way, I'm not always opposing you with my arguments, sometimes I just like to play devil's advocate. :)

@svaarala
Copy link
Owner Author

Yeah, but may main concern with the API is not really just creation of the symbols - but for example:

  • The ability to use plain C strings as symbols (e.g. #define MY_SYMBOL ("\xa0" "foo")) which isn't possible if they're distinct types,
  • The duplication of all string helpers, e.g. duk_get_prop_string() needs a duk_get_prop_symbol() variant, same for duk_get_global_string(), etc. Also all pushers, type checkers, etc will spawn a bunch of new calls.

Symbol creation is by far the smallest concern :) And it'd actually be a useful API even now: it would allow hiding the \xFF prefix from user code that didn't want to specifically deal with it.

@fatcerberus
Copy link
Contributor

Agreed, those were the "other benefits" I mentioned. :)

@svaarala
Copy link
Owner Author

Just as a side note, in this example:

#define MY_SYMBOL ("\xa0" "fooSymbol")

one could:

#define MY_SYMBOL DUK_MAKE_SYMBOL("fooSymbol")

Similarly, for existing internal properties:

#define MY_INT_PROP DUK_MAKE_INTPROP("fooBar")  /* -> "\xFF" "fooBar" */

This would hide the concrete prefix from user code, and would also avoid any potential issues with hex escape ambiguity (which makes it necessary to define the string in parts). This wouldn't need to be explained to users.

Draft of an approach where internal properties are hidden from user
Ecmascript code even when the correct (internal) key is used.
Internal properties can only be accessed using the C API, which
should fulfill sandboxing requirements for protecting the internal
properties securely.
@svaarala
Copy link
Owner Author

I'll drop this from 2.0.0 until the symbol typing issues have been resolved.

@svaarala svaarala removed this from the v2.0.0 milestone Dec 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants