Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenges with ES6 symbols #2012

Closed
JsonFreeman opened this issue Feb 11, 2015 · 12 comments · Fixed by #15473
Closed

Challenges with ES6 symbols #2012

JsonFreeman opened this issue Feb 11, 2015 · 12 comments · Fixed by #15473
Labels
Discussion Issues which may not have code impact

Comments

@JsonFreeman
Copy link
Contributor

I'd like to outline some of the thoughts the team has had about supporting custom ES6 symbols. This is focused on the proposals we have come up with, as well as the challenges associated with them.

The motivation is something like this:

var s = Symbol('hello');
var obj = {
   [s]: ""
};
obj[s]; // gives type string

We have had two genres of proposal here, the entity approach and the type approach.

  1. Entity approach: Use the compiler's entities as an approximation for the symbol value. In the above example, we would say that the variable s is the entity that the property key [s] represents. Any key referring to s is the same key, and any other way of referring to the value of Symbol('hello') is different.
  2. Type approach: Creating a symbol manufactures a type that uniquely represents that symbol value. This would involve introducing a new kind of type constructor. It is potentially far more complex than the entity approach, but in principle it would be more accurate.

There are 4 general challenges with these approaches:

  1. Symbol aliasing: If you have two variables pointing at the same symbol, you want them to be the same. This is a drawback for the entity approach, but the type approach gets this right.
  2. Symbol reassignment: If you have one variable that gets assigned two different symbols at different times, you want them to be different. This is a drawback for both approaches.
  3. Solidifying symbol types: If there is a way to specify that something manufactures a new symbol type, there must also be a mechanism for what happens when the symbol gets assigned to something (solidified). Drawback for the type approach.
  4. The compiler is architected so that the properties of a type must be determined before analyzing relationships among types. With symbols, figuring out what properties a type has might require analyzing complex relationships among types. This can lead to circularities. This is a drawback for both approaches, but is more severe for the type approach, and the severity tends to increase the more general the feature support is.

Issues 1, 2, and 3 are really just specific cases of a more general problem. Namely, there is no 1-to-1 correspondence between static entities in the code (declarations, expressions) and runtime values. And because symbol identity is based on runtime values, there needs to be a 1-to-1 correspondence with static entities.

I think the best antidote for these issues is to limit the support for symbol properties to situations where we know there is a 1-to-1 correspondence between values and static entities. There are two general approaches we can explore, which share this 1-to-1 philosophy. They both use the entity approach mentioned above as a base:

  1. Support symbols exactly as a user would write them in ES6. This involves identifying usages where this 1-to-1 correspondence would hold, and only supporting symbols in those contexts. For instance, we might give special preference to const declarations, or disallow using symbols inside a loop or across a function boundary.
  2. Add TypeScript syntax (perhaps a new kind of declaration), for which we enforce/ensure that the correspondence is 1-to-1, and recommend that users use symbols in the way prescribed by TypeScript with these new forms.

Obviously, neither of these are complete proposals, but this is intended to start a discussion directed at finding a solution that is both feasible and would satisfy users' needs.

@JsonFreeman JsonFreeman added the Discussion Issues which may not have code impact label Feb 11, 2015
@rbuckton
Copy link
Member

The entity approach may work better if we only support const declarations for symbols, as we can guarantee the declaration doesn't change, and treat any assignment of the symbol to another const declaration as the same symbol:

const s = Symbol('hello');
var obj = {
  [s]: ""
};
const s1 = s;
obj[s1]; // gives type string

@yortus
Copy link
Contributor

yortus commented Feb 12, 2015

@JsonFreeman to my mind, the challenge of ES6 symbol typing is analogous to the that of string literal types (discussed in #1003), both cases being useful examples of the general concept of 'singleton types'.

In both cases, you have literal values that could usefully be treated as types by the compiler, for things like checking assignment compatibility, narrowing types in type guards, inferring types, etc. And in both cases you have the problem of the runtime system providing many ways for running code to violate the assumptions that the compiler is relying on for its checks.

I'd love to see this approached as a special case of 'singleton types' more generally, so that static type checking could be enhanced with things like ES6 symbols (eg for more accurate indexer typing), const literal strings (eg for tagged unions), and const enum values (eg for use in type guards).

const seems to be a good way to tell the compiler when it can potentially treat values (such as symbols and string literals) as singleton types. For non-const literal values, the compiler would have to fall back to the literal's more general type (Symbol, string, number, etc). Pity there's nothing as simple as const for marking properties of an object.

@JsonFreeman
Copy link
Contributor Author

@rbuckton, the const constraint is nice, as it eliminates problem 2 (reassignment). I would say that if we did this, you could only use this type if you reference the const binding directly (as an identifier or dotted name), you cannot first assign it to a var and then access the var.

Problems 1 (aliasing) still remains, but to fix that, we might employ singleton types, as @yortus suggests. This would allow a value/type from one const symbol to flow to another const symbol.

I also think it is pretty likely people will have a class/module that contains properties whose values are particular symbols. So we would want to allow this for export const, and possibly even static declarations on a class (although unfortunately there is no way to mark those as const).

We still have to overcome problem 4, which is an implementation strategy issue really.

@yortus, I think you are right at least in theory, and there does seem to be a beautiful parallel. Being able to express the types particular to certain symbols would make the system more powerful. I'd want to see it worked out for strings and const enum members first, as those are better understood.

One complication with the singleton types approach it is very common to create a symbol using Symbol(). Unlike a string literal expression, this expression, even in the same code location, can have a different value. It could be used in a function that is called multiple times, or used in a loop. The const constraint should probably help with that.

You also mention that it's a pity const does not work for properties of an object, but I think in some cases that's a good thing. For example, you wouldn't want to make a property of an interface a specific symbol type, because it doesn't refer to a particular value, it's more just a slot. But for object literals, perhaps you are right.

@yortus
Copy link
Contributor

yortus commented Feb 13, 2015

@JsonFreeman It makes me think of how some type systems (C++, C#, Java for example) provide special treatment for so-called compile-time constants. What each language considers to be a compile-time constant is subject to practical considerations about the kinds of language expressions the compiler is willing/able to evaluate at compile-time.

In this light, one could think of the expression Symbol.iterator as a compile-time constant, tied to a distinct identity value that the compiler can keep track of, whereas the expression Symbol('foo') creates a constant value, but not a compile-time constant, so the compiler would just treat that as having the general type Symbol. (But perhaps const sym = Symbol('foo') could be the way to introduce new compile-time constant symbols).

The same distinction applies to other constant expressions. For example, the expression 'foo' can be treated by the compiler as a compile-time constant and tracked at compile-time, whereas the expression 'f' + 'o' + 'o' creates a constant value but not a compile-time constant (although it could be if tsc had the compile-time evaluation facilities). So:

// Compile-time constants
const sym1 = Symbol.iterator; // sym1 has singleton type with identity Symbol.iterator
const str1 = 'foo'; // str1 has singleton type with identity 'foo'

// Not compile-time constants
var sym2 = Symbol('foo'); // sym2 has type Symbol
var str2 = 'f' + 'o' + 'o'; // str2 has type string

Maybe this distinction would help in tracking the types of expressions across loop iterations and function boundaries. References to compile-time constants can safely preserve their identity in loops and across functions. Everything else falls back to the existing type system, using the more general types like Symbol, string, number, etc.

I have no idea how practical this would be given the existing compiler architecture. In theory at least, it just means having a way to recognise compile-time constants in the form of (a) literals such as 'foo' and Symbol.iterator, and (b) constructions that preserve compile-time identities.

@JsonFreeman
Copy link
Contributor Author

@yortus, yes I like your way of explaining it. Tracking the constant value Symbol.iterator in the compiler could be just like tracking the constant value introduced by a string literal. And if we had a way to express singleton types for string literals, we could also have such type denotations for "symbol literals".

I think treating const sym = Symbol('foo') as a way to introduce a symbol constant is not a bad idea. In terms of your point about constant-preserving expressions, I believe it would be best to start conservative and to throw away the constant tracking for every use of sym, except when used directly as a property key, or assigned to another const. So the following would work:

const sym = Symbol();
const sym2 = sym;
var obj = {
    [sym]: 0
};
obj[sym2]; // type number

But as soon as you need to do anything with sym, other than use it as a property key or reassign it to another const, its type is the generic symbol (btw the keyword is spelled with a lowercase).

TypeScript does actually have constant preserving expressions evaluated at compile time, but not in general. It is only for arithmetic operations in initializers of const enum members. So this infrastructure could in principle be extended to work for string concatenation as you suggest, and const assignment for symbols.

I think this would be a good start, particularly if we can get this working for strings first. Although, I have a feeling people will expect symbols to be tracked in arbitrary object properties. I think that would be infeasible, as properties can be modified in very non-local ways. But the problem is reasonably scoped for const declarations, no pun intended.

@yortus
Copy link
Contributor

yortus commented Feb 13, 2015

@JsonFreeman

Although, I have a feeling people will expect symbols to be tracked in arbitrary object properties. I think that would be infeasible, as properties can be modified in very non-local ways.

Agreed, with the combination of dynamic mutable objects and general control flow, information tying types to specific compile-time constant values is going to be lost very quickly, in all but some very specific circumstances. I guess the question is really about how much mileage the compiler can get from those very specific circumstances. Better type-level reasoning about well-known symbols and tagged unions would alone be huge wins IMHO.

Everything else you've said makes this approach sound promising. What about problem 4 you mentioned up in the OP?

@JsonFreeman
Copy link
Contributor Author

const declarations do not have much risk for circularity, but there is still an architectural issue here. The main problem is that the compiler currently does name binding in one pass, but this would not work in the following case:

function foo() {
    var obj = {
        [s]: 0
    };
}
const s = Symbol();

We would essentially need to bind names breadth first instead of depth first. We would also need to pull the resolveName function out of the checker to make it "typeless" so that the binder could use it. This would be a very disruptive change. Perhaps @ahejlsberg knows more about what would be involved here.

Anders, we are talking about what it would take to support user defined symbols (as well-typed property keys) if the symbols are bound to const declarations. We'd need the binder to have access to resolveName, and we'd need to make the binding breadth first so that the symbols are all available at the right time.

@coreh
Copy link

coreh commented Feb 16, 2015

Hello there, I would like to propose the following way of implementing this feature:

Instead of special casing const s = Symbol(), a new keyword (symbol) and language construct would be introduced. The construct would be declared as:

symbol Foo;

On code generation this would be mapped to ES6's Symbol(). (Just like class is currently mapped to a constructor function and its .prototype, and module is mapped to an object literal.)

Symbols declared with this special syntax would then be statically verified and have strong typing guarantees, while symbols created manually with Symbol() — regardless of whether they're const, let or var — would function like a weakly typed reflection mechanism, and always map to any type when used as keys for object property access.

You would be able to export symbol constructs normally like you do with class and module. You could also be able to declare them "nominally" for external modules on your .d.ts files, for compatibility with third party non-TypeScript libraries:

declare module "bar" {
  export symbol Foo;
}

Just like functions, their names would be bound and available regardless of declaration order. So @JsonFreeman's example above would be perfectly valid, and would become:

function foo() {
    var obj = {
        [s]: 0
    };
}
symbol s;

Assigning TypeScript symbols to variables would be allowed, (Since it's useful for reflection) but would just yield the underlying ES6 value for them. If you did:

symbol s;
var s2 = s;

s would be a native TypeScript symbol construct, while s2 would be of the ES6 symbol type. With that in mind:

var foo: { [s]: number; }; // valid declaration
var bar: { [s2]: number; }; // invalid declaration

The keyword symbol would also need to be allowed as a type, for consistency with the existing number, string, types. When used like this it would represent the underlying ES6s symbol type, and not the TypeScript symbol construct.

var bar: symbol;

@JsonFreeman
Copy link
Contributor Author

@coreh, that is quite an interesting idea. One very nice thing about it is that users may have trouble internalizing that const s = Symbol() behaves specially, whereas adding a new keyword / declaration kind is a very clear statement that something is treated specially.

I would further propose that these declarations be block scoped (maybe that just follows because they are emitted as const). Another very important thing is that when they are returned from a function, they no longer retain their identity as a well typed symbol. That's because each function invocation will produce a new symbol.

This still exhibits problem 4 explained above though. We would still need a resolveName that is divorced from the checker, and we'd still need to make the binding breadth first.

@coreh
Copy link

coreh commented Feb 16, 2015

I would further propose that these declarations be block scoped (maybe that just follows because they are emitted as const). Another very important thing is that when they are returned from a function, they no longer retain their identity as a well typed symbol. That's because each function invocation will produce a new symbol.

Interesting, I hadn't thought about the use case where they are declared inside a function. If this adds a lot of complexity to the implementation though, I think it would be acceptable for symbol declarations to be restricted only to module scope, like class declarations.

This still exhibits problem 4 explained above though. We would still need a resolveName that is divorced from the checker, and we'd still need to make the binding breadth first.

I'm not entirely familiar with the internal workings of the type checker but if I understand correctly, problem no. 4 would be caused by code like this?

module A {
  export function f(): { [B.s]: string } {
    // ...
  }
  export symbol s;
}

module B {
  export function f(): { [A.s]: string } {
    // ...
  }
  export symbol s;
}

@JsonFreeman
Copy link
Contributor Author

Allowing symbol declarations only in modules would certainly make things simpler. Modules have the nice property that everything nested directly inside them happens exactly once, so execution and static entities are 1-to-1.

For your example, are you trying to show a circular symbol reference? I think if we only allow them in modules, there can be no circular references because the symbols would be named with identifiers and not symbols. But I am not even sure that circularity is an issue at all, as I have yet to come up with an example myself.

There are a number of architectural changes that would be required for this to be possible:

  • Binding must become breadth first, so that an outer scope with a symbol definition will be all bound by the time it's referenced in an inner scope.
  • Looking up a name in a scope must be extracted from the type checker so that the binder can use it as well, to find symbols that are being used as property keys.
  • Resolving a dotted name would also have to be extracted from the type checker for similar reasons (this is the one that is most concerning to me in terms of feasibility).

In principle these should all be doable, but they are all somewhat drastic and may interact with parts of the compilation process in ways that I'm not foreseeing. I think in some ways these concerns are more worrisome than the language design issues we have been discussing, as the latter can always be simplified by scoping the feature.

@JsonFreeman
Copy link
Contributor Author

After some more discussion, we have decided that it would be best to revisit this after seeing how people use this feature in practice in ES6. Given the discussion in this thread, seems like we are aware of what we think will be feasible, and where the challenges will be. Once we have key use cases in mind, I think we can apply this knowledge more purposefully.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Discussion Issues which may not have code impact
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants