by easily upgrading bags of properties to instances of classes.
See API to find out how to use the draft implementation.
JavaScript provides two main ways to represent structured data.
// Named Types
class Point {
constructor({ x, y }) {
this.x = x;
this.y = y;
}
}
let myInstance = new Point({ x: 1, y: 2 });
// Bags of Properties
let myBag = { x: 1, y: 2 };
It is more convenient to use an instance of a well-defined class, but it is easier to create a bag of properties.
Some frameworks define APIs in terms of bags of properties:
- MongoDB's query language:
db.foo.findAndModify({query: {_id: 123, available: {$gt: 0}}})
- Babel AST builders produce values like
{ type: 'BinaryExpression', operator: '+', ... }
- sanitize-html takes policy objects like
{ allowedTags: [ 'b', 'i' ], ... }
- hapi uses routing rules like
{ method: 'GET', path: '/', config: ... }
- Many APIs document configuration and option bundles in JavaScript object syntax.
Classes provide a natural place to check invariants, and work with
instanceof
to provide easy is-a checks.
Bags of properties are hard to check early, and JSON object forgery attacks exploit the fact that libraries can't rely on user code to endorse the bag as being appropriate to use in a particular way.
JSON.parse makes it easy to unintentionally turn untrustworthy strings into untrustworthy objects which has led to problems when key pieces of infrastructure are less suspicious of objects than of strings.
...
duck typing is a terrible basis for authorization decisions
This proposal seeks to bridge bags of properties with class types so that it is convenient to create instances of well-defined classes making it more transparent to consumers of the object how to use them safely.
A developer might see
let myMessage = {
body: 'Hello, World!',
timestamp: Date.now(),
recipient: ['j.friendly@example.com']
};
let expiry = {
type: 'Instant',
timestamp: Date.now()
};
let attachment = {
body: 'SSA8MyBkdWNrcyE=',
encoding: 'base64',
type: 'text/plain',
modified: {
type: 'Instant',
timestamp: 1533912060207
}
};
and mentally map those to three different concepts: an email message, an instant in time, and some kind of file.
Additionally, the developer might deduce that the body
fields of
messages and attachments might be attacker-controlled elsewhere,
and that the type: 'Instant'
is boilerplate.
The JavaScript engine can't.
More problematically, the difference between which fields are attacker controlled is apparent in the code here, but not to downstream code that merges, combines, or uses properties.
Hereafter, "duck type" refers to these informal types. Note: this is a narrower definition than readers may be familiar with: structural typing but without formally defined "structure."
TypeScript lets us bring duck types into the type system with index types and literal types.
interface Message {
body: String, // Unfiltered HTML
timestamp?: Number, // ? means Optional
recipient: AddressSpec
}
interface Instant {
type: 'Instant', // Literal type
timestamp: Number
}
interface TypedContent {
body: String,
encoding: Encoding,
type: MimeType,
modified?: Instant
}
Given a description like this, TypeScript can look at
let x: T = { key: value }
and decide whether
{ key: value }
is really a T
.
Converting existing projects to TypeScript is not trivial though, nor
is adding the right : T
to every creation of an Object via { ... }
.
The rest of this document explains how an operator, tentatively called unduck, might:
- Collect type descriptions like the
interface
s above, - Pick an appropriate class type given a bag of properties,
- Assemble arguments to the class's constructor from the bag of properties,
- Distinguish between bags from an external source and bags created by trusted user code,
- Respect scopes by not assuming that all modules are interested in constructing all duckable types.
First we need to put the information that TypeScript uses to identify
problems with interface
s in a form that we can use in JavaScript.
Below we will use ๐ฅ as a shorthand for from duck or deduck. (๐ฅ is actually "front-Facing Baby Chick" but the author thinks it looks like a duckling and, more importantly, is more adorable than ๐ฆ.)
(The author knows that ๐ฅ is not a valid JavaScript IdentifierName. ๐ฅ is a placeholder for bike-shedding to happen at a later date and stands out nicely in code samples.)
let ๐ฅ = global.๐ฅ;
๐ฅ = ๐ฅ.withTypes({
classType: class Point2D {
constructor(x, y) {
this.x = +x;
this.y = +y;
if (isNaN(this.x) || isNaN(this.y)) {
throw new TypeError('Invalid numeric input');
}
}
},
properties: {
'x': {
type: Number,
required: true // the default
},
'y': {
type: Number
},
'type': {
value: 'Point2D'
}
},
toConstructorArguments({ x, y }) { return [ x, y ] }
});
Duck property descriptors can also specify:
- Whether to recursively unduck the property value if it is an object. Defaults to true.
- A custom value converter which takes
(value, trusted, notApplicable)
and returnsnotApplicable
to indicate that the type is not applicable. See the duck hunt algorithm below.
Babel internally uses type definitions that contain similar information.
A duck pond is a set of type relationships.
The code above creates a local variable, ๐ฅ, by deriving from a global ๐ฅ, and registers a type relationship with it.
By assigned to ๐ฅ in a module scope, the developer can add type relationships which will affect calls to ๐ฅ(...) in that module.
The important thing about a duck pond is that we can derive from it a decision tree to relate a bag of properties to a class instance, and derive arguments to that class's constructor.
The duck hunt algorithm takes a bag of properties and a pond, then:
- Applies a decision tree to narrow the set of applicable type relationships
to the maximal subset of the pond such that the bag of properties
- has all required properties,
- has no property that is neither required nor optional,
- has no property whose value does not match a required value
(See
value
in the property descriptor above), - has no property whose value that does not pass a corresponding type guard.
- For any properties that are recursively deduckable by any applicable type relationship, recursively deduck them. If any is reference identical to an object that is still in progress, fail.
- Call
toConstructorArguments
for each applicable type relationship. - Await all the results from
toConstructorArguments
. For each, if the result is not an array, then remove the type relationship from the applicable set. - Fail if there is not exactly one applicable type relationship.
- Return the result of applying the applicable type relationship's
classType
's constructor to the soletoConstructorArguments
result.
To turn a nested bag of properties into a value, simply initialize your duck pond as above, and then call the autoduck operator.
import * as ShapesLibrary from 'ShapesLibrary';
// Maybe libraries provide a way to register their duckable types.
let ๐ฅ = ShapesLibrary.fillPond(global.๐ฅ);
let myTriangle = ๐ฅ({
path: {
points: [
{
start: { x: 50, y: 25 },
end: { x: 25, y: 50 },
},
{ ... },
{ ... }
]
}
});
Compare that to a use of explicit type names:
import { Shape, Path, LineSegment, Point } from 'ShapesLibrary';
let myTriangle = new Shape(
new Path(
new LineSegment(
new Point(50, 25),
new Point(25, 50)),
new LineSegment(...),
new LineSegment(...)));
Having written lots of Java and C++, the author does not find the
latter code sample hard to read, and doesn't find the import
and
setup code onerous.
But novice programmers do seem to find bags-of-properties style APIs easy to learn and use.
Being able to produce well-governed object graphs like the latter gives API authors more choices.
If a project's developers are comfortable reasoning about type hierarchies and how they compose, then there's no need for duck types.
If you have to choose between bags of properties and auto-ducking,
getting developers in the habit of using ๐ฅ gives a small
number of type maintainers the ability to see that type invariants
checks happen early and that downstream code can use instanceof
to
check their inputs, especially those values that imply that a property
is safe to use in a sensitive context.
Application code shouldn't naively convert any bag of properties to an object. "JSON object forgery" (mentioned previously) explains why not.
JSON.parse makes it easy to unintentionally turn untrustworthy strings into untrustworthy objects.
A duck property descriptor's optional convert
method can convert
values from outside a trust boundary to ones suitable to use
inside a trust boundary if it knows whether the author considers the
input trustworthy. This could apply sanitizers, restrict to
plain strings instead of recursing, or just not upgrade to a contract type:
There are two patterns that might make it easy to audit decisions about whether an object is trustworthy.
- ๐ฅ.โข (read danger duck) could indicate that an input is dangerous.
- Alternatively, ๐ฅ.โฎ (read peace duck) could indicate that the author trusts the input.
The latter makes the easiest to type default to safe which is preferable. Either, if named consistently, make it easy to enumerate calls that might need auditing.
[
๐ฅ({ foo: 'bar' }),
๐ฅ.โข(JSON.parse(untrustedString))
]
// or
[
๐ฅ.โฎ({ foo: 'bar' }),
๐ฅ(JSON.parse(untrustedString))
]
This replaces one intractable problem, locally reasoning about the structure of external inputs,
with two simpler ones. The author of this code does not need to reason about the content of
untrustedString
. A type author can specify one conversion to ferry safe values inside the
trust boundary, and the author of the code above only needs to reason about which variables
originate inside the trust boundary.
Given a codebase that uses bags of properties extensively, I might expect migration to happen piecemeal:
- Developers pick an API that takes bags of properties.
- Configure it to require class types as inputs, or to report when they're not.
- Put ๐ฅ(...) around object constructors, run tests, tweak, and repeat until tests run green.
- Repeat with another API that ducks.
As noted before, without rewriting code to call the appropriate
new ClassName
, maintainers and security auditors get the benefits of:
- constructors that check type invariants at
new
time, - having a place to put code that coerces untrusted structured inputs to trustworthy structured values.