Skip to content

Assembles instances of class types from bags of properties.

License

Notifications You must be signed in to change notification settings

mikesamuel/unduck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

30 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Building castles out of ducks

by easily upgrading bags of properties to instances of classes.

See API to find out how to use the draft implementation.

Build Status Dependencies Status npm Coverage Status Known Vulnerabilities

Problem: Juggling ducks is hard

JavaScript provides two main ways to represent structured data.

// Named Types
class Point {
  constructor({ x, y }) {
    this.x = x;
    this.y = y;
  }
}
let myInstance = new Point({ x: 1, y: 2 });


// Bags of Properties
let myBag = { x: 1, y: 2 };

It is more convenient to use an instance of a well-defined class, but it is easier to create a bag of properties.

Some frameworks define APIs in terms of bags of properties:

  • MongoDB's query language: db.foo.findAndModify({query: {_id: 123, available: {$gt: 0}}})
  • Babel AST builders produce values like { type: 'BinaryExpression', operator: '+', ... }
  • sanitize-html takes policy objects like { allowedTags: [ 'b', 'i' ], ... }
  • hapi uses routing rules like { method: 'GET', path: '/', config: ... }
  • Many APIs document configuration and option bundles in JavaScript object syntax.

Classes provide a natural place to check invariants, and work with instanceof to provide easy is-a checks.

Bags of properties are hard to check early, and JSON object forgery attacks exploit the fact that libraries can't rely on user code to endorse the bag as being appropriate to use in a particular way.

JSON.parse makes it easy to unintentionally turn untrustworthy strings into untrustworthy objects which has led to problems when key pieces of infrastructure are less suspicious of objects than of strings.

...

duck typing is a terrible basis for authorization decisions

This proposal seeks to bridge bags of properties with class types so that it is convenient to create instances of well-defined classes making it more transparent to consumers of the object how to use them safely.

What is obvious to a developer is not to the JS engine.

A developer might see

let myMessage = {
  body: 'Hello, World!',
  timestamp: Date.now(),
  recipient: ['j.friendly@example.com']
};

let expiry = {
  type: 'Instant',
  timestamp: Date.now()
};

let attachment = {
  body: 'SSA8MyBkdWNrcyE=',
  encoding: 'base64',
  type: 'text/plain',
  modified: {
    type: 'Instant',
    timestamp: 1533912060207
  }
};

and mentally map those to three different concepts: an email message, an instant in time, and some kind of file.

Additionally, the developer might deduce that the body fields of messages and attachments might be attacker-controlled elsewhere, and that the type: 'Instant' is boilerplate.

The JavaScript engine can't.

More problematically, the difference between which fields are attacker controlled is apparent in the code here, but not to downstream code that merges, combines, or uses properties.

Duck typing

Hereafter, "duck type" refers to these informal types. Note: this is a narrower definition than readers may be familiar with: structural typing but without formally defined "structure."

TypeScript lets us bring duck types into the type system with index types and literal types.

interface Message {
  body: String,  // Unfiltered HTML
  timestamp?: Number,  // ? means Optional
  recipient: AddressSpec
}

interface Instant {
  type: 'Instant',  // Literal type
  timestamp: Number
}

interface TypedContent {
  body: String,
  encoding: Encoding,
  type: MimeType,
  modified?: Instant
}

Given a description like this, TypeScript can look at let x: T = { key: value } and decide whether { key: value } is really a T.

Converting existing projects to TypeScript is not trivial though, nor is adding the right : T to every creation of an Object via { ... }.

The rest of this document explains how an operator, tentatively called unduck, might:

  • Collect type descriptions like the interfaces above,
  • Pick an appropriate class type given a bag of properties,
  • Assemble arguments to the class's constructor from the bag of properties,
  • Distinguish between bags from an external source and bags created by trusted user code,
  • Respect scopes by not assuming that all modules are interested in constructing all duckable types.

Duck calls

First we need to put the information that TypeScript uses to identify problems with interfaces in a form that we can use in JavaScript.

Below we will use ๐Ÿฅ as a shorthand for from duck or deduck. (๐Ÿฅ is actually "front-Facing Baby Chick" but the author thinks it looks like a duckling and, more importantly, is more adorable than ๐Ÿฆ†.)

(The author knows that ๐Ÿฅ is not a valid JavaScript IdentifierName. ๐Ÿฅ is a placeholder for bike-shedding to happen at a later date and stands out nicely in code samples.)

let ๐Ÿฅ = global.๐Ÿฅ;
๐Ÿฅ = ๐Ÿฅ.withTypes({
  classType: class Point2D {
    constructor(x, y) {
      this.x = +x;
      this.y = +y;
      if (isNaN(this.x) || isNaN(this.y)) {
        throw new TypeError('Invalid numeric input');
      }
    }
  },
  properties: {
    'x': {
      type: Number,
      required: true  // the default
    },
    'y': {
      type: Number
    },
    'type': {
      value: 'Point2D'
    }
  },
  toConstructorArguments({ x, y }) { return [ x, y ] }
});

Duck property descriptors can also specify:

  • Whether to recursively unduck the property value if it is an object. Defaults to true.
  • A custom value converter which takes (value, trusted, notApplicable) and returns notApplicable to indicate that the type is not applicable. See the duck hunt algorithm below.

Babel internally uses type definitions that contain similar information.

Duck ponds

A duck pond is a set of type relationships.

The code above creates a local variable, ๐Ÿฅ, by deriving from a global ๐Ÿฅ, and registers a type relationship with it.

By assigned to ๐Ÿฅ in a module scope, the developer can add type relationships which will affect calls to ๐Ÿฅ(...) in that module.

The duck hunt algorithm

The important thing about a duck pond is that we can derive from it a decision tree to relate a bag of properties to a class instance, and derive arguments to that class's constructor.

The duck hunt algorithm takes a bag of properties and a pond, then:

  1. Applies a decision tree to narrow the set of applicable type relationships to the maximal subset of the pond such that the bag of properties
    • has all required properties,
    • has no property that is neither required nor optional,
    • has no property whose value does not match a required value (See value in the property descriptor above),
    • has no property whose value that does not pass a corresponding type guard.
  2. For any properties that are recursively deduckable by any applicable type relationship, recursively deduck them. If any is reference identical to an object that is still in progress, fail.
  3. Call toConstructorArguments for each applicable type relationship.
  4. Await all the results from toConstructorArguments. For each, if the result is not an array, then remove the type relationship from the applicable set.
  5. Fail if there is not exactly one applicable type relationship.
  6. Return the result of applying the applicable type relationship's classType's constructor to the sole toConstructorArguments result.

How to make ducks?

To turn a nested bag of properties into a value, simply initialize your duck pond as above, and then call the autoduck operator.

import * as ShapesLibrary from 'ShapesLibrary';

// Maybe libraries provide a way to register their duckable types.
let ๐Ÿฅ = ShapesLibrary.fillPond(global.๐Ÿฅ);

let myTriangle = ๐Ÿฅ({
  path: {
    points: [
      {
        start: { x: 50, y: 25 },
        end: { x: 25, y: 50 },
      },
      { ... },
      { ... }
    ]
  }
});

Compare that to a use of explicit type names:

import { Shape, Path, LineSegment, Point } from 'ShapesLibrary';

let myTriangle = new Shape(
  new Path(
    new LineSegment(
        new Point(50, 25),
        new Point(25, 50)),
    new LineSegment(...),
    new LineSegment(...)));

To duck or not to duck

Having written lots of Java and C++, the author does not find the latter code sample hard to read, and doesn't find the import and setup code onerous.

But novice programmers do seem to find bags-of-properties style APIs easy to learn and use.

Being able to produce well-governed object graphs like the latter gives API authors more choices.

If a project's developers are comfortable reasoning about type hierarchies and how they compose, then there's no need for duck types.

If you have to choose between bags of properties and auto-ducking, getting developers in the habit of using ๐Ÿฅ gives a small number of type maintainers the ability to see that type invariants checks happen early and that downstream code can use instanceof to check their inputs, especially those values that imply that a property is safe to use in a sensitive context.

Danger duck

Application code shouldn't naively convert any bag of properties to an object. "JSON object forgery" (mentioned previously) explains why not.

JSON.parse makes it easy to unintentionally turn untrustworthy strings into untrustworthy objects.

A duck property descriptor's optional convert method can convert values from outside a trust boundary to ones suitable to use inside a trust boundary if it knows whether the author considers the input trustworthy. This could apply sanitizers, restrict to plain strings instead of recursing, or just not upgrade to a contract type:

There are two patterns that might make it easy to audit decisions about whether an object is trustworthy.

  • ๐Ÿฅ.โ˜ข (read danger duck) could indicate that an input is dangerous.
  • Alternatively, ๐Ÿฅ.โ˜ฎ (read peace duck) could indicate that the author trusts the input.

The latter makes the easiest to type default to safe which is preferable. Either, if named consistently, make it easy to enumerate calls that might need auditing.

[
  ๐Ÿฅ({ foo: 'bar' }),
  ๐Ÿฅ.โ˜ข(JSON.parse(untrustedString))
]

// or

[
  ๐Ÿฅ.โ˜ฎ({ foo: 'bar' }),
  ๐Ÿฅ(JSON.parse(untrustedString))
]

This replaces one intractable problem, locally reasoning about the structure of external inputs, with two simpler ones. The author of this code does not need to reason about the content of untrustedString. A type author can specify one conversion to ferry safe values inside the trust boundary, and the author of the code above only needs to reason about which variables originate inside the trust boundary.

Duck Migration

Given a codebase that uses bags of properties extensively, I might expect migration to happen piecemeal:

  1. Developers pick an API that takes bags of properties.
  2. Configure it to require class types as inputs, or to report when they're not.
  3. Put ๐Ÿฅ(...) around object constructors, run tests, tweak, and repeat until tests run green.
  4. Repeat with another API that ducks.

As noted before, without rewriting code to call the appropriate new ClassName, maintainers and security auditors get the benefits of:

  • constructors that check type invariants at new time,
  • having a place to put code that coerces untrusted structured inputs to trustworthy structured values.

About

Assembles instances of class types from bags of properties.

Resources

License

Stars

Watchers

Forks

Packages

No packages published