Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Dumping an intermediate representation #2493

Closed
wants to merge 7 commits into from
Closed

Dumping an intermediate representation #2493

wants to merge 7 commits into from

Conversation

NTillmann
Copy link
Contributor

@NTillmann NTillmann commented Aug 28, 2018

Release notes: None

This resolves #1944.

The functionality that Prepack provides can be seen as being done by two separate engines:

  1. There's the "front-end", which does symbolic execution / abstract interpretation,
    computing "effects" / "generator trees" for the global code and optimized functions.
  2. There's the "back-end", which takes this intermediate representation, computes what's
    reachable, and then turns it into a new executable program by performing transformations
    such as breaking cycles.

Before, Prepack developers had very few tools available to understand what goes on between the
front-end and back-end. Usually, they just look at the final output, imagine what the
intermediate representation might have been, or maybe use to debugger to poke around
memory locations, or to invoke helper functions to inspect live state that way, hoping the debugger doesn't crash along the way.

This PR (and other associated PRs such as #2490, #2491) try to improve that state by turning the intermediate representation into a first-class data structure that is printable into a human-readable textual form. This will help...

  • in understanding what's going on
  • enabling different back-ends by ensuring that there's a first-class intermediate representation
  • enabling future transformations on a well-defined intermediate data structure, ideally breaking up what the current serializer implementation does
  • new ways of testing, e.g. via snapshots of the intermediate representation, or
    (once there is a parser) feeding hand-written IR into the back-end

Example 1

For example, for this program...

let x = __abstract("number", "(x)");
if (x > 42) {
  let t = Date.now();
  let u = 2 + 3
  console.log(t + u);
} else {
  global.result = x;
}

... the IR would currently look like this:

(entry point): "main(#0)"
  * value#0 = ">"(@"(x)", 42)
  if value#0
    then: "evaluateNodeForEffects(#4)"
      path conditions value#0
      _$0 := ABSTRACT_FROM_TEMPLATE<template source @"global.Date.now()">[isPure]
      * value#1 = "+"(5, _$0)
      CONSOLE_LOG("log", value#1)
    else: "evaluateNodeForEffects(#11)"
      * value#2 = "!"(value#0)
      path conditions value#2
      GLOBAL_ASSIGNMENT(@"(x)", "result")

Notes:

  • Indentation reflects structure of the generator tree
  • Each line encodes some information. Some of the information is atemporal, e.g.
    • * value#N = f(args) // atemporal abstract value
  • Some of the information is temporal, e.g.
    • _$N := op-type<op-data>(args)[metadata] // generator entry that defines a temporal value
      • op-data contains various essential data
      • args tends to be a projection of data capturing all values that must be visited, but it's not consistent
      • metadata is information that helps the visitor compute minimal reachability, but it's not semantically relevant.
    • op-type<op-data>(args)[metadata] // generator entry that does not define a temporal value
    • if ... then ... else.

Example 2

(function() {
    let obj = {};
    obj.p = obj;
    global.result = obj;
})();

=>

(entry point): "main(#0)"
  * object#14 = ObjectValue(properties [p], $Prototype @"Object.prototype")
  * object#14.p = PropertyBinding(descriptor PropertyDescriptor(writable, enumerable, configurable, value object#14))
  GLOBAL_ASSIGNMENT(object#14, "result")

Example 3

Things get slightly ugly with functions.

function f() { }

=>

(entry point): "main(#0)"
  * declEnv#1 = DeclarativeEnvironmentRecord()
  * globEnv#2 = GlobalEnvironmentRecord($DeclarativeRecord declEnv#1, $ObjectRecord declEnv#1, $VarNames [f], $GlobalThisValue global)
  * lexEnv#0 = LexicalEnvironment(destroyed, environment record globEnv#2)
  * func#13 = ECMAScriptSourceFunctionValue($ConstructorKind base, $ThisMode global, $FunctionKind normal, $FormalParameters 0, $Environment lexEnv#0, properties [arguments, length, caller, prototype, name], $Prototype @"Function.prototype")
  * func#13.arguments = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value undefined))
  * func#13.length = PropertyBinding(descriptor PropertyDescriptor(configurable, value 0))
  * func#13.caller = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value undefined))
  * object#14 = ObjectValue(properties [constructor], $Prototype @"Object.prototype")
  * object#14.constructor = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value func#13))
  * func#13.prototype = PropertyBinding(descriptor PropertyDescriptor(writable, value object#14))
  * func#13.name = PropertyBinding(descriptor PropertyDescriptor(configurable, value "f"))
  GLOBAL_ASSIGNMENT(func#13, "f")

Example 4

Things get really ugly with optimized functions, but that's what it is right now:

function f() { return 2 + 5; }
__optimize(f);

=>

(entry point): "main(#0)"
  * declEnv#1 = DeclarativeEnvironmentRecord()
  * globEnv#2 = GlobalEnvironmentRecord($DeclarativeRecord declEnv#1, $ObjectRecord declEnv#1, $VarNames [f], $GlobalThisValue global)
  * lexEnv#0 = LexicalEnvironment(destroyed, environment record globEnv#2)
  * func#13 = ECMAScriptSourceFunctionValue($ConstructorKind base, $ThisMode global, $FunctionKind normal, $FormalParameters 0, $Environment lexEnv#0, properties [arguments, length, caller, prototype, name], $Prototype @"Function.prototype")
  * func#13.arguments = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value undefined))
  * func#13.length = PropertyBinding(descriptor PropertyDescriptor(configurable, value 0))
  * func#13.caller = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value undefined))
  * object#15 = ObjectValue(properties [constructor], $Prototype @"Object.prototype")
  * object#15.constructor = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value func#13))
  * func#13.prototype = PropertyBinding(descriptor PropertyDescriptor(writable, value object#15))
  * func#13.name = PropertyBinding(descriptor PropertyDescriptor(configurable, value "f"))
  GLOBAL_ASSIGNMENT(func#13, "f")
=== optimized function func#13
  (entry point): "AdditionalFunctionEffects(#12)"
    RETURN(7)
  * object#16 = ArgumentsExotic(properties [length, callee], symbols [@"Symbol.iterator"], $Prototype @"Object.prototype")
  * object#16.length = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value 0))
  * declEnv#1 = DeclarativeEnvironmentRecord()
  * globEnv#2 = GlobalEnvironmentRecord($DeclarativeRecord declEnv#1, $ObjectRecord declEnv#1, $VarNames [f], $GlobalThisValue global)
  * lexEnv#0 = LexicalEnvironment(destroyed, environment record globEnv#2)
  * func#13 = ECMAScriptSourceFunctionValue($ConstructorKind base, $ThisMode global, $FunctionKind normal, $FormalParameters 0, $Environment lexEnv#0, properties [arguments, length, caller, prototype, name], $Prototype @"Function.prototype")
  * func#13.arguments = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value undefined))
  * func#13.length = PropertyBinding(descriptor PropertyDescriptor(configurable, value 0))
  * func#13.caller = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value undefined))
  * object#15 = ObjectValue(properties [constructor], $Prototype @"Object.prototype")
  * object#15.constructor = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value func#13))
  * func#13.prototype = PropertyBinding(descriptor PropertyDescriptor(writable, value object#15))
  * func#13.name = PropertyBinding(descriptor PropertyDescriptor(configurable, value "f"))
  * object#16.callee = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value func#13))
  * object#16.@"Symbol.iterator" = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value @"Array.prototype.values"))
  * object#16.$Prototype = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value @"Object.prototype"))
  * object#16.$Extensible = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value true))
  * object#16._isPartial = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#16._isLeaked = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#16._isSimple = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#16._simplicityIsTransitive = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#16._isFinal = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#17 = ObjectValue($Prototype null)
  * object#17.$Prototype = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value null))
  * object#17.$Extensible = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value true))
  * object#17._isPartial = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#17._isLeaked = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#17._isSimple = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#17._simplicityIsTransitive = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#17._isFinal = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#18 = ObjectValue(properties [next], $Prototype @"([][Symbol.iterator]().__proto__.__proto__)")
  * func#19 = NativeFunctionValue(properties [length, name], $Prototype @"Function.prototype")
  * func#19.length = PropertyBinding(descriptor PropertyDescriptor(configurable, value 0))
  * func#19.name = PropertyBinding(descriptor PropertyDescriptor(configurable, value "next"))
  * object#18.next = PropertyBinding(descriptor PropertyDescriptor(writable, configurable, value func#19))
  * object#18.$Prototype = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value @"([][Symbol.iterator]().__proto__.__proto__)"))
  * object#18.$Extensible = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value true))
  * object#18._isPartial = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#18._isLeaked = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#18._isSimple = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#18._simplicityIsTransitive = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#18._isFinal = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * object#18.$IteratedList = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(some array))
  * func#19.$Prototype = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value @"Function.prototype"))
  * func#19.$Extensible = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value true))
  * func#19._isPartial = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * func#19._isLeaked = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * func#19._isSimple = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * func#19._simplicityIsTransitive = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  * func#19._isFinal = PropertyBinding(internal slot, descriptor InternalSlotDescriptor(value false))
  modified property bindings: [object#16.$Prototype, object#16.$Extensible, object#16._isPartial, object#16._isLeaked, object#16._isSimple, object#16._simplicityIsTransitive, object#16._isFinal, object#17.$Prototype, object#17.$Extensible, object#17._isPartial, object#17._isLeaked, object#17._isSimple, object#17._simplicityIsTransitive, object#17._isFinal, object#16.length, object#16.@"Symbol.iterator", object#16.callee, object#18.$Prototype, object#18.$Extensible, object#18._isPartial, object#18._isLeaked, object#18._isSimple, object#18._simplicityIsTransitive, object#18._isFinal, object#18.$IteratedList, func#19.$Prototype, func#19.$Extensible, func#19._isPartial, func#19._isLeaked, func#19._isSimple, func#19._simplicityIsTransitive, func#19._isFinal, func#19.length, func#19.name, object#18.next]
  created objects: [object#16, object#17, object#18, func#19]
  result: SimpleNormalCompletion(value 7)

Future work (not for this PR)

There are still a good number of things left to do. In particular:

  • further simplify printing to make it more readable (and writable)
  • further extend printed format to make it round-trippable
  • some details of the current IR is really just an artefact of 2 years of hacking. It is in need of some additional
    rounds of refactorings and simplifications.
    • Particularly problematic / overly complicated are invariants INVARIANT, FULL_INVARIANT_ABSTRACT, FOR_IN, REACT_SSR_TEMPLATE_LITERAL
    • In TemporalOperationEntry, there's some duplication going on with args and data. Consider eliminating args and deriving this information when needed from data.
    • OperationDescriptorData could use some structure, or a (sub)type hierarchy.
    • There are already dedicated generator entry classes for some operations, and then there is TemporalOperationEntry with its data dumping ground for everything else. This is all a bit arbitrary and should be unified.
    • ...there's much more cruft.

In a way, this PR just provides yet another way of dumping values and generators. Some of the other existing ways should be consolidated or killed.

Added option --ir to test-runner to activate (and test) IR dumping.

@trueadm trueadm added the WIP This pull request is a work in progress and not ready for merging. label Aug 28, 2018
@NTillmann NTillmann force-pushed the TextPrinter branch 2 times, most recently from 34ed021 to 38c8203 Compare August 28, 2018 17:21
Copy link
Contributor

@hermanventer hermanventer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far.

@@ -365,6 +373,20 @@ function run(
flags
);
if (heapGraphFilePath !== undefined) resolvedOptions.heapGraphFormat = "DotLanguage";
if (dumpIRFilePath !== undefined) {
resolvedOptions.onExecute = realm => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the "onExecute" name a bit surprising. This is called when Abstract Interpretation (execution) is completed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, name matches onParse, which is called after parsing...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both names are confusing. If anything they should get the prefix after or something to show they happen after the action. We might as well get the naming right for both in this PR?

let dataTexts = [];
if (args.length > 0) dataTexts.push(`args ${this.describeValues(args)}`);
if (data.unaryOperator !== undefined) dataTexts.push(data.unaryOperator); // used by UNARY_EXPRESSION
if (data.binaryOperator !== undefined) dataTexts.push(data.binaryOperator); // used by BINARY_EXPRESSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably these are mutually exclusive and this if can be inside an else.

Copy link
Contributor Author

@NTillmann NTillmann Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of invariants over this data structure could be enforced somewhere and then assumed here. But that's for another PR... Here, I'll make the entries unique (there should be a parser one day).

Copy link
Contributor

@trueadm trueadm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great step in the right direction and is going to be invaluable to working with the Prepack internals. As a future point, maybe we want to add tests around checking this output at a later point?

Release notes: None

The functionality that Prepack provides can be seen as being done by two separate engines:
1) There's the "front-end", which does symbolic execution / abstract interpretation,
   computing "effects" / "generator trees" for the global code and optimized functions.
2) There's the "back-end", which takes this intermediate representation, computes what's
   reachable, and then turns it into a new executable program by performing transformations
   such as breaking cycles.

Prepack developers have few tools available to understand what goes on between the
front-end and back-end. Usually, they look just at the final output, imagine what the
intermediate representation might have been, or maybe use to debugger to poke around
memory locations.

This PR (and other associated PRs) try to improve that state by turning the intermediate
representation into a first-class printable and human-readable data structure. This will help...
- in understanding what's going on
- enabling different back-ends by ensuring that there's a first-class intermediate representation
- enabling future transformation on a well-defined intermediate data structure
- new ways of testing, e.g. via snapshots of the intermediate representation, or
  (once there is a parser) feeding hand-written IR into the back-end

There are still a good number of things left to do. In particular:
- discovering and printing (nested) optimized functions
- printing the structure of abstract values and objects
- some form of testing
- the current IR is really just an artefact of 2 years of hacking. It is in need of some additional
  rounds of refactorings.
Add mode --ir to test-runner where it records on prints IR when it would also print generated code.
@NTillmann NTillmann removed the WIP This pull request is a work in progress and not ready for merging. label Aug 29, 2018
@NTillmann NTillmann changed the title [WIP] Dumping an intermediate representation Dumping an intermediate representation Aug 29, 2018
Copy link

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NTillmann is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create serialization format for Effects/completion/generators
4 participants