Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Add serializer mode for test262 execution #2290

Closed

Conversation

calebmer
Copy link
Contributor

I want to better understand the bugs we have in abstract evaluation. One of my ideas to do this is to replace all the literals in test262 tests with abstract values and see what our test coverage is. The first step to do this is to serialize the test262 sources after running them in Prepack and checking if their generated JavaScript runs correctly. This PR does that by adding a --serializer flag.

With my methodology, all test262 harnesses are serialized every time since they are in the global scope. This unfortunately seems to be unavoidable since doing otherwise would change program semantics and break the tests.

Running time yarn test-test262 takes about 2 minutes and has a 98% pass rate. Running time yarn test-test262 --serializer takes about 5 minutes and has a 95% pass rate. Here’s a diff of the two results.

Here is a selection of some of the bugs in the serializer caught by running test262 with the Prepack serializer. I might open issues for these, but they can be a bit pedantic. I might fix some of them depending on how important they are to the React code we want to compile.

Notably we have:

  • Lots of invariants being triggered. I particularly saw a lot of this invariant. (An example of this is 07.md.)
  • Lots of values that are visited, but not serialized. (An example of this is 01.md.)
  • Some incorrect outputs only caught by executing code. Could not be caught by static analysis. Particularly around class serialization.

I’ll build off this PR to get coverage for the abstract evaluator next, but I found these results interesting on their own so decided to do this in two PRs.

@NTillmann
Copy link
Contributor

NTillmann commented Jul 18, 2018

Regarding "all native function values should be intrinsics": There was a discussion on that here #1285 --- maybe that issue got closed prematurely. I don't think this is terribly relevant, might just be triggered by some common test code path.

@NTillmann
Copy link
Contributor

Lots of values that are visited, but not serialized.

That's not good. However, your examples seem to all involve classes, which wouldn't surprise me as that's an undertested feature (it's not ES5), and comes with an open issue mentioning some of the things that need to happen, possibly unrelated to the visiting/serialization bug you found: #1317.

@NTillmann
Copy link
Contributor

Some incorrect outputs only caught by executing code. Could not be caught by static analysis. Particularly around class serialization.

Hm, yes, classes... At the very least, Prepack should give up when encountering not actually supported class features. Great to see that we now have a way to detect them!

@calebmer
Copy link
Contributor Author

“Lots” might be a bit of an exaggeration as you note, @NTillmann. More like “lots” relative to 5% of tests that failed. A 95% pass rate is pretty good.

@NTillmann
Copy link
Contributor

NTillmann commented Jul 18, 2018

What's the change in pass rate when you exclude tests that involve classes?

}

const context = vm.createContext({
// NOTE: Workaround since Prepack serializes code that expects a global `TypedArray` class which does not exist per
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it exists in JSC? Anyway, sounds like a bug that deserves an issue number and a TODO...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2292 👍

return "Recover";
},
onParse: () => {
// TODO(calebmer): Turn all literals into abstract values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess behind some other option(s) that would allow to different levels of fuzzing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Contributor

@NTillmann NTillmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that it only takes 5 minutes with the serializer in the loop.

@calebmer
Copy link
Contributor Author

What's the change in pass rate when you exclude tests that involve classes?

95% still. I added a basic check to exclude source files that include class. ES6 passes rise from 91% to 93% though.

=== RESULTS ===
Passes: 16238 / 16949 (95%)
ES5 passes: 11501 / 11873 (96%)
ES6 passes: 4737 / 5076 (93%)
Skipped: 14206
Timeouts: 0

I also opened some issues for problems I thought were actionable and not class related. (Since class serialization is known to have issues.)

None seem super related to React code, but #2293 might be. Haven’t audited all the NativeFunctionValues this effects.

Copy link

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calebmer is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@calebmer calebmer deleted the test262-serializer-mode branch July 19, 2018 16:20
facebook-github-bot pushed a commit that referenced this pull request Aug 1, 2018
Summary:
I extended the `--serializer` command line argument I added in #2290 to now support `--serializer abstract-scalar`. What this mode does is it converts all boolean, string, number, and symbol literals into abstract values. I did not choose to extend this logic to object and array literals just yet since scalars alone showed some interesting results.

What I really want here is a review of the results.

Full suite execution results are real bad. **18%** pass rate. I dug a bit into why.

```
=== RESULTS ===
Passes: 3356 / 17780 (18%)
ES5 passes: 2276 / 12045 (18%)
ES6 passes: 1080 / 5735 (18%)
Skipped: 13375
Timeouts: 28
```

I was mostly interested in the runtime failures we see since that means Prepack is serializing invalid code. However, I found ~14k failures in the Prepack stage (more on this in a bit) and ~3k failures in the runtime stage. This means ~80% of tests _fail to compile_ with this abstract transformation applied.

Why are these tests failing? I took the first 4 items of the stack traces from errors thrown in the Prepack stage, sorted, and ranked them. [Here’s the result.](https://gist.github.com/calebmer/29e27613325fd99fa04be7ab4a9641c0) The top 5 with thousands of hits are:

```
7538 of:
    at AbstractValue.throwIfNotConcrete (/Users/calebmer/prepack/src/values/AbstractValue.js:536:11)
    at ToImplementation.ToStringPartial (/Users/calebmer/prepack/src/methods/to.js:717:69)
    at NativeFunctionValue._index.NativeFunctionValue [as callback] (/Users/calebmer/prepack/src/intrinsics/ecma262/String.js:34:37)
    at NativeFunctionValue.callCallback (/Users/calebmer/prepack/src/values/NativeFunctionValue.js:121:12)

4595 of:
    at AbstractValue.throwIfNotConcrete (/Users/calebmer/prepack/src/values/AbstractValue.js:536:11)
    at NativeFunctionValue.func.defineNativeMethod [as callback] (/Users/calebmer/prepack/src/intrinsics/ecma262/Object.js:328:41)
    at NativeFunctionValue.callCallback (/Users/calebmer/prepack/src/values/NativeFunctionValue.js:121:12)
    at functionCall (/Users/calebmer/prepack/src/methods/call.js:308:26)

1454 of:
    at AbstractValue.throwIfNotConcrete (/Users/calebmer/prepack/src/values/AbstractValue.js:536:11)
    at NativeFunctionValue.func.defineNativeMethod [as callback] (/Users/calebmer/prepack/src/intrinsics/ecma262/Object.js:364:41)
    at NativeFunctionValue.callCallback (/Users/calebmer/prepack/src/values/NativeFunctionValue.js:121:12)
    at functionCall (/Users/calebmer/prepack/src/methods/call.js:308:26)

1351 of:
    at invariant (/Users/calebmer/prepack/src/invariant.js:18:15)
    at EvalPropertyNamePartial (/Users/calebmer/prepack/src/evaluators/ObjectExpression.js:59:7)
    at _default (/Users/calebmer/prepack/src/evaluators/ObjectExpression.js:80:21)
    at LexicalEnvironment.evaluateAbstract (/Users/calebmer/prepack/src/environment.js:1368:20)

1053 of:
    at AbstractValue.throwIfNotConcrete (/Users/calebmer/prepack/src/values/AbstractValue.js:536:11)
    at NativeFunctionValue.obj.defineNativeMethod [as callback] (/Users/calebmer/prepack/src/intrinsics/ecma262/ObjectPrototype.js:35:39)
    at NativeFunctionValue.callCallback (/Users/calebmer/prepack/src/values/NativeFunctionValue.js:121:12)
    at functionCall (/Users/calebmer/prepack/src/methods/call.js:308:26)
```

This means there may be some low hanging fruit.

Here are my questions for you.

- Did you expect results like this?
- What is our ideal test262 pass rate with this transformation applied?
- What happens to React Compiler or other projects when these errors are thrown? (As I understand it, we bail out and don’t optimize the code, but do optimize the code around it.)
- Do you think my methodology is flawed?

It’s also possible that something in my methodology is wrong, but I didn’t spend much time investigating these failures as I spent investigating the failures I found in #2290.

My goal with this test suite is to build an understanding of what “correctness” for the React Compiler against all JavaScript code looks like. (Not just the few bundles we’ve selected to look at.) I don’t think these results suggest that we only safely compile 18% of the language, but it’s a data point. I’ll be looking into fixing a selection of these issues to better understand their nature or if I need to change methodologies.
Pull Request resolved: #2297

Differential Revision: D9120572

Pulled By: calebmer

fbshipit-source-id: b394f1e8da034c9985366010e3e63fd55fd94168
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants