Use integration api to run Test262 tests #1122

leobalter · 2017-10-31T22:08:29Z

This method provides an alternative way to run Test262 through a
stream of compiled test contents for each scenario (default, strict
mode), including all the necessary harness contents.

The logic to capture an execution success or failure is similar to
the original one - from scripts/test262-runner.js - and improvements
should be done as follow ups.

The filtering system is now done through a configuration yaml file,
using tests metadata and path locations. This filter is used as an
object that can be extended with further logic, but still offers a
single point to check for filtering.

The report system requires some improvements and these should also
be done as follow-ups. For now they provide a report for each
folder and the total results. Although, the results data contain
enough information to highly expand the report.

Some further improvements are expected and planned, this work
should be at least ready for an initial round for feedback review.

leobalter · 2017-10-31T22:10:36Z

I'm expecting some feedback for this PR.

It works, but as said, it can use some further improvements. Any feedback will be really helpful to prioritize anything.

cc @sebmarkbage

hermanventer · 2017-10-31T22:49:51Z

It would be great if you can fix the lint and flow errors before I look at it in detail. Also, the runner itself complains about the version of node not being supported. There is also an out of memory situation, it seems.

leobalter · 2017-10-31T23:34:03Z

Sure thing. I was using the tests in a small subset of files and now I noticed the there is some uncaught timeout issue, perhaps.

I'll push a fixed code soon.

leobalter · 2017-11-01T00:25:37Z

it should pass on the lint and the flow check tasks.

NTillmann · 2017-11-01T18:00:04Z

This new way of running test262 seems to run way fewer tests than we used to. Is the plan to bring that up to parity with what it used to be?

leobalter · 2017-11-01T18:23:11Z

@NTillmann I'm actually using only a subset of test files to run while I try to fix an unexpected memory leak if we run the tests against everything.

You can find the subset definition here: https://github.com/facebook/prepack/pull/1122/files#diff-4ebecfb92903473e83654d42ad5c892cR299

This line can be dropped out to run the tests for every folder.

This is a work in progress, I'm working on planned improvements, but due to it's size, I wanted to have it open as a PR here so we can apply any prioritization if necessary.

hermanventer · 2017-11-02T03:37:46Z

Is there a way to exclude tests for errors that should be caught during parsing? For example all tests that expect syntax errors.

hermanventer · 2017-11-02T03:42:09Z

scripts/test262.js

+    return new Result(true);
+  } catch (err) {
+    if (err.value && err.value.$Prototype && err.value.$Prototype.intrinsicName === "SyntaxError.prototype") {
+      // Skip test


I'm not clear on the intent of this check. Perhaps expand the comment with the rationale?

This is not much clear for me as well. I reused most of the logic from the original text execution.

https://github.com/facebook/prepack/blob/master/scripts/test262-runner.js#L994-L995

My plan is to first tackle the stream events but I was taking so long to do a better refactoring for the logics to understand the execution result. This is for sure a place for improvement, but as far as I can tell, this is some sort of test skip for early SyntaxErrors, does it sound right?

We do want to skip tests that check for early SyntaxErrors, because dealing with those is out of scope for Prepack. Ideally, there would be a tag in the tests that we can use to exclude them. Absent such a tag, the old framework resorts to running the test and then ignoring the result if it fails with a SyntaxError. It seems from the output of the test run that many tests are still visibly failing because of late SyntaxError detection, so something needs to get fixed here.

sure! We have the negative tags in the frontmatter metadata, and I'm using that to capture the filtering.

With the filter on I can work to improve this verification process for the execution results.

leobalter · 2017-11-02T15:10:40Z

Is there a way to exclude tests for errors that should be caught during parsing? For example all tests that expect syntax errors.

I can add a filter for negative tests. I've been thinking if a separation for runtime or early errors in the negative flags in the yaml config file is fine to start:

negative:
  - runtime # filter all the runtime errors specified by the negative metadata flag 
  - early # filter all the tests for early errors

What do you think? I'll push a patch for this today.

leobalter · 2017-11-02T21:29:44Z

Ok, I pushed a custom arguments features with most of the features already present at the old test262-runner file.

The new commit bits are also using the negative filter for early errors.

Everything is passing locally on the flow check and eslint processes. I need to wait a bit more for the CI results.

leobalter · 2017-11-02T21:30:43Z

Running the tests locally: https://gist.github.com/leobalter/4ea72144711f597b34c30681fc447c89

leobalter · 2017-11-03T16:05:57Z

I squashed and rebased the commits fixing a conflict in the yarn.lock file.

leobalter · 2017-11-08T16:56:29Z

Please let me know if there is any other feedback for this work. When it's good to land I can fix the yarn.lock conflict, rebase and push it again.

hermanventer · 2017-11-10T01:00:10Z

The filtering feature looks great and I would love to see us making use of it. Unfortunately, this test runner in its current form has quite a way to go before it gets to feature parity with the existing test runner.

Perhaps you can just add this as an additional test runner (running against a different clone of the test262 repository) for the time being. We can then gradually add the missing functionality and make the final switch over only when the development workflow will not be impacted by it.

hermanventer · 2017-11-10T01:00:34Z

package.json

@@ -34,7 +34,7 @@
    "test-serializer": "babel-node --stack_trace_limit=200 --stack_size=10000 scripts/test-runner.js",
    "test-serializer-with-coverage": "./node_modules/.bin/istanbul cover ./lib/test-error-handler.js --dir coverage.error && ./node_modules/.bin/istanbul cover ./lib/test-runner.js && ./node_modules/.bin/remap-istanbul -i coverage.error/coverage.json -i coverage/coverage.json -o coverage-sourcemapped -t html",
    "test-sourcemaps": "babel-node scripts/generate-sourcemaps-test.js && bash < scripts/test-sourcemaps.sh",
-    "test-test262": "babel-node scripts/test262-runner.js",


Let's keep this.

hermanventer · 2017-11-10T01:00:56Z

package.json

@@ -34,7 +34,7 @@
    "test-serializer": "babel-node --stack_trace_limit=200 --stack_size=10000 scripts/test-runner.js",
    "test-serializer-with-coverage": "./node_modules/.bin/istanbul cover ./lib/test-error-handler.js --dir coverage.error && ./node_modules/.bin/istanbul cover ./lib/test-runner.js && ./node_modules/.bin/remap-istanbul -i coverage.error/coverage.json -i coverage/coverage.json -o coverage-sourcemapped -t html",
    "test-sourcemaps": "babel-node scripts/generate-sourcemaps-test.js && bash < scripts/test-sourcemaps.sh",
-    "test-test262": "babel-node scripts/test262-runner.js",
+    "test-test262": "babel-node scripts/test262.js",


name this test-test262-new

leobalter · 2017-11-10T19:35:00Z

Unfortunately, this test runner in its current form has quite a way to go before it gets to feature parity with the existing test runner.

Would you mind listing what is missing for for feature parity from your perspective?

While I'm working to separate both runners, I'd be interested to support fully migrating to the new one only.

hermanventer · 2017-11-10T21:32:25Z

The most important pain point is the lack of concurrency. It is important that on a beefy laptop the test runner uses all available resources. On my laptop the entire suite runs in 1.30 mins.

Next, it is important to control console spew. When running the tests for validation purposes I want to see at most a concise summary of what failed. Preferably (and this is not realized in the current runner, but could be) I want to hear only about new failures.

Lastly, it must be easy to run an individual test in a single thread in such a way that VSCode can run it in debug mode without much fuss.

hermanventer · 2017-11-13T21:31:17Z

Using the updated version test/test262 causes the existing test runner to skip many more tests, hence the overall test fails because it still expect to run (and pass) many more tests. You can either update the expected number of passing tests, or use a separate directory for the updated test262.

hermanventer · 2017-11-30T20:09:50Z

Are you planning to do any more work on this?

leobalter · 2017-12-01T03:43:56Z

Are you planning to do any more work on this?

Sure thing. We are finally past the TC39 meeting and I'm dedicating some time this week to work on this branch.

This method provides an alternative way to run Test262 through a stream of compiled test contents for each scenario (default, strict mode), including all the necessary harness contents. The logic to capture an execution success or failure is similar to the original one - from scripts/test262-runner.js - and improvements should be done as follow ups. The filtering system is now done through a configuration yaml file, using tests metadata and path locations. This filter is used as an object that can be extended with further logic, but still offers a single point to check for filtering. The report system requires some improvements and these should also be done as follow-ups. For now they provide a report for each folder and the total results. Although, the results data contain enough information to highly expand the report. Some further improvements are expected and planned, this work should be at least ready for an initial round for feedback review.

The ES5 and ES6 ids are legacy flags from Test262 and they are mostly used to flag features implemented by ES5.1 or ES6 without any update in consecutive spec editions. They contain a matching reference for the section number of the respective spec edition. New tests are supposed to have only the esid tag for matching sections by their respective HTML anchor ids. With the recent updates on Test262 this number is changing so the threshold for failures on tests with the es5id and es6id tags needs to reflect this same change.

leobalter · 2017-12-01T04:33:35Z

I squashed the previous commits and added f5f0a6a to fix the CI run. Tests are running just fine from my end and I believe the CI should be green anytime soon.

Using the updated version test/test262 causes the existing test runner to skip many more tests, hence the overall test fails because it still expect to run (and pass) many more tests. You can either update the expected number of passing tests, or use a separate directory for the updated test262.

I just reused the same directory for Test262 to avoid any extra noise. Not only because of filters, I expect less es5id and es6id with time, as improved/refactored tests might not reuse them. Test262 could definitely have a better flag system for features added in specific edition or time, but those legacy tags are not the best to maintain on Test262.

Meanwhile, I have some thoughts based on your feedback we can follow up, probably after this PR.

While not faster, the new test262 has an easy way to define specific folders to run the tests. Using workers is definitely interesting for a development PoV. Getting faster results is really a nice to have feature.

One thing we could do is also work together to define and design the desired output for the runner. I'm sure some conversations would allow us to find what is humanly desirable for consuming a large set of test results.

I'm totally game - if time allows me - to have a test runner iterating only over new or modified tests, as the results are already exported and we can reuse them. This should also be something under a flag (or a parameter) as new versions of Prepack would probably require a new run through all the tests.

hermanventer · 2017-12-01T04:54:17Z

Running the full test suite in parallel is an important scenario for us as we often tweak the interpreter to deal with abstract values. We also would like to improve the interpreter to the point where it can truly be a reference implementation for the specification. All this is easier if we can quickly zero in on any test that starts failing because of a change we made.

Probably the best way to get there is to have a base line file for recording the expected failures and then to have the runner only display differences from the base line. Some care will be needed to make the results deterministic even when running in parallel.

facebook-github-bot

@hermanventer is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@hermanventer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed label Oct 31, 2017

hermanventer reviewed Nov 2, 2017

View reviewed changes

This was referenced Nov 2, 2017

Add a filter for negative phase bocoup/test262-integrator#6

Merged

Feedback request: use node tooling for checking Test262 tests? facebook/flow#5247

Open

leobalter force-pushed the test262 branch from ae2f3d8 to 0bfaa4c Compare November 2, 2017 21:19

leobalter force-pushed the test262 branch from 4d57e77 to b4c3487 Compare November 3, 2017 16:05

hermanventer suggested changes Nov 10, 2017

View reviewed changes

leobalter force-pushed the test262 branch 2 times, most recently from 846e818 to a6617a6 Compare November 10, 2017 19:42

leobalter added 2 commits November 30, 2017 22:51

leobalter force-pushed the test262 branch from 6fd36e6 to f5f0a6a Compare December 1, 2017 04:07

hermanventer approved these changes Dec 1, 2017

View reviewed changes

facebook-github-bot reviewed Dec 1, 2017

View reviewed changes

facebook-github-bot closed this in df0f9e1 Dec 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use integration api to run Test262 tests #1122

Use integration api to run Test262 tests #1122

leobalter commented Oct 31, 2017

leobalter commented Oct 31, 2017

hermanventer commented Oct 31, 2017

leobalter commented Oct 31, 2017

leobalter commented Nov 1, 2017

NTillmann commented Nov 1, 2017

leobalter commented Nov 1, 2017

hermanventer commented Nov 2, 2017

hermanventer Nov 2, 2017

leobalter Nov 2, 2017

hermanventer Nov 2, 2017

leobalter Nov 2, 2017

leobalter commented Nov 2, 2017

leobalter commented Nov 2, 2017

leobalter commented Nov 2, 2017

leobalter commented Nov 3, 2017

leobalter commented Nov 8, 2017

hermanventer commented Nov 10, 2017

hermanventer Nov 10, 2017

hermanventer Nov 10, 2017

leobalter commented Nov 10, 2017

hermanventer commented Nov 10, 2017 •

edited

Loading

hermanventer commented Nov 13, 2017

hermanventer commented Nov 30, 2017

leobalter commented Dec 1, 2017

leobalter commented Dec 1, 2017

hermanventer commented Dec 1, 2017

facebook-github-bot left a comment

facebook-github-bot left a comment

Use integration api to run Test262 tests #1122

Use integration api to run Test262 tests #1122

Conversation

leobalter commented Oct 31, 2017

leobalter commented Oct 31, 2017

hermanventer commented Oct 31, 2017

leobalter commented Oct 31, 2017

leobalter commented Nov 1, 2017

NTillmann commented Nov 1, 2017

leobalter commented Nov 1, 2017

hermanventer commented Nov 2, 2017

hermanventer Nov 2, 2017

Choose a reason for hiding this comment

leobalter Nov 2, 2017

Choose a reason for hiding this comment

hermanventer Nov 2, 2017

Choose a reason for hiding this comment

leobalter Nov 2, 2017

Choose a reason for hiding this comment

leobalter commented Nov 2, 2017

leobalter commented Nov 2, 2017

leobalter commented Nov 2, 2017

leobalter commented Nov 3, 2017

leobalter commented Nov 8, 2017

hermanventer commented Nov 10, 2017

hermanventer Nov 10, 2017

Choose a reason for hiding this comment

hermanventer Nov 10, 2017

Choose a reason for hiding this comment

leobalter commented Nov 10, 2017

hermanventer commented Nov 10, 2017 • edited Loading

hermanventer commented Nov 13, 2017

hermanventer commented Nov 30, 2017

leobalter commented Dec 1, 2017

leobalter commented Dec 1, 2017

hermanventer commented Dec 1, 2017

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

hermanventer commented Nov 10, 2017 •

edited

Loading