Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Use integration api to run Test262 tests #1122

Closed
wants to merge 2 commits into from

Conversation

leobalter
Copy link
Contributor

This method provides an alternative way to run Test262 through a
stream of compiled test contents for each scenario (default, strict
mode), including all the necessary harness contents.

The logic to capture an execution success or failure is similar to
the original one - from scripts/test262-runner.js - and improvements
should be done as follow ups.

The filtering system is now done through a configuration yaml file,
using tests metadata and path locations. This filter is used as an
object that can be extended with further logic, but still offers a
single point to check for filtering.

The report system requires some improvements and these should also
be done as follow-ups. For now they provide a report for each
folder and the total results. Although, the results data contain
enough information to highly expand the report.

Some further improvements are expected and planned, this work
should be at least ready for an initial round for feedback review.

@leobalter
Copy link
Contributor Author

I'm expecting some feedback for this PR.

It works, but as said, it can use some further improvements. Any feedback will be really helpful to prioritize anything.

cc @sebmarkbage

@hermanventer
Copy link
Contributor

It would be great if you can fix the lint and flow errors before I look at it in detail. Also, the runner itself complains about the version of node not being supported. There is also an out of memory situation, it seems.

@leobalter
Copy link
Contributor Author

Sure thing. I was using the tests in a small subset of files and now I noticed the there is some uncaught timeout issue, perhaps.

I'll push a fixed code soon.

@leobalter
Copy link
Contributor Author

it should pass on the lint and the flow check tasks.

@NTillmann
Copy link
Contributor

This new way of running test262 seems to run way fewer tests than we used to. Is the plan to bring that up to parity with what it used to be?

@leobalter
Copy link
Contributor Author

@NTillmann I'm actually using only a subset of test files to run while I try to fix an unexpected memory leak if we run the tests against everything.

You can find the subset definition here: https://github.com/facebook/prepack/pull/1122/files#diff-4ebecfb92903473e83654d42ad5c892cR299

This line can be dropped out to run the tests for every folder.

This is a work in progress, I'm working on planned improvements, but due to it's size, I wanted to have it open as a PR here so we can apply any prioritization if necessary.

@hermanventer
Copy link
Contributor

Is there a way to exclude tests for errors that should be caught during parsing? For example all tests that expect syntax errors.

return new Result(true);
} catch (err) {
if (err.value && err.value.$Prototype && err.value.$Prototype.intrinsicName === "SyntaxError.prototype") {
// Skip test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on the intent of this check. Perhaps expand the comment with the rationale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not much clear for me as well. I reused most of the logic from the original text execution.

https://github.com/facebook/prepack/blob/master/scripts/test262-runner.js#L994-L995

My plan is to first tackle the stream events but I was taking so long to do a better refactoring for the logics to understand the execution result. This is for sure a place for improvement, but as far as I can tell, this is some sort of test skip for early SyntaxErrors, does it sound right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do want to skip tests that check for early SyntaxErrors, because dealing with those is out of scope for Prepack. Ideally, there would be a tag in the tests that we can use to exclude them. Absent such a tag, the old framework resorts to running the test and then ignoring the result if it fails with a SyntaxError. It seems from the output of the test run that many tests are still visibly failing because of late SyntaxError detection, so something needs to get fixed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! We have the negative tags in the frontmatter metadata, and I'm using that to capture the filtering.

With the filter on I can work to improve this verification process for the execution results.

@leobalter
Copy link
Contributor Author

Is there a way to exclude tests for errors that should be caught during parsing? For example all tests that expect syntax errors.

I can add a filter for negative tests. I've been thinking if a separation for runtime or early errors in the negative flags in the yaml config file is fine to start:

negative:
  - runtime # filter all the runtime errors specified by the negative metadata flag 
  - early # filter all the tests for early errors

What do you think? I'll push a patch for this today.

@leobalter
Copy link
Contributor Author

Ok, I pushed a custom arguments features with most of the features already present at the old test262-runner file.

The new commit bits are also using the negative filter for early errors.

Everything is passing locally on the flow check and eslint processes. I need to wait a bit more for the CI results.

@leobalter
Copy link
Contributor Author

@leobalter
Copy link
Contributor Author

I squashed and rebased the commits fixing a conflict in the yarn.lock file.

@leobalter
Copy link
Contributor Author

Please let me know if there is any other feedback for this work. When it's good to land I can fix the yarn.lock conflict, rebase and push it again.

@hermanventer
Copy link
Contributor

The filtering feature looks great and I would love to see us making use of it. Unfortunately, this test runner in its current form has quite a way to go before it gets to feature parity with the existing test runner.

Perhaps you can just add this as an additional test runner (running against a different clone of the test262 repository) for the time being. We can then gradually add the missing functionality and make the final switch over only when the development workflow will not be impacted by it.

package.json Outdated
@@ -34,7 +34,7 @@
"test-serializer": "babel-node --stack_trace_limit=200 --stack_size=10000 scripts/test-runner.js",
"test-serializer-with-coverage": "./node_modules/.bin/istanbul cover ./lib/test-error-handler.js --dir coverage.error && ./node_modules/.bin/istanbul cover ./lib/test-runner.js && ./node_modules/.bin/remap-istanbul -i coverage.error/coverage.json -i coverage/coverage.json -o coverage-sourcemapped -t html",
"test-sourcemaps": "babel-node scripts/generate-sourcemaps-test.js && bash < scripts/test-sourcemaps.sh",
"test-test262": "babel-node scripts/test262-runner.js",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this.

package.json Outdated
@@ -34,7 +34,7 @@
"test-serializer": "babel-node --stack_trace_limit=200 --stack_size=10000 scripts/test-runner.js",
"test-serializer-with-coverage": "./node_modules/.bin/istanbul cover ./lib/test-error-handler.js --dir coverage.error && ./node_modules/.bin/istanbul cover ./lib/test-runner.js && ./node_modules/.bin/remap-istanbul -i coverage.error/coverage.json -i coverage/coverage.json -o coverage-sourcemapped -t html",
"test-sourcemaps": "babel-node scripts/generate-sourcemaps-test.js && bash < scripts/test-sourcemaps.sh",
"test-test262": "babel-node scripts/test262-runner.js",
"test-test262": "babel-node scripts/test262.js",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name this test-test262-new

@leobalter
Copy link
Contributor Author

Unfortunately, this test runner in its current form has quite a way to go before it gets to feature parity with the existing test runner.

Would you mind listing what is missing for for feature parity from your perspective?

While I'm working to separate both runners, I'd be interested to support fully migrating to the new one only.

@leobalter leobalter force-pushed the test262 branch 2 times, most recently from 846e818 to a6617a6 Compare November 10, 2017 19:42
@hermanventer
Copy link
Contributor

hermanventer commented Nov 10, 2017

The most important pain point is the lack of concurrency. It is important that on a beefy laptop the test runner uses all available resources. On my laptop the entire suite runs in 1.30 mins.

Next, it is important to control console spew. When running the tests for validation purposes I want to see at most a concise summary of what failed. Preferably (and this is not realized in the current runner, but could be) I want to hear only about new failures.

Lastly, it must be easy to run an individual test in a single thread in such a way that VSCode can run it in debug mode without much fuss.

@hermanventer
Copy link
Contributor

Using the updated version test/test262 causes the existing test runner to skip many more tests, hence the overall test fails because it still expect to run (and pass) many more tests. You can either update the expected number of passing tests, or use a separate directory for the updated test262.

@hermanventer
Copy link
Contributor

Are you planning to do any more work on this?

@leobalter
Copy link
Contributor Author

Are you planning to do any more work on this?

Sure thing. We are finally past the TC39 meeting and I'm dedicating some time this week to work on this branch.

This method provides an alternative way to run Test262 through a
stream of compiled test contents for each scenario (default, strict
mode), including all the necessary harness contents.

The logic to capture an execution success or failure is similar to
the original one - from scripts/test262-runner.js - and improvements
should be done as follow ups.

The filtering system is now done through a configuration yaml file,
using tests metadata and path locations. This filter is used as an
object that can be extended with further logic, but still offers a
single point to check for filtering.

The report system requires some improvements and these should also
be done as follow-ups. For now they provide a report for each
folder and the total results. Although, the results data contain
enough information to highly expand the report.

Some further improvements are expected and planned, this work
should be at least ready for an initial round for feedback review.
The ES5 and ES6 ids are legacy flags from Test262 and they are mostly used to flag features implemented by ES5.1 or ES6 without any update in consecutive spec editions. They contain a matching reference for the section number of the respective spec edition. New tests are supposed to have only the esid tag for matching sections by their respective HTML anchor ids.

With the recent updates on Test262 this number is changing so the threshold for failures on tests with the es5id and es6id tags needs to reflect this same change.
@leobalter
Copy link
Contributor Author

I squashed the previous commits and added f5f0a6a to fix the CI run. Tests are running just fine from my end and I believe the CI should be green anytime soon.

Using the updated version test/test262 causes the existing test runner to skip many more tests, hence the overall test fails because it still expect to run (and pass) many more tests. You can either update the expected number of passing tests, or use a separate directory for the updated test262.

I just reused the same directory for Test262 to avoid any extra noise. Not only because of filters, I expect less es5id and es6id with time, as improved/refactored tests might not reuse them. Test262 could definitely have a better flag system for features added in specific edition or time, but those legacy tags are not the best to maintain on Test262.


Meanwhile, I have some thoughts based on your feedback we can follow up, probably after this PR.

While not faster, the new test262 has an easy way to define specific folders to run the tests. Using workers is definitely interesting for a development PoV. Getting faster results is really a nice to have feature.

One thing we could do is also work together to define and design the desired output for the runner. I'm sure some conversations would allow us to find what is humanly desirable for consuming a large set of test results.

I'm totally game - if time allows me - to have a test runner iterating only over new or modified tests, as the results are already exported and we can reuse them. This should also be something under a flag (or a parameter) as new versions of Prepack would probably require a new run through all the tests.

@hermanventer
Copy link
Contributor

Running the full test suite in parallel is an important scenario for us as we often tweak the interpreter to deal with abstract values. We also would like to improve the interpreter to the point where it can truly be a reference implementation for the specification. All this is easier if we can quickly zero in on any test that starts failing because of a change we made.

Probably the best way to get there is to have a base line file for recording the expected failures and then to have the runner only display differences from the base line. Some care will be needed to make the results deterministic even when running in parallel.

Copy link

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hermanventer is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hermanventer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants