π Feature Request
Allow a custom Playwright reporter to influence the exit code by modifying result.status in onTestEnd() or result.status in onEnd(). Currently, Playwright's reporter API is read-only β mutating results has no effect on the runner's exit code, which is computed independently after all reporters run.
Specifically, we'd like one of:
Option A (preferred): Make result.status writable in onTestEnd(test, result), with the runner respecting the final value when computing the exit code.
Option B: Make result.status writable in onEnd(result), allowing a reporter to downgrade the overall suite status before the exit code is determined.
Option C: A built-in config option (e.g., softFailFile: '.flaky-tests.json') that accepts a list of test names whose failures should not affect the exit code.
Additionally, for reliable test matching at scale, we'd benefit from a stable, deterministic, human-readable test identifier β something like titlePath().slice(1).join(' βΊ ') (the describe chain + test name, without the file path). The current matching mechanisms (grep/grepInvert regex, titlePath() with file path, testId hash) are either fragile, path-dependent, or opaque.
Example
class SoftFailReporter implements Reporter {
private flakyTests = loadFlakyTestList(); // from a JSON file generated by CI
onTestEnd(test: TestCase, result: TestResult) {
if (
result.status === "failed" &&
this.flakyTests.has(test.titlePath().slice(1).join(" βΊ "))
) {
// Capture real failure data to a separate artifact for telemetry
this.recordFlakyFailure(test, result);
// Downgrade failure so it doesn't affect exit code
result.status = "passed";
}
}
}
This reporter would:
- Load a list of known-flaky test names from a JSON file (generated by a CI pipeline querying test telemetry)
- Let all tests run normally under real conditions
- When a known-flaky test fails, capture the real failure to a separate artifact (for telemetry and auto-re-enable decisions)
- Rewrite the status so the failure doesn't break the build or pollute console output
Motivation
We manage a large monorepo (~200 packages, thousands of tests, 50+ engineers) and have struggled with flaky test management at scale:
-
Skipping flaky tests creates a blind spot β skipped tests aren't validated under real conditions. Our rate of disabling tests has outpaced the rate at which developers can fix and re-enable them.
-
Running skipped tests in a quarantine pipeline doesn't work either β these tests run in isolation, pass in a vacuum, get re-enabled, and promptly fail again under real CI conditions (CPU pressure, parallel execution, real network calls, etc.).
-
The approach that works (which we've built for Jest) is to always run all tests under real conditions, but suppress the build-breaking consequence of known-flaky failures. The test still executes, the failure is still recorded in artifacts, but the exit code stays 0. This gives us:
- Ground-truth signal β flaky tests run under the same conditions as every other test
- Confident auto-re-enabling β if a test passes N consecutive runs under full load, we can safely remove it from the flaky list
- No build disruption β known-flaky failures don't block PRs or CI
In Jest, this works because the reporter API passes mutable TestResult and AggregatedResult objects. We'd love to have the same capability in Playwright.
Prior art
- Jest: Mutable
AggregatedResult in reporter hooks (we use this today)
- pytest:
@pytest.mark.xfail(strict=False) β expected failures don't fail the suite
- RSpec:
pending blocks β failures in pending tests are recorded but don't fail the suite
We are a team at Microsoft and would be happy to submit a PR implementing this if the team is aligned on the approach. Happy to discuss the best design.
π Feature Request
Allow a custom Playwright reporter to influence the exit code by modifying
result.statusinonTestEnd()orresult.statusinonEnd(). Currently, Playwright's reporter API is read-only β mutating results has no effect on the runner's exit code, which is computed independently after all reporters run.Specifically, we'd like one of:
Option A (preferred): Make
result.statuswritable inonTestEnd(test, result), with the runner respecting the final value when computing the exit code.Option B: Make
result.statuswritable inonEnd(result), allowing a reporter to downgrade the overall suite status before the exit code is determined.Option C: A built-in config option (e.g.,
softFailFile: '.flaky-tests.json') that accepts a list of test names whose failures should not affect the exit code.Additionally, for reliable test matching at scale, we'd benefit from a stable, deterministic, human-readable test identifier β something like
titlePath().slice(1).join(' βΊ ')(the describe chain + test name, without the file path). The current matching mechanisms (grep/grepInvertregex,titlePath()with file path,testIdhash) are either fragile, path-dependent, or opaque.Example
This reporter would:
Motivation
We manage a large monorepo (~200 packages, thousands of tests, 50+ engineers) and have struggled with flaky test management at scale:
Skipping flaky tests creates a blind spot β skipped tests aren't validated under real conditions. Our rate of disabling tests has outpaced the rate at which developers can fix and re-enable them.
Running skipped tests in a quarantine pipeline doesn't work either β these tests run in isolation, pass in a vacuum, get re-enabled, and promptly fail again under real CI conditions (CPU pressure, parallel execution, real network calls, etc.).
The approach that works (which we've built for Jest) is to always run all tests under real conditions, but suppress the build-breaking consequence of known-flaky failures. The test still executes, the failure is still recorded in artifacts, but the exit code stays 0. This gives us:
In Jest, this works because the reporter API passes mutable
TestResultandAggregatedResultobjects. We'd love to have the same capability in Playwright.Prior art
AggregatedResultin reporter hooks (we use this today)@pytest.mark.xfail(strict=False)β expected failures don't fail the suitependingblocks β failures in pending tests are recorded but don't fail the suiteWe are a team at Microsoft and would be happy to submit a PR implementing this if the team is aligned on the approach. Happy to discuss the best design.