[lexical-playground] Chore: Audit and de-flake the e2e suite (remove all @flaky tags)#8585
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Audited all 32 @flaky e2e tests against the CI flaky-job configuration: static playground build served on :4000, collab websocket on :1234, chromium + firefox x {rich-text, plain-text, rich-text-with-collab}, --repeat-each=10 --retries=0 --workers=4, plus a --repeat-each=20 deep collab pass. Each test got ~40-80 runs. (An initial run against the live vite dev server on :3000 showed ~80 failures, but that was environmental noise from on-the-fly compilation under load; against the CI-equivalent static build only the tests below ever failed.) Removed @flaky from 26 tests that never failed across the entire audit. Kept @flaky (still intermittently fail, fixed separately): - ClearFormatting: Should preserve the default styling of hashtags and mentions - TextFormatting: Regression facebook#2523 can toggle format across a decorator - Toolbar: Insert image caption + table - Tables: Select multiple merged cells (selection expands to a rectangle) - Tables: Can align text using Table selection - ListsCopyAndPaste: Copy and paste of partial list items into the list The large diff is prettier re-indenting test bodies after the options argument was dropped; there are no semantic changes to any test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se, collab connect wait) Investigated the still-@flaky tests. Root causes and fixes: - exposeLexicalEditor (collab setup): the wait for the right frame's ".action-button.connect" to read "Disconnect" used the default 5s expect timeout. Under parallel load the shared y-websocket connect exceeds 5s, which was the dominant source of collab @flaky failures across the whole suite. Bumped that wait (and the editor visibility check) to 30s. - ClearFormatting "Should preserve the default styling of hashtags and mentions": waited for any "@luke" typeahead item then pressed Enter. While "@luke" is still being typed the partial query "@lu" also matches "Agent Kallus" (kal-LU-s), which sorts earlier and is highlighted, so Enter selected it. Now wait for "Luke Skywalker" to be the aria-selected option. Verified 30/30 firefox+collab (was ~25% failure). Tag removed. - Toolbar "Insert image caption + table": the image renders behind React.Suspense (fallback={null}) and only appears after the asset loads, which can exceed assertHTML's 5s window under load. Wait for ".editor-image img" (30s) before asserting. Tag removed. The remaining 4 @flaky tests (TextFormatting facebook#2523, Tables "Select multiple merged cells", Tables "Can align text using Table selection", ListsCopyAndPaste "partial list items into the list") only fail via a rarer shared collab right-iframe boot stall (~1/60, chromium+collab only), which the 30s connect wait reduces but does not eliminate; left tagged pending a harness-level fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…alls; untag remaining @flaky tests The remaining @flaky tests (TextFormatting facebook#2523, Tables "Select multiple merged cells", Tables "Can align text using Table selection", ListsCopyAndPaste "partial list items into the list") only failed via a shared, rare (~1/60, chromium+collab) collab-setup stall: under parallel load one split-view iframe occasionally fails to boot / activate collab within the timeout, so its ".action-button.connect" toolbar button never appears and initialize() fails. This affects every collab test, not these four specifically. exposeLexicalEditor now retries the collab-frame readiness check, reloading the page between attempts (up to 3×, 15s each), so a transient boot/connect hiccup during setup recovers instead of failing the test. Validated under stress against the static build on :4000: 200/200 chromium+collab and 120/120 firefox+collab with retries=0 (previously ~3 failures in a comparable sample). With the stall handled, all four tests are stable, so their @flaky tags are removed. No @flaky tags remain in the e2e suite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Removing the last @flaky tags made the flaky CI jobs run `playwright test --grep "@flaky"`, which now matches zero tests and errors with "No tests found". Pass --pass-with-no-tests on the flaky invocations so the jobs are a no-op success, and still run normally if a @flaky test is re-introduced later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
potatowagon
approved these changes
May 29, 2026
zurfyx
approved these changes
May 29, 2026
etrepum
pushed a commit
to etrepum/lexical
that referenced
this pull request
May 30, 2026
All @flaky tags were removed in facebook#8585 and the suite has been de-flaked, so the machinery for splitting "flaky" tests out of CI is dead weight that only invites re-adding flaky tests. Remove it: - call-e2e-all-tests.yml: drop the (already `if: false`) `flaky` job. - call-e2e-test.yml: drop the `flaky` input, `continue-on-error`, the `--grep`/`--grep-invert "@flaky"` (+ `--pass-with-no-tests`) toggles, and the `flaky` segment of the artifact name. Steps now just run the suite. - package.json: drop `--grep-invert "@flaky"` from the *-ci-* e2e scripts. No test carried an actual `@flaky` tag, so this is behavior-preserving.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Cleans up the Playwright e2e suite: removes stale browser workarounds, audits every
@flakytest, fixes the ones that are genuinely flaky, and removes all@flakytags. No library/product code is changed — this is test-suite maintenance only.1. Drop stale Firefox workarounds (follow-up to the selection fix in #8582)
test.skipon Move left from last node in RTL Bug: Selection movement incorrect for RTL going back from last node in paragraph #7775 — it passes on the current Firefox.Navigation.specwhose values were already identical to the default. Other Firefox branches were verified still-needed (inline-node caret positions,selectAllroot selection, decorator focus) or left alone because they encodeIS_WINDOWS-specific behavior that can't be verified from Linux.2. Audit every
@flakytest (32 total)Ran the suite against the CI-equivalent setup (static playground build on
:4000+ collab websocket on:1234, chromium + firefox × {rich-text, plain-text, rich-text-with-collab},--retries=0,--repeat-each10–50). Note: running against the live dev server inflates flakiness massively, so a built static server is required to get representative results.3. Fix the genuinely-flaky tests
exposeLexicalEditorwaited for the right frame's connect button with the default 5s timeout, and under parallel load a split-view iframe occasionally fails to connect or to boot/activate collab at all (~1/60, chromium+collab). The readiness check now retries with apage.reload()between attempts (3×, 15s each), so a transient setup hiccup recovers instead of failing the test. This hardens every collab test.ClearFormattingmention typeahead: pressed Enter on a partial-query menu (@Lumatches "Agent Kallus", highlighted first), selecting the wrong mention. Now waits for "Luke Skywalker" to be thearia-selectedoption.Toolbarimage insert: the image renders behindReact.Suspense(fallback={null}) and only appears after the asset loads, which can exceed the 5s assert under load. Now waits for.editor-image imgbefore asserting.Result: no
@flakytags remain in the e2e suite, and the collab test harness is materially more robust.Test plan
Automated — Playwright, against the CI-equivalent static build (
:4000) + collab server (:1234),--retries=0:Before
ClearFormatting"default styling of hashtags and mentions": ~25% failure on firefox + collab (wrong mention selected).Toolbar"Insert image caption + table": intermittent empty image decorator.exposeLexicalEditortimeout waiting for the connect button) under--repeat-eachstress; 3 failures in a 180-run chromium+collab sample.After
ClearFormatting: 30/30 firefox + collab.--repeat-each=50/30,--retries=0).@flakyset across all 6 CI configs: stable;playwright test --listparses all 2124 tests.