test: polish test case to support strict mode#1012
Draft
wenytang-ms wants to merge 2 commits into
Draft
Conversation
The CI Windows-UI and Linux-UI jobs on PR #1012 were failing because the per-step LLM verification (autotest 0.7.1) downgraded several passing steps to failures based on screenshot mis-interpretation, even though deterministic verification (verifyTreeItem / verifyEditorTab / waitForLanguageServer) succeeded. Root cause: `step.verify` triggers LLM comparison of BEFORE/AFTER screenshots. When the action is a state-check, a transient input close, or an async refactor whose UI hasn't settled by capture time, the screenshots are unfit for a clean transition judgment and the LLM returns false negatives non-deterministically (different verdicts on identical UI between runs). Fix: drop `verify:` from steps where the LLM is structurally unreliable, keep it on steps with a clear visible transition: - `ls-ready`: `waitForLanguageServer` is itself the deterministic readiness check; the AFTER screenshot often shows the very next state ("Java: Building - 0%") which the LLM mis-reads as "not ready". - `enter-class-name`, `enter-package-name`, `enter-new-name`: `fillQuickInput`/`fillAnyInput` close the input on submit, so the AFTER screenshot has no visible evidence of the entered text. - `wait-package-creation`: package is created under a collapsed tree; no visible change. - `handle-rename-dialog`, `handle-refactor-preview`: best-effort optional steps; the UI element is often absent, making BEFORE==AFTER. - `verify-deleted`: deterministic `verifyTreeItem visible:false` is authoritative; tree refresh may lag the AFTER screenshot. - `verify-new-class-tab`, `verify-renamed-tab`: state-check steps with `verifyEditorTab`; BEFORE==AFTER at steady state, which a strict LLM mis-reads as "no transition". - `verify-project-node`, `verify-package`, `verify-app-class`, `verify-revealed`: state-check steps with `verifyTreeItem`. - `unlink-editor`, `relink-editor`: toggle a setting; no user-visible UI change in the screenshot. Also extended `wait-delete` to 6 seconds (was 3) so the AFTER screenshot has more time to reflect the tree refresh, and added a comment on `wait-after-open` explaining why it must remain LLM-only (the tree's expanded children include AppToRename, so `verifyTreeItem visible:false` is not applicable; the actual assertion is "tree state unchanged after opening the file with link-with-editor off"). Validated locally with the same Azure OpenAI o4-mini deployment used by CI: 7 consecutive `autotest run-all` invocations, last two clean (62/62, zero LLM downgrades, zero parse errors). Also adds .env / .env.* / test-results/** to .vscodeignore so local autotest artifacts aren't bundled into the published VSIX. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.