Use Python `backslashreplace` to avoid UNRESOLVED tests #4366

StephanTLavavej · 2024-02-03T12:38:29Z

Followup to #2145, which added a two-step process for decoding compiler and test output. See Python's Unicode documentation and bytes.decode documentation. Our decodeOutput(bytes) first tries bytes.decode(), i.e. bytes.decode(encoding='utf-8', errors='strict'), to handle EDG's UTF-8 output. 'strict' asks it to throw a UnicodeDecodeError (derived from UnicodeError). If that happens, we assume we're looking at MSVC's output in the active codepage, so we fall back to locale-aware decoding.

The problem was that this fallback was still 'strict', so if the test emitted unrecognized characters, we'd get another UnicodeDecodeError, this time uncaught. That causes the test to be reported as UNRESOLVED. It originally had incredibly confusing output, which #4323 improved by printing the contents of the UnicodeDecodeError.

How can we get a test printing unrecognized characters? I validated my fix with a deterministic puts("\x8d");. (I don't think it's worth adding a test to validate this part of the test harness.) We originally encountered it sporadically, due to heap corruption tracked by #4268. Due to an STL bug, those tests reliably corrupt the heap, but (due to a still-mysterious chain of events), as the UCRT attempts to print its "HEAP CORRUPTION DETECTED" message, when printing the block type (usually "Normal"), something is damaged and it occasionally prints garbage memory contents. (We did corrupt the heap, after all.)

By requesting 'backslashreplace' during the fallback conversion, we avoid uncaught exceptions here. This will allow the test to pass (if it simply happened to print bizarre characters during its execution) or fail (if it printed bizarre characters during its plunge into Mount Doom), with readable output captured in the logs thanks to the backslash escaping.

I considered also marking #4268's heap-corrupting tests as SKIPPED instead of FAIL. However, they do seem to be reliably failing with detected heap corruption, the only sporadic part was whether they corrupted the heap badly enough to damage the UCRT's message. Unless and until we start seeing sporadic "unexpected passes" here, I am inclined to leave them marked as FAIL. Thus #4308 will be truly fixed as we'll no longer encounter sporadic test run failures due to UNRESOLVED (the affected tests will FAIL but that's expected, so the test run as a whole will pass).

StephanTLavavej · 2024-02-05T18:19:11Z

I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

CaseyCarter

"Approve with suggestions" that we can address in a followup.

CaseyCarter · 2024-02-05T21:49:15Z

tests/utils/stl/util.py

@@ -77,8 +77,9 @@ def decodeOutput(bytes):
    try:
        return bytes.decode()
    except UnicodeError:
+        # Use 'backslashreplace' to avoid throwing another exception for unrecognized characters.


I'm not fond of "unrecognized characters" which suggests that the transcoder doesn't understand the source character set; the issue here is that tests sometimes emit byte sequences that aren't valid encoded characters. I'd prefer something like "when tests emit garbage bytes."

This is absolutely not worth resetting testing.

Use Python backslashreplace to avoid UNRESOLVED tests

e8091ce

StephanTLavavej added the test Related to test code label Feb 3, 2024

StephanTLavavej requested a review from a team as a code owner February 3, 2024 12:38

github-actions bot added this to Initial Review in Code Reviews Feb 3, 2024

StephanTLavavej moved this from Initial Review to Final Review in Code Reviews Feb 3, 2024

StephanTLavavej self-assigned this Feb 5, 2024

StephanTLavavej changed the title ~~Use Python backslashreplace to avoid UNRESOLVED tests~~ Use Python backslashreplace to avoid UNRESOLVED tests Feb 5, 2024

CaseyCarter approved these changes Feb 5, 2024

View reviewed changes

CaseyCarter moved this from Final Review to Ready To Merge in Code Reviews Feb 5, 2024

StephanTLavavej merged commit 81314e3 into microsoft:main Feb 6, 2024
37 checks passed

Code Reviews automation moved this from Ready To Merge to Done Feb 6, 2024

StephanTLavavej deleted the magic-decoder-ring branch February 6, 2024 08:56

StephanTLavavej mentioned this pull request Feb 15, 2024

Toolset update: VS 2022 17.10 Preview 1 #4392

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Python `backslashreplace` to avoid UNRESOLVED tests #4366

Use Python `backslashreplace` to avoid UNRESOLVED tests #4366

StephanTLavavej commented Feb 3, 2024

StephanTLavavej commented Feb 5, 2024

CaseyCarter left a comment

CaseyCarter Feb 5, 2024

Use Python backslashreplace to avoid UNRESOLVED tests #4366

Use Python backslashreplace to avoid UNRESOLVED tests #4366

Conversation

StephanTLavavej commented Feb 3, 2024

StephanTLavavej commented Feb 5, 2024

CaseyCarter left a comment

Choose a reason for hiding this comment

CaseyCarter Feb 5, 2024

Choose a reason for hiding this comment

Use Python `backslashreplace` to avoid UNRESOLVED tests #4366

Use Python `backslashreplace` to avoid UNRESOLVED tests #4366