New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: wpt upstreamer fails when git process output contains invalid UTF-8 #29620
Labels
A-build
Related to or part of the build process
Comments
mrobinson
added a commit
to mrobinson/servo
that referenced
this issue
Apr 12, 2023
The output of `git diff` and `git show` can include non-UTF8 text or binary data, so the WPT upstream script should handle this properly. Fixes servo#29620.
4 tasks
mrobinson
added a commit
to mrobinson/servo
that referenced
this issue
Apr 12, 2023
The output of `git diff` and `git show` can include non-UTF8 text or binary data, so the WPT upstream script should handle this properly. Fixes servo#29620.
mrobinson
added a commit
to mrobinson/servo
that referenced
this issue
Apr 12, 2023
The output of `git diff` and `git show` can include non-UTF8 text or binary data, so the WPT upstream script should handle this properly. Fixes servo#29620.
bors-servo
added a commit
that referenced
this issue
Apr 12, 2023
Handle non-UTF8 diffs in the WPT upstream script The output of `git diff` and `git show` can include non-UTF8 text or binary data, so the WPT upstream script should handle this properly. Fixes #29620. <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes fix #29620 - [x] There are tests for these changes <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Out of curiosity, I tried running the new upstreamer (#29601) against #29594 with a merge from master, which is currently more challenging for the script because it applies the commits one at a time.
https://github.com/servo/servo/actions/runs/4675613330/jobs/8280932770?pr=29594
As expected, the script processed a lot of WPT changes made outside my PR, but it failed on them in an interesting way:
This is because LocalGitRepo.run assumes that the output can be decoded as UTF-8, but git-diff(1) and git-show(1) output can contain arbitrary high bytes unless git gives up and decides the file is binary.
We should leave git-diff(1) and git-show(1) output as bytes where possible, and if we really need to process them as str (for example, to find
---
and+++
), we should consider using surrogateescape to treat it as ASCII with opaque high bytes.We should also check that we handle weird paths correctly, even though Windows and macOS support means we won’t need to deal with any invalid UTF-8. Most git commands like
git status
andgit diff
will use a special double-quote-octal escaping for paths with any high bytes (or double quotes), whereas-z
modes likegit status -z
andgit diff --numstat -z
will happily spit out arbitrary high bytes.more details
Inspecting the output with utf8check:
The three � characters are A2h bytes, which are valid in Windows-1252 but not UTF-8.
The text was updated successfully, but these errors were encountered: