Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feature(xtask): add encoding detection to compare #1907

Merged
merged 1 commit into from
Jan 7, 2022

Conversation

leops
Copy link
Contributor

@leops leops commented Jan 7, 2022

Summary

On PowerShell the default behavior for pipes it to output an UTF-16 encoded file, so cargo xtask coverage --json > results.json will create a file that can't be decoded directly by cargo xtask compare since serde_json and Rust in general assumes an UTF-8 encoding.

This adds a simple encoding detection logic to the loading of test results file that detects (and skips over) Unicode Byte Order Mark, and re-encodes UTF-16 content into UTF-8 before handing it to serde_json

@github-actions
Copy link

github-actions bot commented Jan 7, 2022

Test262 comparison coverage results on windows-latest

Test result main count This PR count Difference
Total 44951 44951 0
Passed 42866 42866 0
Failed 2084 2084 0
Panics 1 1 0
Coverage 95.36% 95.36% 0.00%

@Boshen
Copy link
Contributor

Boshen commented Jan 7, 2022

Perhaps it is better to configure your Powershell to output utf-8 instead? I see a lot of people do this on stackoverflow 😃

@ematipico
Copy link
Contributor

I'm torned here, both arguments are valid. Although, so far we haven't had this issue even though we have people that use Windows as their primary OS. So my question is, how likely people would encounter this issue?

We should also highlight the fact that these commands are there for us maintainers/contributors. If this error might occur often, especially for contributors, maybe it's worth fixing it.

Although this is also an information that we can add inside the CONTRIBUTING.md section. What do you think, team?

@MichaReiser
Copy link
Contributor

I'm in favour of this change. This is an issue for everyone using the "classic" Windows PowerShell (not the cross-platform version) as explained in the stack overflow article that @Boshen linked.

It only adds little complexity and eases the setup of many. I also don't think that it should require engineers to change their dev tooling to work on Rome because they may then end up in a situation where they need UTF8 for Rome but UTF16 for some other project they're working on.

@leops
Copy link
Contributor Author

leops commented Jan 7, 2022

I think it's probably a better solution to use a shell that emits UTF-8 files, but if anyone like me is also using PowerShell because it's the default shell that comes with Windows, then even the latest Windows 11 seems to ship with Windows PowerShell (5.1) which always emits a BOM that serde will fail on. Configuring Powershell is not that easy because besides creating a profile script you also need to change the execution policy to allow it to run, and if you go in there you're just better off installing PowerShell Core or a completely different shell anyway. Overall this just an annoying speed bump for the Windows-based contributor experience, not a major issue but it just adds unnecessary friction for first-time contributors.

This adds a simple encoding detection logic to the loading of test results file that detects (and skips over) Unicode Byte Order Mark, and re-encodes UTF-16 content into UTF-8 before handing it to serde_json
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 7, 2022

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: b5df252
Status: ✅  Deploy successful!
Preview URL: https://2b7ff3a5.tools-8rn.pages.dev

View logs

@leops leops merged commit d448edb into main Jan 7, 2022
@leops leops deleted the feature/compare-unicode branch January 7, 2022 13:11
file.read_to_end(&mut buffer)
.unwrap_or_else(|err| panic!("Can't read the file of the {} results: {:?}", name, err));

enum FileEncoding {
Copy link
Contributor

@xunilrj xunilrj Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering here if we can use something like: https://github.com/kena0ki/aconv or https://gitlab.com/philbooth/unicode-bom

@xunilrj
Copy link
Contributor

xunilrj commented Jan 7, 2022

I'm in favour of this change [..] eases the setup of many.

+1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants