Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/benchstat: support optional ascii-only output #64210

Closed
extemporalgenome opened this issue Nov 16, 2023 · 11 comments
Closed

proposal: cmd/benchstat: support optional ascii-only output #64210

extemporalgenome opened this issue Nov 16, 2023 · 11 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal
Milestone

Comments

@extemporalgenome
Copy link
Contributor

extemporalgenome commented Nov 16, 2023

Some runes in the benchstat output, particularly µ and ±, but also super-script numbers (¹ ², etc), may not render well in some terminals (or when processed with some tools when LANG is not properly configured, such as with less).

I propose, either via CLI flag or environment variable detection, we instead output u and either +- or ~.

@gopherbot gopherbot added this to the Proposal milestone Nov 16, 2023
@ianlancetaylor ianlancetaylor added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 2, 2023
@rsc
Copy link
Contributor

rsc commented Dec 13, 2023

/cc @aclements

@rsc
Copy link
Contributor

rsc commented Dec 14, 2023

Personally I'm unconvinced. I run terminals other tools from 30 years ago and they can print these characters.

@rsc
Copy link
Contributor

rsc commented Dec 14, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@aclements
Copy link
Member

I agree with @rsc. Terminal environments have had good support for Unicode for a long time. If something is misconfigured resulting in broken Unicode support, it would seem better to fix that configuration than to have every tool that may print Unicode work around it. Is there a concrete issue that this causes?

@zephyrtronium
Copy link
Contributor

Anecdotally, I've found that PowerShell's insistence on UTF-16 for pipes causes strange behaviors with benchstat's non-ASCII characters when working on Windows. If I pipe benchstat into clip.exe and then paste into an editor, the special characters are replaced with a ? per byte of multibyte UTF-8 sequences. My solution has generally been to avoid touching benchmarks in Windows. (More so because getting benchstat to work in the first place requires re-encoding the benchmark outputs from UTF-16LE CRLF to UTF-8 LF, which is more effort than just launching WSL.)

@aclements
Copy link
Member

That actually sounds like a possible Go runtime bug to me. For console output, the runtime transparently reencodes UTF-8 as UTF-16, but this sounds like another place where it should potentially be doing that. On the other hand, pipes are also used for non-text data, so it might be impossible for the runtime to transparently tell what the right thing is.

@zephyrtronium , I'm also curious how you're collecting benchmark results such that they're in UTF-16LE.

@zephyrtronium
Copy link
Contributor

I'm also curious how you're collecting benchmark results such that they're in UTF-16LE.

If I run go test -short -bench . -count 6 >new.bench or go test -short -bench . -count 6 | Tee-Object tee.bench in PowerShell on Windows, the resulting file is encoded as UTF-16LE CRLF. Given "the runtime transparently reencodes UTF-8 as UTF-16," it sounds like that's working as intended.

If I pass those UTF-16 files directly into benchstat, it gives no output, since it expects UTF-8. If I re-encode them to UTF-8 with LF line endings, then:

  • benchstat new.bench tee.bench works as expected and prints correctly. I can copy manually from my terminal and the non-ASCII characters are preserved. Manually copying gets harder when there are more than a few benchmarks and several categories of metrics.
  • benchstat new.bench tee.bench >benchstat.bench again encodes as UTF-16LE, and it mangles non-ASCII characters pretty severely. E.g., becomes Γöé (93 03 f6 00 e9 00 per hexdump) and ± becomes ┬▒ (2c 25 92 25).
  • benchstat new.bench tee.bench | clip.exe puts a result in my clipboard which has a U+003F ? per byte of non-ASCII UTF-8. I lean toward calling that a clip.exe problem rather than a benchstat one, but it's still inconvenient.
  • benchstat new.bench tee.bench | clip.exe in a WSL2 terminal (and using benchstat built for Linux) copies text that reproduces the mangled characters from the Windows >benchstat.bench case.
  • benchstat new.bench tee.bench >benchstat.bench in WSL writes a file with the correct data, since it's end-to-end UTF-8.

(Hopefully this isn't getting too far off-topic. Let me know if I should open another issue.)

@rsc
Copy link
Contributor

rsc commented Jan 10, 2024

@zephyrtronium Please do open another issue. I've reproduced what you are saying under PowerShell, but not under cmd.exe. It sounds like PowerShell is doing very very strange things.

@rsc
Copy link
Contributor

rsc commented Jan 10, 2024

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jan 18, 2024

Created #65157 for the PowerShell / UTF-16 issue.

@rsc
Copy link
Contributor

rsc commented Jan 19, 2024

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal
Projects
Status: Declined
Development

No branches or pull requests

6 participants