Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: redirect in powershell creates utf-16 files #65157

Closed
rsc opened this issue Jan 18, 2024 · 9 comments
Closed

runtime: redirect in powershell creates utf-16 files #65157

rsc opened this issue Jan 18, 2024 · 9 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Jan 18, 2024

In #64210, @zephyrtronium wrote:


          > I'm also curious how you're collecting benchmark results such that they're in UTF-16LE.

If I run go test -short -bench . -count 6 >new.bench or go test -short -bench . -count 6 | Tee-Object tee.bench in PowerShell on Windows, the resulting file is encoded as UTF-16LE CRLF. Given "the runtime transparently reencodes UTF-8 as UTF-16," it sounds like that's working as intended.

If I pass those UTF-16 files directly into benchstat, it gives no output, since it expects UTF-8. If I re-encode them to UTF-8 with LF line endings, then:

  • benchstat new.bench tee.bench works as expected and prints correctly. I can copy manually from my terminal and the non-ASCII characters are preserved. Manually copying gets harder when there are more than a few benchmarks and several categories of metrics.
  • benchstat new.bench tee.bench >benchstat.bench again encodes as UTF-16LE, and it mangles non-ASCII characters pretty severely. E.g., becomes Γöé (93 03 f6 00 e9 00 per hexdump) and ± becomes ┬▒ (2c 25 92 25).
  • benchstat new.bench tee.bench | clip.exe puts a result in my clipboard which has a U+003F ? per byte of non-ASCII UTF-8. I lean toward calling that a clip.exe problem rather than a benchstat one, but it's still inconvenient.
  • benchstat new.bench tee.bench | clip.exe in a WSL2 terminal (and using benchstat built for Linux) copies text that reproduces the mangled characters from the Windows >benchstat.bench case.
  • benchstat new.bench tee.bench >benchstat.bench in WSL writes a file with the correct data, since it's end-to-end UTF-8.

(Hopefully this isn't getting too far off-topic. Let me know if I should open another issue.)

Originally posted by @zephyrtronium in #64210 (comment)

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 18, 2024
@rsc
Copy link
Contributor Author

rsc commented Jan 18, 2024

It appears this is standard powershell and not Go:
https://superuser.com/questions/961697/redirecting-output-in-powershell-produces-utf-16-encoded-text

@rsc rsc closed this as completed Jan 18, 2024
@rsc rsc reopened this Jan 18, 2024
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 19, 2024
@cagedmantis cagedmantis added this to the Backlog milestone Jan 19, 2024
@cagedmantis
Copy link
Contributor

cc @golang/runtime

@thepudds
Copy link
Contributor

FWIW, there is a new feature of PowerShell that greatly reduces the mangling that PowerShell does of pipes and redirects for native commands (such as Go binaries):

PowerShell/PowerShell#17857

https://learn.microsoft.com/en-us/powershell/scripting/learn/experimental-features?view=powershell-7.4#psnativecommandpreservebytepipe

The interop between PowerShell and native binaries is all a bit cursed, but I suspect (hope?) that feature might address this issue. It was initially an experimental feature, but now is considered a "mainstream" feature as of PowerShell 7.4, I think.

I tried it just now in PowerShell 7.4.1 (released end of 2023), and it seemed to address at least the go test -short -bench . -count 6 >new.bench portion of what @zephyrtronium reported, including I could see the Byte Order Mark and what looked to be UTF-16 encoding in hexdump -C after go test -bench . > old-ps.bench in an older version of PowerShell, vs. the output file looked correct using PowerShell 7.4.1...

...but it was just a quick test and would be nice for someone else to confirm.

@Jorropo

This comment was marked as duplicate.

@mknyszek
Copy link
Contributor

In compiler/runtime triage, we're not sure if there's anything actionable here. Is there something we should be doing to workaround this? Or is this issue open just as a notice? Thanks.

@thepudds
Copy link
Contributor

thepudds commented Feb 1, 2024

Hi @zephyrtronium, are you able to confirm if the original problem you reported is resolved in PowerShell 7.4?

@thepudds thepudds added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Feb 1, 2024
@gopherbot
Copy link
Contributor

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2024
@zephyrtronium
Copy link
Contributor

Sorry, I've been meaning to get to this, but I haven't had access to my Windows PC for a while.

@github-project-automation github-project-automation bot moved this from Todo to Done in Go Compiler / Runtime Mar 1, 2024
@zephyrtronium
Copy link
Contributor

I finally found time to test with PowerShell 7.4.1. Using this program:

package main

import "fmt"

func main() {
	fmt.Println("│±")
}
  • go run nascii.go prints normally.
  • go run nascii.go >out.txt consistently writes │± as UTF-8 (e2 94 82 c2 b1 0a) to the file.
  • go run nascii.go | clip.exe consistently writes Γöé┬▒ to the clipboard.

PowerShell 7.4 seems to make output work identically between PowerShell and bash. I'd call that correct.

Given that Windows still ships with PowerShell 5.1, it's a bit inconvenient to have to install a new version to make this work when I don't otherwise need the upgrade. However, I can't say it matters very often to me, so the overall inconvenience is negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
Development

No branches or pull requests

8 participants