Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsl.exe outputting unicode to stdout #4607

Closed
RoguePointer80 opened this issue Oct 17, 2019 · 25 comments
Closed

wsl.exe outputting unicode to stdout #4607

RoguePointer80 opened this issue Oct 17, 2019 · 25 comments
Labels
needs-investigation likely actionable and/or needs more investigation workaround-available

Comments

@RoguePointer80
Copy link

Please use the following bug reporting template to help produce issues which are actionable and reproducible, including all command-line steps necessary to induce the failure condition. Please fill out all the fields! Issues with missing or incomplete issue templates will be closed.

If you have a feature request, please post to the UserVoice.

If this is a console issue (a problem with layout, rendering, colors, etc.), please post to the console issue tracker.

Important: Do not open GitHub issues for Windows crashes (BSODs) or security issues. Please direct all Windows crashes and security issues to secure@microsoft.com. Ideally, please configure your machine to capture minidumps, repro the issue, and send the minidump from "C:\Windows\minidump".

Please fill out the below information:

  • Your Windows build number: (Type ver at a Windows Command Prompt)
    Microsoft Windows [Version 10.0.18362.418]

  • What you're doing and what's happening: (Copy&paste the full set of specific command-line steps necessary to reproduce the behavior, and their output. Include screen shots if that helps demonstrate the problem.)

C:\Users\frivard>wsl --list
Windows Subsystem for Linux Distributions:
Ubuntu-16.04 (Default)
cmf-src.cafdb8b176f9
Ubuntu-18.04

and when trying to filter using "findstr" :

C:\Users\frivard>wsl --list | findstr Ubuntu

C:\Users\frivard>

Nothing.

  • What's wrong / what should be happening instead:
    The output of wsl.exe --list seems to be UTF-16 without BOM, so the output is not respecting the specified codepage of the system, and thus the "findstr" does not understand the input.
    The output of wsl.exe --list should be in the system codepage.

  • Strace of the failing command, if applicable: (If some_command is failing, then run strace -o some_command.strace -f some_command some_args, and link the contents of some_command.strace in a gist here).

  • For WSL launch issues, please collect detailed logs.

See our contributing instructions for assistance.

@0xbadfca11
Copy link

WslRegisterDistribution API allowing unicode string outside of system codepage. (But wsl --import is not.)
If do output that depends on the system code page, will get new issue where wsl --list does not display the correct name.
image

@therealkenc
Copy link
Collaborator

#4180 #4456. I don't actually know (for sure) if a BOM would make findstr.exe happy or not. I suspect it wouldn't, absent evidence to the contrary. Best I can tell this is by-design findstr.exe.

@RoguePointer80
Copy link
Author

@therealkenc I agree with you, a BOM wouldn't change anything. I mentioned it just to be more precise as to the current output of the command.

@RoguePointer80
Copy link
Author

@0xbadfca11 You might have mis-understood the meaning of my bug report. I will try to say it again in a different way:

this is not about removing unicode support and not about supporting asian languages.

What this is about is interaction with other command-line tools. As wsl.exe is a command-line tool, I expect it to behave and interact with other tools like findstr in a natural way. If you can do dir | findstr "蹴鞠", then you should also be able to do wsl.exe --list | findstr "蹴鞠" .

The problem is also present when piping the output to grep.exe , but I did not specify it because some observers might argue that it is a third-party tool that might not support unicode properly. Whereas findstr is a built-in command that we all know works well.

@riverar
Copy link

riverar commented Nov 4, 2019

findstr doesn't support multi-byte Unicode codepoints, which is what WSL outputs. Agree, it's strange that WSL is not respecting the terminal's codepage, breaking these common scenarios.

@jwbrase
Copy link

jwbrase commented Nov 16, 2019

I think part of the issue is that wsl's entire purpose is to provide interoperability with Linux applications, and pretty much everything in the *nix sphere these days expects UTF8. WSL could easily find its output being piped either into a Windows program expecting the system codepage or a Linux program expecting UTF8, and outputting probably breaks the fewest things on either side.

@Greg-T8
Copy link

Greg-T8 commented Oct 28, 2020

I'm running into what I believe is a similar issue. I'm creating a PowerShell script that reads the output of the WSL command. I want to manipulate the resulting text using the -replace operator, but the -replace operator isn't working properly with the output of the WSL command.

The example below compares the output of the first line of the WSL --list command ($a[0]) with the output of a standard string using the same characters.

You can see how the replace operator is unable to replace the string "Windows" with "Linux" with the WSL command output. However, the replace operator is able to replace "Windows" with "Linux" within the standard string.

Furthermore, you can see that the WSL string length is twice as long as the standard string length. I show evidence of this in the last two lines:

image

@falloutphil
Copy link

falloutphil commented Oct 28, 2020

I'm running into what I believe is a similar issue. I'm creating a PowerShell script that reads the output of the WSL command. I want to manipulate the resulting text using the -replace operator, but the -replace operator isn't working properly with the output of the WSL command.

The example below compares the output of the first line of the WSL --list command ($a[0]) with the output of a standard string using the same characters.

I had a similar requirement to turn the output on wsl -l -v into an array of values for a specific distribution - eg Ubuntu-20.04. See below for how to manipulate the console output in Powershell. The contents of the distroArray (Name, State, Version) can now be used as regular Powershell strings/integers.

I consider this a bug, or at least unexpected output format - but for now changing the console encoding seems to work as a workaround:

$console = ([console]::OutputEncoding)
[console]::OutputEncoding = New-Object System.Text.UnicodeEncoding
$distroArray = (wsl -l -v | Select-String -SimpleMatch 'Ubuntu-20.04') -split '\s+'
[console]::OutputEncoding = $console

@Greg-T8
Copy link

Greg-T8 commented Oct 28, 2020 via email

@MookThompson
Copy link

This issue has just wasted a couple of hours of my time as well. I'm trying to process the list of distros using the standard CMD FOR /F command. A simple repro for this is:
FOR /F %i IN ('wsl -l -q') DO echo %i
which just echos the first character of each distro, which is less than helpful.

Piping the output of 'wsl -l' to the standard 'more' CMD also fails eg:
wsl -l | more
spews out a helpfully-paginated output containing every individual character of the wsl command output on a separate line.

@francogp
Copy link

I have the same problem..
wsl --list | findstr Ubuntu
returns empty on powershell and cmd

@Hegi
Copy link

Hegi commented Apr 8, 2021

@francogp check out falloutphil's answer. After changing the encoding to Unicode, things shall fall into place.

@NotTheDr01ds
Copy link

NotTheDr01ds commented Aug 25, 2021

Also, if anyone needs to parse the output of wsl.exe from within WSL, try the following to get rid of the null characters (as well as the line endings):

wsl -l -q | iconv -f UTF16 | tr -d '\r'

Would love to see this fixed. It really makes parsing/automating very difficult with wsl.exe.

@codecat555
Copy link

In my application, I don't have a console so I can't use the helpful workaround mentioned above. How can I decode the output properly for all cases?

That is, suppose I have a function to run a wsl command given as a string by the caller. When it is invoked with one command, e.g. wsl -l, the output will be encoded utf-16. When it is invoked with another command, e.g. wsl date, the output will most likely not be encoded with utf-16. How can I determine which decoding is correct for any given bit of output?

One solution would be for the calling context to signal which decoding to use, since it "knows" what kind of command it's running. Ok, but the error case is different. Sometimes, wsl will return errors in one encoding and sometimes in the other, depending on where that error originated. And, in this case, there is no context available to help determine the encoding.

For example, the other day an odd looking error popped up in my log -

T\x00h\x00e\x00 \x00W\x00i\x00n\x00d\x00o\x00w\x00s\x00 \x00S\x00u\x00b\x00s\x00y\x00s\x00t\x00e\x00m\x00 \x00f\x00o\x00r\x00 \x00L\x00i\x00n\x00u\x00x\x00 \x00i\x00n\x00s\x00t\x00a\x00n\x00c\x00e\x00 \x00h\x00a\x00s\x00 \x00t\x00e\x00r\x00m\x00i\x00n\x00a\x00t\x00e\x00d\x00.\x00\r\x00\r\x00\n\x00'

With some editing, I found that it reads The Windows Subsystem for Linux instance has terminated. This was important information for me and I would like to log it properly, along with the normal utf-8 encoded errors that happen more regularly. How can I tell which encoding is most appropriate for a given error string?

@infiRD
Copy link

infiRD commented Nov 28, 2021

I experience the same problem. After wasting couple of hours on the problem I found @falloutphil suggestion working. Putting following function into my powershell_profile file seems to fix wsl command for me:

function wsl {
    begin { $pipe_in = "" }
    process { if ($pipe_in -ne "") { $pipe_in += "`n" } $pipe_in += "$_" }
    end {
        $console = ([console]::OutputEncoding)
        [console]::OutputEncoding = New-Object System.Text.UnicodeEncoding
    
        $wsl_cmd = Get-Command -CommandType Application wsl | Select-Object -First 1 | Select-Object -ExpandProperty Source
        if ( $pipe_in -ne "") {
            Invoke-Expression "`"$pipe_in`" | $wsl_cmd $Args"
        } else {
            Invoke-Expression "$wsl_cmd $Args"
        }
        
        [console]::OutputEncoding = $console
    }
}

@s2005
Copy link

s2005 commented Jun 18, 2022

NotTheDr01ds

Thank you!

Based on your solution and the recipe from Make a Bash alias that takes a parameter?
I have solved it for myself by adding an alias
alias wsl='f(){ wsl.exe "$@" | iconv -f UTF16 | tr -d "\r" ; unset -f f; }; f'

so now
wsl -l -v | grep 'Ubuntu'
works as expected.

@c0d3h4x0r
Copy link

Using nothing more than a cmd.exe shell and built-in Windows 10 commands, how can I make the following work as expected?

wsl.exe --status | findstr /C:"Kernel version: 5.10.16"

I cannot use Bash or PowerShell for this, and iconv is not a built-in command on Windows 10. Because wsl.exe outputs in some non-standard Unicode format, findstr is unable to match against its output correctly. Note that there are no Unicode-specific characters involved in what I'm trying to do here, so findstr should be able to match that string just fine.

@RoguePointer80
Copy link
Author

Wow, I am amazed that after nearly 3 years, this seemingly simple bug is still present. Any chance you can open-source the command-line tools (wsl.exe), so we can submit PR to fix it ourselves?

@NotTheDr01ds
Copy link

NotTheDr01ds commented Jun 22, 2022

@c0d3h4x0r

Using nothing more than a cmd.exe shell and built-in Windows 10 commands, how can I make the following work as expected?

It's ugly, but using a RegEx findstr /R, you can match the spurious (erroneous) null characters by placing an extra . between each and every other character (including spaces and other periods/decimals):

wsl.exe --status | findstr /R /C:"K.e.r.n.e.l. .v.e.r.s.i.o.n.:. .5.\..1.0.\..1.6."

Curious why PowerShell isn't an option for this since it's available pretty much anywhere WSL runs. I'm sure you have your reason; just curious ;-).

@NotTheDr01ds
Copy link

This now appears to be fixed based on an opt-in environment variable in the latest Preview release 0.64.0.

Simply adding the environment variable WSL_UTF8=1 will cause it to work properly. I assume the opt-in variable is required so that older code with workarounds in place won't inadvertently break.

PowerShell:

$env:WSL_UTF8=1
wsl --list | findstr Ubuntu

Note that if you are running wsl.exe from inside WSL, you'll need to also add WSL_UTF8 to the WSLENV list. For example:

export WSL_UTF8=1
WSLENV="$WSLENV":WSL_UTF8
wsl.exe -l -v | grep -i Ubuntu

@OneBlue
Copy link
Collaborator

OneBlue commented Aug 2, 2022

As @NotTheDr01ds said the 0.64.0 release introduces the WSL_UTF8 variable, which forces UTF8 output on stdout.

Unfortunately, we cannot make it the default behavior since other programs depend on it, but users can opt-in to the new behavior by setting WSL_UTF8=1.

@henrik-jensen
Copy link

henrik-jensen commented Sep 1, 2022

As @NotTheDr01ds said the 0.64.0 release introduces the WSL_UTF8 variable, which forces UTF8 output on stdout.

Unfortunately, we cannot make it the default behavior since other programs depend on it, but users can opt-in to the new behavior by setting WSL_UTF8=1.

Windows is and has always been so fucked up in the age of the internet. Some very bad decisions back there at Redmon in the early 90'.
PS! Mostly and happily using Windows (Visual Studio is GOD), but many times forced to Linux wsl solutions because of Windows limitations. (thx wsl)

@jwbrase
Copy link

jwbrase commented Sep 2, 2022 via email

@louigi600
Copy link

What the fuck: to get a bat file to export all my wsl instances I needed to run wsl to parse wsl crappy output:
for /f "tokens=2 delims==" %%I in ('wmic os get localdatetime /format:list') do set datetime=%%I
set filedate=%datetime:~0,4%%datetime:~4,2%%datetime:~6,2%
wsl bash -c "wsl.exe -l -q | iconv -f UTF16 | tr -d '\r' > mk_bck.conf"
for /f "delims=" %%a in (mk_bck.conf) do (
wsl -t %%a
wsl --export %%a - | 7z a -tgzip %%a_%filedate%.tar.gz -si
)

@BatmanAoD
Copy link

@OneBlue

Unfortunately, we cannot make it the default behavior since other programs depend on it, but users can opt-in to the new behavior by setting WSL_UTF8=1.

I can understand why changing the default encoding to always be UTF-8 might break existing programs, but why doesn't wsl.exe --list (and --help and --version) respect the system codepage? Opting for UTF-8 everywhere (this way) via the Region settings for "non-unicode" programs (??) is still a "beta" feature, so in cases where a user has enabled it, it seems reasonable to expect wsl.exe to respect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-investigation likely actionable and/or needs more investigation workaround-available
Projects
None yet
Development

No branches or pull requests