Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: readConsole corrupts multibyte characters on Windows #17939

Closed
mattn opened this issue Nov 16, 2016 · 12 comments
Closed

os: readConsole corrupts multibyte characters on Windows #17939

mattn opened this issue Nov 16, 2016 · 12 comments

Comments

@mattn
Copy link
Member

@mattn mattn commented Nov 16, 2016

What version of Go are you using (go version)?

go version devel +5af7553 Wed Nov 9 00:21:04 2016 +0000 windows/amd64

What operating system and processor architecture are you using (go env)?

Windows 7 64bit

What did you do?

package main

import (
    "fmt"
    "os"
)

func main() {
    p := make([]byte, 100)
    fmt.Printf("> ")
    for {
        n, err := os.Stdin.Read(p)
        fmt.Printf("[%d %q %v]\n> ", n, p[:n], err)
    }
}

#17427 (comment)

go run and type hello\n on the prompt >.

What did you expect to see?

output [7 "hello\r\n" <nil>]

What did you see instead?

[1 "h" <nil>]
[1 "e" <nil>]
[1 "l" <nil>]
[1 "l" <nil>]
[1 "o" <nil>]
[1 "\r" <nil>]
[1 "\n" <nil>]

When 100 bytes buffer is given, readConsole should be filled with hello\r\n on console. Because GetConsoleMode() is ENABLE_LINE_INPUT and if call ReadFile with enough buffer, it fill as hello\r\n.

https://go-review.googlesource.com/#/c/31114/ will fix this.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 16, 2016

CL https://golang.org/cl/31114 mentions this issue.

@quentinmit
Copy link
Contributor

@quentinmit quentinmit commented Nov 16, 2016

This doesn't seem like a bug to me; Read is not guaranteed to read all the available bytes. I'll mark it as a possible fix for the future since it would help with optimization, but you should always remember that Read is free to not fill the buffer whenever it wants.

@quentinmit quentinmit added the NeedsFix label Nov 16, 2016
@quentinmit quentinmit added this to the Unplanned milestone Nov 16, 2016
@mattn
Copy link
Member Author

@mattn mattn commented Nov 16, 2016

But it should call ReadFile with enough bytes because ReadFile break half of multi-byte unread.

#17427 (comment)

When the leading byte is read, trailing byte(s) never be read. This is clearly with C code.

#17427 (comment)

So this should be fixed until next release.

@hirochachacha
Copy link
Contributor

@hirochachacha hirochachacha commented Nov 17, 2016

The second example looks bug to me and it is reproducible.
I used '%x' instead of '%q' and saw following results:

> ああ<enter>
[3 efbd82 nil]
[1 0d nil]
[1 0a nil]

Note that 'あ' = [e3 81 82], '\r' = [0d], '\n' = [0a].
Although io.Reader don't prohibit partial read, it should return:

> ああ<enter>
[3 e38182 nil]
[3 e38182 nil]
[1 0d nil]
[1 0a nil]

Also here are results in go1.7.3:

> ああ<enter>
[8 e38182e381820d0a nil]

So this is a regression. 1af769d

Even worse, my input was swallowed when:

> あ<enter> // os.Stdin.Read never return in this case
@rsc rsc modified the milestones: Go1.8, Unplanned Nov 17, 2016
@quentinmit
Copy link
Contributor

@quentinmit quentinmit commented Nov 17, 2016

Indeed, that looks like a bug to me. Apologies for misinterpreting the report before; I thought it was only that reads were being broken up into multiple calls. If we're dropping bytes, that's very bad and could be a release blocker. Thanks for the sample output.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 18, 2016

The second example looks bug to me and it is reproducible.

@hirochachacha please add your example to TestReadStdin in os package to see what is wrong, and report here. Thank you.

Alex

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 18, 2016

The second example looks bug to me and it is reproducible.

@hirochachacha please add your example to TestReadStdin in os package, to see what is going on, and report here. Thank you.

Alex

@hirochachacha
Copy link
Contributor

@hirochachacha hirochachacha commented Nov 18, 2016

@alexbrainman I tried, but it's not reproducible there.
Anyway, I understand what problem is.

diff --git a/src/os/file_windows.go b/src/os/file_windows.go
index 8f2d4d3..cb806bd 100644
--- a/src/os/file_windows.go
+++ b/src/os/file_windows.go
@@ -232,6 +232,7 @@ func (f *File) readOneUTF16FromConsole() (uint16, error) {
        for {
                var nmb uint32
                err := readFile(f.fd, buf[:], &nmb, nil)
+               println("nmb:", nmb, "byte:", buf[0])
                if err != nil {
                        return 0, err
                }

then,

> あ
nmb: 2 byte: 130   // 0x82
nmb: 1 byte: 13    // 0x0d
nmb: 1 byte: 10    // 0x0a

Note that 'あ' = [82 a0] in CP932 encoding.

Current code uses var buf [1]byte, while nmb is 2, a remaining byte (0xa0) is truncated.

I don't know https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx
should guarantee that nNumberOfBytesToRead >= lpNumberOfBytesRead or not. At least we expected that.

@quentinmit quentinmit changed the title os: readConsole don't fill buffer on Windows os: readConsole corrupts multibyte characters on Windows Nov 18, 2016
@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 19, 2016

I understand what problem is.

Yes. It looks weird. We ask for 1 byte, but ReadFile says that it returned returned 2 bytes. I wonder if ReadFile actually have written correct bytes into the buffer (actually second byte past the end of our buffer). Also, please, tell us if ReadFile returns error. Change your code to:

diff --git a/src/os/file_windows.go b/src/os/file_windows.go
index 8f2d4d3..5b0e0c8 100644
--- a/src/os/file_windows.go
+++ b/src/os/file_windows.go
@@ -226,12 +226,13 @@ func (f *File) copyReadConsoleBuffer(buf []byte) (n int, err error) {
 // readOneUTF16FromConsole reads single character from console,
 // converts it into utf16 and return it to the caller.
 func (f *File) readOneUTF16FromConsole() (uint16, error) {
-   var buf [1]byte
+   var buf [2]byte
    mbytes := make([]byte, 0, 4)
    cp := getCP()
    for {
        var nmb uint32
-       err := readFile(f.fd, buf[:], &nmb, nil)
+       err := readFile(f.fd, buf[:1], &nmb, nil)
+       println("nmb:", nmb, " err=", err, "byte1:", buf[0], "byte2:", buf[1])
        if err != nil {
            return 0, err
        }

and tell us what it prints. Thank you.

Alex

@hirochachacha
Copy link
Contributor

@hirochachacha hirochachacha commented Nov 19, 2016

Here you are.

> あ<enter>
nmb: 2 err= (0x0, 0x0) byte1: 130 byte2: 0
nmb: 1 err= (0x0, 0x0) byte1: 13 byte2: 0
nmb: 1 err= (0x0, 0x0) byte1: 10 byte2:0

Unfortunately, readFile doesn't return error even if truncation occurred.
I cannot find the way to recover from truncation. So I think mattn's approach is right.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 19, 2016

Here you are.

Thank you. I will think about this.

Alex

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 22, 2016

CL https://golang.org/cl/33451 mentions this issue.

@gopherbot gopherbot closed this in 610d522 Nov 29, 2016
@golang golang locked and limited conversation to collaborators Nov 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.