Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading stream with json.NewDecoder fails with "invalid character '\r' in string literal" if large base64 []byte present #39874

Closed
fxtentacle opened this issue Jun 26, 2020 · 2 comments

Comments

@fxtentacle
Copy link

@fxtentacle fxtentacle commented Jun 26, 2020

Dear Go team,

first off, thank you for this amazingly productive language :) I'm writing you because I believe that I have found an issue with the streaming JSON decoder in the standard library.

This crashes with "invalid character '\r' in string literal":

package main

import (
	"bufio"
	"encoding/json"
	"fmt"
	"io"
	"os"
)

type RequestDataItem struct {
	Name string
	Data []byte
}

type Request struct {
	Data []RequestDataItem
}

func main() {
	dec := json.NewDecoder(os.Stdin)
	for {
		var request Request
		err := dec.Decode(&request)
		if err == io.EOF {
			break
		}
		if err != nil {
			panic(err)
		}
		fmt.Println("OK")
	}
}

if presented with this input data:

{"Data":[{"Name":"test","Data":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=="}]}

In contrast, this main function works, because the JSON is now accumulated in the scanner before parsing:

func main() {
	scanner := bufio.NewScanner(os.Stdin)
	for scanner.Scan() {
		var request Request
		err := json.Unmarshal(scanner.Bytes(), &request)
		if err == io.EOF {
			break
		}
		if err != nil {
			panic(err)
		}
		fmt.Println("OK")
	}
}

Example data that works in both cases would be

{"Data":[{"Name":"test","Data":"dGVzdAo="}]}

In my opinion, the issue is the space character invented here:

if stateEndValue(&dec.scan, ' ') == scanEnd {

in combination with reading the input in 512 byte chunks here:
const minRead = 512

So while parsing with the streaming JSON decoder, it will only read up until

{"Data":[{"Name":"test","Data":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

(or exactly 512 bytes)
and then crash, because it erroneously tries to base64-decode the "A[..]AA " string generated by appending a space to the unfinished base64 data.

As expected, changing

const minRead = 512
to read const minRead = 5120 fixes the issue for my test case here, but of course it'll still break for larger JSON payloads.

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Jun 26, 2020

please fill out the issue template when filing reports

this is more likely a limitation of your terminal emulator than an issue with json, related: https://unix.stackexchange.com/questions/204815/terminal-does-not-accept-pasted-or-typed-lines-of-more-than-1024-characters

@fxtentacle
Copy link
Author

@fxtentacle fxtentacle commented Jun 26, 2020

Thank you for pointing that out. I agree with you that this is most likely an issue with my terminal when data is being read from it with a small buffer, like the default 512 that json.NewDecoder uses.

Replacing
dec := json.NewDecoder(os.Stdin)
with
dec := json.NewDecoder(bufio.NewReaderSize(os.Stdin, 2048))
will also fix the problem.

In other words, the exact same terminal emulator works if I force Go to read from it with a larger 2048 byte buffer.

I'll close this issue, then. Let's hope that the next person observing the problem will find this thread so that they can pinpoint the cause more quickly than me.

@fxtentacle fxtentacle closed this Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.