Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: how to marshal with unicode escape? #39137

Closed
cupen opened this issue May 19, 2020 · 9 comments
Closed

encoding/json: how to marshal with unicode escape? #39137

cupen opened this issue May 19, 2020 · 9 comments
Labels
Milestone

Comments

@cupen
Copy link

@cupen cupen commented May 19, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.2 linux/amd64

What did you expect to see?

module json add a API for marshal string with unicode escape would be useful.

unicode escape
https://tools.ietf.org/html/rfc7159#section-7

package main

import (
	"fmt"
	"encoding/json"
)

type Object struct {
	Name string
}

func main() {
	obj := Object{Name:"哇呀呀"}
	line, _ := json.MarshalUnicodeEscape(obj)
	fmt.Println(string(line))
}
{"Name": "\u54c7\u5440\u5440"}
@cupen cupen changed the title json: how to marshal with unicode escape? encoding/json: how to marshal with unicode escape? May 19, 2020
@mvdan
Copy link
Member

@mvdan mvdan commented May 19, 2020

Before we start talking about new API, we should first talk about why you need to do that in the first place. You shouldn't need to escape non-ASCII letters in json.

@cupen
Copy link
Author

@cupen cupen commented May 19, 2020

@mvdan I have several program environments, go, python2, js runtime(a v8 app). they have different default character encoding, utf-8(go, python2). ucs2 (js runtime). It need to convert utf-8 to usc2 or usc2 to utf-8 when a json text transferred between them, it's a error-prone job.

I think unicode escape is use for this case., it's a ascii-safe and legal json string encoding format.

@mvdan
Copy link
Member

@mvdan mvdan commented May 19, 2020

That seems like a very narrow edge case, and I don't think it should be the json package's job to support producing ASCII output alone.

You could consider having a named string type that implements MarshalJSON and replaces all non-ASCII characters with escape codes, if you want. That should be under twenty lines of extra Go code.

@mvdan mvdan added NeedsDecision and removed WaitingForInfo labels May 19, 2020
@mvdan mvdan added this to the Backlog milestone May 19, 2020
@cupen
Copy link
Author

@cupen cupen commented May 19, 2020

Yeah, I know your meaning. here is the e.g. : https://play.golang.org/p/YVSQzad2Z2r

type UnicodeEscape string

func (ue UnicodeEscape) MarshalJSON() ([]byte, error) {
	text := strconv.QuoteToASCII(string(ue))
	return []byte(text), nil
}

type Object struct {
	Name UnicodeEscape
}

But I would to think the unicode-escape is a proposal of json spec, not a hack or a monkey-patch only for python, js(v8) or others.

BTW: json.MarshalUnicodeEscape is a ugly name, maybe I can add a new encoder option for it.

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented May 19, 2020

@cupen
Copy link
Author

@cupen cupen commented May 20, 2020

@seankhliao Yes, it says json text character encoding could be utf-8, utf-16, utf-32, and json string character encoding could be utf-8, utf-16, utf-32 or unicode escape. Sorry to sound like a word game. 😄
https://tools.ietf.org/html/rfc7159#section-7

e.g.:

  • json string: "\u54c7\u5440\u5440", it's a json value with string type.
  • json text: {"Name": "\u54c7\u5440\u5440"}, it contains all of the json elements, field name, field value and ,:"{}.

For a json text {"Name": "\u54c7\u5440\u5440"}, it's ascii safe no matter which UTF be used.

@networkimprov
Copy link

@networkimprov networkimprov commented May 20, 2020

Maybe you could consider a `json:"ascii"` tag for this case?

@mvdan
Copy link
Member

@mvdan mvdan commented May 28, 2020

I think the few lines of code shown in #39137 (comment) are a completely acceptable solution to this. The json API should only cover common issues and needs. Trying to avoid utf-8 altogether in favor of ascii with unicode escapes certainly feels like an edge case that we shouldn't cover, especially given how easy it is to do with MarshalJSON.

@cupen
Copy link
Author

@cupen cupen commented Jun 2, 2020

ok, I'll do it by no-std library.

@cupen cupen closed this Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.