Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for UTF-16 surrogate pair encoded emojis #279

Open
tech4him1 opened this issue Sep 3, 2017 · 5 comments · May be fixed by #1029
Open

Support for UTF-16 surrogate pair encoded emojis #279

tech4him1 opened this issue Sep 3, 2017 · 5 comments · May be fixed by #1029

Comments

@tech4him1
Copy link

tech4him1 commented Sep 3, 2017

Currently, if I try to parse YAML data containing Unicode emojis split into UTF-16 surrogate pairs (i.e. 1F468 as \uD83D\uDC68 in YAML), go-yaml returns the error "found invalid Unicode character escape code".

According to the YAML spec parsers are supposed to support UTF-8 and UTF-16, including surrogate pairs:
http://www.yaml.org/spec/1.2/spec.html#id2770814
http://www.yaml.org/spec/1.2/spec.html#id2771184

This looks intentional, are you planning on supporting these, or not?

yaml/scannerc.go

Lines 2443 to 2447 in 25c4ec8

if (value >= 0xD800 && value <= 0xDFFF) || value > 0x10FFFF {
yaml_parser_set_scanner_error(parser, "while parsing a quoted scalar",
start_mark, "found invalid Unicode character escape code")
return false
}

@Destroy666x
Copy link

Still not supported 😞

@andreif
Copy link

andreif commented Mar 18, 2024

I had similar problem somewhere in my pipeline and thought it's because of this issue, but it looks like it works? Or am I reading it wrong:

Screenshot 2024-03-18 at 09 31 21

@andreif
Copy link

andreif commented Mar 18, 2024

Screenshot 2024-03-18 at 12 05 40

@andreif
Copy link

andreif commented Mar 19, 2024

It looks like it almost works -- it does if the data is just a string, but does not if the string is in arrays or objects.

Screenshot 2024-03-19 at 08 54 03

steveh added a commit to steveh/yaml that referenced this issue Apr 8, 2024
I'm new to this code base so have likely implemented this in a way that isn't ideal, but hopefully it's enough of a starting point.

References:

* https://russellcottrell.com/greek/utilities/SurrogatePairCalculator.htm
* https://mathiasbynens.be/notes/javascript-unicode
* readerc.go

Fixes go-yaml#279
@steveh steveh linked a pull request Apr 8, 2024 that will close this issue
@steveh
Copy link

steveh commented Apr 8, 2024

I've created a PR to add support for surrogate pairs: #1029

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants