You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The behavior of "\u1f917" is also correct, though surprising- the JSON spec (RFC 7159) in section 7 (Strings) states:
Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point
🤗 (U+1F917) is outside the BMP, so there's a different escape format to use, which the RFC goes on to describe:
To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a 12-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E"
Following the UTF-16 instructions for constructing the surrogate pair (It's the UTF-16BE format in your link) for U+1F917 gives us this pair: \ud83e\udd17.
Describe the bug
http://www.unicode-symbol.com/u/1F917.html gives details about 🤗 - hugging face (U+1F917).
jq can handle it in the sense that:
$ jq -n '"🤗"'
"🤗"
However jq seems to be quite confused about the details:
$ jq --version
jq-master-2e01ff1 # Release jq-1.6 of Nov 1, 2018
$ jq -n '"🤗" | explode'
[
129303
]
$ jq -n '[12903] | implode'
"㉧"
The following also does not look right:
$ jq -n '"\u1f917" | explode'
[
8081,
55
]
$ jq -n '[8081,55] | implode'
"ᾑ7"
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: