Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gltfio: enable escaped unicode for node name #7989

Merged
merged 2 commits into from
Jul 25, 2024
Merged

Conversation

poweifeng
Copy link
Contributor

Fixes #7846

@poweifeng poweifeng added the internal Issue/PR does not affect clients label Jul 23, 2024
@poweifeng poweifeng added internal Issue/PR does not affect clients and removed internal Issue/PR does not affect clients labels Jul 23, 2024
size_t cur = 0, idx = 0;
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
while ((idx = sImpl.find("\\u", cur)) != std::string::npos) {
auto endIdx = sImpl.find_first_not_of("0123456789abcdefABCDEF", idx + 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to limit this to only 6 characters. Otherwise we could have input like this which causes a std::range_error: wstring_convert: to_bytes error:

\u6211\u53EBBenjamin

(should be 我叫Benjamin)

If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point.

See https://www.ietf.org/rfc/rfc4627.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the spec link. This makes things a lot easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My other recommendation is that to_bytes can throw a std::range_error error if the unicode point is not valid, but not sure if we care about handling that case since that would mean the string is malformed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that's more of the user's responsibility. Let's discuss again if/when someone files a bug on it.

@poweifeng poweifeng enabled auto-merge (squash) July 25, 2024 08:44
@poweifeng poweifeng disabled auto-merge July 25, 2024 08:45
@poweifeng poweifeng enabled auto-merge (squash) July 25, 2024 08:45
@poweifeng poweifeng changed the title gltfio: enable unicode for node name gltfio: enable escaped unicode for node name Jul 25, 2024
@poweifeng poweifeng merged commit 7441e87 into main Jul 25, 2024
11 checks passed
@poweifeng poweifeng deleted the pf/gltfio-fix-unicode branch July 25, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal Issue/PR does not affect clients
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Treatment of escaped vs unescaped Unicode character in JSON strings
3 participants