Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hex Entities Incorrectly Converted to Unicode #140

Closed
joeyhoer opened this issue Sep 24, 2013 · 6 comments
Closed

Hex Entities Incorrectly Converted to Unicode #140

joeyhoer opened this issue Sep 24, 2013 · 6 comments
Assignees

Comments

@joeyhoer
Copy link

While optimizing an SVG font file, I ran across an interesting issue with the encoding of hex entities.

All of the hex entities were converted to their unicode counterpart (a great optimization), however they were incorrectly encoded. For example, the hex entity 🔥 should have been converted to 🔥, but instead it was converted to . Upon further inspection, I concluded that the hex entity for the resulting character was .

It appears the the 1 has been stripped from the hex entity before it was converted, leading to an incorrect conversion.

Here is a test: svgo -s "<svg>&#x1f525;</svg>" -o -

@isaacs
Copy link

isaacs commented Jan 2, 2014

Astral plane chars are going to fail because it's calling String.fromCharCode on the parsed int, which only looks at the two right-most bytes. What you want is for the string to be "\ud83d\udd25", with the surrogate pair split up.

I'd accept a patch that properly split up surrogate pairs before calling String.fromCharCode().

For example:

> fire = String.fromCharCode(0xd83d) + String.fromCharCode(0xdd25)
////// Github won't let me post this.  But it prints out the actual fire glyph on my terminal.
> fire.length
2
> fire.charCodeAt(0)
55357
> fire.charCodeAt(1)
56613
> mojibake = String.fromCharCode(0x1f525)
''
> mojibake.length
1
> mojibake.charCodeAt(0)
62757
> mojibake.charCodeAt(0).toString(16)
'f525'

@joeyhoer
Copy link
Author

joeyhoer commented Jan 2, 2014

Seems as if we both came to the same conclusion, you may want to move your comment over to the other issue (in the sax project) since it's a little more pertinent there. I think the best solution would be to use the String.fromCodePoint method, which will need to be shimmed.

Anyhow, this seems to be an issue in the XML/SVG parser, that is manifesting in some cases within SVGO.

@isaacs
Copy link

isaacs commented Jan 2, 2014

Oh, whoops, I commented on the wrong place :) Will repost over there.

@isaacs
Copy link

isaacs commented Jan 2, 2014

Fixed by sax 0.6.0. Thanks to @mathiasbynens for writing a String.fromCodePoint shim.

@deepsweet
Copy link
Member

ok, thank you guys!
i'll update svgo soon.

@ghost ghost assigned deepsweet Jan 2, 2014
deepsweet pushed a commit that referenced this issue Jan 2, 2014
@maxkarkowski
Copy link

maxkarkowski commented Jun 26, 2019

this seems not to work attributes. eg:
<glyph unicode="&#xe898;" glyph-name="info" />
converts to
<glyph unicode="" glyph-name="info"/>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants