New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support higher Unicode ranges for Char.fromCode #687

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@Janiczek
Contributor

Janiczek commented Aug 9, 2016

> Char.fromCode 0x1F0A1 == Char.fromCode 0xF0A1
True : Bool

> Char.fromCode 0x1F0A1
'' : Char

> Char.fromCode 0xF0A1
'' : Char

I'm trying 0x1F0A1 = 🂡, I'm getting 0xF0A1 = .
That's because Elm uses JavaScript's String.fromCharCode which doesn't support higher Unicode ranges.
To mitigate that, String.fromCodePoint should be used.

js_console

As a sidenote, maybe toCode could be also updated to use String.codePointAt instead of String.charCodeAt?

@process-bot

This comment has been minimized.

Show comment
Hide comment
@process-bot

process-bot Aug 9, 2016

Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

process-bot commented Aug 9, 2016

Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

@lukewestby

This comment has been minimized.

Show comment
Hide comment
@lukewestby

lukewestby Aug 11, 2016

Member

Unfortunately String.fromCodePoint isn't well supported by browsers so it would need to be polyfilled. Even a low-level polyfill of this function is a lot slower than the native, as demonstrated by the benchmark below

http://jsbin.com/taraguwaja/1/edit?js,console

Member

lukewestby commented Aug 11, 2016

Unfortunately String.fromCodePoint isn't well supported by browsers so it would need to be polyfilled. Even a low-level polyfill of this function is a lot slower than the native, as demonstrated by the benchmark below

http://jsbin.com/taraguwaja/1/edit?js,console

@lukewestby lukewestby added the request label Aug 20, 2016

@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Mar 25, 2017

Member

Fixed in elm-lang@bf16ca8 with an implementation that is cross-browser and should be reasonably fast. In the BMP, it usually adds just one check, so it's not much of an additional overhead.

@lukewestby, most polyfills have lots of code to handle all the bad data that could be passed in. What does .codePointAt do on objects? On arrays? Etc. In this case I just learned a lot about UTF-16 for some recent parser work, and it turns out that it's not so hard to sort out.

If folks are curious, I used the formulas from here for surrogate pairs.

Member

evancz commented Mar 25, 2017

Fixed in elm-lang@bf16ca8 with an implementation that is cross-browser and should be reasonably fast. In the BMP, it usually adds just one check, so it's not much of an additional overhead.

@lukewestby, most polyfills have lots of code to handle all the bad data that could be passed in. What does .codePointAt do on objects? On arrays? Etc. In this case I just learned a lot about UTF-16 for some recent parser work, and it turns out that it's not so hard to sort out.

If folks are curious, I used the formulas from here for surrogate pairs.

@evancz evancz closed this Mar 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment