New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix s-reverse for Unicode combining characters. #58

Merged
merged 1 commit into from Jul 13, 2014

Conversation

Projects
None yet
3 participants
@skeeto
Contributor

skeeto commented Jun 11, 2014

When reversing a Unicode string, care must be taken to reverse characters with their combining marks as a single unit, maintaining their order.

Before this patch, s-reverse does this (NFD normalized):

(s-reverse "naïve")
;; => "ev̈ian"
@Fuco1

This comment has been minimized.

Show comment
Hide comment
@Fuco1

Fuco1 Jul 12, 2014

Contributor

This has bitten me before with Russian and some Indian scripts. Can we get this in, pretty please? :)

Lists really are a terrible data-structure to hold text. I wonder when we'll move away from that.

Contributor

Fuco1 commented Jul 12, 2014

This has bitten me before with Russian and some Indian scripts. Can we get this in, pretty please? :)

Lists really are a terrible data-structure to hold text. I wonder when we'll move away from that.

@Fuco1

This comment has been minimized.

Show comment
Hide comment
@Fuco1

Fuco1 Jul 12, 2014

Contributor

@skeeto Would it be possible to test if the string is unibyte and in that case use the old implementation? It should be a bit faster without doing all the checks and so. I believe any text with combining characers will be stored as multibyte, is that right? (I'm no unicode expert)

See multibyte-string-p for the testing.

Contributor

Fuco1 commented Jul 12, 2014

@skeeto Would it be possible to test if the string is unibyte and in that case use the old implementation? It should be a bit faster without doing all the checks and so. I believe any text with combining characers will be stored as multibyte, is that right? (I'm no unicode expert)

See multibyte-string-p for the testing.

Fix s-reverse for Unicode combining characters.
When reversing a Unicode (multibyte) string, care must be taken to
reverse characters with their combining marks as a single unit,
maintaining their order.

* https://github.com/mathiasbynens/esrever
* http://www.cl.cam.ac.uk/~mgk25/unicode.html
@skeeto

This comment has been minimized.

Show comment
Hide comment
@skeeto

skeeto Jul 12, 2014

Contributor

Yup, if it's unibyte then the plain old reverse works perfectly fine. I
just updated my branch so that it tests the string with
multibyte-string-p, using the original reverse definition when it can.
Emacs prefers to make strings unibyte when possible, so hopefully this
does actually make it faster.

Contributor

skeeto commented Jul 12, 2014

Yup, if it's unibyte then the plain old reverse works perfectly fine. I
just updated my branch so that it tests the string with
multibyte-string-p, using the original reverse definition when it can.
Emacs prefers to make strings unibyte when possible, so hopefully this
does actually make it faster.

@magnars

This comment has been minimized.

Show comment
Hide comment
@magnars

magnars Jul 13, 2014

Owner

Thanks!

Owner

magnars commented Jul 13, 2014

Thanks!

magnars added a commit that referenced this pull request Jul 13, 2014

Merge pull request #58 from skeeto/fix-reverse
Fix s-reverse for Unicode combining characters.

@magnars magnars merged commit cb3e499 into magnars:master Jul 13, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment