Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode 4 byte chars are displayed as ?? #842

Closed
fourplusone opened this issue Jul 5, 2012 · 3 comments
Closed

Unicode 4 byte chars are displayed as ?? #842

fourplusone opened this issue Jul 5, 2012 · 3 comments

Comments

@fourplusone
Copy link
Contributor

Paste any Unicode char >0xFFFF into a pad (like https://gist.github.com/3055654). It will be turned into '??'

It seems like contentcollector.jss

function sanitizeUnicode(s)
{
  return s.replace(/[\uffff\ufffe\ufeff\ufdd0-\ufdef\ud800-\udfff]/g, '?');
}

is responsible for this behavior.

However, socket.io seems to have problems too with handling chars > 0xFFFF

Any ideas how to fix this?

@JohnMcLear
Copy link
Member

Dupe of #526

@JohnMcLear
Copy link
Member

if you comment out the return UNorm.nfc(s).replace... then etherpad will store, accept, recieve and transmit the 𝕄 char. It looks like perhaps our replace is too strict there.

@JohnMcLear
Copy link
Member

Updating unorm and moving lib/unorm.js to index.js the same problem persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants