forked from vor0nwe/nppHTMLTag
-
Notifications
You must be signed in to change notification settings - Fork 4
old versions
rdipardo edited this page Apr 20, 2026
·
1 revision
Note
This version can decode Unicode scalars between U+010000 and U+10FFFF, but literal Unicode characters are still encoded as surrogate pairs (see below).
Decoding works the same as any other Unicode character; see here and here.
Previous versions of HTML Tag represent Unicode text in UTF-16 encoding. This is the same encoding traditionally used by the Windows operating system, and hence by Notepad++.
A single code point in a UTF-16 string can have a maximum value of 0xFFFF, or 16 consecutive 1 bits. Code points above 0xFFFF can still be represented, using two code points that, taken together, form a surrogate pair.
As an example:
- Make sure HTML Tag is at least version 1.4
- Paste this emoji into a new buffer: 🍪 (
U+1F36A) - Select the emoji and run the
Encode JScommand; the cookie will be broken into the escape characters\uD83C\uDF6A - Select all the text, run the
Decode JScommand, and confirm that the cookie appears again
To decode any Unicode character between U+010000 and U+10FFFF, you will need to:
- Find the “high” and “low” surrogate for the character. You can use an online tool, or implement an algorithm in the programming language of your choice
- Type or paste the high surrogate, followed by the low surrogate, both in your preferred escape character format
- Run the
Decode JScommand after selecting the pair, or after placing the caret beside them
Return to wiki homepage.