Skip to content

old versions

rdipardo edited this page Apr 20, 2026 · 1 revision

“How do I…”

…(de/e)ncode entities or Unicode characters with code points between U+010000 and U+10FFFF?

Html Tag 1.5.0 (Unicode only)

Note

This version can decode Unicode scalars between U+010000 and U+10FFFF, but literal Unicode characters are still encoded as surrogate pairs (see below).

Decoding works the same as any other Unicode character; see here and here.

Html Tag <= 1.4.4

Previous versions of HTML Tag represent Unicode text in UTF-16 encoding. This is the same encoding traditionally used by the Windows operating system, and hence by Notepad++.

A single code point in a UTF-16 string can have a maximum value of 0xFFFF, or 16 consecutive 1 bits. Code points above 0xFFFF can still be represented, using two code points that, taken together, form a surrogate pair.

As an example:

  • Make sure HTML Tag is at least version 1.4
  • Paste this emoji into a new buffer: 🍪 (U+1F36A)
  • Select the emoji and run the Encode JS command; the cookie will be broken into the escape characters \uD83C\uDF6A
  • Select all the text, run the Decode JS command, and confirm that the cookie appears again

To decode any Unicode character between U+010000 and U+10FFFF, you will need to:

  1. Find the “high” and “low” surrogate for the character. You can use an online tool, or implement an algorithm in the programming language of your choice
  2. Type or paste the high surrogate, followed by the low surrogate, both in your preferred escape character format
  3. Run the Decode JS command after selecting the pair, or after placing the caret beside them

Clone this wiki locally