Encode/decode JS entities works on one byte at a time and is not reversible #1

rdipardo · 2023-05-10T01:52:19Z

Updated to Notepad 8.3.3 x64
HtmlTag was missing after update
Re-installed HtmlTag

Test:
Following characters require encoding: ä ö ü ß
After encoding:
Following characters require encoding: Ã¤ Ã¶ Ã¼ ÃŸ
After decoding encoded text:
Following characters require encoding: Ã¤ Ã¶ Ã¼ ÃŸ

rdipardo · 2023-05-10T01:52:20Z

The decoding algorithm can only handle single-byte sequences. So, this works:

\u00E4 \u00F6 \u00FC \u00DF (decode =>) ä ö ü ß

But this is broken:

ä ö ü ß (encode =>) \u00C3\u00A4 \u00C3\u00B6 \u00C3\u00BC \u00C3\u0178

A file in UTF-8 gives 2 bytes to each character, and the algorithm encodes each one separately.

That's a limitation of the original author's design (based on pre-Unicode Notepad++). It affects both 32- and 64- bit versions.

As I said in 82f9b0e,

More work still needed before utf8mb4 can be encoded *correctly*

Fixing this will be part of that overall task.

rdipardo · 2023-05-10T01:52:21Z

If you're running at least Windows 10, here's a way to resolve this issue for the time being:

Go to the Control Panel, then “Clock and Region”, and select "Change date, time, or number formats"
Click the "Administrative" tab
Click "Change system locale..."
Check the box labelled "Beta: Use Unicode UTF-8 for worldwide language support” (a reboot is required)

‌

Here is N++ 8.3.3 (64-bit) on Windows 10 21H2, with the updated system encoding :

‌

The plugin is most likely calling a standard library function that uses the system's default encoding. What it should do is encode the document's text as Unicode every time, not rely on Windows.

rdipardo · 2023-05-10T01:52:22Z

Original comment by Björn Klug (Bitbucket: [Björn Klug](https://bitbucket.org/Björn Klug/workspace/repositories)).

I tried your workaround ("Beta: Use Unicode UTF-8 for worldwide language support” checkbox) but it broke all my MS Access 2010 applications, so that is not a viable solution for me.

Since I’m using this plugin quite frequently I’d by very interested in your estimat when this bug will be fixed.

[EDIT] Just found the download of version 1.2.2 at https://bitbucket.org/rdipardo/htmltag/downloads/ which works fine again. Thanks!

rdipardo · 2023-05-10T01:56:00Z

Fixed in d2189a1

rdipardo added enhancement New feature or request major labels May 10, 2023

rdipardo closed this as completed May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode/decode JS entities works on one byte at a time and is not reversible #1

Encode/decode JS entities works on one byte at a time and is not reversible #1

rdipardo commented May 10, 2023

rdipardo commented May 10, 2023

rdipardo commented May 10, 2023

rdipardo commented May 10, 2023

rdipardo commented May 10, 2023

Encode/decode JS entities works on one byte at a time and is not reversible #1

Encode/decode JS entities works on one byte at a time and is not reversible #1

Comments

rdipardo commented May 10, 2023

rdipardo commented May 10, 2023

rdipardo commented May 10, 2023

rdipardo commented May 10, 2023

rdipardo commented May 10, 2023