-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
Description
Testcase:
Array.from({length:256},(x,i)=>[i,new TextDecoder('windows-1252').decode(Uint8Array.of(i)).codePointAt(0)]).filter(([a,b])=>a!==b).map(x=>x.join())Node.js treats windows-1252 as a subset of Unicode (code above shows zero difference), which is not correct
E.g. Node.js:
> new TextDecoder('windows-1252').decode(Uint8Array.of(128)).codePointAt(0)
128
> new TextDecoder('windows-1252').decode(Uint8Array.of(130)).codePointAt(0)
130
> new TextDecoder('windows-1252').decode(Uint8Array.of(131)).codePointAt(0)
131
> new TextDecoder('windows-1252').decode(Uint8Array.of(159)).codePointAt(0)
159Browsers (expected):
> new TextDecoder('windows-1252').decode(Uint8Array.of(128)).codePointAt(0)
8364
> new TextDecoder('windows-1252').decode(Uint8Array.of(130)).codePointAt(0)
8218
> new TextDecoder('windows-1252').decode(Uint8Array.of(131)).codePointAt(0)
402
> new TextDecoder('windows-1252').decode(Uint8Array.of(159)).codePointAt(0)
376This also directly contradicts the doc (which is aware that windows-1252 and Latin1 are different):
Lines 229 to 234 in 7643c2a
| Modern Web browsers follow the [WHATWG Encoding Standard][] which aliases | |
| both `'latin1'` and `'ISO-8859-1'` to `'win-1252'`. This means that while doing | |
| something like `http.get()`, if the returned charset is one of those listed in | |
| the WHATWG specification it is possible that the server actually returned | |
| `'win-1252'`-encoded data, and using `'latin1'` encoding may incorrectly decode | |
| the characters. |
It's also a regression since v20.18.3 and v22.13.0
Node.js <=20.18.2 behaves correctly, v22 <=22.12.0 also behaves correctly
This regressed in 20.x and 22.x this year, after they were labeled as LTS
20.x regressed during Maintenance.
Whatever caused this in 20/22 should be reverted
mertcanaltin
Metadata
Metadata
Assignees
Labels
No labels