Simplified Chinese #3

dan2468 · 2018-01-22T08:33:19Z

Should it be able to fix “ÓůÓĂČíĽţ(YYRJ)” ?
(It should be a person’s name in a Chinese script.)

rspeer · 2018-01-23T17:03:00Z

The text is "御用软件(YYRJ)", right? (That's the result of encoding the text as Windows-1250 and decoding as GBK.)

This is a similar case to #4, but because GBK is a multi-byte character set, it is at least conceivable that the ftfy library could deal with it.

The problem is the decoding as Windows-1250, the Eastern European encoding that's giving you letters like ů. It often creates a mess of ambiguity (as it does in #4) by being too similar to ISO-8859-2. I don't think ftfy will ever be able to disentangle Windows-1250 from arbitrary other encodings for that reason. Do you have any control over your data source that's decoding text from numerous different languages as if it were Windows-1250?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified Chinese #3

Simplified Chinese #3

dan2468 commented Jan 22, 2018

rspeer commented Jan 23, 2018

Simplified Chinese #3

Simplified Chinese #3

Comments

dan2468 commented Jan 22, 2018

rspeer commented Jan 23, 2018