Skip to content

Word filtered html doesn't convert accents to utf8 #512

@pdo2641

Description

@pdo2641

Saved a Word 2016 document as filtered html. Saved as UTF-8 also tried Latin 1.

Using the following tidy config:
anchor-as-name: yes
clean: YES
bare: YES

drop-proprietary-attributes: YES
word-2000: No
wrap: 144
vertical-space: yes
input-encoding:latin1 (tried 1252 and utf8 as well)
output-encoding:utf8

In every case. it would not convert the Word html ö, ñ, and various other accented characters typically used in German and Spanish into anything other the garbage or a boxed question mark. I've tried variations of input-encoding, output-encoding, clean, bare without improvement.

Please advise

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions