Skip to content

Incorrectly changing — to - (emdash [alt 0151]) to hypens. #885

@JustAGuyTryingToCodeSomething

Description

When cleaning up an MSWord document,

<p class=CSP-ChapterBodyText><span lang=EN-US style='font-size:14.0pt'>“I thought—”</span></p>

is converted to

	<p>
	"I thought-"
	</p>

_If you look closes the — emdash in the original text has been changed to a hyphen, which is actually introducing a grammatical error into the text, the original emdash should be persisted. _

I've included my config file and how I run the tool below. I'm not sure how to check, but I'm 99% sure this is version 5.6.0.

Note: The change of the quote characters is interesting, and I'd be interested to know why that happens, but I'm not reporting that as an issue here.

`.\tidy.exe -config .\htmltidy.config -o .\tidied.html .\original.html

add-xml-space:               no
add-meta-charset:            no
anchor-as-name:              yes
ascii-chars:                 no
assume-xml-procins:          no
bare:                        yes
break-before-br:             no
char-encoding:               utf8
clean:                       no
coerce-endtags:              yes
add-xml-decl:                no
css-prefix:                  c
custom-tags:                 no
decorate-inferred-ul:        no
doctype:                     auto
drop-empty-elements:         yes
drop-empty-paras:            yes
drop-proprietary-attributes: no
enclose-block-text:          no
enclose-text:                no
escape-cdata:                no
fix-backslash:               yes
escape-scripts:              yes
fix-bad-comments:            no
fix-style-tags:              yes
fix-uri:                     yes
force-output:                no
gdoc:                        no
gnu-emacs:                   no
hide-comments:               yes
indent:                      yes
indent-attributes:           no
indent-cdata:                no
indent-spaces:               4
indent-with-tabs:            yes
input-encoding:              utf8
input-xml:                   no
join-classes:                no
keep-tabs:                   no
keep-time:                   no
literal-attributes:          no
join-styles:                 yes
logical-emphasis:            no
lower-literals:              yes
markup:                      yes
merge-divs:                  auto
merge-emphasis:              yes
merge-spans:                 auto
mute-id:                     no
ncr:                         yes
new-blocklevel-tags:         no
omit-optional-tags:          no
output-bom:                  auto
output-encoding:             utf8
output-html:                 no
output-xhtml:                no
output-xml:                  no
preserve-entities:           no
punctuation-wrap:            no
quiet:                       no
quote-ampersand:             yes
quote-marks:                 no
quote-nbsp:                  yes
repeated-attributes:         keep-last
replace-color:               no
show-body-only:              no
show-errors:                 6
show-info:                   yes
show-meta-change:            no
show-warnings:               yes
skip-nested:                 yes
sort-attributes:             none
strict-tags-attributes:      no
tab-size:                    8
tidy-mark:                   yes
uppercase-attributes:        no
uppercase-tags:              no
vertical-space:              no
warn-proprietary-attributes: yes
word-2000:                   yes
wrap:                        68
wrap-asp:                    yes
wrap-attributes:             no
wrap-jste:                   yes
wrap-php:                    yes
write-back:                  no
wrap-script-literals:        no
wrap-sections:               yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions