New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the true purpose and use case of the --bare option? #896
Comments
Further reading:
|
@JustAGuyCoding wow, thanks for the Take the
So, as suggested, in #885, I think the following patches need to be implemented, to completely separate the functionality of diff --git a/include/tidyenum.h b/include/tidyenum.h
index 3daee5b..e3fa793 100644
--- a/include/tidyenum.h
+++ b/include/tidyenum.h
@@ -610,7 +610,7 @@ typedef enum
TidyLiteralAttribs, /**< If true attributes may use newlines */
TidyLogicalEmphasis, /**< Replace i by em and b by strong */
TidyLowerLiterals, /**< Folds known attribute values to lower case */
- TidyMakeBare, /**< Make bare HTML: remove Microsoft cruft */
+ TidyMakeBare, /**< Replace smart quotes, em dashes, etc with ASCII. */
TidyMakeClean, /**< Replace presentational clutter by style rules */
TidyMark, /**< Add meta element indicating tidied doc */
TidyMergeDivs, /**< Merge multiple DIVs */
diff --git a/src/clean.c b/src/clean.c
index e96dd3f..059e9da 100644
--- a/src/clean.c
+++ b/src/clean.c
@@ -1890,8 +1890,7 @@ void TY_(CleanWord2000)( TidyDocImpl* doc, Node *node)
if ( nodeIsHTML(node) )
{
/* check that it's a Word 2000 document */
- if ( !TY_(GetAttrByName)(node, "xmlns:o") &&
- !cfgBool(doc, TidyMakeBare) )
+ if ( !TY_(IsWord2000) (doc) )
return;
/* Output proprietary attributes to maintain errout compatability
diff --git a/src/language_en.h b/src/language_en.h
index 60bde02..eab5567 100644
--- a/src/language_en.h
+++ b/src/language_en.h
@@ -786,9 +786,9 @@ static languageDefinition language_en = { whichPluralForm_en, {
- The strings "Tidy" and "HTML Tidy" are the program name and must not
be translated. */
TidyMakeBare, 0,
- "This option specifies if Tidy should strip Microsoft specific HTML "
- "from Word 2000 documents, and output spaces rather than non-breaking "
- "spaces where they exist in the input. "
+ "This option specifies if Tidy should replace smart quotes and em dashes with "
+ "ASCII, and output spaces rather than non-breaking "
+ "spaces, where they exist in the input. "
},
{/* Important notes for translators:
- Use only <code></code>, <var></var>, <em></em>, <strong></strong>, and This seems to clear up the issue of when Simply, I am perfectly happy to leave them as utf8, everytime, everywhere... smart quotes being E2 80 9C, E2 80 9D, and emdash E2 80 94 And these patches allows the I still waiver whether there should even be the 'IsWord2000' filter at all left there... but can leave that for another time...
That is corrected, in one of the patches, but there could be alternate wording, even in other places not addressed... I will try to find the time to add a PR for this issue... unless someone beat me to it... Look forward to further feedback, comments, even alternate patches, other code, etc, to help in finalising this into a PR... thanks... |
@JustAGuyCoding have now created the
To the Then you can view, review, test, etc, the changes... Have I missed any references in the docs, or in the code, anywhere... that perpetuates any direct tie between Will shortly get around to setting up the |
@JustAGuyCoding, have now created the PR #898 ... Look forward to further feedback, comments, even alternate patches, other code, etc, to help complete the PR... thanks... |
* Is. #896 - make 'bear' docs match code * Is. #487 #462 add warn msg and do not get stuck until eof The warning message could perhaps be better worded, and maybe there should be another msg when a '>' is encountered while looking for a ']' in a MS Word section, and perhaps the section should be discarded... And perhaps it should be an error, to force the user to fix... But the fix is good as it is, and these issues can be dealt with later... And this fix is piggy backed on this PR, but it is likewise related to 'word-2000' option...
@JustAGuyCoding have now merged #898, fixing the docs to conform to the actual code, so closing this... thanks... |
What is the true purpose of the --bare option? Tidy help says it's to strip out smart quotes and other em dashes. More information under
help-option
indicates it's for cleaning up MSWord documents.These two descriptions of the option don't seem the same:
tidy --help
-bare, -b strip out smart quotes and em dashes, etc.
tidy -help-option bare
This option specifies if Tidy should strip Microsoft specific HTML from Word
2000 documents, and output spaces rather than non-breaking spaces where they
exist in the input.
After briefly reading the documentation at https://api.html-tidy.org/tidy/quickref_5.6.0.html I thought bare could be used to clean up MSWord documents but was surprised when it substituted hyphens for em-dashes.
Side note: There is also a word-2000 option which is for cleaning MSWord documents. This seems tailored for MSWord WebPage exports, opposed to MSWord WebPage filtered exports.
Related issues:
#885
The text was updated successfully, but these errors were encountered: