Skip to content

Trailing backspace removed #856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
petko opened this issue Jan 26, 2020 · 6 comments
Closed

Trailing backspace removed #856

petko opened this issue Jan 26, 2020 · 6 comments

Comments

@petko
Copy link

petko commented Jan 26, 2020

Why are trailing spaces like the following removed by TidyHtml?

<body>
<p>text </p>
</body>

The trailing space after text will be removed. If it is replaced with &nbsp; it will be preserved.

What config option determines that and is it possible to leave the spaces intact?

@geoffmcl
Copy link
Contributor

@petko thank you for your issue...

But I do not understand quite, exactly, what you are asking... maybe I don't understand... explain...

Part of Tidy is in the name... tidy equals, in DOM/like text nodes... are trimmed of trailing spaces... and indeed leading, other spaces... there are exceptions...

Is there an option to preserve this space... in some xhtml cases, yes, to some extent... but otherwise NO...

It is important that such spaces do not change the browser rendering of the paragraph...

So one is forced to ask, why do you want this space preserved in the output... what is the use case?

In testing, considering, have expanded your sample, to my in_856.html, to included other <p> cases, where trailing/leading/interword spaces are removed, where required, to conform to a browser rendering of this sample... but not touched, removed in other cases, like the <pre> block, ...

But maybe I do not understand your issue... please elaborate... thanks...

@petko
Copy link
Author

petko commented Feb 3, 2020

I am migrating some texts, which are used in WYSIWYG text editor. In the old version of the software they are stored as RTF files and in the new - as HTML (after cleaning the produced files using HTMLTidy). Such texts sometimes contain leading or trailing whitespace, which are deliberately put there by the user.

So when they are imported in the new software, they no longer are the same texts, they are missing that whitespace.

@geoffmcl
Copy link
Contributor

geoffmcl commented Feb 9, 2020

@petko thanks for the further feedback...

Ok, you changed the version of your text editor, but I still do not understand what you want tidy to do... or not do... and, just out of curiosity, what editor is that... what os?

All version of tidy remove unused 'spacey' characters... from html... where allowed, possible... as part of the cleaning process... has done that since its creation...

There is little chance of adding an option to prevent that... even if fully defined...

More to the point, what would be the use case for such an option... what are its full specs... etc...

Such texts sometimes contain leading or trailing whitespace, which are deliberately put there by the user.

Assume by user here, you mean the creator of the html code... why would they deliberately do that?... and want it kept? ... seems unreasonable...

Hmmm, as you point out, if it is replaced with &nbsp; it will be preserved... a html user/generator/editor knows, or should know, that... so does tidy...

Still do not understand your issue... please elaborate... thanks...

@petko
Copy link
Author

petko commented Feb 10, 2020

My initial confusion comes from the fact that https://validator.w3.org with the Clean HTML option enabled does not remove these spaces. But probably they used an older/alternative version of Tidy HTML.

As for &nbsp; being preserved, while a pure space is not, I admit that I don't know the reason behind this. Until now I've had the impression the single space and &nbsp; are equivalent, but probably I am wrong.

I will just replace spaces with &nbsp; in that case..

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 9, 2021

@petko, seems no actual issue for tidy identified... that I can see... and no new comments in over a year...

So am closing this...

Please add further feedback, if I am wrong... or open a new issue... thanks...

@geoffmcl geoffmcl closed this as completed Apr 9, 2021
@pozemka
Copy link

pozemka commented May 19, 2021

Hello @geoffmcl !
Sorry for bumping old issue but it seems I have another example as why spaces sometimes need to be preserved.
I use tidy as "preprocessor" for HTML data pasted from clipboard. For example when copying data from Google Sheets:
изображение (I selected text to make space visible)
Clipboard receives HTML:

<body>
<!--StartFragment--><style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style><span style="font-size:10pt;font-family:Arial;font-style:normal;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Will it eat \nspace here&quot;}" data-sheets-userformat="{&quot;2&quot;:513,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0}">Will it eat <br/>space here</span><!--EndFragment-->
</body>
</html> 

Tidy used to prepare the valid XML for parsing. After tidy I get this:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy for HTML5 for Windows version 5.7.45" />
<title></title>

<style type="text/css">
/*<![CDATA[*/
<!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
/*]]>*/
</style>
</head>
<body>
<!--StartFragment--><span style="font-size:10pt;font-family:Arial;font-style:normal;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Will it eat \nspace here&quot;}" data-sheets-userformat="{&quot;2&quot;:513,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0}">Will it eat<br />
space here</span><!--EndFragment-->
</body>
</html>

Space in Will it eat␣<br />space here is missing. Visually it renders identically but string can be used, for example, as a key for future translation so to keep every symbol is quite important.

The question is: Probably it is still possible to somehow tell tidy to keep such spaces? Or may be you have some thoughts or workarounds?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants