Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop-empty-elements is not removing empty Table elements #923

Closed
rovo79 opened this issue Mar 5, 2021 · 5 comments
Closed

drop-empty-elements is not removing empty Table elements #923

rovo79 opened this issue Mar 5, 2021 · 5 comments

Comments

@rovo79
Copy link

rovo79 commented Mar 5, 2021

Running tidy 5.6 on Mac OS using CLI

Command run:
tidy test-output.html > droptables.html --drop-empty-elements yes

In the HTML, there are empty Table elements like .

It doesn't remove them.

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 6, 2021

@rovo79, thank you for the issue, but at this time do not understand what you want, expect...

In this markdown formatted issues, you need to use a backticks, to escape the likes of <td></td>... or three backticks to escape a HTML block of text - read markdown help like https://guides.github.com/features/mastering-markdown/, and others...

But in HTML, <table> elements, like <td></td> are important and significant, and usually can not be dropped! They mark a new, empty, column... like -

<table border="2">
<tr>
<th>Col 1</th><th>Col 2</th><th>Col 3</th>
</tr>
<tr>
<td>Yes</td><td></td><td>No</td>
</tr>
<tr>
<td></td><td>Empty</td><td></td>
</tr>
<tr>
<td></td><td></td><td></td><td>4</td>
</tr>
</table>

You will note, not only will tidy not discard empty element, but does not even warn about the last row having an 4th col overrun...

Yes, our docs drop-empty-elements only has simple "This option specifies if Tidy should discard empty elements.". Maybe we could add "if possible", or something like that, to indicate that it does not apply to all empty elements...

Please explain what you think is wrong... thanks...

Meantime closing this issue, but will re-open, if an issue is exposed, identified... thanks...

@rovo79
Copy link
Author

rovo79 commented Mar 23, 2021

@geoffmcl I appreciate your time having looked at and deciphered my inquiry. I meant this more as a support request. I agree with you that simply removing all empty elements in an HTML table could be problematic. In my case, the Editors I'm working with, utilize blank elements in their Word Doc to aid with formatting the layout of their Tables in a very specific way. This results in me having nonstandard HTML tables. I'm just trying to find an automated way to clean them out.

@geoffmcl
Copy link
Contributor

@rovo79, thanks for the further feedback... but, some points to consider...

Fact: One of the purposes of tidy is to try to clean html code, so that the browser rendering does not change... So, for sure, we would not accept a Feature Request (FR), which was contra to that purpose, broke that stated aim... and that seems to be what you are asking, but maybe I got it wrong...

You state - some editors, (un-named) - utilize blank <td> elements in their Word Doc to aid with formatting the layout of their Tables in a very specific way. ... - seems exactly the point. If such blank table element were removed, it seems very likely the the browser rendering, formatting, view, ... would change as well... so tidy should try to not be a party to that...

But this conversation is without specific sample HTML code...

Maybe, if you added a small sample, show what tidy (version) presently does, using a config of A, B, C..., then show what you expect... might help me understand if there is a valid FR here... thanks...

OT: Regarding an automated way to clean them out, perhaps open a unix terminal, and use tools like sed to clean them out, before passing it to tidy... just a suggestion...

@rovo79
Copy link
Author

rovo79 commented Mar 26, 2021

Thanks very much. I'll read up on SED.

As for an example of what I'm working with:

<table>
    <tbody>
        <tr><td colspan="7"></td></tr>
        <tr><td colspan="7"><p>Table 1.</p></td></tr>
        <tr><td colspan="7"><p>Table Title</p></td></tr>
        <tr><td></td><td colspan="2"><p>Col 1</p></td><td colspan="2"><p>Col 2</p></td><td colspan="2"><p>Col 3</p></td></tr>
        <tr><td><p>Row 1</p></td><td><p>500</p></td><td></td><td><p>600</p></td><td></td><td><p>700</p></td><td></td></tr>
        <tr><td><p>Row 2</p></td><td><p>800</p></td><td></td><td><p>900</p></td><td></td><td><p>1000</p></td><td></td></tr>
        <tr><td><p>Row 3</p></td><td><p>1100</p></td><td></td><td><p>1200</p></td><td></td><td><p>1300</p></td><td></td></tr>
        <tr><td colspan="7"><p>Source: somewhere out there.</p></td></tr>
        <tr><td colspan="7"></td></tr>
    </tbody>
</table>

@geoffmcl
Copy link
Contributor

@rovo79, wow, what an example... but you do not show what you expect, after processing with tidy, sed, or any other tool to achieve???

It is a perfect example of how difficult it would be to just deleting the empty <td></td>... which can be done in a jiffy...

Saving the sample html as in_925.html, and run $ sed 's/<td><\/td>//g' in_925.html > in_925-1.htm... and load both in a browser... see how broken the in_925-1.html is...

Some images might make it clear - first is just the above sample - all columns nicely aligned, 2nd is with the border: <style> turned ON - remove the leading // , and the 3rd is the broken table rendering ...

Image:
is-925-1-3

Again it comes back to what do you expect?

And some images would also help... thanks...

PS: And sorry I got the issue number wrong - should be 923, not 925... $ sed 's/925/923/g' issue-923.txt...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants