Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI option to stop insertion/deletion of tags #682

Closed
codeniko opened this issue Feb 21, 2018 · 4 comments
Closed

CLI option to stop insertion/deletion of tags #682

codeniko opened this issue Feb 21, 2018 · 4 comments

Comments

@codeniko
Copy link

Hi, I use Jekyll for templating and the Jekyll generated html files handle spacing/indentation in a terrible manner. I'd like to use tidy to clean up just the spacing/indentation and remove comments, but tidy seems to be packed with destructive defaults which alter the normal behavior of pages. Scanning the man page several times, I've tried to remove as many as I could for my simple use case but I just can't find the option to stop tidy from inserting implicit tags and discarding enexpected tags.

Knowingly against spec, I happen to have a <div> element inside of a <label> that I use to trigger a checkbox to hide/show another div (a menu). Being that it's a menu used for navigation, I'd rather not use javascript to handle this in case a user happens to disable it.

I'm currently running tidy for Apple macOS version 5.7.3 with the following options
--merge-divs no --merge-spans no --enclose-block-text no --enclose-text no --coerce-endtags no --hide-comments yes --wrap 0 --tidy-mark no --drop-empty-elements no --drop-empty-paras no -indent

I'm still getting the following 3 warnings, the last two being the ones I wish to disable.

line 56 column 7 - Warning: missing </label> before <div>
line 57 column 39 - Warning: inserting implicit <label>
line 58 column 7 - Warning: discarding unexpected </label>

This behavior is weird as the output code contains two of the same exact labels with the same for attribute value: original label outside but no longer wrapped around the div, and a newly generated one inside the div. This ultimately breaks my pages.

Please let me know if there exists cli options to disable said insertion and deletion. Thank you!

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 2, 2018

@codeniko re-read this several times, and still a little unsure exactly what you are asking...

Given that the purpose of tidy is to tidy, and where it can fix a document, there is no option to sort of prevent that from happening...

In this specific given case, as you point out, Knowingly against spec, you want to add a <div> inside a <label> element, and there is no way to configure tidy to ignore this... so it does its best to fix the problem it sees... to try to only output valid html...

Passing a small sample to the W3C validator gives an Error: output -

Error: Element div not allowed as child of element label in this context.
From line 9, column 25; to line 9, column 41
<label for="markuplang"><div class="lab">HTML</div></label>

And yes, with the option --drop-empty-elements no tidy will output two labels... not particularly good, but it is the consequence of forcing that option to no...

So sorry, at the moment can not see any problem with tidy...

Look forward to further feedback... thanks...

@geoffmcl geoffmcl added this to the 5.7 milestone Mar 2, 2018
@codeniko
Copy link
Author

codeniko commented Mar 2, 2018

Hi @geoffmcl, thanks for getting back to me. I apologize if I wasn't clear in the beginning. I'll give this another shot.

Given that the purpose of tidy is to tidy, and where it can fix a document, there
is no option to sort of prevent that from happening...

The word tidy is ambiguous in its current context. Given how featureful tidy is, I'm sure you're aware that there are many ways to tidy up HTML. I know you know this given the plethora of options that tidy already has. I'm only interested in a subset of those features to "tidy" up my files. That said, it may be useful to have one more cli option.

This is a feature request. My use case is simple and fairly common, I'm sure of it. I would like to use tidy only to tidy up formatting (proper nesting indentation, trimming excess whitespace, etc..), but not to fix errors (however listing the errors like a linter is still useful). Right now, fixing of errors is baked into tidy, and I cannot do one without the other. I would especially want to disable such a feature if it's not adequately fixing the issue, like in the output I mentioned earlier.

My request is to have an cli option to allow me to disable tidy from fixing html errors, but not necessarily to disable notifying that these warnings/errors exist.

Here is some insight to what I'm doing. I made a build shell script doing the following steps:

  1. Generate HTML files with Jekyll
  2. Minify generated CSS and JS files
  3. Run generated HTML files through tidy, with the intent to just fix formatting
  4. Deploy all files and assets to firebase.

As you see, the pipeline is automated and I cannot risk using a tool I don't have confidence in that it won't break my code.

@geoffmcl
Copy link
Contributor

@codeniko was hoping others would chime in on this question...

Certainly, you should not use tidy if you do not "have confidence in that it won't break your code." ... sorry you feel that...

If you have a specific case where tidy outputs invalid html, then please raise it as an issue, with small sample... thanks...

If your pipeline includes an invalid html generator, then I would get that fixed, before passing it to tidy... like putting a <div> in a <label> is out, etc... the code should pass the W3C validator... use the nu api as part of your pipeline... generate clean html...

If you want tidy html, nothing changed, except output formatting, then this could be written in minutes, in say perl... and maybe there are other linters that do this for html...

We have other web developers who use tidy, with -q -e options, to signal that some html needs fixing, and go back to the editing, generation stage to make the html fix permanent... then using tidy a 2nd time, to only format an output...

But an option to disable tidy from fixing html errors could be difficult implement...

Many html errors are detected and fixed in the phase 1 parsing of the user input, and more in the phase 2 clean up... how to disable all these, sort of built-in html rules and regs, may not be easy... but you never know, until you get into the coding...

So yes, maybe a new option, of say --disable-tidy yes, could be considered... but I think there should be a discussion on this... before any work, patches, PR...

Marking this Feature Request with indefinite milestone... until someone steps up...

Look forward to more feedback, patches, ideas on this... thanks

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 7, 2020

20201007:

@codeniko well there has been no more feedback, so there is no clear Feature Request spec emerging...

As stated, look forward to more feedback, patches, ideas on this... but for now closing this... thanks...

@geoffmcl geoffmcl closed this as completed Oct 7, 2020
@geoffmcl geoffmcl removed this from the Indefinite future milestone Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants