Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: "preserve" value for anchor-as-name #733

Open
ghost opened this issue May 14, 2018 · 9 comments
Open

Feature request: "preserve" value for anchor-as-name #733

ghost opened this issue May 14, 2018 · 9 comments

Comments

@ghost
Copy link

ghost commented May 14, 2018

Thanks for maintaining HTML Tidy!

The documentation for --anchor-as-name says:

Type: Boolean ...
If set to yes a name attribute, if not already existing, is added along an existing id attribute if the DTD allows it.
If set to no any existing name attribute is removed if an id attribute exists or has been added.

I can see that those two distinct functionalities may each be useful in specific use cases. It is good that tidy offers them.

However, if the user wishes for no modifications to be made to existing name or id attributes, then they would seem to be out of luck: tidy simply does not seem to offer this.

Therefore, I ask that --anchor-as-name be changed from taking a Boolean argument to taking either an Autobool argument or an enum argument, to allow the user to choose from at least three values to pass as an argument: each of the two existing values (e.g. "yes", and "no"), and a new, no-op value (e.g. "auto", or "preserve").

@ghost ghost changed the title Feature request: no-op value for anchor-as-name Feature request: "preserve" value for anchor-as-name May 15, 2018
@geoffmcl
Copy link
Contributor

@sampablokuper thanks for the issue, I think...

Now maybe this is a good Feature Request, but the first thing is to supply some small sample html... using name and id, to show what tidy presently does, with this or that config... that is --anchor-as-name yes|no...

Then you would need to show what you expect from --anchor-as-name auto... i.e. how that should change the above sample outputs...

That is move this from theory, based on current documentations, which may or may not be the whole story, to simple practical examples... tidy currently does this... with config ???? it should do that...

I have tried to read and understand the current FixAnchors code, which seems the only place TidyAnchorAsName in tidyDocCleanAndRepair is used, but without samples, and then what difference is expected, it is quite difficult, and time consuming...

In essence the samples only need to be one line, and we can use --show-body-only yes to test... although later may need some legacy doctypes to ensure everything continues to work as expected there...

At this time marking this as Technical Support, until it becomes clearer what actual Feature is being requested... and that can only be determined by having some sample html to test... thanks...

@ghost
Copy link
Author

ghost commented May 15, 2018

@geoffmcl, thanks for your reply.

With the input <a id="foo" name="bar">baz</a>, the only way to obtain a no-op is to use --anchor-as-name yes:

$ echo '<a id="foo" name="bar">baz</a>' | tidy --anchor-as-name yes --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo" name="bar">baz</a>
$ echo '<a id="foo" name="bar">baz</a>' | tidy --anchor-as-name no --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo">baz</a>

By contrast, with the input <a id="foo">baz</a>, the only way to obtain a no-op is to use --anchor-as-name no:

$ echo '<a id="foo">baz</a>' | tidy --anchor-as-name yes --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo" name="foo">baz</a>
$ echo '<a id="foo">baz</a>' | tidy --anchor-as-name no --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo">baz</a>

Therefore, neither of those two options can be relied upon, in the general case, as a no-op.

The proposed --anchor-as-name preserve option would yield a no-op with any relevant input, i.e.:

$ echo '<a id="foo" name="bar">baz</a>' | tidy --anchor-as-name preserve --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo" name="bar">baz</a>
$ echo '<a id="foo">baz</a>' | tidy --anchor-as-name preserve --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo">baz</a>

I hope this demonstrates the validity of the feature request, and that this is not a technical support query. Thanks again :-)

@geoffmcl
Copy link
Contributor

@sampablokuper thanks for the samples... quite interesting that you seem to be talking about legacy doctypes, i.e. --doctype strict, but will come back to that...

Sort of OT, but you are another person to comment on Technical Support. I have always seen this more as a dicussion label, and not as a query. Or at least a 2-way query. One where I too am trying to learn and understand exactly what the issue addresses, requests... sometimes so I can attach a more accurate label...

Accordingly will try changing this to a clearer, simple idea of a Technical Discussion...

Am doing some testing on your samples, especially html5 versus legacy html4 and earlier doctypes... and how anchor-as-name influences that... then what would say a preserve addition do, or aim to do... exactly what is the use case of the preserve, or a no-op as you have termed it...

Also searching and reading W3C docs on this, and running tests on the W3C validator, both legacy and nu to see its results... any W3C links welcome, more samples, etc...

One of the important considerations is that any type of preserve option does not restrict tidy from doing the right thing, what ever that may be in each specific case, and doctype...

Tidy's general aim should be to produce a valid W3C document. I know this is not always the case at present, but it tries, and can be improved... Such an option should not force tidy to produce invalid html... Not that anything you have suggested so far is an error, but I hope you get the idea...

At present this seems all in the TY_(FixAnchors)(doc, ...) service, run from the phase 2 tidyCleanAndRepair API, in the internal tidyDocCleanAndRepair service... there is already a lot of logic to be studied and understood there...

This may take some time to put together a technical specification on what tidy should try to do... specifically regarding anchors and the id and name attributes... and already your samples help in that, thanks, and am adding the FR label...

Seek further feedback, discussion, examples, even patches, or a PR, etc... would be most appreciated... thanks...

@ghost
Copy link
Author

ghost commented May 17, 2018

Thanks for your follow-up :-)

Accordingly will try changing this to a clearer, simple [label] of a Technical Discussion...

Good call.

Tidy's general aim should be to produce a valid W3C document. [...] Such an option should not force tidy to produce invalid html...

In the case of an (X)HTML or XML fragment or snippet, the input to Tidy is necessarily not a valid document. Nevertheless, it is reasonable for a user to want Tidy to process that input. To satisfy this reasonable use case, Tidy must necessarily be capable of creating output other than valid documents.

Fortunately, Tidy is already capable of this, as you know :-)

Personally, I think it is most useful to think of Tidy's end goal to be to act as a set of filters, each of which is designed to correct or to report on some category of issues likely to be found in (X)HTML or XML documents or fragments. Ideally, the user should be able to selectively turn those filters on or off. (And if on, then to choose between various available applications of those filters, if appropriate - e.g. to choose between yes or no for --anchor-as-name.) This way, Tidy would be capable of achieving all the functionality that a user might desire, without ever forcibly performing any actions that the user does not desire.

The power to create valid documents must be part of that. But the power to prevent Tidy from silently semantically altering the input should also be available.

Seek further feedback, discussion, examples, even patches, or a PR, etc... would be most appreciated... thanks...

I'm sorry that I can't offer anything further in that vein right now :-(

@geoffmcl
Copy link
Contributor

@sampablokuper looked more at this, but still blocked on what is the purpose of this preserve anchor-as-name option... how does it help...

In the main will leave aside the philosophic discussion on what is, or should be, tidy's goal, but stick with help produce valid html for the user... I am sure we could just go back and forth on this forever... will try to concentrate on any pratical use, and need, of this feature request...

I agree the current documentation is not sufficient, nor very helpful... That certainly needs to be improved... suggestions very welcome...

Next it seems this option means slightly different things in html5 vs legacy html4 documents...

HTML4

In html4 W3C specs, like links html4, you can find things like name shares the same name space as the id. I am not sure I fully understand what that means, but for sure I can set up an internal link to either a name, or an id... but the preference at the time seemed to be name... and certainly seek more references on this

Hence, I think, this option came about to ensure if the user had added an id, then this option, with a default of yes, ensured a name attribute would be added. And you would set this option to no to avoid this, if that was what you wanted...

So here you need to show a use case where preserve is needed in this html4 mode.

Either you let tidy fix the document, adding name if missing, or you set it to no...

Where then is preserve needed? What would it do differently to no? html4 document samples please...

HTML5

Then html5 was born, and this sort of flipped this option on it head!

The id was the dominent, and name was depreciated... see say a-element, where name has been omitted... and again seek more references on this...

But tidy has still to catch up with this html5 change... It should warn about name, if used, and if you swing this option as no, it will silently remove it...

Thankfully if only id given in a html5 document, it seems tidy will not add name, in any circumstances...

So while this indicates some work needed for html5, and some document updates, I can not see the usefulness of adding a preserve choice...

If you disagree, what should it do in this html5 case? Again html5 document samples please...

Testing

Now, to begin testing and understanding this, I have add 7 test files to my site, and could add more -

  1. in_733.html - html5 - warn id name no match
  2. in_733-1.html - html5 - only id
  3. in_733-2.html - html4 - warn id and name no math
  4. in_733-3.html - html4 - link targs id and name
  5. in_733-4.html - html5 - only id and name same
  6. in_733-5.html - html5 - only id - same as 2.
  7. in_733-6.html - html5 - link targs id and name

These files can be viewed as html, by adding http://htmlpreview.github.com/? to the url.

I am really trying to find a valid use case for this request, for the addition of a sort of no-op option...

Hope you, or others, can assist... thanks...

@ghost
Copy link
Author

ghost commented May 22, 2018

@geoffmcl wrote:

HTML4

[...] I think, this option came about to ensure if the user had added an id, then this option, with a default of yes, ensured a name attribute would be added. And you would set this option to no to avoid this, if that was what you wanted...

So here you need to show a use case where preserve is needed in this html4 mode.

Already provided in my comment above.

Either you let tidy fix the document, adding name if missing, or you set it to no... Where then is preserve needed?

If an application consuming the HTML applies different semantics to name than to id, then adding and populating a name attribute where one was not previously present could cause unwanted effects in such an application.

Similarly, removing a name attribute from an element could cause unwanted effects in such an application (even if an id attribute exists and is retained on that element).

So, there needs to be an option besides yes or no, which is where preserve would come in.

What would it do differently to no? html4 document samples please...

Again, already provided in my comment above.

@ghost
Copy link
Author

ghost commented May 22, 2018

@geoffmcl wrote:

HTML5

[...] I can not see the usefulness of adding a preserve choice... If you disagree, what should it do in this html5 case?

I do disagree. IMO a preserve option would be useful for HTML5 just it would be useful for HTML 4 and for XHTML: i.e. for the same reasons and behaving much the same way.

FYI, in HTML5, id is a "global attribute", i.e. it can be applied to any element. The name attribute, however, is defined only for certain elements, currently (according to this & this): <button>, <fieldset>, <form>, <iframe>, <input>, <map>, <meta>, <object>, <output>, <param>, <select>, <textarea>, and possibly <keygen>.

AFAICT, it is perfectly valid for such elements to have both the id and the name attribute set, if desired, and indeed for those attributes to have different values to each other. (As usual, each id attribute's value must be unique per-document.)

name was depreciated... see say a-element, where name has been omitted... and again seek more references on this... But tidy has still to catch up with this html5 change... It should warn about name

This is not quite correct. name was not deprecated entirely in HTML5. Tidy should only warn about the presence of a name attribute in an HTML5 document if it appears on an element for which name is not a valid attribute in HTML5.

Again html5 document samples please...

These would be exactly the same as in my comment above, except that instead of the <a> element, they would use one of the elements listed above for which the name attribute is valid in HTML5.

@geoffmcl
Copy link
Contributor

@sampablokuper what application, consumer of html, are you talking about?

Ok, at least you are starting to narrow it down, and that is for HTML4...

And have you tested XHTML? Give an example where tidy is in error. In most cases XHTML is handled differently in tidy...

While I have no problem reading mozilla, and/or w3schools docs, tidy tries to apply W3C recomendations...

This issue is about an anchor, <a ...> tag, not about other tags. If tidy is in error on any of these others, then please open a separate issue, and provide sample html that you think tidy handles incorrectly... thanks...

And just to be clear, adding a preserve would be more difficult. Read would need a new PickListItems table. A simpler change from a Boolean option to an AutoBool, which allows a 3rd option, auto, would be much easier. The auto could signal a sort of no-op in this case, and be more backward compatible...

So really no new information added... and I am not yet convinced that such a change is required... but I am just one voice... and I could be wrong...

Now all that means is that I am not personally interested in coding such a change... so left to me this would presently be a Won't Fix label... but...

If you, or others, want to present a PR, or further feedback, I will try to listen for a stronger use case... thanks...

@ghost
Copy link
Author

ghost commented May 22, 2018

@geoffmcl wrote:

what application, consumer of html, are you talking about?

No specific one: could be a simple static website, could be a dynamic web application. Could even be a mobile app with a WebView, or whatever.

Ok, at least you are starting to narrow it down, and that is for HTML4...

Not just for HTML4. I have addressed XHTML and HTML5 as well, in my comments above.

And have you tested XHTML? [...] In most cases XHTML is handled differently in tidy...

I already gave a relevant example of Tidy's behaviour in my comment above.

Quoting the man page: "If set to strict, Tidy will set the DOCTYPE to the HTML4 or XHTML1 strict DTD."

Give an example where tidy is in error.

I did not say Tidy is "in error". I explicitly marked this issue as a feature request.

In doing so, I noted a reasonable use case that Tidy currently fails to handle, that it would handle if the requested feature were added.

This issue is about an anchor, <a ...> tag, not about other tags.

That is incorrect.

This issue is a feature request relating to Tidy's --anchor-as-name option.

If tidy is in error on any of these others, then please open a separate issue, and provide sample html that you think tidy handles incorrectly... thanks...

See above.

And just to be clear, adding a preserve would be more difficult. Read would need a new PickListItems table. A simpler change from a Boolean option to an AutoBool, which allows a 3rd option, auto, would be much easier. The auto could signal a sort of no-op in this case, and be more backward compatible...

As I mentioned in my first post above, the use of an Autobool as a way to provide a third option seems OK to me.

I am not yet convinced that such a change is required... but I am just one voice... and I could be wrong...

I think you are, in this case.

Now all that means is that I am not personally interested in coding such a change... so left to me this would presently be a Won't Fix label... but...

If you, or others, want to present a PR, or further feedback, I will try to listen for a stronger use case... thanks...

The use case is already strong. If, despite that, you don't want to address the issue, then that will just perpetuate an inconvenience for Tidy's users :-(

In any case, rather than closing this issue as WontFix, I would ask that it at least be left open for anyone who does have an interest in submitting a fix and closing via PR to do so. Thanks.

@balthisar balthisar added this to the Indefinite future milestone Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants