Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy's error messages should include filename somewhere #713

Closed
petdance opened this issue Apr 5, 2018 · 21 comments
Closed

tidy's error messages should include filename somewhere #713

petdance opened this issue Apr 5, 2018 · 21 comments
Milestone

Comments

@petdance
Copy link
Contributor

petdance commented Apr 5, 2018

When I run tidy from the command line on multiple files, I can't tell which file the errors apply to:

$ tidy -q -e *.html
line 343 column 35 - Warning: <img> attribute "height" has invalid value "auto"
line 276 column 11 - Warning: trimming empty <span>                                   
line 275 column 9 - Warning: trimming empty <button>                           
line 409 column 5 - Warning: moved <style> tag to <head>! fix-style-tags: no to avoid.
line 370 column 19 - Warning: <iframe> proprietary attribute "allow"
line 335 column 13 - Warning: <img> attribute "height" has invalid value "auto"
line 269 column 11 - Warning: trimming empty <span>
line 268 column 9 - Warning: trimming empty <button>
line 445 column 5 - Warning: moved <style> tag to <head>! fix-style-tags: no to avoid.

Running without the -q doesn't show me the filename either.

$ tidy -e *.html                    
line 343 column 35 - Warning: <img> attribute "height" has invalid value "auto"       
line 276 column 11 - Warning: trimming empty <span>                              
line 275 column 9 - Warning: trimming empty <button>
line 409 column 5 - Warning: moved <style> tag to <head>! fix-style-tags: no to avoid.
line 370 column 19 - Warning: <iframe> proprietary attribute "allow"
Info: Document content looks like HTML5                 
Tidy found 5 warnings and 0 errors!                                            
                                                        
line 335 column 13 - Warning: <img> attribute "height" has invalid value "auto"
line 269 column 11 - Warning: trimming empty <span>     
line 268 column 9 - Warning: trimming empty <button>    
line 445 column 5 - Warning: moved <style> tag to <head>! fix-style-tags: no to avoid.
Info: Document content looks like HTML5            
Tidy found 4 warnings and 0 errors!  
@ler762
Copy link
Contributor

ler762 commented Apr 5, 2018

yes, it would be nice if the ending
Tidy found NNN warnings and NNN errors!
was changed to
Tidy found NNN warnings and NNN errors in file FILENAME
and the -q option was changed to always spit out the warnings & errors summary line

but until tidy is changed, a work-around is

for FNAME in `find . -name "*.html"` ; do
    echo " ---- begin processing $FNAME" >&2
    tidy -q -e $FNAME
done

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 6, 2018

@petdance this request has come up several times. The earliest I found was #53, in which you participated... is a repeat of #178, and mentioned in #241, and maybe others...

So the first answer is try the --gnu-emacs yes option.

@ler762 as to adding it to the Tidy found ... message... well maybe, but...

First there are three of these. The one you showed, then with Not all warnings/errors were shown. added, and of course No warnings or errors were found. if none. So yes it could be added to all 3, but then there is the problem of what about if the input is from stdin?

The real problem with this is that at the time that message is constructed in the library, in the active language, the library does not know, or keep a copy of the input file, if there was one.

The --gnu-emacs yes overcomes this by setting the file name in the library, on a special service just for this, so that mechanism could probably be re-used, and adding say "stdin" if no file...

But then you want this message excepted from the -quiet option. Wow, I can already hear other users yelling no way ;=)) quiet means only warnings or errors, not summaries...

An alternative would be to output it at the beginning, like you will see from $ tidy non-existing.html will output Document: "non-existing.html" is not a file!, so could output say Document: Processing "existing.html"., or something like that...

But feel this would have to be under an option, like --show-filename yes... that is you have to opt-in, configure for this additional message...

And if something is decided, need to have some thought about the impact on our regression tests... path names always have the unix/windows separator difference, and part of the test is a diff...

Given that there is the --gnu-emacs yes option, and various ways to script this, I do not see this new option having much traction...

Look forward to further feedback, on this old topic... thanks...

@ler762
Copy link
Contributor

ler762 commented Apr 6, 2018

Since this is an old topic, how about treating it as a documentation issue?
(altho I'm also in favor of making --show-filename a synonym of --gnu-emacs yes :)

so the man page changes from:

--gnu-emacs Boolean (no if unset)
       This option specifies if Tidy should change the format for reporting errors and
       warnings to a format that is more easily parsed by GNU Emacs.

to something along the lines of:

--show-filename, --gnu-emacs Boolean (no if unset)
       This option specifies if Tidy should change the format for reporting errors and
       warnings from
         line <line number> column <column number> - (Error|Warning): <message>
       to
         <filename>:<line number>:<column number>: (Error|Warning): <message>

Given that there is ... and various ways to script this

Have you considered adding an EXAMPLES section to the man page showing some of the various ways to script things?

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 7, 2018

@ler762 thank you for the feedback, and getting involved... we welcome that...

Well I too am certainly all for having a gnu-emacs alias, or synonym, like show-filename. There would be many windows users who would not get gnu, nor emacs without looking them up!

gnu-emacs What is that? A wildebeast eating an electronic mac? ;=))

And this is not only for the man tidy page - which again does not exist in Windows - but for the online quickref.html...

And I would not skimp on the definition, description, like maybe -

--show-filename, --gnu-emacs Boolean (no if unset)
       This option specifies if Tidy should change the format for reporting errors and
       warnings to a format that is more easily parsed by GNU Emacs, and possibly 
       other editors.
       It changes them from the default form -
         line <number> column <number> - (Error|Warning): <message>
       to a form which includes the input filename -
         <filename>:<line>:<column>: (Error|Warning): <message>

Or something like that...

Why do you not present a PR?

  1. Fork tidy to your own github spaces
  2. Generate a SSH Key, and add it to your https://github.com/ler762 settings SSH and GPG keys
  3. Clone your own fork - git clone git@github.com:ler762/tidy-fork.git tidy-fork
  4. Create a branch - cd tidy-fork; git checkout -b filename
  5. Edit, and commit your changes to this branch of your fork
  6. Publish the branch - git push -u origin filename
  7. Create the PR here...

You could even go the whole hog, and fork https://github.com/htacg/html-tidy.org, clone this and generate the API docs locally to see how your changes look there as well...

And likewise would welcome the addition of say an EXAMPLES section, like the ENVIRONMENT section in man tidy, which also makes it into the API docs somehow...

I presently do not fully understand how these are generate using xsltproc on tidy1.xsl, and other cmake tidy generated output, etc, but would help where I can, as I am sure others will step in where needed...

So yes, maybe this is more a documentation issue, and look forward to your further help... thanks...

@ler762
Copy link
Contributor

ler762 commented Apr 8, 2018

@geoffmcl I can work on creating a PR

I'll work on adding --show-filename as a synonym for --gnu-emacs yes, but in the mean time, how's this for the language_en.h man page change:

--gnu-emacs Boolean (no if unset)
       This option specifies that Tidy should change the format for reporting errors and
       warnings to a format that is more easily parsed by GNU Emacs or some other
       program. It changes them from the default

        line <lineNumber> column <columnNumber> - (Error|Warning): <message>

       to a form which includes the input filename:

        <filename>:<lineNumber>:<columnNumber>: (Error|Warning): <message>

localize/README.md seems to assume that you know what you're doing. I don't. Is there any other documentation for what all needs to be done after changing the documentation in the code
and how to build the various docs, web pages, localized *.po files, etc?

@balthisar
Copy link
Member

Guys, no synonyms, please. Either change the option name, or leave it alone. We've worked hard to eliminate synonyms in the past, and to remove deprecated options, and to organize them them a bit better. Adding synonyms does nothing for functionality.

I'm fine with changing the name, but please use the deprecation mechanism to support the old name short term. I can help with this, since no one has used the new mechanism yet.

Additional documentation is always welcome, and that goes for any of the other options, too.

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 9, 2018

@balthisar what is so wrong with having synonyms, especially in this case. I can see gnu-emacs being liked and used by unix people, it has inherent meaning, and should not be depreciated, and show-filename by the rest of us...

I see it now and again in unix --help commands, although as usual can not immediately find one... will try to remember to report the next... git has them... I use synonyms all the time in my file systems, as a navigation, and memory aid...

I use a system of h*.bat files to get me to that source, and so I can type htidy-html5 to get there, and in that case, because it is something I do every day I have an alias ht... and so on in hundreds of cases... similarly in linux that becomes $ . ht and I am at tidy source...

And as you know I am not big on your deprecation mechanism, still to be fully described, disucussed and agreed...

While synonyms do "nothing for functionality" they can do a lot for usabilty! While I would agree too many would also be a bad thing, this case seems to scream out for it...

@ler762 thanks for the further feedback...

First improving the description for the gnu-emacs, in general I am fine with the wording proposed, but wonder about lineNumber and columnNumber, repeated...

Who would not undestand, or missunderstand line <number> was the number of the line...? And then in the second form :<line>:<column> would be numbers... what other interpretation could be placed on these?

And then there is the use of non-words, or made up words, word combinations, which makes translation to other languages difficult if not impossible...

If you really feel strongly about this then at least go back to your original <line number>, but recognize that would say be translated to a French <numéro de ligne>... and then ligne <numéro de ligne> etc seems very redundant... and I think the same in other languages...

Concerning a PR, as mentioned the first step is to fork the repo - click on the upper right Fork icon, and github will ask you to where - unless you have already done this under a difference pseudonym...

And localize/README.md is if you are creating a new language translation, or improving an existing one. As it states in the introduction the primary file is language_en.h.

While others have presented PR's with other languages also changed, and re-generated po files, etc, this really complicates the review process, and should be avoided - left as a later step, after the english is in place.

For creating a new option like show-filename then README/OPTIONS.md is the place to look. And do a source search for gnu-emacs, and especially its enumerated unique id TidyEmacs, or the same for any other option for that matter, to see all the places used...

Look forward to your help... thanks...

@balthisar
Copy link
Member

I don't object to synonyms using the CLI app; I'm simply asking that no one consider polluting the library with non-core junk.

The request is for user interface enhancement, which belongs in the CLI app. Use the TidyConfigCallback in the CLI app to implement it in the client, and it will work with config files, etc., without complicating the library codebase. This is also an excellent mechanism for implementing other user interface changes without having to code them into the core library.

@ler762
Copy link
Contributor

ler762 commented Apr 9, 2018

@balthisar changing the option name would be extremely user-hostile. How about --show-filename does exactly what it says; shows the filename with the standard error/warning message. Which maybe would be a bit nicer? and gets away from having two options that do the same thing.

@geoffmcl I'll go back to <line number> and <column number>. I don't feel all that strongly about it, but I think documentation should be as clear as possible & line number seems to leave less room for misunderstanding that just line.

@balthisar
Copy link
Member

@ler762, there's actually a mechanism to accept deprecated names as a means to transition users to use the new names of options, but I agree, there's no reason to rename something that works.

Do you want the proposed show-filename to output on every line, like gnu-emacs, or simply one time, somewhere in the output?

In the former case, given the existence of gnu-emacs this doesn't really anything. If you want a single line somewhere in the output, such as the first line, then this is an ideal candidate for a Tidy CLI application option. Application options are single hyphen (-) switches, although if you want to support configuration files, we can fake LibTidy configuration options (-- switch) using the callback I mentioned above.

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 9, 2018

@balthisar, @ler762 hmmm, adding a -show-filename - note the single hythen - to the console app is maybe a good idea!

And FWIIW, is already there if you use the special DEBUG build mode, -DENABLE_DEBUG_LOG:BOOL=YES... long, LONG ago I wanted to see the filename of the file being processed... see tidy.c:2408:0

If this option is detected, and is not the special debug build, send the filename to errout... simple...

The only drawback here is that it can not be enabled in a config file... becomes one of the options that does not have a config equivalent, but we have several others... and library users can only continue to use gnu-emacs...

As to using TidyConfigCallback in the console app, what folly is this? ;=))

What? Add such a callback to console tidy.c, which it does not presently have, just to catch show_filename: yes in a config file, or on the command line, and convert it to gnu-emacs: yes somehow? And not yet clear what to do if the input is stdin since we will not have reached the input filename in the command yet... but all probably workable...

Talk about polluting the console app with totally unecessary junk, really complicating the console app codebase... just does not seem worth it, but ideas differ...

Alternatively, as suggested here in my first comment, and mentioned by @ler762, a new option --show-filename yes to output the filename once. Nothing to do with gnu-emacs. And maybe that could be on the 3 summary outputs, or separately, but that should remain an Info: type, which can be suppressed with -q, -quiet or --quiet yes...

And @ler762, as stated not too concerned about the new wording for the gnu-emacs option... just about anything would add some clarity to this help... look forward to this...

As sometimes happens, lots of differing ideas here... I think I would be swayed towards the first single hythen -show-filename and will look at this as an extension of my existing debug output... Other developers may be willing to put time and effort into other solutions...

Look forward to further comments... thanks...

@balthisar
Copy link
Member

Talk about polluting the console app with totally unecessary junk, really complicating the console app codebase... just does not seem worth it, but ideas differ...

It's an infrastructure service that could potentially be used for allowing other console switches to work in configuration files, e.g., language. There's been a lot of monkey patching over the years, and a sensible architecture can help prevent monkey patches.

@ler762
Copy link
Contributor

ler762 commented Apr 10, 2018

I'm a very slow writer, so while I'm working on a reply I'll just just put this out for review
#715

@geoffmcl
Copy link
Contributor

@balthisar even after googling it, not really sure what a monkey patch is, or at least how it applies here, nor what is the lot of monkey patches you refer to, especially the idea of a lot, but no problem ;=))

Certainly see TidyConfigCallback infrastructure service as a way of monkeying with a config type switch to make it what it isn't, if you ever needed or wanted to do that... thankfully seems none at present in tidy.c... hope we can keep it that way...

Either such a configuration item is part of the libtidy API, or it is not. Why would you want to sneakily do otherwise? In some readings of the references explored, this service is itself a monkey patch enabler, or something... but ok, each to their own...

@ler762 thank you for PR #715... This looks fine. I have a small backlog of PR's that I need to get too first, but will get to it, eventually...

One small note is that you seem to have done this in next in your fork. That may or may not cause you some pain down the line... as my item 4. above suggested, this is usually better done in a branch... this makes it easier to later keep your next up-to-date, without conflicts, with -

$ git checkout next
$ git remote add upstream git@github.com:htacg/tidy-html5.git # only done once
$ git fetch upstream # get any updates from htacg
$ git rebase upstream/next # update your next
# deal with any conflicts, if any
$ git push # update your remote fork
# and if you have a branch not merged yet
$ git checkout my-branch
$ git rebase next
# deal with conficts, if any
$ git push

But like I say, may not be a problem...

I have experimented with implementing a simple -show-filename option in an issue-713 branch. If acceptable, would create a PR, and eventually merge this to next...

Have found one interesting thing. If this output is put to a error file, with say -f temp.txt, that file is opened in binary mode, thus the output of fprintf(errout, "\n"); is just 0A, thus in say windows I do not get 0D 0A, which messes up some old dumb editors... still looking for a way around this... but small...

Would be appreciated if you, and others, could pull this branch and give it a try... thanks

@balthisar
Copy link
Member

I'm for the ability to emit the filename. There's a demonstrated use case. But this is a client feature, not a library feature. If we want to support -- for config files, then the callback is a valid mechanism. If we're happy with - switches, then the callback isn't necessary. If we want to support-- without the callback, then you're looking at a monkey patch. Overall, this is a client-UI decision, and not a library decision.

That callback has been there since the dawn of time. Actually, the one I mentioned is a bit newer; it corrects the faulty design of the original by providing a reference to the Tidy document in question. This wasn't a monkey patch. It's proper design for a client-server (client-library) model, and is part of the original, intended design.

@geoffmcl, you tend to overlook the fact that LibTidy is a library, and the console application is a consumer of the library, and you tend to want to give special privileges to console. In modern terms, they're very tightly coupled, and there's no reason to be. For example, there's no reason on Earth that LibTidy needs the ability to report the input filename to an output buffer. That onus falls completely on the client program. And the same really goes for other things like gnu-emacs and quiet and other things that the client should be responsible for, and not the library servicing it. And you know how you can support the config file mechanism for these client-only options? Use the callback that's been there since the beginning.

The right thing to do would be to remove all of the option names from the library completely, and let the client deal with them. This would contribute significantly towards the ability to internationalize the option names without spoiling the integrity of the library. The library shouldn't care about gnu-emacs; the client should look for gnu-emacs or the French version thereof, and request TidyEmacs.

Despite all of these flaws, I have no intention of fixing them. There are other priorities. But at the same time, I will crusade against adding crap to the library when it can be handled perfectly well in the client, and that goes for synonyms that have no place in a library, and for adding additional output that doesn't involve the library at all.

I'll reiterate: I'm for the ability to emit the filename. There's a demonstrated use case. But this is a client feature, not a library feature. If we want to support -- for config files, then the callback is a valid mechanism. If we're happy with - switches, then the callback isn't necessary. If we want to support-- without the callback, then you're looking at a monkey patch. Overall, this is a client-UI decision, and not a library decision.

@geoffmcl
Copy link
Contributor

@ler762 thank you for the extension of PR #715. An entirely new config type option --show_filename yes|no. Nothing to do with gnu-emacs. It looks good...

This makes my tidy.c patch in the issue-713 branch entirely redundant... or does it?

What about the case where there are no error or warnings? Now we would be back to no filename shown anywhere, and I think there should be, even in this happy case, if I asked for show-filename...

That could be done in tidy.c using tidyOptGetBool(tdoc, TidyShowFilename), as per my issue-713 branch, just changing the if condition, or moved into the library, where the translation of the line ending "\n" would be better handled in error file output...

And a mention in the documentation of slightly different options - one where gnu-emacs would win if both specified under the current coding... Both gnu-emacs and show-filename descriptions could mention the other alternative, or something... and in the man tidy output there is a See also: ... component on some, which seems to make sense here...

What do you think? Thanks...

@balthisar
Copy link
Member

Given that this is not a synonym, then +1 for support.

Also support @geoffmcl's proposal to output the filename in tidy.c to cover cases with no message output, although I might suggest that it be placed in tidyDocRunDiagnostics() instead. Despite this being in the library, it's only emitted when requested by the client, and so follows the server-client model very nicely, and it would be silenced upon request in the same manner that other stuff already is.

(Plus, @geoffmcl, it might be simpler to localize. Right now your issue-713 branch is adding the non-localized string "Tidying.")

I don't think that cross-referencing is important in this case because these options are not synonyms, and are in no way related to each other other than that the both (coincidentally) indicate the filename of the input file.

@ler762, if you want to "complete" your PR with the cross-references, have a look at message.c:1516 and below. You'll want to add two new static const arrays, one for TidyEmacs and one for TidyShowFilename, each referencing the other. Don't forget to terminate with TidyUnknownOption.

Then add them each to docs_xrefs[] below. Please note the alphabetical order.

The man page will automatically include the xrefs next time the project is built on Unix, and the documentation sets will automatically pick them up next time they're built against this executable.

@ler762
Copy link
Contributor

ler762 commented Apr 11, 2018

@balthisar thanks for the instructions on how to add the cross references

@geoffmcl it didn't take me long at all to get to the 'blow everything away and start over' stage :)
But on the plus side, now I know to make a separate branch for each change.

This makes my tidy.c patch in the issue-713 branch entirely redundant... or does it?

No - I see it as a progress indicator. If you're running tidy on a lot of files that are mostly correct, I can see how it'd be nice to have each filename listed so you know the process isn't hung.

And a mention in the documentation of slightly different options - one where gnu-emacs would win if both specified under the current coding...

I like the See also: ... idea, but I'm not so sure about getting down to the level of documenting which one wins if both are specified. Both options change the output format, so specifying both options seems wrong. (but not wrong enough to check for it & warn/die if found)

@balthisar
Copy link
Member

@ler762, you don't really need a "see also"; the xrefs will take care of that, but maybe just a quick note on TidyShowFilename saying that gnu-emacs overrides it.

Just a quick note on formatting: there's no need for trailing backslashes on interrupted lines. C only cares about whitespace in certain, limited circumstances. It doesn't hurt; you're effectively escaping the carriage return, but just weird to look at outside of a Unix shell.

@geoffmcl
Copy link
Contributor

@ler762 thanks for the addition of the See also: cross references to the PR... just built your fork in linux, and love what I see with $ man -l tidy.1...

Still to check a docs gen to see what I get. @balthisar can't remember are the cross references picked up, and how are they shown... Certainly strongly feel if a user is reading say gnu-emacs, it would be great if they are reminded of the show-filename format alternative, and vice versa...

And not so certain we need to say anything about which wins, but clarity, at least on show-filename description could not hurt...

Since we seem to all agree that it would be a good thing to also show the filename even when there are no warnings or errors... more or less extracted from my now dead issue-713 branch, which will get around to deleting - how about adding this patch -

diff --git a/console/tidy.c b/console/tidy.c
index 4afbc04..928630e 100644
--- a/console/tidy.c
+++ b/console/tidy.c
@@ -2405,7 +2405,15 @@ int main( int argc, char** argv )
         if ( argc > 1 )
         {
             htmlfil = argv[1];
-            DEBUG_LOG( SPRTF("Tidying '%s'\n", htmlfil) );
+#ifdef ENABLE_DEBUG_LOG
+            SPRTF("Tidy: '%s'\n", htmlfil);
+#else
+            if ( tidyOptGetBool(tdoc, TidyShowFilename) )
+            {
+                fprintf(errout, "Tidy: '%s'", htmlfil);
+                fprintf(errout, "\n");
+            }
+#endif
             if ( tidyOptGetBool(tdoc, TidyEmacs) || tidyOptGetBool(tdoc, TidyShowFilename) )
                 tidySetEmacsFile( tdoc, htmlfil );
             status = tidyParseFile( tdoc, htmlfil );

Note I have reduced to the product name Tidy: which I understand is exempt from translations anyway... and since this is a new opt-in option, do not feel it necessary to be in the library where it can be silenced by other options... apps using libtidy can look after this on their own... and can wear the untranslated "\n" in windows... but open to further discussion on this...

Have not yet done our regression tests, but again since it is an opt-in option do not expect any problems...

As stated, I have a backlog of PR's to get to, and prefer to do them in order, so the automatic merge of #715 might break, but no problem, it can still be manually merged, if this happens...

This is of course another reason why such changes are done in a fork branch so you are able to keep the underlying next branch in sync, and continually layer your changes on top, keeping the auto merge valid... but as I say should be no problem...

And @petdance, as the openner of this issue, do you have any further comments? thanks...

geoffmcl added a commit that referenced this issue Apr 24, 2018
@geoffmcl geoffmcl added this to the 5.7 milestone Apr 24, 2018
@geoffmcl
Copy link
Contributor

Love the new --show-filename yes option -

F:\Projects\tidy-test\test\input5>tidy --show-filename yes -e -q in_704.html
Tidy: 'in_704.html'
in_704.html: line 1 column 7 - Warning: missing <!DOCTYPE> declaration
in_704.html: line 1 column 7 - Warning: plain text isn't allowed in <head> elements
in_704.html: line 1 column 7 - Warning: inserting implicit <body>
in_704.html: line 1 column 7 - Warning: inserting missing 'title' element

Even tried a $ tidy --show-filename yes -e -q *.html in my input5 folder, but aborted this since there are over 700 test files in there, some of them very nasty, but always saw the filename, even on docs that show "This document has errors ..."...

@ler762 thank you for PR #717... this worked fine, even though it showed from unknown repository?... then I added my tidy.c patch and we were on our way... and also thanks for the gnu-emacs documentation improvements, and the spelling fix... all now in next from version 5.7.13 onwards... And simply closed PR #715 as now redundant...

@petdance assume we can close this... feel free to re-open if I missed something, or add a new issue... thanks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants