Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy emits warnings that aren't in order #696

Closed
petdance opened this issue Mar 16, 2018 · 15 comments
Closed

Tidy emits warnings that aren't in order #696

petdance opened this issue Mar 16, 2018 · 15 comments

Comments

@petdance
Copy link
Contributor

$ cat -n t/bad.html
     1  <html>
     2      <head>
     3          <title>Test Page</title>
     4      </head>
     5      <body>
     6          Test Page <a href="bad.html">Back to bad</a>
     7          <a hrex="goodlinks.html">good</a>
     8          <a href="bad2.html">Test</b>
     9      </body>
    10  </html>
    11
$ tidy -q -e t/bad.html
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 8 column 33 - Warning: discarding unexpected </b>
line 8 column 9 - Warning: missing </a>
line 7 column 9 - Warning: <a> proprietary attribute "hrex"

I would expect that the warning for line 7 would come before the warnings for line 8.

@geoffmcl
Copy link
Contributor

@petdance thank you for the issue...

Yes, warnings and errors are emitted, by default to stderr, when they are found...

I think the first 3 are during the document parsing phase, and the last during the tidyDocCleanAndRepair phase...

There are many other warnings that can be issued during this clean and repair phase, so they too would appear out-of-document-order... even out of order with each other...

Not that it is important, tidy has been doing this since some time before the last cvs source release in 2009... thus this feature! was inherited in the src move to github...

It may have been brought up previously in these issues, like #561, but maybe others...

As indicated there, if we wanted the messages in strict document order, the initial parsing phase messages would need to be buffered, and the later items added and then sorted before any output... could be done, not so difficult...

And since apps using libTidy could get all the such messages in a buffer, and they could apply their own sorting, before showing the messages...

Is this worth the effort? Maybe... as indicated it would mean delaying all message output until the library is finished with all phases of the document...

Look forward to further feedback to determine is this a Feature Request that should be added to tidy.c, the console app... thanks...

@geoffmcl geoffmcl added this to the 5.7 milestone Mar 16, 2018
@petdance
Copy link
Contributor Author

If you want to call it a feature request, go ahead. It's not that big a deal to me, other than I was surprised. I'd push harder for it if it were trivial, but clearly it is not.

@ler762
Copy link
Contributor

ler762 commented Apr 6, 2018

Maybe this is another old issue that could be resolved by adding an EXAMPLES section to the man page?

$ tidy -q -e bad.html 2>&1 | sort -n -k 2,2 -k 4,4
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 7 column 1 - Warning: <a> proprietary attribute "hrex"
line 8 column 1 - Warning: missing </a>
line 8 column 25 - Warning: discarding unexpected </b>

Three minutes thought would suffice to find this out;
but thought is irksome and three minutes is a long time.
-- A. E. Housman

@balthisar
Copy link
Member

@petdance, it should be fairly trivial, but memory consumption could skyrocket. Not sure that that's important in this day and age, but libtidy can run currently on some pretty minimal systems.

Right now each message is dealloc'd after emission, but keeping them around until tidyFree() is doable, and a simple structure to hold references could be added to tidyDocImpl. Then the existing message API can fetch the messages. A quick sort on the line and column numbers, output the messages, and it's done.

@geoffmcl is on record elsewhere that this doesn't belong in Tidy, although it seems maybe he's coming around.

@petdance
Copy link
Contributor Author

petdance commented Apr 9, 2018

There are really two questions here:

  1. should libtidy optionally allow returning the text of the HTML as part of the warning?
  2. should the CLI tool optionally display the text of the HTML that caused the warning?

@petdance
Copy link
Contributor Author

petdance commented Apr 9, 2018

I don't have any strong feelings about whether the message stuff should be in libtidy. I just need a yes/no on whether it will, so I can act accordingly on HTML::Tidy5 in Perl.

@balthisar
Copy link
Member

@petdance, are you using LibTidy, or the CLI for HTML::Tidy5? Using the library directly is the way to go, so that you can use the TidyMessageCallback API, which is future proof. If we ever decide on a way to provide an HTML extract as part of the TidyMessage, then it will be queryable with the TidyMessage API. You never, ever have to worry about what the CLI client does.

For your two questions in particular, though:

  1. Yes, if we figure out a way, then the extract would be part of the TidyMessage object.
  2. This is tricky, because right now the CLI doesn't display text; the library does this directly for legacy reasons. Ideally we'd partition these roles, and let the client take over responsibility for this display. The neat thing, it enables sorting, if desired, as well as a few other nifty things. Tidy console really should just be another client.

@petdance
Copy link
Contributor Author

petdance commented Apr 9, 2018

just need a yes/no on whether it will, so I can act accordingly on HTML::Tidy5 in Perl.

are you using LibTidy, or the CLI for HTML::Tidy5?

I think you may be confused by what I meant by HTML::Tidy5. HTML::Tidy5 is the Perl module that I created that wraps libtidy. https://search.cpan.org/dist/HTML-Tidy5

@balthisar
Copy link
Member

@petdance, I know about the module, I was simply wondering if you wrapped the console application, or the library directly. There are a lot of tools out there that wrap the console application and try to scrape STDOUT and STDERR instead of using the library.

Actually, some of the library users may be using the output sinks directly and scaping them for information, too. If you're doing this, or using one of the older message/filter callbacks, take a look at the newer API in TidyMessageCallback.

It was inspired by @gagern if I recall correctly, and really is the future-proof way to collect all message output that LibTidy will ever create. And if we ever add an extract, it's a simple, new accessor that we add to the API that you can pick up, too.

@petdance
Copy link
Contributor Author

petdance commented Apr 9, 2018

HTML::Tidy5 is the Perl module that I created that wraps libtidy.

@balthisar
Copy link
Member

Yeah, I got that. My previous message wasn't asking for clarification from you; I was clarifying why I had asked! And because you're using the library, I went on to explain why you should be using the message API.

@petdance
Copy link
Contributor Author

petdance commented Apr 9, 2018

I went on to explain why you should be using the message API.

Then I'm not understanding what you're talking about. Please say more.

@balthisar
Copy link
Member

I'm not knowledgeable about Perl modules, but in your XS file, I see that you're using output buffers to capture Tidy's standard and error output. If you can configure a callback of type TidyMessageCallback, you can collect each of Tidy's messages constituent parts as they are generated. The callback will provide a TidyMessage object, which you can interrogate with the TidyMessageCallback API.

This decouples you completely from any dependency on how and what stderr and stdout have in their buffers, because you now have access to all of the data that were used to generate the messages.

A quick Google check seems to indicate that XS supports callbacks, so this may be an extensible avenue for you, especially if you want to be able to sort output rather than waiting for the Tidy CLI to offer that as a feature.

If you look at tidy.c:1919, you can see an example implementation. It's ifdef'd out, but is a fairly complete implementation.

@petdance
Copy link
Contributor Author

petdance commented Apr 9, 2018

Thanks. I'll try to make some time to look into this.

@balthisar
Copy link
Member

I will close this as there has been no additional conversation, and the message API can solve the issue for library clients. Please feel free to re-open if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants