Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named entity apos only defined in XML/XHTML (anachronistic) #239

Closed
Justin-Maxwell opened this issue Aug 16, 2015 · 4 comments
Closed

Named entity apos only defined in XML/XHTML (anachronistic) #239

Justin-Maxwell opened this issue Aug 16, 2015 · 4 comments
Assignees
Labels
Milestone

Comments

@Justin-Maxwell
Copy link

Hi. I'm using tidy under sublimelinter. I'm not a developer, so apologies if I get things wrong here.

' wasn't technically valid in HTML4 but is AFAICT under HTML5.

But tidy reports "named entity apos only defined in XML/XHTML" (http://www.htmlpedia.org/wiki/Tidy_5)

Is there a way to turn off specific errors? Google points me at http://tidy.sourceforge.net/ for documentation but I appreciate that is way out of date.

Cheers.

@geoffmcl
Copy link
Contributor

@Justin-Maxwell thanks for checking and reporting...

I do not know sublimelinter except what I just read on the web here and there... and now some donwloaded bits... did not insstall it...

I was pleased to see that it seems to use a separately installed version of Tidy!, but the plugin README.md has some quite old links... and seems to first search for tidy5, then tidy executables... which is ok I guess...

History

In brief when we, HTACG, revived the development of Tidy! we asked to also take over the sourceforege site and repo, to update them, but todate this has not happened. Sorry about that!

For now we have :-

And of course urging system maitainers to add the current HTML Tidy! 5.0.0 as an auto-install, or at least a package installer...

But back to your report...

' wasn't technically valid in HTML4 but is AFAICT under HTML5.

I searched around, but could not find a specific W3C reference that definitively stated an ' entity is ok under html5... Do you know some? Can you give me some pointers?

If it is allowed then the fix is one line... modern Tidy! has sort of two modes. It defaults, starts in, HTML5++ mode, and only if it finds a doctype of an earlier version, it switches back to HTML4-- mode.

This patch would suppress that warning if still in HTML5++ mode...

diff --git a/src/lexer.c b/src/lexer.c
index 07b5274..d4fa9e1 100644
--- a/src/lexer.c
+++ b/src/lexer.c
@@ -983,7 +983,8 @@ static void ParseEntity( TidyDocImpl* doc, GetTokenMode mode )
     if ( TY_(tmbstrcmp)(lexer->lexbuf+start, "&apos") == 0
          && !cfgBool(doc, TidyXmlOut)
          && !lexer->isvoyager
-         && !cfgBool(doc, TidyXhtmlOut) )
+         && !cfgBool(doc, TidyXhtmlOut)
+         && !(TY_(HTMLVersion)(doc) == HT50) ) /* Issue #239 - no warning if in HTML5++ mode */
         TY_(ReportEntityError)( doc, APOS_UNDEFINED, lexer->lexbuf+start, 39 );

     if (( mode == OtherNamespace ) && ( c == ';' ))

If you, or others could verify the W3C references, then this could be added...

Initially, it may be added to the issue-228 branch, which will in due course be merged with master... and thus be in the next release... and maybe pushed back into 5.0.0?

@geoffmcl geoffmcl added this to the 5.1 milestone Aug 17, 2015
@geoffmcl geoffmcl self-assigned this Aug 17, 2015
@Justin-Maxwell
Copy link
Author

Hi Geoff.

Found it... See:

http://www.w3.org/TR/html5/

8 The HTML syntax
8.5 Named character references

http://www.w3.org/TR/html5/syntax.html#named-character-references

aopf;
U+1D552𝕒ap;
U+02248≈apacir;
U+02A6F⩯apE;
U+02A70⩰ape;
U+0224A≊apid;
U+0224B≋
_apos;U+00027'_ApplyFunction;
U+02061⁡approx;
U+02248≈approxeq;
U+0224A≊Aring;
U+000C5Å

As an aside, this list looks like it might be somewhat bigger than the old
XML and HTML versions. So there might be a few more that need to be
included. Fortunately there's http://www.w3.org/TR/html5/entities.json

Cheers

[image: tibdit icon] http://www.tibdit.com/ tibdit
http://www.tibdit.com/

@tibdit on twitter http://www.twitter.com/tibdit

Facebook page https://www.facebook.com/pages/tibdit/470144859751000

Google+ page https://plus.google.com/+tibditMicropayments/posts
justin maxwell ceo & founder

@Justin_Maxwell_ http://twitter.com/Justin_Maxwell_

linkedin.com/in/tibditJustin

On Mon, Aug 17, 2015 at 1:25 PM, Geoff McLane notifications@github.com
wrote:

@Justin-Maxwell https://github.com/Justin-Maxwell thanks for checking
and reporting...

I do not know sublimelinter except what I just read on the web here and
there... and now some donwloaded bits... did not insstall it...

I was pleased to see that it seems to use a separately installed version
of Tidy!, but the plugin
https://github.com/SublimeLinter/SublimeLinter-html-tidy README.md has
some quite old links... and seems to first search for tidy5, then tidy
executables... which is ok I guess...
History

In brief when we, HTACG http://www.htacg.org/, revived the development
of Tidy! we asked to also take over the sourceforege site and repo, to
update them, but todate this has not happened. Sorry about that!

For now we have :-

And of course urging system maitainers to add the current HTML Tidy!
5.0.0 as an auto-install, or at least a package installer...
But back to your report...

' wasn't technically valid in HTML4 but is AFAICT under HTML5.

I searched around, but could not find a specific W3C reference that
definitively stated an ' entity is ok under html5... Do you know
some? Can you give me some pointers?

If it is allowed then the fix is one line... modern Tidy! has sort of
two modes. It defaults, starts in, HTML5++ mode, and only if it finds a
doctype of an earlier version, it switches back to HTML4-- mode.

This patch would suppress that warning if still in HTML5++ mode...

diff --git a/src/lexer.c b/src/lexer.c
index 07b5274..d4fa9e1 100644
--- a/src/lexer.c
+++ b/src/lexer.c
@@ -983,7 +983,8 @@ static void ParseEntity( TidyDocImpl* doc, GetTokenMode mode )
if ( TY_(tmbstrcmp)(lexer->lexbuf+start, "&apos") == 0
&& !cfgBool(doc, TidyXmlOut)
&& !lexer->isvoyager

  •     && !cfgBool(doc, TidyXhtmlOut) )
    
  •     && !cfgBool(doc, TidyXhtmlOut)
    
  •     && !(TY_(HTMLVersion)(doc) == HT50) ) /* Issue #239 - no warning if in HTML5++ mode */
     TY_(ReportEntityError)( doc, APOS_UNDEFINED, lexer->lexbuf+start, 39 );
    

    if (( mode == OtherNamespace ) && ( c == ';' ))

If you, or others could verify the W3C references, then this could be
added...

Initially, it may be added to the issue-228 branch, which will in due
course be merged with master... and thus be in the next release... and
maybe pushed back into 5.0.0?


Reply to this email directly or view it on GitHub
#239 (comment).

@geoffmcl
Copy link
Contributor

@Justin-Maxwell thanks for the references...

Have pushed this fix to the issue-228 branch, and bumped the version...

To try this version -

$ cd tidy-html5
$ git pull # just to make sure it is up-to-date
$ git checkout issue-228
$ git pull # if this is not your first checkout of this branch
$ cd build/cmake
$ cmake ../..
$ make
$ ./tidy -v

If you get a chance to confirm all is now ok, maybe you could close this issue...

With the fix in the issue-228 branch, it will eventually be merged to master...

@Justin-Maxwell
Copy link
Author

Hi.

tested and all good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants