Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy can't deal with <中文> XML tags #913

Closed
jidanni opened this issue Nov 23, 2020 · 4 comments
Closed

Tidy can't deal with <中文> XML tags #913

jidanni opened this issue Nov 23, 2020 · 4 comments

Comments

@jidanni
Copy link
Contributor

jidanni commented Nov 23, 2020

<?xml version="1.0" encoding="UTF-8" standalone="no"?><Lands><國有><土地標示部><縣市>臺中市</縣市><鄉鎮市區
causes
$ tidy -xml #5.6
to say

Tidy found 0 warnings and 34089 errors! Not all warnings/errors were shown.

Despite it being allowed.

@geoffmcl
Copy link
Contributor

@jidanni thank you for your issue...

Can you please give a small, complete, valid xml sample file, for testing, understanding, exploring, etc...

And to avoid any copy/paste issues, please zip the sample file, and attach the zip to this issue - just drag and drop it into your comment...

The first step is to have a solid, repeatable sample... to even begin to look into this... thanks...

@jidanni
Copy link
Contributor Author

jidanni commented Nov 26, 2020

b_1646.xml.gz

@geoffmcl
Copy link
Contributor

@jidanni thank you for adding a valid xml sample...

But for future reference that sample could have been reduced to just a few lines... with just one multi-byte Unicode tag, like <縣市>anything</縣市>... that would really assists the debugging effort...

In debugging I now think this is related to issue #878 ...

Since tidy incorrectly encodes a multi-byte Unicode start tag, into the lexer, and copied into the node, the tag later can not be match to the correctly encoded same multi-byte closing tag, with catastrophic, cascading, errors thereafter...

I am so sure this is the issue here, I am closing this, until #878 is solved... any help in that appreciated...

But for sure will re-open this, if it turns out not to be the case... thanks...

@jidanni
Copy link
Contributor Author

jidanni commented Nov 28, 2020

But for future reference that sample could have been reduced

Oh, I thought you wanted a real life one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants