Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not silently ignore unclosed tags #338

Closed
wants to merge 4 commits into from
Closed

Conversation

dataflake
Copy link
Collaborator

Fixes #327

@d-maurer This is a draft as food for discussion on a fix for #327. The code change will flag up any unclosed tag - but is that really desirable? As the failing test shows it will also flag up simple tags that few people close, like <br>. But it feels a fix shouldn't introduce code to introspect the unclosed tag to see if it's OK to ignore it.

@dataflake dataflake self-assigned this Feb 19, 2021
@d-maurer
Copy link

the code change will flag up any unclosed tag - but is that really desirable?

This is not desirable -- at least not for HTML templates. HTML defines which tags can be closed automatically and which must not be closed at all. br, for example, is defined as empty which means, it cannot have an end tag.

For XML, on the other hand, all tags must be explicitly closed. It has a special short hand notation to let a tag function as both start and end tag. The HTML/XHTML compatibility rules indicate how the HTML/XHTML discrepancy concerning element definitions should be handled.

I can see that parsing HTML and XML with the same parser is not easy. In my view, however, the zope.pagetemplate parser did a better job in this respect than that of chameleon.

In my opinion, the parser should handle HTML and XML differently. For HTML, it should implement the HTML rules for empty and automatically closed tags; for XML, it should follow the XML rules - with the HTML/XHMTL compatibility guidelines.

@d-maurer
Copy link

d-maurer commented Feb 20, 2021 via email

@dataflake
Copy link
Collaborator Author

When I remember right, then zope.pagetemplate insists that elements with TAL/METAL attributes are explicitly closed and leaves everything else to the renderer. This might be a good compromise.

I had the same idea after looking through the code in zope.pagetemplate and zope.tal, their processing is a lot more involved and the pain of getting that right seemed greater than the pain of unclear error messages for users who forget to close a tag with tal statements on it. I'll look at that option next week.

@dataflake
Copy link
Collaborator Author

@d-maurer It would be possible to inspect the parsing results for the tag attributes and see if any of them are from the namespaces we consider interesting (tal, metal, i18n) but the parse result is nested several levels and that seems a performance drag. Instead, I have taken inspiration from zope.tal to use the same definition they have for tags that are always supposed to be empty. Those will not be put onto the temporary stack that is used to hold unclosed tags. What's your opinion on that?

@d-maurer
Copy link

d-maurer commented Feb 24, 2021 via email

@dataflake
Copy link
Collaborator Author

chameleon uses quite aggressive caching. This means that it parses not very often. This might indicate that some parsing slowdown could be acceptable.

OK, I'll try that next

However, I doubt that this will resolve the initial problem which has been that unclosed elements were silently ignored even if they carried TAL/METAL attributes.

I don't make these changes blindly or into the blue. If you look at the PR changes you will notice that it includes a test that, among others, proves that your original examples are covered. Unclosed elements that are not among those considered empty by the HTML specs will throw up a parsing error.

@dataflake
Copy link
Collaborator Author

@d-maurer Only putting tags with attributes we care about onto the parser's stack of unclosed tags won't help. It breaks as closing tags are encountered that are not on that stack because they didn't contain "interesting" attributes.

Consider this PR a suggestion. It attempts to get closer to a solution without rewriting larger parts of that parser. It works correctly with XHTML/XML, the list of "empty" tags will be left off the stack regardless of them being unclosed for HTML (<br>) or closed for XML (<br/>). It does solve the problem you pointed out, see the tests.

If you believe this is a bad solution or it is too simplistic then you're welcome to say so. Like I said, I'm not going to rewrite the parser. The underlying issue is not urgent for me, anyway, and I don't mind closing the PR and leaving the issue around.

@d-maurer
Copy link

d-maurer commented Feb 26, 2021 via email

@dataflake
Copy link
Collaborator Author

Like I said, I won't get into reimplementing the parser, and my attempt at ignoring tags that don't have TAL/METAL attributes did not work. I'll close this PR and create a release then.

@dataflake dataflake closed this Feb 26, 2021
@dataflake dataflake deleted the dataflake/issue_327 branch February 26, 2021 07:40
@malthe
Copy link
Owner

malthe commented Feb 26, 2021

It might be better to require proper XML formatting and then have an output option to render to HTML5, e.g. render down a <br /> to <br>.

@dataflake
Copy link
Collaborator Author

Requiring proper XML formatting will break thousands of existing templates...

@malthe
Copy link
Owner

malthe commented Feb 26, 2021

@dataflake yeah I suppose an HTML parsing layer is desirable. That shouldn't be hard, it's just a matter of modifying ElementParser to automatically close tags that appear in a list of self-closing tags.

@d-maurer
Copy link

d-maurer commented Feb 26, 2021 via email

@malthe
Copy link
Owner

malthe commented Feb 26, 2021

It might be good enough to start with the subset of typically used rules. I don't think very many people interested in Chameleon will use an <li> without closing it.

@d-maurer
Copy link

d-maurer commented Feb 26, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(tal) start tags with missing end tag are silently ignored
3 participants