Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed compilation errors in macros. Whitespace handling in Tree Builder. #205

Closed
wants to merge 4 commits into from

Conversation

@martinrlilja
Copy link

martinrlilja commented Apr 6, 2016

Hi!

I’m working on a project which uses html5ever and I noticed that the tree builder handles newlines and whitespace outside of the body and head tags in a slightly odd way.

Take a simple document like this:

<!DOCTYPE html>
<html>
    <head></head>
    <body></body>
</html>

Currently html5ever ignores the newline and indentation of head. For example, the print-rcdom example yields this on the above document:

#Document
    <!DOCTYPE html "" "">
    <html>
        <head>
        #text: \n    
        <body>
            #text: \n\n

So I changed the BeforeHead rule in tree_builder/rules.rs to treat text in the same way AfterHead does.
(As a side note, doing so yielded some errors in the macros crate which I fixed. Those fixes are also part of this pr.)

#Document
    <!DOCTYPE html "" "">
    <html>
        #text: \n    
        <head>
        #text: \n    
        <body>
            #text: \n\n

So there is now a newline and indentation between the html and head tags. Perfect! But there are still two newlines inside the empty body tag. (Both of them exists after the end tag of body.) I’m guessing they come from the AfterBody rule. Which, I guess, is there in case there is text after the body. But it seems odd that it affects a valid document like this.

So without knowing the spec, maybe only text up until the last non-whitespace character should be inserted into the body and the rest should be where it originally occurred in the input stream.

The output should then look like this:

#Document
    <!DOCTYPE html "" "">
    <html>
        #text: \n    
        <head>
        #text: \n    
        <body>
        #text: \n
    #text: \n

Does this seem reasonable?


This change is Reviewable

@nox
Copy link
Member

nox commented Apr 6, 2016

If you look at the spec, you'll see it tells to do exactly what we do:

https://html.spec.whatwg.org/multipage/syntax.html#the-before-head-insertion-mode

  • A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
    • Ignore the token.
@nox
Copy link
Member

nox commented Apr 6, 2016

What was the first commit for, btw?

@martinrlilja
Copy link
Author

martinrlilja commented Apr 6, 2016

Oh, thanks, should’ve ran the tests :/

When I changed tree_builder/rules.rs it wanted me to compile with cargo build --features codegen which gave a bunch of errors because apparently syntax::Parser functions no longer return a FatalError.

@jdm
Copy link
Member

jdm commented Apr 26, 2016

r? @nox

@jdm
Copy link
Member

jdm commented Oct 3, 2016

Shoot, I'm really sorry we dropped the ball on reviewing these changes. This would have avoided #216 (or at least alerted us sooner); as it is, we're replacing the macros in #217, so I'm going to go ahead and close this. Sorry for the frustrating experience!

@jdm jdm closed this Oct 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.