Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kuchiki tests should use strict mode #963

Open
jyn514 opened this issue Aug 13, 2020 · 11 comments
Open

Kuchiki tests should use strict mode #963

jyn514 opened this issue Aug 13, 2020 · 11 comments
Labels
A-frontend Area: Web frontend C-technical-debt Category: This makes the code harder to read and modify, but has no impact on end users E-easy Effort: Should be easy to implement and would make a good first PR P-medium Medium priority

Comments

@jyn514
Copy link
Member

jyn514 commented Aug 13, 2020

This would have caught the issues in #957 (comment) automatically.

@jyn514 jyn514 added E-easy Effort: Should be easy to implement and would make a good first PR A-frontend Area: Web frontend P-medium Medium priority C-technical-debt Category: This makes the code harder to read and modify, but has no impact on end users labels Aug 13, 2020
@shorsher
Copy link
Contributor

Happy to do some work on this, I’ll probably reach out for some initial help getting started though.

@jyn514
Copy link
Member Author

jyn514 commented Aug 15, 2020

Hmm, I don't remember what I was thinking of when I wrote this. @Nemo157 mentioned it originally - do you know where the kuchiki setting for this is?

I do see https://docs.rs/kuchiki/0.8.1/kuchiki/struct.ParseOpts.html#structfield.on_parse_error, maybe we could set it to panic if it sees an error?

@Kixiron
Copy link
Member

Kixiron commented Aug 15, 2020

I think it's probably TreeBuilderOpts.exact_errors and TokenizerOpts.exact_errors

@jyn514
Copy link
Member Author

jyn514 commented Aug 15, 2020

No, that just reports more detailed errors, it doesn't change the behavior: https://docs.rs/html5ever/0.25.1/src/html5ever/tree_builder/mod.rs.html#474

@Kixiron
Copy link
Member

Kixiron commented Aug 15, 2020

Report all parse errors described in the spec, at some performance penalty

Sounds like it reports all the errors it encounters instead of being a browser and allowing malformed html to be silently accepted

@Nemo157
Copy link
Member

Nemo157 commented Aug 15, 2020

TBH when suggesting that I had no idea whether kuchiki had such an option, it just seemed like something it would likely support.

@shorsher
Copy link
Contributor

So we want to panic on error? Or does this need to be closed. Looking through the kuchiki docs, couldn’t anything about a “strict” mode.

@jyn514
Copy link
Member Author

jyn514 commented Aug 16, 2020

Let's start by panicking on error and if that has too many false positives we can reconsider. The issues in #957 (comment) were pretty blatant, I'd be surprised if we couldn't find some way to catch that in kuchiki.

@shorsher
Copy link
Contributor

So threw together something quick for initial testing, every parse I've tried seems to fail with "unexpected token", below being one example

            let mut opts = kuchiki::ParseOpts::default();
            opts.on_parse_error = Some(Box::new(move |err| {
                eprintln!("HTML parse error: {:?}", err);
                panic!(err.to_string());
            }));

            let full = kuchiki::parse_html_with_options(opts)
                .one(web.get("/releases/queue").send()?.text()?);

and

HTML parse error: "Unexpected token"
thread 'web::test::test_doc_coverage_for_crate_pages' panicked at 'Unexpected token', src/web/mod.rs:675:17

New to rust and not too familiar with html5ever, but should we allow "unexpected token" and only panic on other errors? Or let me know if something seems wrong in general.

@jyn514
Copy link
Member Author

jyn514 commented Aug 16, 2020

This might be because we have <script> tags after the </body>. I have a fix as part of #980 I could separate out.

@shorsher
Copy link
Contributor

Alright, sounds good. I'll wait until it lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-frontend Area: Web frontend C-technical-debt Category: This makes the code harder to read and modify, but has no impact on end users E-easy Effort: Should be easy to implement and would make a good first PR P-medium Medium priority
Projects
None yet
Development

No branches or pull requests

4 participants