Skip to content

Fix readability to work with real dom (fixes #72)#78

Closed
gijsk wants to merge 4 commits into
mozilla:masterfrom
gijsk:fix-readability-to-work-with-real-dom
Closed

Fix readability to work with real dom (fixes #72)#78
gijsk wants to merge 4 commits into
mozilla:masterfrom
gijsk:fix-readability-to-work-with-real-dom

Conversation

@gijsk
Copy link
Copy Markdown
Contributor

@gijsk gijsk commented Mar 25, 2015

This really needs careful review and probably another iteration or two.

There is 1 failing test, the TMZ one, with jsdom, because it contains a comment.

I don't want to add comment parsing to JSDOMParser.
I don't want to have different testcase output for jsdom.
I don't really want to strip all the comments explicitly because it'll need another DOM walk and is essentially useless.

Thoughts? @leibovic @n1k0 @DavidBruant

@gijsk
Copy link
Copy Markdown
Contributor Author

gijsk commented Mar 25, 2015

Huh, so the travis stuff rebases, which I've not done, and so there's another test failure because of the twitter addition that I just merged. Easy to fix, though...

@gijsk gijsk force-pushed the fix-readability-to-work-with-real-dom branch from 85a8e2c to 9cee135 Compare March 25, 2015 17:15
Comment thread Readability.js
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I could totally see this going away in order to avoid starting dealing with implementation-specific behavior; though, if you think this has a very significant impact on performances we should probably keep it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect so, but I've not measured (which is generally true for this patch, and should probably happen before we decide to land it). I will be away from tomorrow until the second half of Monday, so if you have time to look at how this performs that would be helpful. I'm not sure what the best way of measuring this is now that both tests run in the same test framework...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to come with a simple matcha banchmark to measure this.

@n1k0
Copy link
Copy Markdown
Contributor

n1k0 commented Mar 26, 2015

Looks good, though the tmz-1 test is failing, because of some unstripped HTML comment it seems; shouldn't be too hard to fix.

While betting on jsdom as a representative real-world DOM impl. should probably be challenged at some point, we're not in a position to fiddle a lot and I think we should land this.

Thoughts and feedback from @leibovic & @DavidBruant would be appreciated :)

@gijsk
Copy link
Copy Markdown
Contributor Author

gijsk commented Mar 26, 2015

@n1k0 Thanks for reviewing! Yes, I think it'd be good if others would weigh in, too. How would you fix the comment issue? Strip it on the real DOM implementations? Implement in JSDOMParser so we don't kill it there? (otherwise, if we add it to the testcase, the jsdom-based test will succeed and the JSDOMParser-based one will fail...)

@n1k0
Copy link
Copy Markdown
Contributor

n1k0 commented Mar 26, 2015

I wish we had more serialization options here… Adding support for comments to JSDOMParser seems unavoidable, though that'd be only to make tests pass :(

I'll look into this tomorrow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the & normal? I'd write the URL with and & directly (but maybe I'd be wrong)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC they are not strictly necessary in HTML unless there is a ; before the next non-alphanumeric-non-# character, but I could be wrong. Having them, however, is not incorrect.

Generally, the encoding/decoding of attributes that this patch does could do with another look, too.

@DavidBruant
Copy link
Copy Markdown
Contributor

Beyond the nits and questions, it's hard for me to give a review as I'm not super-familiar with the code. In any case, I believe this change is important.

@gijsk
Copy link
Copy Markdown
Contributor Author

gijsk commented Mar 30, 2015

Closing this in favour of pull request #80 . Let's sync up about that tomorrow.

@gijsk gijsk closed this Mar 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants