Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader mode button doesn't appear on Project Zero blogspot post #503

Open
evilpie opened this issue Dec 20, 2018 · 5 comments
Open

Reader mode button doesn't appear on Project Zero blogspot post #503

evilpie opened this issue Dec 20, 2018 · 5 comments

Comments

@evilpie
Copy link

evilpie commented Dec 20, 2018

The reader mode button doesn't appear in the urlbar for: https://googleprojectzero.blogspot.com/2018/12/on-vbscript.html

about:reader?url=https://googleprojectzero.blogspot.com/2018/12/on-vbscript.html however seems to mostly work, just the code snippets are a bit mangled.

@gijsk
Copy link
Contributor

gijsk commented Dec 20, 2018

The issue is that this page uses a pile of <div>s with no <br>s and it therefore doesn't pass the simple detection algorithm in https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/Readability-readerable.js .

We can't just add all <div>s to that because it would cause the detection speed to plummet.

This can easily be fixed by the site by switching to semantic markup ie using <p> for paragraph text.

@evilpie evilpie changed the title Reade mode button doesn't appear on Project Zero blogspot post Reader mode button doesn't appear on Project Zero blogspot post Dec 21, 2018
@EvsChen
Copy link
Contributor

EvsChen commented Feb 19, 2019

I noticed in the website their first para has a textContent.length of 523. However, our criteria is >540 for "paragraphs". But it's something that should be "readerable".

@gijsk
Copy link
Contributor

gijsk commented Feb 19, 2019

I noticed in the website their first para has a textContent.length of 523. However, our criteria is >540 for "paragraphs". But it's something that should be "readerable".

Sorry, what bit of code are you referring to? Off-hand I don't see 540 anywhere in our codebase. Is my diagnosis in #503 (comment) not correct? The reader mode button in Firefox is governed by the code in https://github.com/mozilla/readability/blob/master/Readability-readerable.js and I think we're just not picking things up at all because the text content is in <div>s instead of <p>s...

@EvsChen
Copy link
Contributor

EvsChen commented Feb 20, 2019

540 because we have score += Math.sqrt(textContent.length - 140). And we would return true for score > 20. So the minimum is 540.

The problem here is about <div> and <p>. But in most cases we cannot expect the site owner to change that. I'm just wondering if there is any way for us to regard this kind of <div> as <p>.

@gijsk
Copy link
Contributor

gijsk commented Feb 20, 2019

540 because we have score += Math.sqrt(textContent.length - 140). And we would return true for score > 20. So the minimum is 540.

Right, but that can be reached by multiple paragraphs, right? The entire article as a whole should be big enough.

The problem here is about <div> and <p>. But in most cases we cannot expect the site owner to change that. I'm just wondering if there is any way for us to regard this kind of <div> as <p>.

Yeah... There's no classes on the <div>s themselves though, only on some of the containers. Perhaps, like for #420 , we need some per-domain rules to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants