Context: #75 (comment)
The universal approach of Readability.js is admittedly convenient, though doesn't work for every website, where markup and {js|css}-driven behavior can easily get in the way trying to extract meaningful contents. That becomes a serious issue in the case of a popular website like Fastcompany…
We should start considering adding per-site rules, in order to guarantee users to access readable contents for mainstream websites. A very simplistic approach of how a configuration could look like:
// ./rules/fastcompany.com.js
ReadabilityExt.registerRule({
remove: [
"p > a.people-page img",
// … moar selectors
]
});
Obviously, that means JSDOMParser to support querySelectorAll and friends, though I think there's work currently ongoing on this front.
Of course we should think a little harder of how the whole thing would work, which options would make sense, the possible impact on performance and so on.
We'd have to maintain per-site rules over time as websites' markup change… this could quickly become tedious, so we should restrain a very short list of supported websites. We could also rely on a community effort, though that would mean a super easy way of doing so.
Note: This is quite possibly out of scope for 38.
Thoughts?
Context: #75 (comment)
The universal approach of Readability.js is admittedly convenient, though doesn't work for every website, where markup and {js|css}-driven behavior can easily get in the way trying to extract meaningful contents. That becomes a serious issue in the case of a popular website like Fastcompany…
We should start considering adding per-site rules, in order to guarantee users to access readable contents for mainstream websites. A very simplistic approach of how a configuration could look like:
Obviously, that means JSDOMParser to support
querySelectorAlland friends, though I think there's work currently ongoing on this front.Of course we should think a little harder of how the whole thing would work, which options would make sense, the possible impact on performance and so on.
We'd have to maintain per-site rules over time as websites' markup change… this could quickly become tedious, so we should restrain a very short list of supported websites. We could also rely on a community effort, though that would mean a super easy way of doing so.
Note: This is quite possibly out of scope for 38.
Thoughts?