Skip to content

Site-specific rules #89

@n1k0

Description

@n1k0

Context: #75 (comment)

The universal approach of Readability.js is admittedly convenient, though doesn't work for every website, where markup and {js|css}-driven behavior can easily get in the way trying to extract meaningful contents. That becomes a serious issue in the case of a popular website like Fastcompany…

We should start considering adding per-site rules, in order to guarantee users to access readable contents for mainstream websites. A very simplistic approach of how a configuration could look like:

// ./rules/fastcompany.com.js
ReadabilityExt.registerRule({
  remove: [
    "p > a.people-page img",
    // … moar selectors
  ]
});

Obviously, that means JSDOMParser to support querySelectorAll and friends, though I think there's work currently ongoing on this front.

Of course we should think a little harder of how the whole thing would work, which options would make sense, the possible impact on performance and so on.

We'd have to maintain per-site rules over time as websites' markup change… this could quickly become tedious, so we should restrain a very short list of supported websites. We could also rely on a community effort, though that would mean a super easy way of doing so.

Note: This is quite possibly out of scope for 38.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions