A framework for extracting meaning from web pages
HTML JavaScript
Permalink
Failed to load latest commit information.
docs Correct a heading in the version history. Feb 21, 2018
examples Keep nudged coeffs from going negative in readability tuner. Dec 22, 2017
test Fix attributesMatch() tests to conform to new param type. Nov 20, 2017
.babelrc Add .babelrc file so fathom can be used as a dep in webpack / babel p… Mar 13, 2017
.eslintignore Ignore rollup config during linting. Nov 3, 2017
.eslintrc.yml Add ESLint eqeqeq rule. Close #68. Jul 17, 2017
.gitignore Add rollup config for creating a single-file build of Fathom. Oct 19, 2017
.npmignore Remove Makefile from built package. Sep 26, 2017
.travis.yml Remove debugging cat, and respell it as a deploy script. Feb 10, 2017
LICENSE Apply the MPL. May 10, 2016
Makefile Get GUI debugging working again. Mar 1, 2017
README.md Take people right to where they left off rather than dumping them at … Apr 19, 2017
clusters.js Add additionalCost to distance(). Jun 28, 2017
exceptions.js Standardize on JSDoc-style comments for anything like anything that's… Mar 6, 2017
fnode.js Come up with a universal math for determining rule prerequisites from… Jun 14, 2017
index.js Add clusters and utils to index.js exports. Oct 23, 2017
lhs.js Clarify some documentation. Nov 1, 2017
optimizers.js Improve annealer performance by about 4-fold. Dec 22, 2017
package-lock.json Add package-lock.json. Feb 21, 2018
package.json Bump version to 2.3. Update readme. Feb 21, 2018
rhs.js Rename a checking routine for consistency with the setter (atMost()) … Jun 20, 2017
rollup.config.js Mark rollup config as experimental. Nov 3, 2017
rule.js Split rules into their own module. Jun 20, 2017
ruleset.js Split rules into their own module. Jun 20, 2017
side.js fixed lint errors Aug 16, 2017
utils.js Fix embarrassing linter failures. Nov 20, 2017

README.md

Fathom

Fathom is a JavaScript framework for extracting meaning from web pages, identifying parts like Previous/Next buttons, address forms, and the main textual content—or classifying a page as a whole. Essentially, it scores DOM nodes and extracts them based on conditions you specify. A Prolog-inspired system of types and annotations expresses dependencies between scoring steps and keeps state under control. It also provides the freedom to extend existing sets of scoring rules without editing them directly, so multiple third-party refinements can be mixed together.

Continue reading at https://mozilla.github.io/fathom/intro.html#why.