Skip to content
A framework for extracting meaning from web pages
JavaScript Python Makefile HTML Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cli Merge fathom-extract progress bar. Close #171. Jan 2, 2020
docs Remove all mention of fathom-trainees, which has been obsoleted by a … Jan 3, 2020
test Fix linter errors. Dec 2, 2019
tooling Bump requests from 2.18.1 to 2.20.0 in /tooling Dec 3, 2019
.babelrc
.eslintignore
.eslintrc.yml Try to make linter happy on Travis. Dec 2, 2019
.gitignore
.npmignore Don't ship the Python CLI tools with the npm module. May 23, 2019
.travis.yml See if a more complete Python environment make Python 3's venv module… Sep 6, 2019
CODE_OF_CONDUCT.md Add Mozilla Code of Conduct file Apr 19, 2019
LICENSE
Makefile Update dev docs. Dec 12, 2019
README.md Update the docs excerpt in the readme. May 23, 2019
clusters.mjs Remove wu dependency. Close #99. Nov 23, 2019
exceptions.mjs
fnode.mjs Remove mention of score multiplication. May 8, 2019
index.mjs
lhs.mjs Tweak wording and formatting. Jul 17, 2019
package-lock.json We haven't used the leven package since we ripped out the Readability… Dec 2, 2019
package.json
rhs.mjs
rollup.config.js
rule.mjs Remove wu dependency. Close #99. Nov 23, 2019
ruleset.mjs Move Ruleset constructor arg docs to actual constructor doclet. May 23, 2019
side.mjs
utils.mjs
utilsForBackend.mjs
utilsForFrontend.mjs

README.md

Fathom

Fathom is a supervised-learning system for recognizing parts of web pages—pop-ups, address forms, slideshows—or for classifying a page as a whole. A DOM flows in one side, and DOM nodes flow out the other, tagged with types and probabilities that those types are correct. A Prolog-like language makes it straightforward to specify the “smells” that suggest each type, and a neural-net-based trainer determines the optimal contribution of each smell. Finally, the FathomFox web extension lets you collect and label a corpus of web pages for training.

Continue reading at https://mozilla.github.io/fathom/intro.html#why.

Documentation

You can’t perform that action at this time.