Skip to content
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.

Explore fuzzier success functions for Trainer #12

Closed
erikrose opened this issue Aug 23, 2018 · 1 comment
Closed

Explore fuzzier success functions for Trainer #12

erikrose opened this issue Aug 23, 2018 · 1 comment

Comments

@erikrose
Copy link
Contributor

Currently, the success function—that is, the test that tells us whether a Fathom-selected node is the right one—is simple equality: did it choose the same node as the human labeled? We'd like to crowdsource labeling, but I doubt the crowd will be as strict in their choice of nodes as ruleset authors. For example, I can imagine them selecting either the inner or outer div when trying to find the price node in <div><div>$34.95</div></div>. I doubt they will stick closely to rubrics even if we define them.

Thus, we should explore alternate, more forgiving success functions. Perhaps any node of equal or nearly equal dimensions and placement should be accepted. Perhaps any node with equivalent innerText should be accepted. Perhaps there should be a fuzzy acceptance based on how far off they are. Perhaps success functions should be pluggable in trainees.js, based on the needs of the ruleset or individual out() rule.

Write up some prospective success functions, and see how close you can get to rubric-strict training accuracy with slightly faulty labeling.

@erikrose
Copy link
Contributor Author

Explored. Didn't work out. Tried neural nets. Worked like gangbusters. Trainer is deprecated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant