You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.
Currently, the success function—that is, the test that tells us whether a Fathom-selected node is the right one—is simple equality: did it choose the same node as the human labeled? We'd like to crowdsource labeling, but I doubt the crowd will be as strict in their choice of nodes as ruleset authors. For example, I can imagine them selecting either the inner or outer div when trying to find the price node in <div><div>$34.95</div></div>. I doubt they will stick closely to rubrics even if we define them.
Thus, we should explore alternate, more forgiving success functions. Perhaps any node of equal or nearly equal dimensions and placement should be accepted. Perhaps any node with equivalent innerText should be accepted. Perhaps there should be a fuzzy acceptance based on how far off they are. Perhaps success functions should be pluggable in trainees.js, based on the needs of the ruleset or individual out() rule.
Write up some prospective success functions, and see how close you can get to rubric-strict training accuracy with slightly faulty labeling.
The text was updated successfully, but these errors were encountered:
Currently, the success function—that is, the test that tells us whether a Fathom-selected node is the right one—is simple equality: did it choose the same node as the human labeled? We'd like to crowdsource labeling, but I doubt the crowd will be as strict in their choice of nodes as ruleset authors. For example, I can imagine them selecting either the inner or outer div when trying to find the price node in
<div><div>$34.95</div></div>
. I doubt they will stick closely to rubrics even if we define them.Thus, we should explore alternate, more forgiving success functions. Perhaps any node of equal or nearly equal dimensions and placement should be accepted. Perhaps any node with equivalent innerText should be accepted. Perhaps there should be a fuzzy acceptance based on how far off they are. Perhaps success functions should be pluggable in trainees.js, based on the needs of the ruleset or individual out() rule.
Write up some prospective success functions, and see how close you can get to rubric-strict training accuracy with slightly faulty labeling.
The text was updated successfully, but these errors were encountered: