Explore fuzzier success functions for Trainer #12

erikrose · 2018-08-23T16:25:32Z

Currently, the success function—that is, the test that tells us whether a Fathom-selected node is the right one—is simple equality: did it choose the same node as the human labeled? We'd like to crowdsource labeling, but I doubt the crowd will be as strict in their choice of nodes as ruleset authors. For example, I can imagine them selecting either the inner or outer div when trying to find the price node in <div><div>$34.95</div></div>. I doubt they will stick closely to rubrics even if we define them.

Thus, we should explore alternate, more forgiving success functions. Perhaps any node of equal or nearly equal dimensions and placement should be accepted. Perhaps any node with equivalent innerText should be accepted. Perhaps there should be a fuzzy acceptance based on how far off they are. Perhaps success functions should be pluggable in trainees.js, based on the needs of the ruleset or individual out() rule.

Write up some prospective success functions, and see how close you can get to rubric-strict training accuracy with slightly faulty labeling.

The text was updated successfully, but these errors were encountered:

erikrose · 2019-06-26T20:27:30Z

Explored. Didn't work out. Tried neural nets. Worked like gangbusters. Trainer is deprecated.

erikrose mentioned this issue Aug 30, 2018

Selectable or pluggable success functions #10

Closed

erikrose mentioned this issue Sep 10, 2018

Allow trainee extensions to run arbitrary callables to determine success #18

Closed

erikrose closed this as completed Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore fuzzier success functions for Trainer #12

Explore fuzzier success functions for Trainer #12

erikrose commented Aug 23, 2018

erikrose commented Jun 26, 2019

Explore fuzzier success functions for Trainer #12

Explore fuzzier success functions for Trainer #12

Comments

erikrose commented Aug 23, 2018

erikrose commented Jun 26, 2019