Use training data (hard labels) in LabelModel fit()? #1642

moscow25 · 2021-04-05T18:07:44Z

Hi! It seems there's no way to directly specify know labels in the dataset, as features, or otherwise?

assign higher initial precision to some functions but not others
fix values for some values, and not allow those to be changed by the bootstrapping, in a hard or soft way

snorkel/snorkel/labeling/model/label_model.py

Line 44 in ed77718

prec_init

Do I understand that correctly? I'm happy to try implementing a "confidence" vector by labeling function/feature. Does not really play nicely with the l2 penalty for all features, but that can also be tweaked to treat features differently.

The text was updated successfully, but these errors were encountered:

fredsala · 2021-04-12T13:48:53Z

The guiding principle behind the LabelModel is that no labeled data is available. For this reason, there isn't an implementation in the codebase now, but you could fairly easily write a wrapper of the LabelModel that permits such operations.

If you do have this kind of information, it's definitely a reasonable thing to do. There are theoretical studies of how one would do this. For example, to understand the conditions for how to use known labels (and combine with standard LabelModel estimates) when estimating accuracies, see our AISTATS paper.

moscow25 · 2021-04-12T18:44:47Z

Thanks @fredsala. Yes in a case where one has labeled data, seems strange not to use it. Do I understand correctly there is no intent to implement a tag for labeled data, as suggested in your paper, or otherwise?

I agree it's not impossible to have the priors and mu handled differently for labels known to be correct/likely correct.

moscow25 · 2021-04-19T01:44:31Z

@fredsala I implemented a simple tweak to allow for training with labeled examples.

simple CrossEntropy loss for golden examples vs probabilities from the labeling emsemble
add another loss term, with optional weight w/r/t to "Snorkel loss"
as with normal training, it also just change labeling function "mu" weights and nothing else
work well in practice... (as-good or better as adding golden labels as separate function multiple times to push mu to 1.0)

In other words, it tries to fit the labeling function weights both to the current loss, and to an optional training dataset, and you can set the ration between those losses (they are on the same scale, works ok in my examples even at 50/50).

Let me know if that's something you want as a pull request?

github-actions · 2021-07-18T12:05:41Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

rjurney · 2021-07-31T22:46:16Z

@moscow25 I would love a pull request for this!

moscow25 · 2021-08-02T03:49:21Z

@rjurney let me clean up and push the cleaned-up version, but it's really just a few lines -- messily represented here

moscow25@4dd4283

What I'm doing is

passing in labeled examples [actually a label for all examples, with ? for unlabeled]
computing softmax (for current mu value on labeling function) on labeled examples
computing supervised loss between labels and the softmax
returning final loss as a weighted average of "snorkel loss" and "softmax (supervised) loss"

In practice on problems I tried, the two losses are of the same scale, so averaging them in 80-20 to 20-80 average works just fine, depending on your problem. All you are doing is fitting labeling function weights to the labeled examples, while also minimizing snorkel loss if you like.

Apologies for the messy code. Let me know, and happy to push a version with minimal changes to the production Snorkel.

rjurney · 2021-08-03T05:00:47Z

I’d be curious to see it!

On Sun, Aug 1, 2021 at 8:49 PM Nikolai Yakovenko ***@***.***> wrote: @rjurney <https://github.com/rjurney> let me clean up and push the cleaned-up version, but it's really just a few lines -- messily represented here ***@***.*** <moscow25@4dd4283> What I'm doing is - passing in labeled examples [actually a label for *all* examples, with ? for unlabeled] - computing softmax (for current mu value on labeling function) on labeled examples - computing supervised loss between labels and the softmax - returning final loss as a weighted average of "snorkel loss" and "softmax (supervised) loss" In practice on problems I tried, the two losses are of the same scale, so averaging them in 80-20 to 20-80 average works just fine, depending on your problem. All you are doing is fitting labeling function weights to the labeled examples, while also minimizing snorkel loss if you like. Apologies for the messy code. Let me know, and happy to push a version with minimal changes to the production Snorkel. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1642 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAKJJPP353ZKEJR3NH5AADT2YIUZANCNFSM42NGYA2Q> .

-- Thanks, Russell Jurney @rjurney <http://twitter.com/rjurney> ***@***.*** LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com

github-actions bot added the no-issue-activity label Jul 18, 2021

github-actions bot closed this as completed Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use training data (hard labels) in LabelModel fit()? #1642

Use training data (hard labels) in LabelModel fit()? #1642

moscow25 commented Apr 5, 2021

fredsala commented Apr 12, 2021

moscow25 commented Apr 12, 2021

moscow25 commented Apr 19, 2021 •

edited

github-actions bot commented Jul 18, 2021

rjurney commented Jul 31, 2021

moscow25 commented Aug 2, 2021

rjurney commented Aug 3, 2021 via email

Use training data (hard labels) in LabelModel fit()? #1642

Use training data (hard labels) in LabelModel fit()? #1642

Comments

moscow25 commented Apr 5, 2021

fredsala commented Apr 12, 2021

moscow25 commented Apr 12, 2021

moscow25 commented Apr 19, 2021 • edited

github-actions bot commented Jul 18, 2021

rjurney commented Jul 31, 2021

moscow25 commented Aug 2, 2021

rjurney commented Aug 3, 2021 via email

moscow25 commented Apr 19, 2021 •

edited