Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use training data (hard labels) in LabelModel fit()? #1642

Closed
moscow25 opened this issue Apr 5, 2021 · 7 comments
Closed

Use training data (hard labels) in LabelModel fit()? #1642

moscow25 opened this issue Apr 5, 2021 · 7 comments

Comments

@moscow25
Copy link

moscow25 commented Apr 5, 2021

Hi! It seems there's no way to directly specify know labels in the dataset, as features, or otherwise?

  • assign higher initial precision to some functions but not others
  • fix values for some values, and not allow those to be changed by the bootstrapping, in a hard or soft way

Do I understand that correctly? I'm happy to try implementing a "confidence" vector by labeling function/feature. Does not really play nicely with the l2 penalty for all features, but that can also be tweaked to treat features differently.

@fredsala
Copy link
Contributor

The guiding principle behind the LabelModel is that no labeled data is available. For this reason, there isn't an implementation in the codebase now, but you could fairly easily write a wrapper of the LabelModel that permits such operations.

If you do have this kind of information, it's definitely a reasonable thing to do. There are theoretical studies of how one would do this. For example, to understand the conditions for how to use known labels (and combine with standard LabelModel estimates) when estimating accuracies, see our AISTATS paper.

@moscow25
Copy link
Author

Thanks @fredsala. Yes in a case where one has labeled data, seems strange not to use it. Do I understand correctly there is no intent to implement a tag for labeled data, as suggested in your paper, or otherwise?

I agree it's not impossible to have the priors and mu handled differently for labels known to be correct/likely correct.

@moscow25
Copy link
Author

moscow25 commented Apr 19, 2021

@fredsala I implemented a simple tweak to allow for training with labeled examples.

  • simple CrossEntropy loss for golden examples vs probabilities from the labeling emsemble
  • add another loss term, with optional weight w/r/t to "Snorkel loss"
  • as with normal training, it also just change labeling function "mu" weights and nothing else
  • work well in practice... (as-good or better as adding golden labels as separate function multiple times to push mu to 1.0)

In other words, it tries to fit the labeling function weights both to the current loss, and to an optional training dataset, and you can set the ration between those losses (they are on the same scale, works ok in my examples even at 50/50).

Let me know if that's something you want as a pull request?

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@rjurney
Copy link

rjurney commented Jul 31, 2021

@moscow25 I would love a pull request for this!

@moscow25
Copy link
Author

moscow25 commented Aug 2, 2021

@rjurney let me clean up and push the cleaned-up version, but it's really just a few lines -- messily represented here

moscow25@4dd4283

What I'm doing is

  • passing in labeled examples [actually a label for all examples, with ? for unlabeled]
  • computing softmax (for current mu value on labeling function) on labeled examples
  • computing supervised loss between labels and the softmax
  • returning final loss as a weighted average of "snorkel loss" and "softmax (supervised) loss"

In practice on problems I tried, the two losses are of the same scale, so averaging them in 80-20 to 20-80 average works just fine, depending on your problem. All you are doing is fitting labeling function weights to the labeled examples, while also minimizing snorkel loss if you like.

Apologies for the messy code. Let me know, and happy to push a version with minimal changes to the production Snorkel.

@rjurney
Copy link

rjurney commented Aug 3, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants