Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LabelModel produces equal probability for labeled data #1422

Closed
jamie0725 opened this issue Aug 21, 2019 · 16 comments · Fixed by #1444
Closed

LabelModel produces equal probability for labeled data #1422

jamie0725 opened this issue Aug 21, 2019 · 16 comments · Fixed by #1444
Assignees
Labels

Comments

@jamie0725
Copy link

jamie0725 commented Aug 21, 2019

Issue description

I am using snorkel for creating binary text classification training examples with 9 labeling functions. However, I find that for some data that are trained with the label model, they receive a probabilistic label of equal probabilities (i.e. [0.5 0.5]), despite that they only receive label of one class from the labeling functions (e.g. [-1 0 -1 -1 0 -1 -1 -1 -1], so only class 0 or ABSTAIN), why is that?

Besides, I find that setting verbose = True when defining the LabelModel does not print the logging information.

Lastly, if producing the label [0.5 0.5] is a normal behavior, then when filtering out unlabeled data points, they should be removed as well, because they do not contribute to the training (of a classifier), and if the classifier that does not support probablistic labels (e.g. FastText), using argmax will always lead to class 0 (which is undesired).

Code example/repro steps

Below I show my code of defining LabelModel:

print('==================================================')
print('Training label model...')
label_model = LabelModel(cardinality=2, verbose=True)
label_model.fit(L_train=L_train, n_epochs=10000, lr=0.001, log_freq=100, seed=1)
print('Done!')

Below I show some of the logs I print:

Check results for data in the training set:  a_text_in_the_training_set_but_I_removed_here
        * Output of L_train for this data point is: [-1  0 -1 -1  0 -1 -1 -1 -1]
        * Output of probs_train for this data point is: [0.5 0.5]
        * Output of probs_train_filtered for this data point is: [0.5 0.5]
        * Output of preds_train_filtered for this data point is: 0

Expected behavior

I expect that if a datapoint receives label of only a single class, then it should not get equal probabilities of both classes when the label model is trained.

System info

  • How you installed Snorkel (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • OS: Windows
  • Python version: 3.7
  • Snorkel version: 0.9
  • Versions of any other relevant libraries:
@paroma
Copy link
Contributor

paroma commented Aug 21, 2019

Thanks for pointing this out!

  • For logging, the default sets logging on only for errors or warnings. The train statements are recorded as logging.info() and therefore don't show up even with verbose=True. In order to see these messages, you can add the following to the top of the file you're calling LabelModel from:
import logging
logging.basicConfig(level=logging.INFO)
  • I tested with a few L matrices where a datapoint either receives a label from one class or an abstain and was unable to reproduce the issue. Is there a way you can share your L matrix with us? A few things you can look at (now that logging works!) is looking at whether the loss is changing with the learning rate you have.

  • In terms of filtering out examples with equal probabilities for all classes, calling label_model.predict(L, tie_break_policy="abstain") will return -1 for datapoints with equal probabilities across different classes. Options for tie_break_policy described here.

Example:

label_model = LabelModel(cardinality=2, verbose=True)
L = np.array([[0, 1], [0, 1]])
label_model.fit(L)

label_model.predict_proba(L)
# array([[0.5, 0.5], [0.5, 0.5]])

label_model.predict(L, tie_break_policy="abstain")
# array([-1, -1])

@jamie0725
Copy link
Author

jamie0725 commented Aug 22, 2019

I took a look at the logs, the results are very strange:

Training label model...
INFO:root:Computing O...
INFO:root:Estimating \mu...
INFO:root:[0 epochs]: TRAIN:[loss=0.003]
INFO:root:[100 epochs]: TRAIN:[loss=0.000]
INFO:root:[200 epochs]: TRAIN:[loss=0.000]
INFO:root:[300 epochs]: TRAIN:[loss=0.000]
INFO:root:[400 epochs]: TRAIN:[loss=0.000]
INFO:root:[500 epochs]: TRAIN:[loss=0.000]
INFO:root:[600 epochs]: TRAIN:[loss=0.000]
INFO:root:[700 epochs]: TRAIN:[loss=0.000]
INFO:root:[800 epochs]: TRAIN:[loss=0.000]
INFO:root:[900 epochs]: TRAIN:[loss=0.000]
INFO:root:Finished Training
Done!

So, the loss becomes 0 since ~ epoch 100.

@vtang13
Copy link

vtang13 commented Aug 23, 2019

I have a similar issue with probs_train outputting [0.5, 0.5]. The prob for a 1 label never goes above 0.5.

Prob LFs
[0.5, 0.5] [1, -1, -1, -1, 1, -1, -1]
[0.5, 0.5] [1, -1, -1, -1, -1, -1, -1]
[0.743774, 0.256226] [-1, -1, -1, -1, -1, 0, -1]

Here are the logs for training the LabelModel with roughly 10,000 data points.

INFO:root:Computing O...
INFO:root:Estimating \mu...
INFO:root:[0 epochs]: TRAIN:[loss=0.030]
INFO:root:[100 epochs]: TRAIN:[loss=0.004]
INFO:root:[200 epochs]: TRAIN:[loss=0.000]
INFO:root:[300 epochs]: TRAIN:[loss=0.000]
INFO:root:[400 epochs]: TRAIN:[loss=0.000]
INFO:root:[500 epochs]: TRAIN:[loss=0.000]
INFO:root:[600 epochs]: TRAIN:[loss=0.000]
INFO:root:[700 epochs]: TRAIN:[loss=0.000]
INFO:root:[800 epochs]: TRAIN:[loss=0.000]
INFO:root:[900 epochs]: TRAIN:[loss=0.000]
INFO:root:Finished Training

@paroma
Copy link
Contributor

paroma commented Aug 23, 2019

@vtang13 can you provide an example of the L matrix that will help reproduce this error? Can you also print L.shape to make sure it's shape is (n, m), where m is the number of LFs?

I tried creating a matrix with the LFs that you have and did not get the same results:

L = np.array([[1, -1, -1, -1, 1, -1, -1], 
              [1, -1, -1, -1, -1, -1, -1], 
              [-1, -1, -1, -1, -1, 0, -1]])

label_model = LabelModel(cardinality=2)
label_model.fit(L)
label_model.predict_proba(L)

# array([[0.019964  , 0.980036  ],
#       [0.1229839 , 0.8770161 ],
#       [0.98669756, 0.01330244]])

@vtang13
Copy link

vtang13 commented Aug 23, 2019

Hi @paroma

>> L_train.shape
(9671, 7)

It looks like the issue may occur when there are too many negative examples. It doesn't occur when the dataset is more balanced. Below is a subset of my L_train with more balanced positive/negative examples.

L_train = np.asarray([[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, 0, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, 0, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, 0, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, -1, -1],[-1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, 1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, 1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, 1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, -1, -1, -1],[1, -1, -1, -1, 1, -1, -1]])

Training results:

>> label_model.fit(L_train=test_train, n_epochs=1000, lr=0.001, log_freq=100, seed=123)
INFO:root:Computing O...
INFO:root:Estimating \mu...
INFO:root:[0 epochs]: TRAIN:[loss=0.098]
INFO:root:[100 epochs]: TRAIN:[loss=0.014]
INFO:root:[200 epochs]: TRAIN:[loss=0.004]
INFO:root:[300 epochs]: TRAIN:[loss=0.003]
INFO:root:[400 epochs]: TRAIN:[loss=0.002]
INFO:root:[500 epochs]: TRAIN:[loss=0.001]
INFO:root:[600 epochs]: TRAIN:[loss=0.001]
INFO:root:[700 epochs]: TRAIN:[loss=0.001]
INFO:root:[800 epochs]: TRAIN:[loss=0.001]
INFO:root:[900 epochs]: TRAIN:[loss=0.001]
INFO:root:Finished Training
probs LFs
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.95779496 0.04220504] [-1 -1 -1 -1 -1 0 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.95779496 0.04220504] [-1 -1 -1 -1 -1 0 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.95779496 0.04220504] [-1 -1 -1 -1 -1 0 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.5 0.5] [-1 -1 -1 -1 -1 -1 -1]
[0.00618187 0.99381813] [ 1 -1 -1 -1 1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.00618187 0.99381813] [ 1 -1 -1 -1 1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.00618187 0.99381813] [ 1 -1 -1 -1 1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.17381922 0.82618078] [ 1 -1 -1 -1 -1 -1 -1]
[0.00618187 0.99381813] [ 1 -1 -1 -1 1 -1 -1]

@paroma
Copy link
Contributor

paroma commented Aug 23, 2019

This looks like the expected output! If you think it is an issue related to class balance, you can try passing in Y_dev or class_balance to the fit() method to see if that helps with the estimates. See examples and usage here

@vtang13
Copy link

vtang13 commented Aug 23, 2019

Thanks @paroma, I don't have labelled data yet but I will try those parameters once I do.

Here is the full L_train to reproduce the issue: https://pastebin.com/02mEznra

@paroma
Copy link
Contributor

paroma commented Aug 23, 2019

Thank you for access to the L matrix! Looking at the full L_train and running LFAnalysis(L_train).lf_summary() suggests a few things:

Screen Shot 2019-08-23 at 12 38 39 PM

  • the LFs at index 1,2,3 are always abstaining. These provide no information for the LabelModel to learn from
  • other than the LF at index 5, the rest of the LFs have very low coverage, labeling only 3-15 datapoints out of almost 10,000.
  • The lack of significant overlap and no conflict among the LFs, in addition to the low coverage overall, results in the LabelModel not learning accurate weights for the LFs. Therefore, for 17 datapoints that do receive 1-2 positive labels, the LabelModel continues to default to the class balance due to the lack of confidence in the assigned labels.

@vtang13
Copy link

vtang13 commented Aug 26, 2019

That's very informative. Thanks @paroma for looking into this.

@ajratner
Copy link
Contributor

ajratner commented Sep 1, 2019

Just quickly tacking onto the great answer from @paroma

One thing we are working on is making sure that the LabelModel always reverts to sensible defaults, even when it's in a setting outside of where our theory tells us it should work (e.g. like the above, for reasons @paroma detailed).

For example, in your setting we probably would want to have a higher prior weight on the LFs over the class balance... we'll iterate here and push some updates soon!

@ajratner
Copy link
Contributor

ajratner commented Sep 1, 2019

(Note: Marking "feature request" for our reference, as technically the LabelModel seems to be working fine... but we need to add the feature of more sensible defaults in settings such as these)

@xsway
Copy link

xsway commented Sep 2, 2019

Hi, I have observed the same behavior of the LabelModel: one LF (out of 15) which was highly accurate but had low coverage was ignored (resulting in 0.5 0.5 probabilities).

In fact, I wanted to ask whether it would be possible to specify by-hand some priors about LFs in the LabelModel. In my case, I'm 99% sure that this LF gives the correct label even if it's low coverage and I'd like LabelModel not to lose this knowledge.

@ajratner ajratner assigned ajratner and unassigned paroma Sep 4, 2019
@ajratner
Copy link
Contributor

ajratner commented Sep 4, 2019

Hi @s2948044 @vtang13 @xsway thanks first of all for bringing this issue to light in such detail, and @vtang13 for sharing your label matrix!

It turns out that this is actually a bug due to incorrect parameter clamping; I've corrected this (it's a one line fix more or less), confirmed on @vtang13's label matrix, and on a new synthetic test that replicates the problem and confirms the solution. PR being submitted right now! Thank you all for the great help here!!

Just in case anyone is curious... the LabelModel learns the conditional probabilities of the labeling functions outputting certain labels given certain true (but unobserved or latent) true labels, e.g. P(\lambda = x | Y = y). We also clamp these estimates to some min / max values just to guard against pathological errors... but as one might guess, when P(\lambda = x | Y = y) is really small- for example in the common sparse label setting where LFs mostly abstain- then clamping at too high of a minimum value messes everything up- we basically save clamped parameters that say these sparse LFs are equally likely to be wrong versus right! Luckily, this is fixed by changing the clamping parameter, which is now a kwarg in LabelModel.fit, and defaults to a much smaller value.

Note that what @paroma said above about the LabelModel not being able to learn much if the LFs don't label and overlap enough is still true- but it shouldn't result in a nonsensical output like you all were observing. This should now be fixed once the PR is merged in today :)

Note also that we'll be steadily pushing extensions, improvements, and additions to the LabelModel, so stay tuned! For example @xsway your request for specifying per-LF priors is on our list (was in an old branch somewhere, just need to port it).

@xsway
Copy link

xsway commented Sep 5, 2019

Thanks a lot for addressing this issue so fast! I checked the updated code now and it works as expected, that is, I get > 0.5 probability for the positive class for the cases where a low-coverage LF was activated (and where before I had 0.5 probability).

@vtang13
Copy link

vtang13 commented Sep 5, 2019

Thanks @ajratner for the detailed explanation and speedy resolution!

@ajratner
Copy link
Contributor

ajratner commented Sep 5, 2019

Thanks for pointing this out and helping us to improve the new version! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants