Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label model symmetry breaking #1451

Merged
merged 11 commits into from
Sep 6, 2019
Merged

Conversation

ajratner
Copy link
Contributor

@ajratner ajratner commented Sep 5, 2019

Description of proposed changes

This PR primarily implements a heuristic procedure for selecting one of several symmetric (equally optimal) solutions to the LabelModel parameter estimation procedure arising from orthogonal symmetries there. Basically, for any mu that we learn (the estimated conditional probabilities of the LFs), we can often also accept column permutations of this result. So, we choose the solution where the most LFs are estimated to be better than random, as per our standard modeling assumption.

This PR also:

  • Slightly refactors and changes the LabelModel.get_conditional_probs sub-function
  • Factors out the 'post-processing' operations in LabelModel.fit, i.e. right now, this symmetry breaking operation and clamping.

Related issue(s)

Fixes #1437 (at least to first order)

Test plan

Adding additional tests for (a) conditional probability table calculation, and (b) symmetry breaking specifically

Checklist

  • I have read the CONTRIBUTING document.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@codecov
Copy link

codecov bot commented Sep 5, 2019

Codecov Report

Merging #1451 into master will increase coverage by 0.03%.
The diff coverage is 95.23%.

@@            Coverage Diff             @@
##           master    #1451      +/-   ##
==========================================
+ Coverage   97.55%   97.58%   +0.03%     
==========================================
  Files          55       55              
  Lines        2001     2032      +31     
  Branches      328      334       +6     
==========================================
+ Hits         1952     1983      +31     
  Misses         22       22              
  Partials       27       27
Impacted Files Coverage Δ
snorkel/labeling/model/label_model.py 95.72% <95.23%> (+0.46%) ⬆️
snorkel/labeling/analysis.py 100% <0%> (ø) ⬆️

Copy link
Contributor

@paroma paroma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test for checking whether the symmetry breaking is working correctly or is it inherent in one of the other tests?

snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
snorkel/labeling/model/label_model.py Show resolved Hide resolved
snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
snorkel/labeling/model/label_model.py Show resolved Hide resolved
snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
Copy link
Member

@henryre henryre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offline: added tests

snorkel/labeling/model/label_model.py Show resolved Hide resolved
snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
snorkel/labeling/model/label_model.py Show resolved Hide resolved
snorkel/labeling/model/label_model.py Outdated Show resolved Hide resolved
@ajratner ajratner merged commit 8e4526e into master Sep 6, 2019
@ajratner ajratner deleted the label-model-symmetry-breaking branch September 6, 2019 03:18
@plison
Copy link

plison commented Oct 1, 2019

I'm experiencing problems with the symmetric breaking code. Given the combinatorial explosion of the number of possible permutations (as a function of the number of output classes), the method does not really scale to problems with more than 6-7 classes. Maybe the code should stop after a few thousand permutations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

negative predictions for data point with positive labels
4 participants