Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore abstains in Scorer, change LabelModel default tie break policy #1450

Merged
merged 4 commits into from
Sep 10, 2019

Conversation

paroma
Copy link
Contributor

@paroma paroma commented Sep 5, 2019

Description of proposed changes

  • Change Scorer default to ignore abstains in preds
  • Change LabelModel tie break policy default to abstain (instead of random)
  • Log warning when calling LabelModel score() function

Test plan

  • Add tests in LabelModel for predict() and score() functions related to abstain default
  • Add test for Scorer to check abstains ignored in preds by default

Checklist

  • I have read the CONTRIBUTING document.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@codecov
Copy link

codecov bot commented Sep 5, 2019

Codecov Report

Merging #1450 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master    #1450   +/-   ##
=======================================
  Coverage   97.58%   97.58%           
=======================================
  Files          55       55           
  Lines        2029     2029           
  Branches      334      334           
=======================================
  Hits         1980     1980           
  Misses         22       22           
  Partials       27       27
Impacted Files Coverage Δ
snorkel/labeling/model/label_model.py 95.76% <ø> (ø) ⬆️
snorkel/analysis/scorer.py 100% <100%> (ø) ⬆️

@paroma paroma requested review from vincentschen, ajratner, henryre and bhancock8 and removed request for ajratner and henryre September 5, 2019 23:17
@@ -472,6 +472,11 @@ def score(
>>> label_model.score(L, Y=np.array([1, 1, 1]), metrics=["f1"])
{'f1': 0.8}
"""
if tie_break_policy == "abstain": # pragma: no cover
logging.warning(
"Metrics calculated over datapoints with non-abstain labels only"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we've been using data points (2 words)

# Test abstain=-1 for preds and gold
abstain_preds = np.array([-1, -1, 1, 1, 0])
abstain_probs = np.array([0.5, 0.5, 0.9, 0.7, 0.4])
results = scorer.score(golds, abstain_preds, abstain_probs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass in probs here. They're optional, and you're only calculating accuracy in this scorer, which just requires golds and preds.

@@ -209,6 +209,13 @@ def test_predict_proba(self):
np.testing.assert_array_almost_equal(probs, true_probs)

def test_predict(self):
L = np.array([[-1, 1, 0], [0, -1, 1], [1, 0, -1]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a simple comment here that this test is confirming that with 3 LFs that counteract one another results in tie votes and therefore abstains on all points?

Copy link
Member

@bhancock8 bhancock8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes lgtm! Ship it!

@paroma paroma merged commit 9af1c77 into master Sep 10, 2019
@paroma paroma deleted the tie-break branch September 10, 2019 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants