Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in Ngram splitting bug #1020

Merged
merged 2 commits into from
Aug 22, 2018
Merged

Fix bug in Ngram splitting bug #1020

merged 2 commits into from
Aug 22, 2018

Conversation

lukehsiao
Copy link
Contributor

@lukehsiao lukehsiao commented Aug 20, 2018

HiromuHota fixed this bug in Fonduer in HazyResearch/fonduer#108 and HazyResearch/fonduer#112. This commit fixes it for Snorkel.

@lukehsiao lukehsiao added the bug label Aug 20, 2018
@lukehsiao lukehsiao changed the title Fix bug in Ngram splitting logic Fix bug in Ngram splitting bug Aug 20, 2018
Rather than returning the TemporarySpan, along with its splits, Snorkel
was returning the TemporarySpan twice, and only the 2nd split. Hiromu
Hota fixed this bug in Fonduer in [1]. This commit fixes it for Snorkel.

[1] HazyResearch/fonduer#108

Co-authored-by: Hiromu Hota <hiromu.hota@hal.hitachi.com>
A bug occurs if the text of the span ends in one of the split tokens.
For example, "BC546-" will try to yield "BC546-", "BC546", and an empty
span with invalid char_start and char_end. This stops it from yielding
the empty span.

See HazyResearch/fonduer#112.

Co-authored-by: Hiromu Hota <hiromu.hota@hal.hitachi.com>
@ajratner
Copy link
Contributor

Thanks!! LGTM

@ajratner ajratner merged commit d0bbf36 into master Aug 22, 2018
@ajratner ajratner deleted the lukehsiao-patch-1 branch August 22, 2018 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants