Skip to content

Commit

Permalink
Fix bug of yielding empty spans
Browse files Browse the repository at this point in the history
A bug occurs if the text of the span ends in one of the split tokens.
For example, "BC546-" will try to yield "BC546-", "BC546", and an empty
span with invalid char_start and char_end. This stops it from yielding
the empty span.

See HazyResearch/fonduer#112.

Co-authored-by: Hiromu Hota <hiromu.hota@hal.hitachi.com>
  • Loading branch information
lukehsiao and Hiromu Hota committed Aug 21, 2018
1 parent a06c782 commit 93f75fa
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions snorkel/candidates.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,11 +170,11 @@ def apply(self, context):
m = re.search(self.split_rgx, context.text[start-offsets[0]:end-offsets[0]+1])
if m is not None and l < self.n_max + 1:
ts1 = TemporarySpan(char_start=start, char_end=start + m.start(1) - 1, sentence=context)
if ts1 not in seen:
if ts1 not in seen and ts1.get_span():
seen.add(ts1)
yield ts1
ts2 = TemporarySpan(char_start=start + m.end(1), char_end=end, sentence=context)
if ts2 not in seen:
if ts2 not in seen and ts1.get_span():
seen.add(ts2)
yield ts2

Expand Down

0 comments on commit 93f75fa

Please sign in to comment.