Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parenthesis at the end of input cause IndexError #19

Closed
windreamer opened this issue Jan 28, 2022 · 4 comments
Closed

Parenthesis at the end of input cause IndexError #19

windreamer opened this issue Jan 28, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@windreamer
Copy link

Hi folks,
I like this cool segmenter for quality and speed, but something is a bit weird.

from syntok.segmenter import analyze
text='''Alexandri Aetoli Testimonia et Fragmenta. Studi e Testi 15. (1999)'''

for p in analyze(text):
    for s in p:
        print(' '.join(str(t) for t in s))

I got:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-15-1217f364130d> in <module>
      1 for p in analyze(text):
----> 2     for s in p:
      3         print(' '.join(str(t) for t in s))
      4

~/Codebase/toolchain/__pypackages__/3.9/lib/syntok/segmenter.py in segment(tokens, bracket_skip_len)
    106         State.max_bracket_skipping_length = int(bracket_skip_len)
    107
--> 108     for state in Begin(tokens):
    109         if state.at_sentence:
    110             history = state.collect_history()

~/Codebase/toolchain/__pypackages__/3.9/lib/syntok/_segmentation_states.py in __iter__(self)
    128         while state is not None:
    129             yield state
--> 130             state = next(state, None)
    131
    132     @abstractmethod

~/Codebase/toolchain/__pypackages__/3.9/lib/syntok/_segmentation_states.py in __next__(self)
    468                 return Terminal(self._stream, self._queue, self._history)
    469
--> 470             self._move()  # Do not skip parenthesis if they open the sentence.
    471
    472             if self.next_is_a_terminal:

~/Codebase/toolchain/__pypackages__/3.9/lib/syntok/_segmentation_states.py in _move(self)
    324     def _move(self) -> bool:
    325         """Advance the queue, storing the old value in history."""
--> 326         self.__history.append(self.__queue.pop(0))
    327
    328         if not self.__queue:

IndexError: pop from empty list

Is there any one can help me on it?

@fnl
Copy link
Owner

fnl commented Jan 28, 2022

Looks like a regression from my latest update on handling parenthesis. Your phrase probably needs to converted to a test case, analyzed, and fixed. Can you confirm if any 1.3 version works?

@windreamer
Copy link
Author

@fnl syntok=1.3.3 looks good.

test pdm run python test.py
Alexandri  Aetoli  Testimonia  et  Fragmenta .
 Studi  e  Testi  15 .
 ( 1999 )
➜  test pdm list --freeze
regex==2022.1.18
syntok==1.3.3

@fnl fnl changed the title Simple case failed Parenthesis at the end of input cause IndexError Jan 30, 2022
@fnl
Copy link
Owner

fnl commented Jan 30, 2022

This was a regression introduced by 1.4.1.
Thank you for pointing out the issue and helping in its review, @windreamer.
The issue is fixed in the latest release v1.4.2.

@fnl fnl closed this as completed Jan 30, 2022
@fnl fnl added the bug Something isn't working label Jan 30, 2022
@windreamer
Copy link
Author

Thx @fnl for this quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants