Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regexp issues #56

Closed
mollerhoj opened this issue Dec 27, 2019 · 4 comments
Closed

Regexp issues #56

mollerhoj opened this issue Dec 27, 2019 · 4 comments

Comments

@mollerhoj
Copy link

I'm getting errors because the regexp engine interprets parentesis: "unterminated subpattern" and "unbalanced parenthesis".

I'm analysing very large amounts of text, so not sure how these were triggered.

@mollerhoj
Copy link
Author

File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/segmenter.py", line 42, in segment
    segments = processor.process()
  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/processor.py", line 44, in process
    self.text = AbbreviationReplacer(self.text).replace()
  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 61, in replace
    self.text = self.search_for_abbreviations_in_string()
  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 96, in search_for_abbreviations_in_string
    self.text, match, ind, char_array
  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 114, in scan_for_replacements
    txt = replace_period_of_abbr(txt, am)
  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 36, in replace_period_of_abbr
    txt,
  File "/usr/lib/python3.5/re.py", line 182, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python3.5/re.py", line 293, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.5/sre_parse.py", line 834, in parse
    raise source.error("unbalanced parenthesis")

@mollerhoj
Copy link
Author

  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 61, in replace
    self.text = self.search_for_abbreviations_in_string()
  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 96, in search_for_abbreviations_in_string
    self.text, match, ind, char_array
  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 114, in scan_for_replacements
    txt = replace_period_of_abbr(txt, am)
  File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 36, in replace_period_of_abbr
    txt,
  File "/usr/lib/python3.5/re.py", line 182, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python3.5/re.py", line 293, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.5/sre_parse.py", line 722, in _parse
    source.tell() - start)
sre_constants.error: missing ), unterminated subpattern at position 0

@nipunsadvilkar
Copy link
Owner

nipunsadvilkar commented May 29, 2020

@mollerhoj If you can provide an example that would be helpful to debug the issue. I most likely need to use re.escape in replace_period_of_abbr function for those kinds of edge cases

@nipunsadvilkar
Copy link
Owner

Closing. Feel free to open with more info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants