You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When there is excessive space in a sentence for example: text = 'The quick brown fox jumps over the lazy dog . 1 2'
it would cause index error in nltk because there will be an empty token. The resulting tokens: ['The', '', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '.', '1', '', '2']
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/cenozai/mypy/tf_models/nlp/nlpaug/nlpaug/base_augmenter.py", line 61, in augment result = self.substitute(data) File "/home/cenozai/mypy/tf_models/nlp/nlpaug/nlpaug/augmenter/word/synonym.py", line 83, in substitute pos = self.model.pos_tag(tokens) File "/home/cenozai/mypy/tf_models/nlp/nlpaug/nlpaug/model/word_dict/wordnet.py", line 46, in pos_tag return nltk.pos_tag(tokens) File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/__init__.py", line 162, in pos_tag return _pos_tag(tokens, tagset, tagger, lang) File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/__init__.py", line 119, in _pos_tag tagged_tokens = tagger.tag(tokens) File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/perceptron.py", line 175, in tag context = self.START + [self.normalize(w) for w in tokens] + self.END File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/perceptron.py", line 175, in <listcomp> context = self.START + [self.normalize(w) for w in tokens] + self.END File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/perceptron.py", line 261, in normalize elif word[0].isdigit(): IndexError: string index out of range
A quick fix could be as follows:
Original WordAugmenter._tokenizer in word_augmenter.py: return text.split(' ')
fix: return [t for t in text.split(' ') if len(t) > 0]
The text was updated successfully, but these errors were encountered:
Hi,
When there is excessive space in a sentence for example:
text = 'The quick brown fox jumps over the lazy dog . 1 2'
it would cause index error in nltk because there will be an empty token. The resulting tokens:
['The', '', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '.', '1', '', '2']
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/cenozai/mypy/tf_models/nlp/nlpaug/nlpaug/base_augmenter.py", line 61, in augment result = self.substitute(data) File "/home/cenozai/mypy/tf_models/nlp/nlpaug/nlpaug/augmenter/word/synonym.py", line 83, in substitute pos = self.model.pos_tag(tokens) File "/home/cenozai/mypy/tf_models/nlp/nlpaug/nlpaug/model/word_dict/wordnet.py", line 46, in pos_tag return nltk.pos_tag(tokens) File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/__init__.py", line 162, in pos_tag return _pos_tag(tokens, tagset, tagger, lang) File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/__init__.py", line 119, in _pos_tag tagged_tokens = tagger.tag(tokens) File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/perceptron.py", line 175, in tag context = self.START + [self.normalize(w) for w in tokens] + self.END File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/perceptron.py", line 175, in <listcomp> context = self.START + [self.normalize(w) for w in tokens] + self.END File "/home/cenozai/.local/lib/python3.6/site-packages/nltk/tag/perceptron.py", line 261, in normalize elif word[0].isdigit(): IndexError: string index out of range
A quick fix could be as follows:
Original WordAugmenter._tokenizer in word_augmenter.py:
return text.split(' ')
fix:
return [t for t in text.split(' ') if len(t) > 0]
The text was updated successfully, but these errors were encountered: