Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in nlpaug.augmenter.char.RandomCharAug(action='swap') #77

Closed
philszep opened this issue Jan 3, 2020 · 1 comment
Closed

Bug in nlpaug.augmenter.char.RandomCharAug(action='swap') #77

philszep opened this issue Jan 3, 2020 · 1 comment

Comments

@philszep
Copy link

philszep commented Jan 3, 2020

The swap augmenter will often replace characters with other characters in string.

For example, running:

import nlpaug
import nlpaug.augmenter.char as nac
from collections import Counter

def char_count(word):
    return Counter(list(word))

swapper = nac.RandomCharAug(action='swap', swap_mode='random')

word = 'testing'
num_t = char_count('testing')['t']

iters=0

while num_t == char_count(word)['t'] and iters < 10000:
    word = swapper.augment(word)
    iters+=1
print(word, iters)

will take the string testing and often output something like:

ttniitg 5

where the output has added one t and one i while removing an e and an s, which is not the expected behavior of multiple swapping operations.

Proposed fix:

  • Remove .copy() method from line 140 of nlpaug/augmenter/char/random.py, or
  • Alternatively, remove the definition of original_chars variable altogether and only augment chars by referencing chars directly.

As it's written on line 151 of random.py, it appears that original_chars is never augmented, so that the characters at the swap indices of chars are being reassigned to those at the corresponding swap indices of original_chars. Since the characters in original_chars are never changed, after the first swap the method is replacing characters at the swap locations in chars with the original characters at those locations, which can end up duplicating some characters from the string, while erasing others.

makcedward added a commit that referenced this issue Jan 8, 2020
@makcedward makcedward reopened this Jan 8, 2020
@philszep
Copy link
Author

philszep commented Jan 8, 2020

BTW - I just noticed essentially the same issue/behavior in the word swap method nlpaug.augmenter.word.RandomWordAug(action='swap')

makcedward added a commit that referenced this issue Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants