You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The swap augmenter will often replace characters with other characters in string.
For example, running:
import nlpaug
import nlpaug.augmenter.char as nac
from collections import Counter
def char_count(word):
return Counter(list(word))
swapper = nac.RandomCharAug(action='swap', swap_mode='random')
word = 'testing'
num_t = char_count('testing')['t']
iters=0
while num_t == char_count(word)['t'] and iters < 10000:
word = swapper.augment(word)
iters+=1
print(word, iters)
will take the string testing and often output something like:
ttniitg 5
where the output has added one t and one i while removing an e and an s, which is not the expected behavior of multiple swapping operations.
Proposed fix:
Remove .copy() method from line 140 of nlpaug/augmenter/char/random.py, or
Alternatively, remove the definition of original_chars variable altogether and only augment chars by referencing chars directly.
As it's written on line 151 of random.py, it appears that original_chars is never augmented, so that the characters at the swap indices of chars are being reassigned to those at the corresponding swap indices of original_chars. Since the characters in original_chars are never changed, after the first swap the method is replacing characters at the swap locations in chars with the original characters at those locations, which can end up duplicating some characters from the string, while erasing others.
The text was updated successfully, but these errors were encountered:
The swap augmenter will often replace characters with other characters in string.
For example, running:
will take the string
testing
and often output something like:where the output has added one
t
and onei
while removing ane
and ans
, which is not the expected behavior of multiple swapping operations.Proposed fix:
nlpaug/augmenter/char/random.py
, ororiginal_chars
variable altogether and only augmentchars
by referencingchars
directly.As it's written on line 151 of
random.py
, it appears thatoriginal_chars
is never augmented, so that the characters at the swap indices ofchars
are being reassigned to those at the corresponding swap indices oforiginal_chars
. Since the characters inoriginal_chars
are never changed, after the first swap the method is replacing characters at the swap locations inchars
with the original characters at those locations, which can end up duplicating some characters from the string, while erasing others.The text was updated successfully, but these errors were encountered: