Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character Augmenters remove non-breaking spaces before punctuation and insert spaces around apostrophes #313

Open
lindsaydbrin opened this issue Oct 12, 2022 · 3 comments

Comments

@lindsaydbrin
Copy link

lindsaydbrin commented Oct 12, 2022

When using Character Augmenters (random and keyboard, specifically) on French utterances, I noticed two things:

  • When there is a space before punctuation (non-breaking space, as described here), it is removed by the augmenter.
  • The augmenter adds space before and after an apostrophe.

It seems like both of these would be unwanted behaviors, as ideally the augmenter would only make the change specified in the docs, and not change anything else.

For example, when I run this:

nlpaug.augmenter.char.KeyboardAug(min_char=4, aug_word_max=1, aug_char_p=0.1).augment("un espace avant le point d'interrogation ?", n=1)

I get this:

"un esoace avant le point d ' interrogation?"
@maxw1489
Copy link

It seems there is a general problem with the char augmenters whenever certain punctuation chars are provided. The following is annoying:
string.punctuation
result:
!"#$%&\'()*+,-./:;<=>?@[\\]^_{|}~`

And this is what happens when applying one of the noted char augs:
nac.RandomCharAug(action="insert",).augment(string.punctuation)
result:
! " # $% & \' () * +, -. /: ; <= >? @ [\\] ^ _ {|} ~`

Please note, the punctuation list is incomplete.

@Alec-Stashevsky
Copy link

Alec-Stashevsky commented May 19, 2023

Yes this is a huge problem! It needs to be addressed.

@fierval
Copy link

fierval commented Jan 30, 2024

This is pretty awful. Makes the whole thing unusable if you need this functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants