Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is there an (easy) way to preserve (trailing) spaces and carriage returns when using ContextualWordEmbsAug? #323

Open
barseghyanartur opened this issue Jan 11, 2023 · 0 comments

Comments

@barseghyanartur
Copy link

Code

import nlpaug.augmenter.word as naw

original = """
         “Fury said to a
         mouse, That he
        met in the
       house,
     ‘Let us
      both go to
       law: I will
        prosecute
         you.—Come,
           I’ll take no
           denial; We
          must have a
        trial: For
      really this
     morning I’ve
    nothing
    to do.’
"""

aug = naw.ContextualWordEmbsAug(model_path="bert-base-cased", action="substitute")
print(aug.augment(original)[0])

Output

White Fury said to a friend, That he met in the library, No Let them both go to law : I will prosecute you. No Come, I ’ ll take no denial ; We shall have a trial : but really this day I ’ still nothing to do. ’

Desired output

         White Fury said to a 
         friend, That he 
        met in the 
       library, 
     No Let them 
      both go to 
       law : I will 
        prosecute 
         you. No Come, 
           I ’ ll take no 
           denial ; We 
          shall have a 
        trial : but 
      really this 
     day I ’ still 
    nothing 
    to do. ’

Python version

3.10.9

(Shortened) pip list

huggingface-hub               0.11.1
joblib                        1.2.0
langchain                     0.0.59
librosa                       0.9.2
llvmlite                      0.39.1
nlpaug                        1.1.11
nltk                          3.8.1
numba                         0.56.4
numpy                         1.23.5
pandas                        1.5.2
resampy                       0.4.2
scikit-learn                  1.2.0
scipy                         1.10.0
snowballstemmer               2.2.0
tokenizers                    0.13.2
torch                         1.13.1
transformers                  4.25.1
wcwidth                       0.2.5
webencodings                  0.5.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant