Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has ContextualWordEmbsAug gone slow? #248

Closed
rajat-tech-002 opened this issue Oct 28, 2021 · 8 comments
Closed

Has ContextualWordEmbsAug gone slow? #248

rajat-tech-002 opened this issue Oct 28, 2021 · 8 comments

Comments

@rajat-tech-002
Copy link

I had used ContextualWordEmbsAug for larger datasets earlier also. Has the code been significantly affected by some update. The augmentor speed is too slow now.

@050644zf
Copy link

I'm using this augmentor recently too and it takes 90 secs to process an sentence on my machine. and still take seconds on colab.

@rajat-tech-002
Copy link
Author

@050644zf, so you also faced same issue?

@050644zf
Copy link

050644zf commented Nov 1, 2021

@050644zf, so you also faced same issue?

Yes, here is a testing notebook. You can see still takes 2.5sec to process an sentence.

@makcedward
Copy link
Owner

@rajat-tech-002
May I know which nlpaug version do you use? Supposed that speed should be improved for ContextualWordEmbsAug after applying tuning to fit multiple inputs to transformer's models.

@makcedward
Copy link
Owner

makcedward commented Nov 21, 2021

Tested from version 1.1.5 to 1.1.8

Performance is downgraded by 23%. (13.3s vs 16.4s) for 100 same inputs

@rajat-tech-002
Copy link
Author

@rajat-tech-002 May I know which nlpaug version do you use? Supposed that speed should be improved for ContextualWordEmbsAug after applying tuning to fit multiple inputs to transformer's models.

I am using the latest version.

@rajat-tech-002
Copy link
Author

Please close the issue.
I think on longer texts for eg. Yahoo dataset with average sentence length of 127, the augmenter would be very slow.
For eg. 300 seconds for 400 sentences on A100 GPU.
And it would be more if you do aug_max = None.
and aug_p = 30%

@makcedward
Copy link
Owner

makcedward commented Nov 21, 2021

Longer sentences affect performance as transformers need to handle more text rather than padding. Secondly, more percentage of augmentation affect performance more seriously.

More technical details:
For example, 10 tokens will be augmented, It needs to pass through transformers 10 times rather than 1 time. By BERT (or other models) design, masking language modeling predicts one token at a time.

However, I just notice there is major performance downgrade from 1.1.3 to 1.1.4. Time spending increased from 1s to 9s. The major change is adopting HuggingFace API rather than using my custom implementation. More tests need to conduct in order to identify the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants