New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text augmenters do not properly recombine items with punctuation. #143
Comments
Maybe replace the standard tokenizer/untokenizer with something like this? |
For short term, you can provide a pre-defined/ customer tokenizer and reverse_tokenizer.
|
Unfortunately, this issue persists.
|
it now adds a space after the tokens. |
I used the |
input:
output:
Note that all punctuation is padded with spaces on both sides. I don't believe this is intended.
This does not properly recombine punctuation.
The text was updated successfully, but these errors were encountered: