compute the list of punctuation characters only once. #67
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The values from
unicodedata
should never change, so this will be aconstant value. On my machine this iterates over 1,114,112 elements to
compute the list. We don't need to recompute this every time and can
save the results of the punctuation characters so only the first time
a
RemovePunctuation
transform will take time and future calls willreuse the character list.
Closes #66