-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use character-by-character string comparison #17
Use character-by-character string comparison #17
Conversation
Suggestion by @snguyenthanh
Speeds up profanity detection by roughly 10x.
Accepts file names and iterables.
better_profanity/varying_string.py
Outdated
Args: | ||
string (str): String to generate variants of. | ||
char_mappings (dict): Maps characters to substitute characters. | ||
variant_thres (int): Maximum number of variants to store in a batch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems variant_thres
is not used ? And do we need to use it when we have the MAX_PATTERNS
constant in #16 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me. I will merge after variant_thres
is added. What do you think about renaming it to max_variants
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my bad, variant_thres
isn't used anymore. It was used for a less-efficient method I experimented with and I forgot to remove it.
MAX_PATTERNS
won't be used anymore. The character-by-character string comparison methods doesn't actually construct any of the variants into full strings so it's not relevant anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's removed now. Should be ready to pull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I will make the corresponding changes to the documentation and make the 0.7.0 release in a few days |
@jcbrockschmidt Apologize for the late delay. I have published version 0.7.0. |
@snguyenthanh No worries. Thanks! |
Comparing strings character-by-character instead of as full string is more computationally and memory efficient. Here is a run-time comparison for the unit tests (in seconds):
And here is the memory consumption comparison for several word sets (in Mb):
These changes also include an extra parameter for the
Profanity
constructor that allows for the optional specification of a word list, either as a file path or an iterable. This comes with respective unit tests.This merge request would close issue #15.