Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could this be faster with Set instead of List #41

Open
psynautic opened this issue Jan 26, 2022 · 1 comment
Open

could this be faster with Set instead of List #41

psynautic opened this issue Jan 26, 2022 · 1 comment

Comments

@psynautic
Copy link

psynautic commented Jan 26, 2022

My colleague was working with this library for some NLP stuff, and he was trying to manipulate the CENSOR_WORDS for reasons not particularly important for this question.

It got me wondering, wouldn't this all go a lot faster if CENSOR_WORDS was a set(). Forgive me if I'm wasting your time, I didn't FULLY trace the code.

It seems to me that a lookup against a very large set of words or phrases would always be faster if you had a Set because it works as a hash table under the python covers.

@DeathDragon7050
Copy link

You are right that using a list is far slower than using a set. I did this to solve the issue, given that you don't edit the censor list afterwards.

from better_profanity import varying_string
varying_string.VaryingString.__hash__ = lambda self : hash(self._original)
import better_profanity
# make your edits to the censor list here
better_profanity.profanity.CENSOR_WORDSET = frozenset(better_profanity.profanity.CENSOR_WORDSET)

If you want everything to work, you are going to need to make all uses of the CENSOR_WORDSET work with sets and not list. The code in the main file is only ~250 lines so it would be easy enough. Otherwise this gets the job done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants