Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get all possible leet words from my txt file and store it as list variable. #32

Open
viveks-codes opened this issue May 19, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@viveks-codes
Copy link

Hello 馃槉,
I want to get all possible leet words from my txt file and store it in list.

suppose I've a.txt file it contains following :

handjob

and I want to print All possible modified spellings of words stored in list

[ 'handjob', 'handj*b', 'handj0b', 'handj@b', 'h@ndjob', 'h@ndj*b', 'h@ndj0b', 'h@ndj@b',
'h*ndjob', 'h*ndj*b', 'h*ndj0b', 'h*ndj@b', 'h4ndjob', 'h4ndj*b', 'h4ndj0b', 'h4ndj@b' ]

can you please share me a code for it it will be great!

Thanks and Have a Nice day! 鉂わ笍

@viveks-codes
Copy link
Author

@snguyenthanh @bakert @jcbrockschmidt @Andriybeats

I've tried following :-

profanity.load_censor_words_from_file("a.txt")
print(profanity.CENSOR_WORDSET)

but it is returning locations

[<better_profanity.varying_string.VaryingString at 0x7f7ccec418e0>,
 <better_profanity.varying_string.VaryingString at 0x7f7ccec417f0>]

Thanks

@snguyenthanh
Copy link
Owner

snguyenthanh commented May 19, 2021

Hi @vivolscute, previously all the variants are stored in memory (which could go up to GBs of memory depending on how big a wordlist is). #17 added VaryingString to only generate the variants on runtime to version 0.7.0.

If I'm not wrong, to get all the variants of all the words in the wordlist, you can use version 0.6.1 instead of the latest version 0.7.0 (with the same code you provided). Moreover, here is the function generating variants of a word:

def _generate_patterns_from_word(self, word):

@jcbrockschmidt
Copy link
Collaborator

It wouldn't be too hard to add a feature for this. However, the number of word variations can grow exponentially, sometimes to intractable numbers which leads to crashes. For example, if a word exists with more than 16 letter e's, it has 43,046,721 possible variants (see #15), and words like this have crashed my computer before due to excessive memory consumption. But words with this many variants are uncommon for most use cases, unless you're including full sentences. What we could do is add a fail-safe to this feature, which the end-user can disable at their own risk.

@jcbrockschmidt jcbrockschmidt added the enhancement New feature or request label Sep 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants