Tired of using sort input.txt | uniq > output.txt
I wanted to create a cross OS script that could read any possible file, take each word once, and list them all in a word-list.
pip install argparse glob python-magic textract colorama
- python-magic: as if the World wasn't complicated enough, there are 2 'Magic' libraries. You can find the right one here on GitHub or here on pypi.python.org
- textract: this too is not easy to install, you can find detailed documentation here on GitHub or on the official website or on the formal pypi.python.org
If the situation gets tragic open an issue and I will help you troubleshooting
Pukeko can currently parse: '.csv', '.doc', '.docx', '.eml', '.epub', '.gif', '.htm', '.html', '.jpeg', '.jpg', '.json', '.log', '.mp3', '.msg', '.odt', '.ogg', '.pdf', '.png', '.pptx', '.ps', '.psv', '.rtf', '.tff', '.tif', '.tiff', '.tsv', '.txt', '.wav', '.xls', '.xlsx'.
plus any file that could be read by a command prompt
Have a look at my YouTube presentatoin:
On spare time my TODO list is:
- add option
-URL
to create wordlists from a target web page like CeWL - add option
-site
to create wordlists from a target website - add option
Leet
(or1337
), also known aseleet
orleetspeak
(so many passwords are week because of leetspeak ) - add multilanguage (
pip install alphabet-detector
) - add highlight HotWords in string
- add e-mail to HotWords