- Install Python3 and Pip3
- pip3 install -r requirements.txt
- python extract_emails.py --help
Note:
If your file has a doc
extension then you must have
- On Windows you must install pypiwin32
- On Linux or Mac Install Libre Office
pypiwin32 is a Windows python module so ignore the install error on Linux-based os.
Options
--dir
option to provide the directory/folder absolute path, default is current folder--file
option to scan only one file--ext
option to restrict the scanning of file extensions, default all supported extensions--dst
option to set the output file name, by default it will print on the console
NOTE: Change the output file for each run otherwise it will overwrite the existing results.
Extract emails from a specific file xyz.pdf
python extract_emails.py --file=xyz.pdf --dst=emails.txt
Extract emails from all files from a folder/directory XYZ
python extract_emails.py --dir=XYZ --dst=emails.txt
While scanning a folder/directory you can specify file extensions as well, for example, it should only scan pdf files and then do
python extract_emails.py --dir=XYZ --dst=emails.txt --ext pdf
Scan directory but only parse doc and pdf files
python extract_emails.py --dir=XYZ --dst=emails.txt --ext pdf doc