This project is a resume parser that I built in 24 hrs for a HackUTD 2022. It parses a pdf resume and extracts some information from it.
Before running the script, make sure to install the needed packages:
pip install --upgrade pip
pip install numpy
pip install torch
pip install pyresparser
pip install nltk
pip install thinc==7.4.1
pip install spacy==2.3.5
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
python -m nltk.downloader words
python -m nltk.downloader stopwords
python -m nltk.downloader punkt
pip install python-doctr
pip install "python-doctr[torch]"
pip install pdf2image
conda install -c conda-forge poppler
pip install flask
pip install pyOpenSSL
pip install flask-cors
Note: A lot of packages are needed and it may break some dependencies. I ran into a few issues and found that making a new venv for this project works best. <b?Note: I use conda to install poppler which is a needed packages, but you may have to install it another way: https://poppler.freedesktop.org/
The script can be run using the following command:
python app.py
Notice how the script is named app.py, so it has Flask support. During the hackathon, I used it as a backend which is why there is Flask support on the script.
To edit the script, open app.py in a text editor and edit the if __name__ == '__main__'
part of it. The script has three paramters:
- pdf_dict - This can be:
- A URL to an online pdf:
https://www.pdf-archive.com/2017/09/26/fake-resume/fake-resume.pdf
- A dictionary with the url:
{'url':https://www.pdf-archive.com/2017/09/26/fake-resume/fake-resume.pdf}
- A relative or absolute path to a pdf resume on your system:
./test.pdf
- A URL to an online pdf:
- (Optional) isURL - (Defaults to True) True if the first paramter is a URL (options 1 or 2), False if the first paramter is a path (option 3)
- (Optional) java_path - On my system, I receive an error stating
NLTK was unable to find the java file!
If this happens, you may have to enter the path to you java.exe file on your system like the followingC:/Program Files/Java/jdk-18/bin/java.exe