Project • Data source • How To Use • Process
The aim of this (learning) project is to:
- extract information from free format resumes (CVs) so that this information can be easily processed
- match a given resume with the 10 closest ones (from a pool of available resumes) from a HR perspective
A user-friendly dashboard will showcase the findings.
The reumes used to build this project are the PDFs files available at the following repository: CV data. Please note that PDFs made of image or irrelevant files that show portofolios are excluded.
For end users, just go to the dashboard online. Here is a short video of how it looks like. Enjoy!
For developers, you'll need Python installed on your computer to clone and run this application. From your command line:
# Clone this repository
$ git clone https://github.com/harozudu/NLP_resume_selection
# Go into the repository
$ cd NLP_resume_selection
# Install dependencies
$ pip install requirements.txt
# Run the streamlit app
$ streamlit run streamlit_app.py
Information extraction
- Personal info: name, email, phone number, address
- Education (title + institution)
- Previous job titles (or work experience)
- Skills (or certifications)
- Hobbies
- Languages
Match CVs
- Natural language processing (NLP) term frequency, inverse document frequency (TF-IDF) to retrieve similar resumes
- NetworkX to visualize the features that make CVs related
LinkedIn @lyes