About the Project🌟:

Resume Analyzer is a tool for recruiters which can help them to select candidates based on their resume and it also helps by providing a overall summary of the resume using which recruiters can know that individual in a more better way in less time.

About the Project🌟:

The whole application is having two tools right now,

Resume Score Generator
Resume Summarizer

Resume Score Generator: Its a NLP classification problem usecase, where multiple resumes are taken and a certaing score(between 1 to 10) is assigned. And a classification model is trained to classify any resume between 1 to 10. and this score is the score what they get for their resume.Total data points was 300+.file name is resume_data2_(used in training).csv and its under data folder. Resume Summarizer: Custom NER is used to summarize any resume. Its done using spacy. Data for this provided in the data folder,file name is train_data.pkl. Total custom tagged data is 150+.

User Interface📱:

Project workflow🧾:

Data Preparation:

I scraped app the sample resumes from overlife.com using the pdf scraper.py file.And parsed all the text from each resume. The resumes from this source is mostly for Engineering and programming field.And data quality is not so good.
I took some data from here also. Most of the resume of this source is for software development and data analyst role.
There was almost 300+ resumes(122 by scrapping and ~200 from the above repo), I did not get chance to label all the data to I randomly assigned some score from 1 to 10.
Created a csv file combining all the data source.
Data from above repo is already tagged for NER so i did not do that.

Model Building:

For classificating I tried RNNs but as the dataset size was too less deep learning was working poor. I tried different ML models like random forest ,naive bayes classifier, random forest with RandomizedSearchCV. As we were having accurecy_score as the evaluation method so i went with random forest as it was giving more accuracy.(The dataset is not so balanced and upsampling, weighted baised approach can be applied and some different method of evaluation could be applied like recall or f1 but because of some time constraints I was unable to do that). Notebook is provided under Notebooks folder,file name is classification model training notebook.ipynb.
For Custom NER I used Spacy to do that. As per the Spacy docs they used Convolutional layers with residual connections, layer normalization and maxout non-linearity are used,which giving much better efficiency than the standard BiLSTM solution.Source

Model Deployment:

For that I went with flask 1st, but as the UI was not good so, finally i switched to streamlit. The python file for flask and streamlit are present in the repo.
As it was a streamlit app, and as I just got the approval to use their deployment plateform from the streamlit team itself. So, I decided to use that. You can see the deployed app here.

Documentation:

In the form of readme I am providing the details of the project. Below, I also have provided that ditails explanation for the file structure and how you can run the application locally.

Future Improvements✊"

Making the models more robust.As its not right now, because of some reason

1.1. Data is not labeled correctly. 1.2. Dataset is imbalanced,

df5['score'].value_counts()

1    47
7    39
9    35
3    33
5    32
0    32
4    31
8    28
2    22
6    20
Name: score, dtype: int64

1.3. Adding More data in the dataset for both the task.

Adding a QnA based model for easy query search option. As it will provide the user to make some query in the form of a question and extract answer in the form of model output. It will help people to search specific things from the resume.
Migrate the webapp from sreamlit to flask.Add some good UI.
Containerizing the project.

NOTE: If you can implement any of the above mentioned feature, please feel free to make a PR. Except, that if you have any problem understanding the above mentioned features feel free to creat an issue.

File Structure📂:

File/Folder Name	Usage of that file/folder
Notebooks	Data collection,Model training every thing is done in the ipynbs,file names are self explanatory so, you will be understand their usage
data	All the CSV and the tagged data is provided here
data/resume_data2_(used in training).csv	is used for classification
data/train_data.pkl	Used for NER
rf_score_model.pkl/tfidf_vectorizer.pkl	Used for classification model training
Resume_analyzer_app.py	is the streamlit app
resume_app_main_flask.py	flask app
pdf scraper.py	for scrap pdfs
Dockerfile	is the Dockerfile

Note: if you dont get the file structure currect feel free to make an issue.

Run Locally💻:

Run Locally:

1.1 git clone <repo link>

1.2 cd Resume-analyzer

1.3 pip install -r requirements.txt

1.4 streamlit run <file_name>

Connect with me If you need any help🤝:

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
BERT-SQuAD		BERT-SQuAD
Notebooks		Notebooks
Result_img		Result_img
data		data
log file		log file
resume_sum1		resume_sum1
templates		templates
trash files		trash files
ui_img		ui_img
uploads		uploads
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Resume_analyzer_app.py		Resume_analyzer_app.py
__init__.py		__init__.py
pdf scraper.py		pdf scraper.py
requirements.txt		requirements.txt
resume_app_main_flask.py		resume_app_main_flask.py
train_data(except extracted pdfs this is also used).txt		train_data(except extracted pdfs this is also used).txt

License

soumya997/Resume-analyzer

Folders and files

Latest commit

History

Repository files navigation

About the Project🌟:

User Interface📱:

Project workflow🧾:

Data Preparation:

Model Building:

Model Deployment:

Documentation:

Future Improvements✊"

File Structure📂:

Run Locally💻:

Connect with me If you need any help🤝:

About

Resources

License

Stars

Watchers

Forks

Languages