Building a Resume Parser Using NLP(Spacy) and Machine Learning

Introduction:

A resume parser is a software tool that extracts relevant information from a resume and saves it in a structured format. It is a crucial tool for recruiters and hiring managers to efficiently screen job applications. In this project, we will use Natural Language Processing (NLP) and Machine Learning techniques to build a resume parser.

In [2]:
import spacy
import pickle
import random

import pandas as pd

In [3]:
train_data = pickle.load(open('train_data.pkl', 'rb'))

In [4]:
train_data[0]

('Govardhana K Senior Software Engineer  Bengaluru, Karnataka, Karnataka - Email me on Indeed: indeed.com/r/Govardhana-K/ b2de315d95905b68  Total IT experience 5 Years 6 Months Cloud Lending Solutions INC 4 Month • Salesforce Developer Oracle 5 Years 2 Month • Core Java Developer Languages Core Java, Go Lang Oracle PL-SQL programming, Sales Force Developer with APEX.  Designations & Promotions  Willing to relocate: Anywhere  WORK EXPERIENCE  Senior Software Engineer  Cloud Lending Solutions -  Bangalore, Karnataka -  January 2018 to Present  Present  Senior Consultant  Oracle -  Bangalore, Karnataka -  November 2016 to December 2017  Staff Consultant  Oracle -  Bangalore, Karnataka -  January 2014 to October 2016  Associate Consultant  Oracle -  Bangalore, Karnataka -  November 2012 to December 2013  EDUCATION  B.E in Computer Science Engineering  Adithya Institute of Technology -  Tamil Nadu  September 2008 to June 2012  https://www.indeed.com/r/Govardhana-K/b2de315d95905b68?isid=rex-

In [11]:
nlp = spacy.blank('en')

def train_model(train_data):
    # add NER model
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        
        nlp.add_pipe(ner, last = True)
    
    # add labels
    for _, annotation in train_data:
        for ent in annotation['entities']:
            ner.add_label(ent[2])
            
    
    # other pipes to be disabled during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    
    with nlp.disable_pipes(*other_pipes): # only train ner
        optimizer = nlp.begin_training()
        
        for itn in range(10):
            print(f'Starting iteration {str(itn)}')
            random.shuffle(train_data)
            
            losses = {}
            index = 0
            
            for text, annotations in train_data:
                try:
                    nlp.update(
                        [text],
                        [annotations],
                        drop = 0.2,
                        sgd = optimizer,
                        losses = losses
                    )
                except Exception as ex:
                    pass
            
            print(losses)

In [12]:
train_model(train_data)

Starting iteration 0
{'ner': 10820.45452731679}
Starting iteration 1
{'ner': 9360.449801606816}
Starting iteration 2
{'ner': 9704.85435980835}
Starting iteration 3
{'ner': 7341.786377321575}
Starting iteration 4
{'ner': 6760.0401792262355}
Starting iteration 5
{'ner': 5925.163263495535}
Starting iteration 6
{'ner': 5480.883594583241}
Starting iteration 7
{'ner': 5564.472496991685}
Starting iteration 8
{'ner': 4694.496345417583}
Starting iteration 9
{'ner': 4777.074484604174}


In [13]:
# save model for future use
nlp.to_disk('nlp_resume_model')

In [14]:
# load model from trained model
nlp_resume_model = spacy.load('nlp_resume_model')

In [15]:
train_data[0]

('Mohamed Ameen System engineer  Bengaluru, Karnataka - Email me on Indeed: indeed.com/r/Mohamed-Ameen/ ba052bfa70e4c0b7  I am looking for a job opportunity as a System Engineer that gives me professional growth and excellence and enables me to contribute my efforts in the success of the organization.  WORK EXPERIENCE  IT Operations Analyst  Accenture  I am looking for a job as system engineer that gives me professional growth and excellence and enables me to contribute my efforts in the success of the organization.  technical support engineer  Convergys for Microsoft -  November 2014 to November 2015  Currently working with Accenture as a Subject Matter Expert for the Remote Technology Support team in IT Operations.  EDUCATION  B.E in Electronics & Communication  Visveswaraiah Technological University -  Bengaluru, Karnataka  2013  Electronics Project  Al-Ameen PU College  Rajiv Gandhi Institute of Technology  SKILLS  Active Directory (2 years), Microsoft office, Windows,End user comp

In [16]:
doc = nlp_resume_model(train_data[0][0])

for ent in doc.ents:
    print(f'{ent.label_.upper():{30}} - {ent.text}')

NAME                           - Mohamed Ameen
DESIGNATION                    - System engineer
LOCATION                       - Bengaluru
EMAIL ADDRESS                  - indeed.com/r/Mohamed-Ameen/ ba052bfa70e4c0b7
DESIGNATION                    - IT Operations Analyst
COMPANIES WORKED AT            - Accenture
COMPANIES WORKED AT            - Accenture
DEGREE                         - B.E in Electronics & Communication
COLLEGE NAME                   - Visveswaraiah Technological University
LOCATION                       - Bengaluru
COLLEGE NAME                   - Rajiv Gandhi Institute of Technology
SKILLS                         - Active Directory (2 years), Microsoft office, Windows,End user computing (3 years)


In [17]:
!pip install PyMuPDF

Collecting PyMuPDF
  Downloading PyMuPDF-1.17.4-cp37-cp37m-win_amd64.whl (5.1 MB)
Installing collected packages: PyMuPDF
Successfully installed PyMuPDF-1.17.4


In [20]:
import sys, fitz

fname = 'Alice Clark CV.pdf'

doc = fitz.open(fname)
text = ''

for page in doc:
    text = text + str(page.getText())
    
    text = ' '.join(text.split('\n'))
    
print(text)

Alice Clark  AI / Machine Learning    Delhi, India Email me on Indeed  •  20+ years of experience in data handling, design, and development  •  Data Warehouse: Data analysis, star/snow flake scema data modelling and design specific to  data warehousing and business intelligence  •  Database: Experience in database designing, scalability, back-up and recovery, writing and  optimizing SQL code and Stored Procedures, creating functions, views, triggers and indexes.  Cloud platform: Worked on Microsoft Azure cloud services like Document DB, SQL Azure,  Stream Analytics, Event hub, Power BI, Web Job, Web App, Power BI, Azure data lake  analytics(U-SQL)  Willing to relocate anywhere    WORK EXPERIENCE  Software Engineer  Microsoft – Bangalore, Karnataka  January 2000 to Present  1. Microsoft Rewards Live dashboards:  Description: - Microsoft rewards is loyalty program that rewards Users for browsing and shopping  online. Microsoft Rewards members can earn points when searching with Bing, bro

In [21]:
doc = nlp_resume_model(text)

for ent in doc.ents:
    print(f'{ent.label_.upper():{30}} - {ent.text}')

NAME                           - Alice Clark
LOCATION                       - Delhi
DESIGNATION                    - Software Engineer
COMPANIES WORKED AT            - Microsoft – Bangalore, Karnataka
COMPANIES WORKED AT            - Microsoft
COMPANIES WORKED AT            - Microsoft
COMPANIES WORKED AT            - Microsoft
COMPANIES WORKED AT            - Microsoft
COLLEGE NAME                   - Indian Institute of Technology – Mumbai
SKILLS                         - Machine Learning, Natural Language Processing, and Big Data Handling    ADDITIONAL INFORMATION  Professional Skills  • Excellent analytical, problem solving, communication, knowledge transfer and interpersonal  skills with ability to interact with individuals at all the levels  • Quick learner and maintains cordial relationship with project manager and team members and  good performer both in team and independent job environments  • Positive attitude towards superiors &amp; peers  • Supervised junior developers thro

In [22]:
fname = 'bbanjara.pdf'

doc = fitz.open(fname)
text = ''

for page in doc:
    text = text + str(page.getText())
    
    text = ' '.join(text.split('\n'))
    
print(text)

Bishorup Banjara    banjara.bishorup@gmail.com      715-791-0828  Summary    • Around 6+ years of IT experience in software design, analysis, development, testing and  implementation of secure n-tier client/server web-based applications using .NET  Framework in various sectors.  • Design patterns: Singleton Design Pattern, Factory Pattern, Repository Pattern  • Extensive experience with Microsoft .Net Technologies (.NET Framework, MS VS.NET,  ADO.NET, Entity Framework, ASP.NET, MVC, API, VB.NET, C#.NET, SQL SERVER, WCF,  WWF and WPF) and experience working with INFRAGISTICS and third party tools.   • Proficient in development of Web & Windows based Applications; have good experience  working with multithreaded applications and also proficient with migrating from ASP.NET  3.0 to 4.0 and 4.0-5.0 frameworks and upgrading from VS 2012, 2013, 2015.   • Excellent experience in developing Web applications using WCF, AJAX, Telerik User  Control, JavaScript, XML, HTML, CSS, IIS and Web Services

In [23]:
doc = nlp_resume_model(text)

for ent in doc.ents:
    print(f'{ent.label_.upper():{30}} - {ent.text}')

NAME                           - Bishorup Banjara
COMPANIES WORKED AT            - Microsoft
DEGREE                         - Programming
COLLEGE NAME                   - Languages
SKILLS                         - C#, Python, NodeJs, VB.NET, Angular, React, Javascript,  HTML, CSS, jQuery  Cloud Computing
COLLEGE NAME                   - Supervised Learning, Unsupervised Learning
COLLEGE NAME                   - Vector Decomposition (SVD), XGBoost, Adaboost
DEGREE                         - Bachelor of Science, Major: Physics, Minor
LOCATION                       - Montvale
