## Resume to Job Description Comparison

_Author: Olalekan Fagbuyi_

Ever seen an online job posting and wondered if you are a fit based on details on your CV(resume)? With text processing libraries such as docx2txt and sklearn library tools like; Count Vectorizer and Cosine Similarity allows candidates to compare similarities between their CVs and company's job posting. This skills could help optmize the job search process by allowing candidates focus on job postings that are the best fit for their current level of experience. 

It is also important to note that companies use software like Applicants Tracking System(ATS) built on similar techniques to search for qualified candidates from a large pool of applicants.

 ### Table of Contents

1. Importing Libraries
2. Reading Documents (CV and Job Posting)
3. Applying Count Vectorization and Calculating Cosine Similarity

### 1. Import Libaries

This project uses text processing library **docx2txt** for reading and processing text documents. Also, CountVectorizer and Cosine Similarity were imported from SciKit Learn to compare similaries between documents.

In [31]:
#import library
import docx2txt
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import os

In [15]:
#upload data - 2 documents will be uploaded here. 1st is a resume and the 2nd a job posting
#printing all sales file from directory
path = "C:\\Users\\ofagb\\Job to CV"
files = [file for file in os.listdir(path) if not file.startswith('.')]
for file in files:
    print (file)

Ola_CV.docx
Walmart_Job_Posting.docx


### 2. Reading CV and Job Posting Dcoument

In [None]:
#read in CV file
CV = docx2txt.process("C:\\Users\\ofagb\\Ola_CV.docx")

In [None]:
#print CV content
print(CV)

In [25]:
#read job posting file
Job_Posting = docx2txt.process("C:\\Users\\ofagb\\Walmart_Job_Posting.docx")

In [36]:
#print Job_Posting
print(Job_Posting)

Data Scientist

SAVE



Walmart

Mississauga, ON, Canada



Apply on TalentEgg



Full–time

Position Summary... 

The Data Scientist represents the foundation of a data driven culture that has made Wal-Mart a leader in the retail industry. Individual in this role will use a wealth of ecommerce, physical store and external data available to understand customer behaviour improve operational efficiencies and provide actionable insights to stakeholders so they can make smarter, data-driven decisions.

The successful candidate will have an established background in solving complex problems, a strong technical ability, excellent project management skills, great communication skills, and the motivation to achieve results in a fast-paced environment. 



Our team is small, creative, diligent, highly entrepreneurial and business-focused. What you'll do... 

• Lead the data science development process with hands-on analysis and modeling, drawing from multiple of analytical methods to choose the

### 3. Applying Count Vectorization and Calculating Cosine Similarity

In [27]:
#create a list containing both CV and Job posting
Job_CV = [CV, Job_Posting]

### Count Vectorization

CountVectorizer from the scikit-learn library is used to transform a given text into a vector on the basis of the frequency of each word that occurs in the entire text. This is helpful when there are multiple such texts, and each there is need to convert each word in each text into vectors to be used in further text analysis.

In [30]:
#use count vectorizer to determine similarities between both documents
Count_Vect = CountVectorizer()
Count_Matrix = Count_Vect.fit_transform(Job_CV)

### Cosine Similarity

Cosine Similarity measures resemblance between 2 vectors. It measures the cosine of the ange between the 2 vectors and determines if they are pointing in the same direction. This technique is often used to measure document similarity in Text Analysis. A score of 0.5 is usually considered to signify a high similarity between documents.

In [33]:
#Printing Cosine Similarity Scores
print("\nSimilarity Scores:")
print(cosine_similarity(Count_Matrix))


Similarity Scores:
[[1.         0.64196146]
 [0.64196146 1.        ]]


In [35]:
#converting score to a percentage
Similarity_Percent = cosine_similarity(Count_Matrix)[0][1] * 100
Similarity_Percent = round(Similarity_Percent, 2) #rounding scores to 2 decimal place
print("Your CV matches about " + str(Similarity_Percent)+ "% of the job posting.")

Your CV matches about 64.2% of the job posting.


Conclusion: Guess I will be applying for this one :)

### 4. Keywords Matching

Checking if keywords from the job postings that could strengthen the match was missed in the CV. Spacy library will be used to carry this task out

In [65]:
from collections import Counter
import re
import pandas as pd

In [95]:
CV = docx2txt.process("C:\\Users\\ofagb\\Ola_CV.docx")
counts =  Counter(re.findall('\w+', CV)).most_common(30)
print(counts)

[('to', 20), ('and', 19), ('of', 14), ('in', 10), ('the', 8), ('by', 8), ('on', 6), ('Lagos', 5), ('com', 5), ('Business', 5), ('2022', 5), ('Data', 5), ('with', 5), ('s', 5), ('Pizza', 5), ('Hut', 5), ('Nigeria', 4), ('Tableau', 4), ('R', 4), ('using', 4), ('sales', 4), ('1', 3), ('Science', 3), ('Analytics', 3), ('University', 3), ('Focus', 3), ('Supply', 3), ('Chain', 3), ('Cape', 3), ('Town', 3)]


In [96]:
Job_Posting = docx2txt.process("C:\\Users\\ofagb\\Walmart_Job_Posting.docx")
counts2 =  Counter(re.findall('\w+', Job_Posting)).most_common(30)
print(counts2)

[('and', 32), ('the', 19), ('to', 15), ('data', 14), ('a', 11), ('in', 10), ('of', 8), ('for', 8), ('as', 8), ('business', 7), ('are', 6), ('an', 5), ('science', 5), ('with', 5), ('or', 5), ('analysis', 4), ('into', 4), ('be', 4), ('by', 4), ('qualifications', 4), ('Data', 3), ('Walmart', 3), ('Canada', 3), ('on', 3), ('this', 3), ('role', 3), ('will', 3), ('skills', 3), ('results', 3), ('challenges', 3)]
