So for this part we want to do Sentence embedding similarity, where we compare the resumes against job applications and general requirements, as the second part of the screening process.

We will be using a pretrained model, which scores highly on embedding with high understanding of context.
We thought about training a model for this, but it would require data, which we dont have. More specifically a dataset with resumes, job applications and a label if its a match or not.
Fine tuning a BERT model with specific data could yield good results.

In [6]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer, util

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
# Loading the pickled dataset
import pickle
with open('Data/Dataframes/newDF.pkl', 'rb') as f:
    df = pickle.load(f)

In [9]:
trainingDF = df.drop(columns=['ID', 'Label', 'TextLen'])
print(trainingDF.head())

                                              Resume  Software_Developer  \
0  Database AdministratorDatabase AdministratorDa...                   0   
1  Database AdministratorSQL Microsoft PowerPoint...                   0   
2  Oracle Database AdministratorOracle Database A...                   0   
3  Amazon Redshift Administrator ETL Developer Bu...                   0   
4  Scrum MasterOracle Database Administrator Scru...                   0   

   Database_Administrator  Systems_Administrator  Project_manager  \
0                       1                      0                0   
1                       1                      0                0   
2                       1                      0                0   
3                       1                      0                0   
4                       1                      0                0   

   Web_Developer  Network_Administrator  Security_Analyst  Python_Developer  \
0              0                      0          

In [12]:
# We use a trained bert model to get the embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [11]:
job_descriptions = {
    "Software_Developer": "Looking for a software developer with strong coding and problem-solving skills.",
    "Database_Administrator": "Seeking an expert in database design, administration, and SQL performance tuning.",
    "Systems_Administrator": "Responsible for managing IT systems, servers, and network infrastructure.",
    "Project_manager": "Oversee project timelines, manage teams, and deliver solutions on time and budget.",
    "Web_Developer": "Experienced in building web applications using HTML, CSS, JavaScript, and frameworks.",
    "Network_Administrator": "Manage and secure enterprise-level network environments and firewalls.",
    "Security_Analyst": "Analyze system vulnerabilities and ensure cybersecurity best practices.",
    "Python_Developer": "Skilled in Python development for web, data, or automation projects.",
    "Java_Developer": "Experienced in Java development, frameworks, and object-oriented design.",
    "Front_End_Developer": "Focus on user interface design and front-end development using modern tools."
}


In [13]:
# Create embeddings for job descriptions
job_embs = {label: model.encode(desc, convert_to_tensor=True) for label, desc in job_descriptions.items()}

# Store results
results = []

for idx, row in df.iterrows():
    resume_text = row["Resume"]
    resume_emb = model.encode(resume_text, convert_to_tensor=True)
    
    match_scores = {}
    for label, job_emb in job_embs.items():
        score = util.cos_sim(resume_emb, job_emb).item()
        match_scores[label] = round(score, 3)
    
    results.append({
        "Resume": resume_text,
        **match_scores
    })

similarity_df = pd.DataFrame(results)