<a href="https://colab.research.google.com/github/pdrobny/Potential_Talents/blob/main/P3_huggingface_rev1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Potential Talent






## Background

#### As a talent sourcing and management company, we are interested in finding talented individuals for sourcing these candidates to technology companies. Finding talented candidates is not easy, for several reasons. The first reason is one needs to understand what the role is very well to fill in that spot, this requires understanding the client’s needs and what they are looking for in a potential candidate. The second reason is one needs to understand what makes a candidate shine for the role we are in search for. Third, where to find talented individuals is another challenge.

#### The nature of our job requires a lot of human labor and is full of manual operations. Towards automating this process we want to build a better approach that could save us time and finally help us spot potential candidates that could fit the roles we are in search for. Moreover, going beyond that for a specific role we want to fill in we are interested in developing a machine learning powered pipeline that could spot talented individuals, and rank them based on their fitness.

#### We are right now semi-automatically sourcing a few candidates, therefore the sourcing part is not a concern at this time but we expect to first determine best matching candidates based on how fit these candidates are for a given role. We generally make these searches based on some keywords such as “full-stack software engineer”, “engineering manager” or “aspiring human resources” based on the role we are trying to fill in. These keywords might change, and you can expect that specific keywords will be provided to you.

#### Assuming that we were able to list and rank fitting candidates, we then employ a review procedure, as each candidate needs to be reviewed and then determined how good a fit they are through manual inspection. This procedure is done manually and at the end of this manual review, we might choose not the first fitting candidate in the list but maybe the 7th candidate in the list. If that happens, we are interested in being able to re-rank the previous list based on this information. This supervisory signal is going to be supplied by starring the 7th candidate in the list. Starring one candidate actually sets this candidate as an ideal candidate for the given role. Then, we expect the list to be re-ranked each time a candidate is starred.

## Goals
#### - Predict how fit the candidate is based on their available information (variable fit)
#### - Rank candidates based on a fitness score.
#### - Re-rank candidates when a candidate is starred.

## Setup

In [None]:
!pip install transformers sentence-transformers
!pip install transformers torch



In [None]:
# import libraries
import pandas as pd
import numpy as np
import warnings
import logging
import random
import requests
import sys
import torch
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
warnings.filterwarnings('ignore', category=UserWarning)

print(torch.__version__)
#tf.__version__

2.6.0+cu124


# Data prep

In [None]:
df = pd.read_csv('talents.csv')
df

Unnamed: 0,id,title,sentence_bert_cossim
0,1,innovative and driven professional seeking a r...,1.000000
1,431,aspiring data science professional focused on ...,0.769162
2,544,data analyst data scientist business analyst d...,0.768222
3,833,data analyst turning complex data into actiona...,0.745245
4,199,ms in information systems northeastern univers...,0.727268
...,...,...,...
1260,648,research specialist university of rochester di...,0.079923
1261,730,medical biller at brick pediatric group,0.072848
1262,990,ingeniero elctrico,0.067254
1263,296,company owner at armstrong cleans carpets,0.056890


# Hugging Face

## Prompting

In [None]:

# Define an optimized prompt for better semantic matching
target_title = "Data Analyst"
prompt = f"Find job titles that closely match '{target_title}' based on meaning and relevance."

# Load model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Encode prompt and job titles
df = df.dropna(subset=["id", "title"])
df["similarity"] = util.cos_sim(model.encode(prompt), model.encode(df["title"].tolist()))[0].tolist()

# Rank results based on similarity
df = df.sort_values(by="similarity", ascending=False).reset_index(drop=True)
df["rank"] = df.index + 1  # Assign ranking based on sorted order

# Display top matches
df

Unnamed: 0,id,title,sentence_bert_cossim,similarity,rank
0,846,data analyst,0.599867,0.577674,1
1,900,data analyst,0.599867,0.577674,2
2,1094,data analyst,0.599867,0.577674,3
3,1194,data analyst,0.599867,0.577674,4
4,785,data analyst,0.599867,0.577674,5
...,...,...,...,...,...
1260,291,mathematical modeling,0.193771,0.026669,1261
1261,1145,aiml engineer at bootloader studio,0.231320,0.020871,1262
1262,990,ingeniero elctrico,0.067254,0.014392,1263
1263,707,reefpoint group,0.095684,0.013836,1264


In [None]:

target_title = input("Enter the target job title: ")
# prompt for
prompt = f"Find job titles that closely match '{target_title}' based on meaning and relevance. Add ranking."

# Load model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Encode prompt and job titles
df = df.dropna(subset=["id", "title"])
df["similarity"] = util.cos_sim(model.encode(prompt), model.encode(df["title"].tolist()))[0].tolist()

# Rank results based on similarity
df = df.sort_values(by="similarity", ascending=False).reset_index(drop=True)
df["rank"] = df.index + 1  # Assign ranking based on sorted order

# Display top matches
df

Enter the target job title: data scientist


Unnamed: 0,id,title,sentence_bert_cossim,similarity,rank
0,1108,data scientist,0.642221,0.582485,1
1,1065,data scientist,0.642221,0.582485,2
2,201,data scientist,0.642221,0.582485,3
3,1187,data scientist,0.642221,0.582485,4
4,1088,data scientist,0.642221,0.582485,5
...,...,...,...,...,...
1260,1111,manager investment risk at cpp investments,0.082493,0.035937,1261
1261,291,mathematical modeling,0.193771,0.028210,1262
1262,707,reefpoint group,0.095684,0.025176,1263
1263,293,ex accenture,0.285914,0.024719,1264
