## Potential Talent

### **Context:**

As a **talent sourcing and management company**, we are interested in **finding talented individuals** for sourcing these candidates to technology companies. **Finding talented candidates is not easy**, for **several reasons**. The **first** reason is one needs to understand what the role is very well to fill in that spot, this requires understanding the client’s needs and what they are looking for in a potential candidate. The **second** reason is one needs to understand what makes a candidate shine for the role we are in search for. **Third**, where to find talented individuals is another challenge.

The nature of our job requires a lot of human labor and is full of **manual operations**. Towards **automating this process** we want to build a better approach that could save us time and finally help us spot potential candidates that could fit the roles we are in search for. Moreover, going beyond that for a specific role we want to fill in we are interested in developing a machine learning powered pipeline that could spot talented individuals, and rank them based on their fitness.

We are right now semi-automatically sourcing a few candidates, therefore the sourcing part is not a concern at this time but we expect to first determine best matching candidates based on how fit these candidates are for a given role. We generally make these searches based on some keywords such as “full-stack software engineer”, “engineering manager” or “aspiring human resources” based on the role we are trying to fill in. These keywords might change, and you can expect that specific keywords will be provided to you.

Assuming that we were able to list and rank fitting candidates, we then employ a review procedure, as each candidate needs to be reviewed and then determined how good a fit they are through manual inspection. This procedure is done manually and at the end of this manual review, we might choose not the first fitting candidate in the list but maybe the 7th candidate in the list. If that happens, we are interested in being able to re-rank the previous list based on this information. This supervisory signal is going to be supplied by starring the 7th candidate in the list. Starring one candidate actually sets this candidate as an ideal candidate for the given role. Then, we expect the list to be re-ranked each time a candidate is starred.

#### Data Description:

The data comes from our sourcing efforts. We removed any field that could directly reveal personal details and gave a unique identifier for each candidate.

#### Attributes:
**id** : unique identifier for candidate (numeric)

**job_title** : job title for candidate (text)

**location** : geographical location for candidate (text)

**connections** : number of connections candidate has, 500+ means over 500 (text)

**Output (desired target)**:
fit - how fit the candidate is for the role? (numeric, probability between 0-1)

Keywords: “Aspiring human resources” or “seeking human resources”

#### Download Data:

https://docs.google.com/spreadsheets/d/117X6i53dKiO7w6kuA1g1TpdTlv1173h_dPlJt5cNNMU/edit?usp=sharing

#### Goal(s):

Predict how fit the candidate is based on their available information (variable fit)

Success Metric(s):

Rank candidates based on a fitness score.

Re-rank candidates when a candidate is starred.

#### Bonus(es):

We are interested in a robust algorithm, tell us how your solution works and show us how your ranking gets better with each starring action.

How can we filter out candidates which in the first place should not be in this list?

Can we determine a cut-off point that would work for other roles without losing high potential candidates?

Do you have any ideas that we should explore so that we can even automate this procedure to prevent human bias?

In [1]:
# Importing Standard Libraries
import pandas as pd
import numpy as np

from sklearn.metrics.pairwise import linear_kernel
pd.options.display.max_columns = 30

## 1. Reading in and Exploring Our Data 

In [2]:
df = pd.read_csv('potential-talents - Aspiring human resources - seeking human resources.csv').set_index('id')
df.head()

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,2019 C.T. Bauer College of Business Graduate (...,"Houston, Texas",85,
2,Native English Teacher at EPIK (English Progra...,Kanada,500+,
3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,
4,People Development Coordinator at Ryan,"Denton, Texas",500+,
5,Advisory Board Member at Celal Bayar University,"İzmir, Türkiye",500+,


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 104 entries, 1 to 104
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   job_title   104 non-null    object 
 1   location    104 non-null    object 
 2   connection  104 non-null    object 
 3   fit         0 non-null      float64
dtypes: float64(1), object(3)
memory usage: 4.1+ KB


In [4]:
df.replace('500+ ','501', inplace=True)

In [5]:
df['connection'] = pd.to_numeric(df['connection'])

In [6]:
df.job_title.value_counts()

Student at Humber College and Aspiring Human Resources Generalist                                                        7
Aspiring Human Resources Professional                                                                                    7
2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional                 7
People Development Coordinator at Ryan                                                                                   6
HR Senior Specialist                                                                                                     5
Aspiring Human Resources Specialist                                                                                      5
Native English Teacher at EPIK (English Program in Korea)                                                                5
Human Resources Coordinator at InterContinental Buckhead Atlanta                                                         4
Student at Chapm

In [7]:
df = df.drop_duplicates()

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 53 entries, 1 to 104
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   job_title   53 non-null     object 
 1   location    53 non-null     object 
 2   connection  53 non-null     int64  
 3   fit         0 non-null      float64
dtypes: float64(1), int64(1), object(2)
memory usage: 2.1+ KB


### Prepping our Text for Modelling

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Prep our Text for Modelling
vectorizer = TfidfVectorizer(stop_words='english', ngram_range = (1, 2))
docs_tfidf = vectorizer.fit_transform(df["job_title"])

In [None]:
def get_tf_idf_query_similarity(vectorizer, docs_tfidf, query):
    """
    vectorizer: TfIdfVectorizer model
    docs_tfidf: tfidf vectors for all docs
    query: query doc

    return: cosine similarity between query and all docs
    """
    query_tfidf = vectorizer.transform([query])
    cos_sim = cosine_similarity(query_tfidf, docs_tfidf).flatten()
    
    return cos_sim

In [10]:
query = 'Aspiring human resources'

cos_sim = get_tf_idf_query_similarity(vectorizer, docs_tfidf, query = query)

df['fit'] = cos_sim

In [12]:
def top_candidates(n, by = 'fit', ascending = False, min_con = 0, location = df.location):
    
    df2 = df.loc[(df.connection >= min_con) & (df.location == location)].sort_values(by = by, ascending = ascending).head(n).copy()
    
    return df2

In [13]:
top_candidates(n = 10, by = 'fit', ascending = False, min_con = 0)

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,0.735855
97,Aspiring Human Resources Professional,"Kokomo, Indiana Area",71,0.735855
6,Aspiring Human Resources Specialist,Greater New York City Area,1,0.632697
73,"Aspiring Human Resources Manager, seeking inte...","Houston, Texas Area",7,0.50888
72,Business Management Major and Aspiring Human R...,"Monroe, Louisiana Area",5,0.38759
27,Aspiring Human Resources Management student se...,"Houston, Texas Area",501,0.374733
66,Experienced Retail Manager and aspiring Human ...,"Austin, Texas Area",57,0.373847
7,Student at Humber College and Aspiring Human R...,Kanada,61,0.358949
74,Human Resources Professional,Greater Boston Area,16,0.340769
79,Liberal Arts Major. Aspiring Human Resources A...,"Baton Rouge, Louisiana Area",7,0.336485


In [14]:
top_candidates(n = 10, by = 'fit', ascending = False, min_con = 90)

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
27,Aspiring Human Resources Management student se...,"Houston, Texas Area",501,0.374733
82,Aspiring Human Resources Professional | An ene...,"Austin, Texas Area",174,0.31642
100,Aspiring Human Resources Manager | Graduating ...,"Cape Girardeau, Missouri",103,0.308829
76,Aspiring Human Resources Professional | Passio...,"New York, New York",212,0.246772
28,Seeking Human Resources Opportunities,"Chicago, Illinois",390,0.220668
101,Human Resources Generalist at Loparex,"Raleigh-Durham, North Carolina Area",501,0.196509
78,Human Resources Generalist at Schwan's,Amerika Birleşik Devletleri,501,0.196509
71,"Human Resources Generalist at ScottMadden, Inc.","Raleigh-Durham, North Carolina Area",501,0.196509
68,Human Resources Specialist at Luxottica,Greater New York City Area,501,0.189503
89,Director Human Resources at EY,Greater Atlanta Area,349,0.187433


In [15]:
top_candidates(n = 50, by = 'fit', ascending = False, location = 'Austin, Texas Area')

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
66,Experienced Retail Manager and aspiring Human ...,"Austin, Texas Area",57,0.373847
82,Aspiring Human Resources Professional | An ene...,"Austin, Texas Area",174,0.31642


In [16]:
top_candidates(n = 50, by = 'fit', ascending = False, location = 'Greater New York City Area')

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
6,Aspiring Human Resources Specialist,Greater New York City Area,1,0.632697
68,Human Resources Specialist at Luxottica,Greater New York City Area,501,0.189503
102,Business Intelligence and Analytics at Travelers,Greater New York City Area,49,0.0


In [17]:
query = 'seeking human resources'

cos_sim = get_tf_idf_query_similarity(vectorizer, docs_tfidf, query = query)

df['fit'] = cos_sim

In [18]:
top_candidates(n = 10, by = 'fit', ascending = False, min_con = 0)

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
99,Seeking Human Resources Position,"Las Vegas, Nevada Area",48,0.675682
28,Seeking Human Resources Opportunities,"Chicago, Illinois",390,0.675682
10,Seeking Human Resources HRIS and Generalist Po...,Greater Philadelphia Area,501,0.432761
94,Seeking Human Resources Opportunities. Open t...,Amerika Birleşik Devletleri,415,0.38129
73,"Aspiring Human Resources Manager, seeking inte...","Houston, Texas Area",7,0.362648
74,Human Resources Professional,Greater Boston Area,16,0.295223
75,"Nortia Staffing is seeking Human Resources, Pa...","San Jose, California",501,0.273577
27,Aspiring Human Resources Management student se...,"Houston, Texas Area",501,0.245337
3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,0.240319
97,Aspiring Human Resources Professional,"Kokomo, Indiana Area",71,0.240319


In [None]:
# Spacy

In [None]:
# Input candidates, query term, location, etc

In [None]:
# WordtoVec  Same thing but with pretrained word embedding average of word
# Try to see who I'm connected with 
skill review surrvey - schedule interview - motivated 