## Potential Talent

### **Context:**

As a **talent sourcing and management company**, we are interested in **finding talented individuals** for sourcing these candidates to technology companies. **Finding talented candidates is not easy**, for **several reasons**. The **first** reason is one needs to understand what the role is very well to fill in that spot, this requires understanding the client’s needs and what they are looking for in a potential candidate. The **second** reason is one needs to understand what makes a candidate shine for the role we are in search for. **Third**, where to find talented individuals is another challenge.

The nature of our job requires a lot of human labor and is full of **manual operations**. Towards **automating this process** we want to build a better approach that could save us time and finally help us spot potential candidates that could fit the roles we are in search for. Moreover, going beyond that for a specific role we want to fill in we are interested in developing a machine learning powered pipeline that could spot talented individuals, and rank them based on their fitness.

We are right now semi-automatically sourcing a few candidates, therefore the sourcing part is not a concern at this time but we expect to first determine best matching candidates based on how fit these candidates are for a given role. We generally make these searches based on some keywords such as “full-stack software engineer”, “engineering manager” or “aspiring human resources” based on the role we are trying to fill in. These keywords might change, and you can expect that specific keywords will be provided to you.

Assuming that we were able to list and rank fitting candidates, we then employ a review procedure, as each candidate needs to be reviewed and then determined how good a fit they are through manual inspection. This procedure is done manually and at the end of this manual review, we might choose not the first fitting candidate in the list but maybe the 7th candidate in the list. If that happens, we are interested in being able to re-rank the previous list based on this information. This supervisory signal is going to be supplied by starring the 7th candidate in the list. Starring one candidate actually sets this candidate as an ideal candidate for the given role. Then, we expect the list to be re-ranked each time a candidate is starred.

#### Data Description:

The data comes from our sourcing efforts. We removed any field that could directly reveal personal details and gave a unique identifier for each candidate.

#### Attributes:
**id** : unique identifier for candidate (numeric)

**job_title** : job title for candidate (text)

**location** : geographical location for candidate (text)

**connections** : number of connections candidate has, 500+ means over 500 (text)

**Output (desired target)**:
fit - how fit the candidate is for the role? (numeric, probability between 0-1)

Keywords: “Aspiring human resources” or “seeking human resources”

#### Download Data:

https://docs.google.com/spreadsheets/d/117X6i53dKiO7w6kuA1g1TpdTlv1173h_dPlJt5cNNMU/edit?usp=sharing

#### Goal(s):

Predict how fit the candidate is based on their available information (variable fit)

Success Metric(s):

Rank candidates based on a fitness score.

Re-rank candidates when a candidate is starred.

#### Bonus(es):

We are interested in a robust algorithm, tell us how your solution works and show us how your ranking gets better with each starring action.

How can we filter out candidates which in the first place should not be in this list?

Can we determine a cut-off point that would work for other roles without losing high potential candidates?

Do you have any ideas that we should explore so that we can even automate this procedure to prevent human bias?

In [201]:
!pip install -U scikit-learn



In [202]:
# Importing Standard Libraries
import pandas as pd
import numpy as np

from sklearn.metrics.pairwise import linear_kernel
pd.options.display.max_columns = 30

## 1. Reading in and Exploring Our Data 

In [203]:
df = pd.read_csv('potential-talents - Aspiring human resources - seeking human resources.csv').set_index('id')
df.head()

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,2019 C.T. Bauer College of Business Graduate (...,"Houston, Texas",85,
2,Native English Teacher at EPIK (English Progra...,Kanada,500+,
3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,
4,People Development Coordinator at Ryan,"Denton, Texas",500+,
5,Advisory Board Member at Celal Bayar University,"İzmir, Türkiye",500+,


In [204]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 104 entries, 1 to 104
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   job_title   104 non-null    object 
 1   location    104 non-null    object 
 2   connection  104 non-null    object 
 3   fit         0 non-null      float64
dtypes: float64(1), object(3)
memory usage: 4.1+ KB


In [205]:
df.replace('500+ ','501', inplace=True)

In [206]:
df['connection'] = pd.to_numeric(df['connection'])

In [207]:
df.job_title.value_counts()

2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional                 7
Aspiring Human Resources Professional                                                                                    7
Student at Humber College and Aspiring Human Resources Generalist                                                        7
People Development Coordinator at Ryan                                                                                   6
Aspiring Human Resources Specialist                                                                                      5
Native English Teacher at EPIK (English Program in Korea)                                                                5
HR Senior Specialist                                                                                                     5
Seeking Human Resources HRIS and Generalist Positions                                                                    4
SVP, CHRO, Marke

In [208]:
df.job_title.iloc[0]

'2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional'

In [209]:
df = df.drop_duplicates()

In [210]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 53 entries, 1 to 104
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   job_title   53 non-null     object 
 1   location    53 non-null     object 
 2   connection  53 non-null     int64  
 3   fit         0 non-null      float64
dtypes: float64(1), int64(1), object(2)
memory usage: 2.1+ KB


### Prepping our Text for Modelling

In [211]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Prep our Text for Modelling
vectorizer = TfidfVectorizer(stop_words='english', ngram_range = (1, 2))
docs_tfidf = vectorizer.fit_transform(df["job_title"])

In [212]:
def get_tf_idf_query_similarity(vectorizer, docs_tfidf, query):
    """
    vectorizer: TfIdfVectorizer model
    docs_tfidf: tfidf vectors for all docs
    query: query doc

    return: cosine similarity between query and all docs
    """
    query_tfidf = vectorizer.transform([query])
    cos_sim = cosine_similarity(query_tfidf, docs_tfidf).flatten()
    
    return cos_sim

In [213]:
query = 'Aspiring human resources'

cos_sim = get_tf_idf_query_similarity(vectorizer, docs_tfidf, query = query)

df['fit'] = cos_sim

In [214]:
def top_candidates(n, by = 'fit', ascending = False, min_con = 0, location = df.location):
    
    df2 = df.loc[(df.connection >= min_con) & (df.location == location)].sort_values(by = by, ascending = ascending).head(n).copy()
    
    return df2

In [215]:
top_candidates(n = 10, by = 'fit', ascending = False, min_con = 0)

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,0.735855
97,Aspiring Human Resources Professional,"Kokomo, Indiana Area",71,0.735855
6,Aspiring Human Resources Specialist,Greater New York City Area,1,0.632697
73,"Aspiring Human Resources Manager, seeking inte...","Houston, Texas Area",7,0.50888
72,Business Management Major and Aspiring Human R...,"Monroe, Louisiana Area",5,0.38759
27,Aspiring Human Resources Management student se...,"Houston, Texas Area",501,0.374733
66,Experienced Retail Manager and aspiring Human ...,"Austin, Texas Area",57,0.373847
7,Student at Humber College and Aspiring Human R...,Kanada,61,0.358949
74,Human Resources Professional,Greater Boston Area,16,0.340769
79,Liberal Arts Major. Aspiring Human Resources A...,"Baton Rouge, Louisiana Area",7,0.336485


In [216]:
top_candidates(n = 10, by = 'fit', ascending = False, min_con = 90)

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
27,Aspiring Human Resources Management student se...,"Houston, Texas Area",501,0.374733
82,Aspiring Human Resources Professional | An ene...,"Austin, Texas Area",174,0.31642
100,Aspiring Human Resources Manager | Graduating ...,"Cape Girardeau, Missouri",103,0.308829
76,Aspiring Human Resources Professional | Passio...,"New York, New York",212,0.246772
28,Seeking Human Resources Opportunities,"Chicago, Illinois",390,0.220668
101,Human Resources Generalist at Loparex,"Raleigh-Durham, North Carolina Area",501,0.196509
78,Human Resources Generalist at Schwan's,Amerika Birleşik Devletleri,501,0.196509
71,"Human Resources Generalist at ScottMadden, Inc.","Raleigh-Durham, North Carolina Area",501,0.196509
68,Human Resources Specialist at Luxottica,Greater New York City Area,501,0.189503
89,Director Human Resources at EY,Greater Atlanta Area,349,0.187433


In [217]:
top_candidates(n = 50, by = 'fit', ascending = False, location = 'Austin, Texas Area')

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
66,Experienced Retail Manager and aspiring Human ...,"Austin, Texas Area",57,0.373847
82,Aspiring Human Resources Professional | An ene...,"Austin, Texas Area",174,0.31642


In [218]:
top_candidates(n = 50, by = 'fit', ascending = False, location = 'Greater New York City Area')

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
6,Aspiring Human Resources Specialist,Greater New York City Area,1,0.632697
68,Human Resources Specialist at Luxottica,Greater New York City Area,501,0.189503
102,Business Intelligence and Analytics at Travelers,Greater New York City Area,49,0.0


In [219]:
query = 'seeking human resources'

cos_sim = get_tf_idf_query_similarity(vectorizer, docs_tfidf, query = query)

df['fit'] = cos_sim

In [220]:
top_candidates(n = 10, by = 'fit', ascending = False, min_con = 0)

Unnamed: 0_level_0,job_title,location,connection,fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
99,Seeking Human Resources Position,"Las Vegas, Nevada Area",48,0.675682
28,Seeking Human Resources Opportunities,"Chicago, Illinois",390,0.675682
10,Seeking Human Resources HRIS and Generalist Po...,Greater Philadelphia Area,501,0.432761
94,Seeking Human Resources Opportunities. Open t...,Amerika Birleşik Devletleri,415,0.38129
73,"Aspiring Human Resources Manager, seeking inte...","Houston, Texas Area",7,0.362648
74,Human Resources Professional,Greater Boston Area,16,0.295223
75,"Nortia Staffing is seeking Human Resources, Pa...","San Jose, California",501,0.273577
27,Aspiring Human Resources Management student se...,"Houston, Texas Area",501,0.245337
3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,0.240319
97,Aspiring Human Resources Professional,"Kokomo, Indiana Area",71,0.240319


# Word2Vec

In [221]:
# !pip install nltk
# !pip install keras
# !pip install tensorflow
# !pip install -U gensim

### Prepping our Text for Modelling

In [222]:
import re
import nltk

# processing texts for modelling
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
df['job_title_cleaned'] = df.job_title.apply(lambda x: " ".join(re.sub(r'[^a-zA-Z]',' ',w).lower() 
                                                                                  for w in x.split() 
                                                                                  if re.sub(r'[^a-zA-Z]',' ',w).lower() 
                                                                                  not in stop_words) )

In [223]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 53 entries, 1 to 104
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   job_title          53 non-null     object 
 1   location           53 non-null     object 
 2   connection         53 non-null     int64  
 3   fit                53 non-null     float64
 4   job_title_cleaned  53 non-null     object 
dtypes: float64(1), int64(1), object(3)
memory usage: 2.5+ KB


In [224]:
# tokenize and pad every document to make them of the same size
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
tokenizer=Tokenizer()

tokenizer.fit_on_texts(df.job_title_cleaned)
tokenized_documents=tokenizer.texts_to_sequences(df.job_title_cleaned)
tokenized_paded_documents=pad_sequences(tokenized_documents,maxlen=64,padding='post')
vocab_size=len(tokenizer.word_index)+1

In [225]:
# loading pre-trained embeddings, each word is represented as a 300 dimensional vector
import gensim
W2V_PATH="https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"
model_w2v = gensim.models.KeyedVectors.load_word2vec_format(W2V_PATH, binary=True)

In [226]:
# creating embedding matrix, every row is a vector representation from the vocabulary indexed by the tokenizer index. 
embedding_matrix=np.zeros((vocab_size,300))
for word,i in tokenizer.word_index.items():
    if word in model_w2v:
        embedding_matrix[i]=model_w2v[word]
        
# creating document-word embeddings
document_word_embeddings=np.zeros((len(tokenized_paded_documents),64,300))
for i in range(len(tokenized_paded_documents)):
    for j in range(len(tokenized_paded_documents[0])):
        document_word_embeddings[i][j]=embedding_matrix[tokenized_paded_documents[i][j]]
document_word_embeddings.shape

(53, 64, 300)

In [227]:
document_word_embeddings[0][:10]

array([[-0.20800781,  0.03417969,  0.02575684, ...,  0.09570312,
        -0.046875  ,  0.23730469],
       [-0.33789062,  0.19824219, -0.296875  , ..., -0.15917969,
         0.03417969,  0.09179688],
       [-0.05249023,  0.06396484, -0.07128906, ..., -0.01037598,
        -0.12402344,  0.05541992],
       ...,
       [-0.03112793,  0.27148438,  0.09814453, ..., -0.01287842,
        -0.33789062,  0.15429688],
       [-0.17285156, -0.02600098, -0.06152344, ..., -0.36523438,
         0.37304688,  0.23242188],
       [-0.140625  ,  0.06835938,  0.01092529, ...,  0.05932617,
        -0.265625  ,  0.09619141]])

In [228]:
# cosine_similarity = np.dot(model_w2v['spain'], model_w2v['england'])/(np.linalg.norm(model_w2v['spain'])* 
#                                                                       np.linalg.norm(model_w2v['england']))
# cosine_similarity

In [229]:
# model_w2v['england'][:5]

In [230]:
def processing(query):
    df3 = pd.DataFrame([query], columns=['query'])
    stop_words = stopwords.words('english')
    df3['processed'] = df3['query'].apply(lambda x: " ".join(re.sub(r'[^a-zA-Z]',' ',w).lower() 
                                                                                  for w in x.split() 
                                                                                  if re.sub(r'[^a-zA-Z]',' ',w).lower() 
                                                                                  not in stop_words) )
    
    tokenizer.fit_on_texts(df3.processed)
    tokenized_documents=tokenizer.texts_to_sequences(df3.processed)
    tokenized_paded_documents=pad_sequences(tokenized_documents,maxlen=64,padding='post')
    vocab_size=len(tokenizer.word_index)+1
    
    embedding_matrix=np.zeros((vocab_size,300))
    for word,i in tokenizer.word_index.items():
        if word in model_w2v:
            embedding_matrix[i]=model_w2v[word]

    # creating document-word embeddings
    query_document_word_embeddings=np.zeros((len(tokenized_paded_documents),64,300))
    for i in range(len(tokenized_paded_documents)):
        for j in range(len(tokenized_paded_documents[0])):
            query_document_word_embeddings[i][j]=embedding_matrix[tokenized_paded_documents[i][j]]
#     document_word_embeddings.shape
    
    return query_document_word_embeddings

In [231]:
processing('hello world!!!!').shape

(1, 64, 300)

In [282]:
processing('hello world!!!!')[0][0][:5]

array([-0.05419922,  0.01708984, -0.00527954,  0.33203125, -0.25      ])

In [232]:
processing('hello world!!!!')[0][:3][0][:20]

array([-0.05419922,  0.01708984, -0.00527954,  0.33203125, -0.25      ])

In [233]:
processing('Aspiring human resources')[0][:3][0][:5]

array([-0.140625  ,  0.06835938,  0.01092529, -0.17285156,  0.13574219])

In [258]:
def get_w2v_query_similarity(query_w2v, document_word_embeddings, query):
    """
    query_w2v: processing the query
    model_w2v: word2vec embedding for all docs
    query: query doc

    return: cosine similarity between query and all docs

    """
    query_w2v = processing(query)
    
    nsamples, nx, ny = query_w2v.shape
    query_w2v_reshape = query_w2v.reshape((nsamples,nx*ny))

    nsamples, nx, ny = document_word_embeddings.shape
    document_word_embeddings_reshape = document_word_embeddings.reshape((nsamples,nx*ny))
    
    cos_sim_w2v = cosine_similarity(query_w2v_reshape, document_word_embeddings_reshape).flatten()
    
    return cos_sim_w2v

In [259]:
query = 'Aspiring human resources'

cos_sim_w2v = get_w2v_query_similarity(query_w2v, document_word_embeddings, query = query)

df['w2v_fit'] = cos_sim_w2v

In [260]:
top_candidates(n = 10, by = 'w2v_fit', ascending = False, min_con = 0)

Unnamed: 0_level_0,job_title,location,connection,fit,job_title_cleaned,w2v_fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
97,Aspiring Human Resources Professional,"Kokomo, Indiana Area",71,0.240319,aspiring human resources professional,0.898174
3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,0.240319,aspiring human resources professional,0.898174
6,Aspiring Human Resources Specialist,Greater New York City Area,1,0.206629,aspiring human resources specialist,0.873679
99,Seeking Human Resources Position,"Las Vegas, Nevada Area",48,0.675682,seeking human resources position,0.654387
82,Aspiring Human Resources Professional | An ene...,"Austin, Texas Area",174,0.103338,aspiring human resources professional energe...,0.641739
27,Aspiring Human Resources Management student se...,"Houston, Texas Area",501,0.245337,aspiring human resources management student se...,0.628601
28,Seeking Human Resources Opportunities,"Chicago, Illinois",390,0.675682,seeking human resources opportunities,0.619797
73,"Aspiring Human Resources Manager, seeking inte...","Houston, Texas Area",7,0.362648,aspiring human resources manager seeking inte...,0.584569
76,Aspiring Human Resources Professional | Passio...,"New York, New York",212,0.080592,aspiring human resources professional passio...,0.551164
10,Seeking Human Resources HRIS and Generalist Po...,Greater Philadelphia Area,501,0.432761,seeking human resources hris generalist positions,0.519345


In [261]:
query = 'seeking human resources'

cos_sim_w2v = get_w2v_query_similarity(query_w2v, document_word_embeddings, query = query)

df['w2v_fit'] = cos_sim_w2v

In [262]:
top_candidates(n = 10, by = 'w2v_fit', ascending = False, min_con = 0)

Unnamed: 0_level_0,job_title,location,connection,fit,job_title_cleaned,w2v_fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
99,Seeking Human Resources Position,"Las Vegas, Nevada Area",48,0.675682,seeking human resources position,0.886226
28,Seeking Human Resources Opportunities,"Chicago, Illinois",390,0.675682,seeking human resources opportunities,0.839381
10,Seeking Human Resources HRIS and Generalist Po...,Greater Philadelphia Area,501,0.432761,seeking human resources hris generalist positions,0.703341
97,Aspiring Human Resources Professional,"Kokomo, Indiana Area",71,0.240319,aspiring human resources professional,0.663209
3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,0.240319,aspiring human resources professional,0.663209
6,Aspiring Human Resources Specialist,Greater New York City Area,1,0.206629,aspiring human resources specialist,0.645122
94,Seeking Human Resources Opportunities. Open t...,Amerika Birleşik Devletleri,415,0.38129,seeking human resources opportunities open tr...,0.639099
89,Director Human Resources at EY,Greater Atlanta Area,349,0.162381,director human resources ey,0.571728
82,Aspiring Human Resources Professional | An ene...,"Austin, Texas Area",174,0.103338,aspiring human resources professional energe...,0.473859
81,Senior Human Resources Business Partner at Hei...,"Chattanooga, Tennessee Area",455,0.102581,senior human resources business partner heil e...,0.470671


In [263]:
query = 'business intelligence specialist'

cos_sim_w2v = get_w2v_query_similarity(query_w2v, document_word_embeddings, query = query)

df['w2v_fit'] = cos_sim_w2v

In [264]:
top_candidates(n = 10, by = 'w2v_fit', ascending = False, min_con = 0)

Unnamed: 0_level_0,job_title,location,connection,fit,job_title_cleaned,w2v_fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
102,Business Intelligence and Analytics at Travelers,Greater New York City Area,49,0.0,business intelligence analytics travelers,0.552532
68,Human Resources Specialist at Luxottica,Greater New York City Area,501,0.164174,human resources specialist luxottica,0.44738
8,HR Senior Specialist,San Francisco Bay Area,501,0.0,hr senior specialist,0.348536
86,Information Systems Specialist and Programmer ...,"Gaithersburg, Maryland",4,0.0,information systems specialist programmer love...,0.274835
101,Human Resources Generalist at Loparex,"Raleigh-Durham, North Carolina Area",501,0.170244,human resources generalist loparex,0.251939
78,Human Resources Generalist at Schwan's,Amerika Birleşik Devletleri,501,0.170244,human resources generalist schwan s,0.231181
13,Human Resources Coordinator at InterContinenta...,"Atlanta, Georgia",501,0.111899,human resources coordinator intercontinental b...,0.2158
72,Business Management Major and Aspiring Human R...,"Monroe, Louisiana Area",5,0.126581,business management major aspiring human resou...,0.214225
4,People Development Coordinator at Ryan,"Denton, Texas",501,0.0,people development coordinator ryan,0.205907
71,"Human Resources Generalist at ScottMadden, Inc.","Raleigh-Durham, North Carolina Area",501,0.170244,human resources generalist scottmadden inc,0.20292


In [266]:
top_candidates(n = 10, by = 'w2v_fit', ascending = False, min_con = 20, location = 'Greater New York City Area')

Unnamed: 0_level_0,job_title,location,connection,fit,job_title_cleaned,w2v_fit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
102,Business Intelligence and Analytics at Travelers,Greater New York City Area,49,0.0,business intelligence analytics travelers,0.552532
68,Human Resources Specialist at Luxottica,Greater New York City Area,501,0.164174,human resources specialist luxottica,0.44738


In [250]:
# query = 'Aspiring human resources'

# query_w2v = processing(query)

# nsamples, nx, ny = query_w2v.shape
# query_w2v_reshape = query_w2v.reshape((nsamples,nx*ny))

# nsamples, nx, ny = document_word_embeddings.shape
# document_word_embeddings_reshape = document_word_embeddings.reshape((nsamples,nx*ny))

# cos_sim_w2v = cosine_similarity(query_w2v_reshape, document_word_embeddings_reshape).flatten()

# df['w2v_fit'] = cos_sim_w2v

In [267]:
# top_candidates(n = 10, by = 'w2v_fit', ascending = False, min_con = 0)

# GloVe - 

# Fastext - 

# BERT - 

In [243]:
# Input candidates, query term, location, etc

In [244]:
# WordtoVec  Same thing but with pretrained word embedding average of word
# Try to see who I'm connected with 
skill review surrvey - schedule interview - motivated 

SyntaxError: invalid syntax (Temp/ipykernel_14772/1254448505.py, line 3)