# Potential Talents - An Apziva Project (#3)

By Samuel Alter

Apziva: 6bImatZVlK6DnbEo

# Proceed to the [previous notebook](potential_talents_pt2_ranknet.ipynb) to view my work on Learning-To-Rank systems RankNet and LambdaRank

## Project Overview

We are working with a talent sourcing and management company to help them surface candidates that are a best fit for their human resources job post. We are using a dataset of job candidates' job titles, their location, and their number of LinkedIn connections.

### Goals

Produce a probability, between 0 and 1, of how closely the candidate fits the job description of **"Aspiring human resources"** or **"Seeking human resources."** After an initial recommendation pulls out a candidate(s) to be starred for future consideration, the recommendation will be re-run and new "stars" will be awarded.

To help predict how the candidates fit, we are tracking the performance of two success metrics:
* Rank candidates based on a fitness score
* Re-rank candidates when a candidate is starred

We also need to do the following:
* Explain how the algorithm works and how the ranking improves after each starring iteration
* How to filter out candidates which should not be considered at all
* Determine a cut-off point (if possible) that would work for other roles without losing high-potential candidates
* Ideas to explore on automating this procedure to reduce or eliminate human bias

### The Dataset

| Column | Data Type | Comments |
|---|---|---|
| `id` | Numeric | Unique identifier for the candidate |
| `job_title` | Text | Job title for the candidate |
| `location` | Text | Geographic location of the candidate |
| `connections` | Text | Number of LinkedIn connections for the candidate |

Connections over 500 are encoded as "500+". Some do not have specific locations listed and just had their country, so I substituted capitol cities or geographic centers to represent those countries.

# Imports and Helper Functions

In [12]:
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Interacting with HuggingFace's models

### Import dataset

In [14]:
df = pd.read_csv('../data/3_data.csv')

In [27]:
job_titles = df['job_title'].tolist()

# check
job_titles[0]

'2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional'

In [28]:
search_terms = pd.read_parquet('search_terms.parquet')
search_terms = search_terms['term'].tolist()

# check
search_terms[0]

'Aspiring human resources'

In [29]:
with open('hugging_face_access_token.txt', 'r') as file:
    access_token = file.read().strip()

In [37]:
# hugging Face API endpoint and token
api_url = "https://api-inference.huggingface.co/models/EleutherAI/gpt-neo-2.7B"
headers = {"Authorization": f"Bearer {access_token}"}

# prompt for ranking
# use job_titles object

# search terms
# use search_terms object
prompt = f"""
Rank the following job titles based on their similarity to these search terms: {', '.join(search_terms)}.
Job Titles: {', '.join(job_titles)}.
Provide the results in a ranked list with explanations.
"""

# send request
response = requests.post(api_url, headers=headers, json={"inputs": prompt})

# parse and inspect the response
if response.status_code == 200:
    output = response.json()
    print("Raw Response:", output)  # Inspect the output structure
    if isinstance(output, list) and len(output) > 0:
        print("Model's Response:")
        print(output[0]["generated_text"])  # Adjust indexing based on response
    else:
        print("Unexpected response format:", output)
else:
    print(f"Error: {response.status_code} - {response.text}")

Raw Response: [{'generated_text': "\nRank the following job titles based on their similarity to these search terms: Aspiring human resources, Seeking human resources.\nJob Titles: 2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional, Native English Teacher at EPIK (English Program in Korea), Aspiring Human Resources Professional, People Development Coordinator at Ryan, Advisory Board Member at Celal Bayar University, Aspiring Human Resources Specialist, Student at Humber College and Aspiring Human Resources Generalist, HR Senior Specialist, Student at Humber College and Aspiring Human Resources Generalist, Seeking Human Resources HRIS and Generalist Positions, Student at Chapman University, SVP, CHRO, Marketing & Communications, CSR Officer | ENGIE | Houston | The Woodlands | Energy | GPHR | SPHR, Human Resources Coordinator at InterContinental Buckhead Atlanta, 2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiri