# AI Recruiter Pipeline

paste a job description below and run all cells to see the top matching candidates for the JD

In [3]:
job_description_input = """# Sample software engineer job description
At [Company X], our technology solves problems. We've established the company as a leading developer of innovative software solutions, and we're looking for a highly skilled software engineer to join our program and network design team. The ideal candidate will have expert knowledge of software development processes, along with solid experience in testing and evaluating current networking systems. This person should be highly motivated in finding technical issues and fixing them with meticulous code.

## Objectives of this role
- Enhance existing platform and network capabilities to handle massive growth, enabling new insights and products based on data via self-serve computing, reporting solutions, and interactive querying
- Visualize, design, and develop innovative software platforms as we continue to experience growth in the usage and visibility of our products
- Create scalable software platforms and applications, as well as efficient networking solutions, that are unit tested, code reviewed, and checked regularly for continuous integration
- Examine existing systems for flaws and create solutions that improve service uptime and time-to-resolve through monitoring and automated remediation
- Plan and execute full software development lifecycle for each assigned project, adhering to company standards and expectations

## Responsibilities
- Design and build tools and frameworks to automate the development, testing, deployment, and management of services and products
- Plan and scale distributed software and applications, using synchronous and asynchronous design patterns, writing code, and delivering with urgency and quality
- Collaborate with global team to produce project plans and analyze the efficiency and feasibility of project operations, leveraging global technology stack and making localized improvements
- Track, document, and maintain software and network system functionality, and leverage any opportunity to improve engineering
- Focus on creating software and networking platforms that are free of faulty programming, and continuously keep developers in step without compromising site reliability
- Work with product managers and user-experience designers to influence the strategy and delivery of next-wave product features and system capabilities

## Required skills and qualifications
- Five or more years of experience as engineer of software and networking platforms
- Seven or more years of experience (professional and academic) with Java, Python, and C++
- Proven ability to document design processes, including development, testing, analytics, and troubleshooting
- Experience with rapid development cycles in a web-based environment
- Strong ability in scripting and test automation
- Desire to continue professional growth through training and education

## Preferred skills and qualifications
- Bachelor's degree (or equivalent) in software engineering or information technology
- Working knowledge of relational databases as well as ORM and SQL technologies
- Proficiency with HTML5, CSS3, and content management systems
- Web application development experience with multiple frameworks, including Wicket, GWT, and Spring MVC
"""


## install dependencies

In [4]:
!uv add pandas
!uv add numpy
!uv add rapidfuzz
!uv add sentence_transformers
!uv add pydantic
!uv add google-genai
!uv add python-dotenv

[2mResolved [1m146 packages[0m [2min 13ms[0m[0m
[2mAudited [1m100 packages[0m [2min 0.54ms[0m[0m
[2mResolved [1m146 packages[0m [2min 0.83ms[0m[0m
[2mAudited [1m100 packages[0m [2min 0.04ms[0m[0m
[2mResolved [1m146 packages[0m [2min 1ms[0m[0m
[2mAudited [1m100 packages[0m [2min 0.04ms[0m[0m
[2mResolved [1m146 packages[0m [2min 0.63ms[0m[0m
[2mAudited [1m100 packages[0m [2min 0.05ms[0m[0m
[2mResolved [1m146 packages[0m [2min 0.71ms[0m[0m
[2mAudited [1m100 packages[0m [2min 0.18ms[0m[0m
[2mResolved [1m146 packages[0m [2min 0.63ms[0m[0m
[2mAudited [1m100 packages[0m [2min 0.05ms[0m[0m
[2mResolved [1m146 packages[0m [2min 0.61ms[0m[0m
[2mAudited [1m100 packages[0m [2min 0.04ms[0m[0m


## initialise candidate dataset

In [5]:
from core.data import process_candidate_data
from sentence_transformers import SentenceTransformer

embedding_model_name = 'all-mpnet-base-v2'

model = SentenceTransformer(embedding_model_name)
df = process_candidate_data('./data/resume_data.csv', model, reload=False)

  from .autonotebook import tqdm as notebook_tqdm
Batches: 100%|██████████| 299/299 [00:20<00:00, 14.50it/s]


## extract jd using llm structured outputs

In [6]:
from getpass import getpass
import os
from dotenv import load_dotenv

load_dotenv('./.env')

gemini_key = os.environ['GEMINI_API_KEY']

os.environ['GEMINI_API_KEY'] = getpass('Enter your Gemini API key:') if not gemini_key else gemini_key

In [7]:
from google import genai
from google.genai import types
from prompts.jd_extraction import system_prompt, user_prompt
from models.data_models import JobRoleSchema

genai_client = genai.Client()

def process_jd(jd: str):
    if not jd.strip():
        raise ValueError("No Job description provided")

    try:
        response = genai_client.models.generate_content(
            model="gemini-2.5-pro",
            contents=user_prompt.format(**{"job_desc": jd}),
            config=types.GenerateContentConfig(
                temperature=0.2,
                system_instruction=system_prompt,
                response_mime_type="application/json",
                response_schema=JobRoleSchema,
            )
        )
        if response and response.text:
            output = JobRoleSchema.model_validate_json(response.text)
            return output
        else:
            raise ValueError("response.text is empty")
    except:
        raise


def load_sample_jd():
    job_desc = ""
    with open('./jd/sample_jd_01.txt', 'r') as file:
        job_desc = file.read()

    return job_desc

sample_jd = load_sample_jd()

processed_jd = process_jd(job_description_input if job_description_input else sample_jd)

print(processed_jd.model_dump_json(indent=2, exclude_unset=True, exclude_none=False))

{
  "role": "Software Engineer",
  "company": {
    "name": "Company X",
    "size": null,
    "stage": null
  },
  "industry": [
    "Software Development"
  ],
  "role_objectives": [
    "Enhance existing platform and network capabilities to handle massive growth, enabling new insights and products based on data via self-serve computing, reporting solutions, and interactive querying",
    "Visualize, design, and develop innovative software platforms as we continue to experience growth in the usage and visibility of our products",
    "Create scalable software platforms and applications, as well as efficient networking solutions, that are unit tested, code reviewed, and checked regularly for continuous integration",
    "Examine existing systems for flaws and create solutions that improve service uptime and time-to-resolve through monitoring and automated remediation",
    "Plan and execute full software development lifecycle for each assigned project, adhering to company standards an

## filter and score candidates

### filter candidates by job title

In [8]:
from core.matching import fuzzy_match
from sentence_transformers import util


def fuzzy_job_title_score(threshold=0.4):
    role_scores = fuzzy_match(df['job_position_name'].tolist(), [processed_jd.role])
    df['title_score'] = role_scores[:, 0] / 100
    
    return df[df['title_score'] >= threshold]


def similarity_match(item1: str, item2: str):
    item1_embedding = model.encode(item1, convert_to_tensor=True)
    item2_embedding = model.encode(item2, convert_to_tensor=True)

    return util.cos_sim(item1_embedding, item2_embedding).item()

def similarity_job_title_score(threshold=0.6):
    df['title_score'] = df.apply(lambda x: similarity_match(x['job_position_name'], processed_jd.role), axis=1)
    
    return df[df['title_score'] >= threshold]


df_filtered = fuzzy_job_title_score() 
# df_filtered = similarity_job_title_score() # improve filter by switching to semantic similarity (takes too long!!)

print(f"filtered candidates: {len(df_filtered)}")
print("")
print(df[['candidate_id', 'job_position_name', 'title_score']])

filtered candidates: 4089

     candidate_id                                  job_position_name  \
0            C001                           Senior Software Engineer   
1            C002                     Machine Learning (ML) Engineer   
2            C003  Executive/ Senior Executive- Trade Marketing, ...   
3            C004                     Business Development Executive   
4            C005                                Senior iOS Engineer   
...           ...                                                ...   
9539        C9540                                      Data Engineer   
9540        C9541                       Executive/ Sr. Executive -IT   
9541        C9542                                    Executive - VAT   
9542        C9543             Asst. Manager/ Manger (Administrative)   
9543        C9544                                     Civil Engineer   

      title_score  
0        1.000000  
1        0.640000  
2        0.393939  
3        0.297872  
4       

### score candidates by skills

In [9]:
import pandas as pd
from core.matching import weighted_fuzzy_skill_score
from models.data_models import Skill

if not processed_jd:
    raise ValueError('JD is corrupted or not processed')

if len(df_filtered) == 0:
    raise ValueError('No candidates found')

def calculate_skill_score(df: pd.DataFrame, filter = False):
    jd_skills = processed_jd.skills

    if processed_jd.technologies:
        for t in processed_jd.technologies:
            jd_skills.append(
                Skill(
                    skill=t.technology,
                    priority=t.priority,
                    proficiency_level=None
                )
            )

    print(jd_skills)

    # df_filtered['skill_score'] = df_filtered.apply(lambda x: weighted_fuzzy_skill_score(x['candidate_id'], jd_skills, x['all_skills']), axis=1)
    skill_results = df.apply(
        lambda x: weighted_fuzzy_skill_score(x['candidate_id'], jd_skills, x['all_skills']),
        axis=1
    )

    skill_results_df = pd.json_normalize(skill_results.tolist())
    df['skill_score'] = skill_results_df['score']
    df['matched_skills'] = skill_results_df['matched_skills']

    # filter by skill
    if filter:
        SKILL_SCORE_THRESHOLD = 0.25
        df = df[df['skill_score'] >= SKILL_SCORE_THRESHOLD]
        print(f"filtered candidates: {len(df)}")

    print(df[['candidate_id', 'job_position_name', 'skill_score']])

    return df



df_filtered = calculate_skill_score(df_filtered)

# pd.set_option('display.max_rows', None)
# pd.set_option('display.max_columns', None)
# pd.set_option('display.width', 1000)
# pd.set_option('display.max_colwidth', None)
# df_filtered

[Skill(skill='Software Development Lifecycle Management', priority=<ImportanceLevel.ESSENTIAL: 'essential'>, proficiency_level='expert'), Skill(skill='Testing and Evaluating Networking Systems', priority=<ImportanceLevel.ESSENTIAL: 'essential'>, proficiency_level='advanced'), Skill(skill='Documenting Design Processes', priority=<ImportanceLevel.ESSENTIAL: 'essential'>, proficiency_level=None), Skill(skill='Scripting', priority=<ImportanceLevel.ESSENTIAL: 'essential'>, proficiency_level='advanced'), Skill(skill='Test Automation', priority=<ImportanceLevel.ESSENTIAL: 'essential'>, proficiency_level='advanced'), Skill(skill='Rapid Development Cycles', priority=<ImportanceLevel.ESSENTIAL: 'essential'>, proficiency_level=None), Skill(skill='Web Application Development', priority=<ImportanceLevel.IMPORTANT: 'important'>, proficiency_level=None), Skill(skill='Java', priority=<ImportanceLevel.ESSENTIAL: 'essential'>, proficiency_level=None), Skill(skill='Python', priority=<ImportanceLevel.ESSE

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['skill_score'] = skill_results_df['score']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['matched_skills'] = skill_results_df['matched_skills']


### score candidates by qualifications

In [10]:
from core.matching import weighted_fuzzy_qualification_score

def calculate_qualification_score(df: pd.DataFrame, filter = False):
    if processed_jd and processed_jd.qualifications and processed_jd.qualifications.education:
        qualification_results = df.apply(
            lambda x: weighted_fuzzy_qualification_score(
                x['candidate_id'], 
                processed_jd.qualifications.education, # type: ignore
                { "degrees": x['degree_names_norm'],  "fields": x['major_field_of_studies']} 
            ), axis=1)
        
        qualification_results_df = pd.DataFrame(qualification_results.tolist())
        df['qualification_score'] = qualification_results_df['score']
        df['matched_qualifications'] = qualification_results_df['matched_qualifications']
        
        # filter by qualification
        if filter: 
            QUALIFICATION_SCORE_THRESHOLD = 0.2
            df = df[df['qualification_score'] >= QUALIFICATION_SCORE_THRESHOLD]
            print(f"filtered candidates: {len(df)}")

        print(df[['candidate_id', 'job_position_name', 'qualification_score']])
    else:
        df['qualification_score'] = 0.0
        df['matched_qualifications'] = None
    return df


df_filtered = calculate_qualification_score(df_filtered)

     candidate_id               job_position_name  qualification_score
0            C001        Senior Software Engineer                  0.0
1            C002  Machine Learning (ML) Engineer                  0.0
4            C005             Senior iOS Engineer                  0.0
5            C006                     AI Engineer                  1.0
6            C007             Senior iOS Engineer                  0.0
...           ...                             ...                  ...
9534        C9535             Senior iOS Engineer                  NaN
9537        C9538                   Data Engineer                  NaN
9538        C9539                     AI Engineer                  NaN
9539        C9540                   Data Engineer                  NaN
9543        C9544                  Civil Engineer                  NaN

[4089 rows x 3 columns]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['qualification_score'] = qualification_results_df['score']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['matched_qualifications'] = qualification_results_df['matched_qualifications']


### calculate similarity score

In [11]:
import torch
from sentence_transformers.util import cos_sim
from core.embedding import build_jd_embedding_input

jd_text = build_jd_embedding_input(processed_jd)
jd_embedding = model.encode(jd_text, convert_to_tensor=True).cpu()

candidate_embeddings = torch.stack([torch.tensor(vec) for vec in df_filtered['profile_embedding']])

similarities = cos_sim(candidate_embeddings, jd_embedding).squeeze().cpu().numpy()

df_filtered['similarity_score'] = similarities

df_filtered[['candidate_id', 'similarity_score']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered['similarity_score'] = similarities


Unnamed: 0,candidate_id,similarity_score
0,C001,0.567675
1,C002,0.347835
4,C005,0.544355
5,C006,0.588320
6,C007,0.544355
...,...,...
9534,C9535,0.544355
9537,C9538,0.386976
9538,C9539,0.588320
9539,C9540,0.386976


## calculate total score and filter

In [12]:
from models.mappings import candidate_score_weights

# Compute the weighted total score
total_score = (
    candidate_score_weights['title_score'] * df_filtered['title_score'] +
    candidate_score_weights['skill_score'] * df_filtered['skill_score'] +
    candidate_score_weights['similarity_score'] * df_filtered['similarity_score']
)
if processed_jd.qualifications and processed_jd.qualifications.education:
    total_score += candidate_score_weights['qualification_score'] * df_filtered['qualification_score']
df_filtered['total_score'] = total_score

top_candidates = df_filtered.sort_values(by='total_score', ascending=False).head(3)

# df_filtered['match_explanation'] = df_filtered.apply(lambda row: f"Title: {row.title_score:.2f}, Skills: {row.skill_score:.2f}, Qual: {row.qualification_score:.2f}, Sem: {row.similarity_score:.2f}", axis=1)
# df_filtered['match_explanation']

cols_to_display = ['candidate_id', 'job_position_name', 'total_score', 'title_score', 'skill_score', 'matched_skills', 'qualification_score', 'matched_qualifications', 'similarity_score']
top_candidates[cols_to_display]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered['total_score'] = total_score


Unnamed: 0,candidate_id,job_position_name,total_score,title_score,skill_score,matched_skills,qualification_score,matched_qualifications,similarity_score
2946,C2947,DevOps Engineer,0.654301,0.695652,0.425,"[Software Development, Sql, HTML, Databases, T...",1.0,[bachelor in Electrical Engineering],0.720306
1462,C1463,Data Science Engineer,0.637561,0.64,0.35625,"[C, platform software development, industrial ...",1.0,[bachelor in Electrical Engineering],0.752219
3168,C3169,DevOps Engineer,0.637113,0.695652,0.35625,"[C, platform software development, industrial ...",1.0,[bachelor in Electrical Engineering],0.720306


## draft emails to the selected candidates

In [13]:
import json
from core.email_generator import generate_email
from utils.parsing import candidate_to_json
from IPython.display import display, Markdown

top_candidates_json = [candidate_to_json(row) for _, row in top_candidates.iterrows()]

# for i, candidate in enumerate(top_candidates_json, 1):
#     display(Markdown(f"### Candidate {i}"))
#     display(Markdown(f"```json\n{json.dumps(candidate, indent=2)}\n```"))



generated_emails = [generate_email(job_description_input, json.dumps(c)) for c in top_candidates_json]

for i, email in enumerate(generated_emails):
    display(Markdown(f"### Candidate {i+1}"))
    display(Markdown(email))


### Candidate 1

Subject: Software Engineer Opportunity at [Company X] – A Fit for Your DevOps & Software Development Expertise

Dear [Candidate Name],

I came across your profile and was particularly impressed by your extensive experience as a DevOps Engineer, especially your focus on CI/CD, automation, and product scalability. Your background immediately brought to mind an exciting Software Engineer role on our Program and Network Design team at [Company X].

We're seeking a highly skilled engineer to enhance our platform and network capabilities, focusing on innovative software solutions and robust system reliability. Your proven expertise in **Software Development** with **Java** and **C++**, combined with your strong capabilities in **Testing** and **Application Development**, aligns perfectly with our need for meticulous code and scalable solutions.

What truly stands out is your experience with `CI/CD Culture & Tooling`, `Automation (Build & Deploy)`, and ensuring `Product Availability & Scalability`. These skills are critical for this role, as you would be instrumental in designing and building tools to automate development, testing, and deployment, while also examining existing systems for flaws to improve service uptime and time-to-resolve.

This opportunity at [Company X] could be a fantastic next step for you to leverage your comprehensive software engineering and DevOps mindset to directly impact the core infrastructure of a rapidly growing company.

Would you be open to a brief 15-20 minute call next week to discuss this role in more detail and explore how your expertise could contribute to our team?

Best regards,

[Your Name]
[Your Title]
[Company X]

### Candidate 2

Subject: Exploring a Software Engineer Opportunity at [Company X] - A Potential Fit for Your Platform Development Experience

Dear [Candidate Name],

I hope this email finds you well.

I came across your profile and was particularly impressed by your background as a Data Science Engineer, especially your experience in **platform software development** and your work with **Java-based systems**. Your expertise immediately brought to mind an exciting Software Engineer opportunity within our program and network design team at [Company X].

At [Company X], we're focused on developing innovative software solutions that solve complex problems, and we're looking for someone who can significantly contribute to enhancing our existing platform and network capabilities. Your proven ability in **application development, object-oriented design, and managing the full software development lifecycle**, including **software testing and documentation**, aligns perfectly with our objectives to build scalable, robust, and efficient systems.

Given your hands-on experience with languages like **C** (and we work with C++), coupled with your strong foundation in system analysis and performance monitoring, I believe your skills would be invaluable in identifying and resolving technical issues, ensuring high service uptime, and contributing to our continuous integration efforts. This role offers a unique chance to visualize, design, and develop innovative software platforms that will handle massive growth and enable new products.

Would you be open to a brief 15-20 minute call next week to discuss this opportunity further? I'd love to share more about the role and learn about your career aspirations to see if this could be a great next step for you.

Please let me know what time works best for you, or if you prefer, you can book a slot directly via [Link to your calendar, if applicable].

Best regards,

[Your Name]
[Your Title]
[Company X]
[Your Contact Information]
[Company Website]

### Candidate 3

Subject Line: Exploring a Software Engineer Opportunity at [Company X] – Your DevOps & Platform Expertise

Email Body:

Dear [Candidate Name],

Your profile as a DevOps Engineer immediately caught my attention, particularly your strong background in CI/CD culture, automation, and ensuring product availability and scalability. At [Company X], we're seeking a highly skilled Software Engineer to join our program and network design team, and your experience appears to be a fantastic match.

Specifically, your expertise in **platform software development** aligns perfectly with our objective to visualize, design, and develop innovative and scalable software platforms. We're also very impressed by your focus on **automation (Build & Deploy)** and **CI/CD Culture & Tooling**, which is crucial for our goal of building tools and frameworks to automate development, testing, and deployment processes, ultimately improving service uptime and time-to-resolve. Your experience with languages like **C** and Java (as seen in your "Java based TVGuide" project) also fits well with our core technology stack.

This role at [Company X] offers a unique opportunity to apply your skills in enhancing existing platform and network capabilities to handle massive growth and enable new insights. Your proven track record in infrastructure innovation, monitoring, and reliability would be invaluable as we continue to build robust and efficient solutions.

Would you be open to a brief chat next week to discuss this exciting opportunity in more detail? Please let me know what time works best for you, or if you prefer, I'm happy to send over some available slots.

Best regards,

[Your Name]
[Your Title]
[Company X]
[Your Contact Information]