# AI Recruiter Pipeline

paste a job description below and run all cells to see the top matching candidates for the JD

In [None]:
job_description_input = """# Financial AI Analyst - FinTech & Data Analytics

## Company Overview
We are a leading financial technology company specializing in AI-powered solutions for accounting, financial analysis, and business intelligence. Our mission is to transform traditional financial processes through innovative machine learning applications and data-driven insights.

## Position Summary
We are seeking a Financial AI Analyst to bridge the gap between financial expertise and artificial intelligence. This role involves developing AI models for financial data analysis, automating accounting processes, and creating intelligent solutions for financial reporting and analysis. The ideal candidate will combine strong financial acumen with emerging AI capabilities to revolutionize how financial data is processed and analyzed.

## Key Responsibilities
- **Financial AI Model Development**: Design and implement machine learning models for financial forecasting, fraud detection, and risk assessment
- **Data Analysis & Financial Insights**: Analyze large financial datasets to identify patterns, trends, and anomalies using advanced analytics
- **Automated Financial Processes**: Develop AI solutions to streamline accounts payable, accounts receivable, and general ledger processes
- **Financial Reporting Automation**: Create intelligent systems for generating financial reports, balance sheets, and trial balances
- **Cross-Functional Collaboration**: Work with finance, accounting, and technology teams to implement AI-driven financial solutions
- **Model Training & Deployment**: Train machine learning models on financial data and deploy them in production environments
- **Documentation & Compliance**: Maintain comprehensive documentation of AI models and ensure compliance with financial regulations
- **Business Intelligence**: Develop predictive models for sales analysis, cash flow forecasting, and financial planning

## Required Qualifications
- **Education**: Master's degree in Commerce (M.Com) or Bachelor's degree in Commerce (B.Com) with strong academic background
- **Financial Experience**: 3+ years of experience in accounting, bookkeeping, or financial analysis
- **Technical Skills**: Proficiency in data entry, financial software, and analytical processes
- **Software Proficiency**: Advanced Excel skills, ERP systems, QuickBooks, MYOB, and MS Office applications
- **Languages**: Fluency in English and Hindi preferred for client communication

## Technical Skills
- **AI/ML Frameworks**: TensorFlow, PyTorch, Scikit-learn
- **Programming**: Python, R, or Java for financial modeling and analysis
- **Financial Software**: QuickBooks, MYOB, ERP systems, and accounting software
- **Data Analysis**: Excel, PowerPoint, Outlook, and MIS reporting tools
- **Financial Processes**: General ledger, accounts payable/receivable, payroll, bank reconciliation, invoicing

## Preferred Experience
- Experience with financial data analysis and sales analysis
- Background in cash management and fixed assets accounting
- Knowledge of accrual accounting and financial statement preparation
- Experience with purchasing processes and vendor management
- Supervisory or management experience in financial operations
- Understanding of legal and compliance requirements in finance

## Key Competencies
- **Analytical Skills**: Strong ability to analyze complex financial data and identify meaningful patterns
- **Problem-Solving**: Innovative approach to automating traditional financial processes
- **Communication**: Excellent written and verbal communication skills for cross-functional collaboration
- **Team Collaboration**: Proven ability to work effectively with diverse teams
- **Detail-Oriented**: High accuracy in financial data processing and model development
- **Adaptability**: Willingness to learn new AI technologies and apply them to financial use cases

## What We Offer
- Opportunity to be at the forefront of AI innovation in financial services
- Comprehensive training in machine learning and AI applications
- Career advancement opportunities in the rapidly growing FinTech sector
- Competitive compensation package with performance-based incentives
- Collaborative work environment with emphasis on continuous learning
- Exposure to cutting-edge AI technologies and financial systems

## Work Environment
This is a hybrid role combining traditional financial analysis with innovative AI development. The position requires both analytical thinking for financial processes and technical aptitude for AI model development. Remote work options available with flexible scheduling.

---

*We are committed to creating an inclusive workplace that values diverse perspectives and backgrounds in both finance and technology.*
"""


## install dependencies

In [None]:
!uv add pandas
!uv add numpy
!uv add rapidfuzz
!uv add sentence_transformers
!uv add pydantic
!uv add google-genai
!uv add python-dotenv

## initialise candidate dataset

In [None]:
from core.data import process_candidate_data
from sentence_transformers import SentenceTransformer

embedding_model_name = 'all-mpnet-base-v2'

model = SentenceTransformer(embedding_model_name)
df = process_candidate_data('./data/resume_data.csv', model, reload=False)

## extract jd using llm structured outputs

In [1]:
from getpass import getpass
import os
from dotenv import load_dotenv

load_dotenv('./.env')

gemini_key = os.environ['GEMINI_API_KEY']

os.environ['GEMINI_API_KEY'] = getpass('Enter your Gemini API key:') if not gemini_key else gemini_key

In [None]:
from google import genai
from google.genai import types
from prompts.jd_extraction import system_prompt, user_prompt
from models.data_models import JobRoleSchema

genai_client = genai.Client()

def process_jd(jd: str):
    if not jd.strip():
        raise ValueError("No Job description provided")

    try:
        response = genai_client.models.generate_content(
            model="gemini-2.5-pro",
            contents=user_prompt.format(**{"job_desc": jd}),
            config=types.GenerateContentConfig(
                temperature=0.2,
                system_instruction=system_prompt,
                response_mime_type="application/json",
                response_schema=JobRoleSchema,
            )
        )
        if response and response.text:
            output = JobRoleSchema.model_validate_json(response.text)
            return output
        else:
            raise ValueError("response.text is empty")
    except:
        raise


def load_sample_jd():
    job_desc = ""
    with open('./jd/sample_jd_01.txt', 'r') as file:
        job_desc = file.read()

    return job_desc

sample_jd = load_sample_jd()

processed_jd = process_jd(job_description_input if job_description_input else sample_jd)

print(processed_jd.model_dump_json(indent=2, exclude_unset=True, exclude_none=False))

## filter and score candidates

### filter candidates by job title

In [None]:
from core.matching import fuzzy_match
from sentence_transformers import util


def fuzzy_job_title_score(threshold=0.4):
    role_scores = fuzzy_match(df['job_position_name'].tolist(), [processed_jd.role])
    df['title_score'] = role_scores[:, 0] / 100
    
    return df[df['title_score'] >= threshold]


def similarity_match(item1: str, item2: str):
    item1_embedding = model.encode(item1, convert_to_tensor=True)
    item2_embedding = model.encode(item2, convert_to_tensor=True)

    return util.cos_sim(item1_embedding, item2_embedding).item()

def similarity_job_title_score(threshold=0.6):
    df['title_score'] = df.apply(lambda x: similarity_match(x['job_position_name'], processed_jd.role), axis=1)
    
    return df[df['title_score'] >= threshold]


df_filtered = fuzzy_job_title_score() 
# df_filtered = similarity_job_title_score() # improve filter by switching to semantic similarity (takes too long!!)

print(f"filtered candidates: {len(df_filtered)}")
print("")
print(df[['candidate_id', 'job_position_name', 'title_score']])

### score candidates by skills

In [None]:
import pandas as pd
from core.matching import weighted_fuzzy_skill_score
from models.data_models import Skill

if not processed_jd:
    raise ValueError('JD is corrupted or not processed')

if len(df_filtered) == 0:
    raise ValueError('No candidates found')

def calculate_skill_score(df: pd.DataFrame, filter = False):
    jd_skills = processed_jd.skills

    if processed_jd.technologies:
        for t in processed_jd.technologies:
            jd_skills.append(
                Skill(
                    skill=t.technology,
                    priority=t.priority,
                    proficiency_level=None
                )
            )

    print(jd_skills)

    # df_filtered['skill_score'] = df_filtered.apply(lambda x: weighted_fuzzy_skill_score(x['candidate_id'], jd_skills, x['all_skills']), axis=1)
    skill_results = df.apply(
        lambda x: weighted_fuzzy_skill_score(x['candidate_id'], jd_skills, x['all_skills']),
        axis=1
    )

    skill_results_df = pd.json_normalize(skill_results.tolist())
    df['skill_score'] = skill_results_df['score']
    df['matched_skills'] = skill_results_df['matched_skills']

    # filter by skill
    if filter:
        SKILL_SCORE_THRESHOLD = 0.25
        df = df[df['skill_score'] >= SKILL_SCORE_THRESHOLD]
        print(f"filtered candidates: {len(df)}")

    print(df[['candidate_id', 'job_position_name', 'skill_score']])

    return df



df_filtered = calculate_skill_score(df_filtered)

# pd.set_option('display.max_rows', None)
# pd.set_option('display.max_columns', None)
# pd.set_option('display.width', 1000)
# pd.set_option('display.max_colwidth', None)
# df_filtered

### score candidates by qualifications

In [None]:
from core.matching import weighted_fuzzy_qualification_score

def calculate_qualification_score(df: pd.DataFrame, filter = False):
    if processed_jd and processed_jd.qualifications and processed_jd.qualifications.education:
        qualification_results = df.apply(
            lambda x: weighted_fuzzy_qualification_score(
                x['candidate_id'], 
                processed_jd.qualifications.education, # type: ignore
                { "degrees": x['degree_names_norm'],  "fields": x['major_field_of_studies']} 
            ), axis=1)
        
        qualification_results_df = pd.DataFrame(qualification_results.tolist())
        df['qualification_score'] = qualification_results_df['score']
        df['matched_qualifications'] = qualification_results_df['matched_qualifications']
        
        # filter by qualification
        if filter: 
            QUALIFICATION_SCORE_THRESHOLD = 0.2
            df = df[df['qualification_score'] >= QUALIFICATION_SCORE_THRESHOLD]
            print(f"filtered candidates: {len(df)}")

        print(df[['candidate_id', 'job_position_name', 'qualification_score']])
    else:
        df['qualification_score'] = 0.0
        df['matched_qualifications'] = None
    return df


df_filtered = calculate_qualification_score(df_filtered)

### calculate similarity score

In [None]:
import torch
from sentence_transformers.util import cos_sim
from core.embedding import build_jd_embedding_input

jd_text = build_jd_embedding_input(processed_jd)
jd_embedding = model.encode(jd_text, convert_to_tensor=True).cpu()

candidate_embeddings = torch.stack([torch.tensor(vec) for vec in df_filtered['profile_embedding']])

similarities = cos_sim(candidate_embeddings, jd_embedding).squeeze().cpu().numpy()

df_filtered['similarity_score'] = similarities

df_filtered[['candidate_id', 'similarity_score']]

## calculate total score and filter

In [None]:
from models.mappings import candidate_score_weights

# Compute the weighted total score
total_score = (
    candidate_score_weights['title_score'] * df_filtered['title_score'] +
    candidate_score_weights['skill_score'] * df_filtered['skill_score'] +
    candidate_score_weights['similarity_score'] * df_filtered['similarity_score']
)
if processed_jd.qualifications and processed_jd.qualifications.education:
    total_score += candidate_score_weights['qualification_score'] * df_filtered['qualification_score']
df_filtered['total_score'] = total_score

top_candidates = df_filtered.sort_values(by='total_score', ascending=False).head(3)

# df_filtered['match_explanation'] = df_filtered.apply(lambda row: f"Title: {row.title_score:.2f}, Skills: {row.skill_score:.2f}, Qual: {row.qualification_score:.2f}, Sem: {row.similarity_score:.2f}", axis=1)
# df_filtered['match_explanation']

cols_to_display = ['candidate_id', 'job_position_name', 'total_score', 'title_score', 'skill_score', 'matched_skills', 'qualification_score', 'matched_qualifications', 'similarity_score']
top_candidates[cols_to_display]


## draft emails to the selected candidates

In [None]:
import json
from core.email_generator import generate_email
from utils.parsing import candidate_to_json
from IPython.display import display, Markdown

top_candidates_json = [candidate_to_json(row) for _, row in top_candidates.iterrows()]

# for i, candidate in enumerate(top_candidates_json, 1):
#     display(Markdown(f"### Candidate {i}"))
#     display(Markdown(f"```json\n{json.dumps(candidate, indent=2)}\n```"))



generated_emails = [generate_email(job_description_input, json.dumps(c)) for c in top_candidates_json]

for i, email in enumerate(generated_emails):
    display(Markdown(f"### Candidate {i+1}"))
    display(Markdown(email))
