# AI HR Screening Agent — End-to-End Evaluation Pipeline

This notebook demonstrates an end-to-end MVP pipeline for a rule-driven AI HR screening system.

It covers:
- Loading and cleaning a real-world candidate dataset
- Normalizing candidate profiles into a structured schema
- Evaluating candidates using a transparent, human-defined framework
- Producing consistent scores and explanations without training or fine-tuning

The goal is to show how AI can scale hiring decisions responsibly,
with explainability, fairness, and human oversight.

In [11]:
from datasets import load_dataset
import pandas as pd
import re

# Load dataset (DatasetDict)
ds_dict = load_dataset("lang-uk/recruitment-dataset-candidate-profiles-english")

# Select split
ds = ds_dict["train"]

# Convert to pandas DataFrame
df = ds.to_pandas()

In [12]:
print(type(df))
print(df.columns)
print(df.shape)

<class 'pandas.DataFrame'>
Index(['Position', 'Moreinfo', 'Looking For', 'Highlights', 'Primary Keyword',
       'English Level', 'Experience Years', 'CV', 'CV_lang', 'id',
       '__index_level_0__'],
      dtype='str')
(210250, 11)
<class 'pandas.DataFrame'>
Index(['Position', 'Moreinfo', 'Looking For', 'Highlights', 'Primary Keyword',
       'English Level', 'Experience Years', 'CV', 'CV_lang', 'id',
       '__index_level_0__'],
      dtype='str')
(210250, 11)


In [13]:
# Normalize position
df["Position"] = df["Position"].astype(str).str.lower()

# Broader inclusion: animation roles
include_pattern = r"(animat)"

# Explicit exclusion
exclude_pattern = r"(3\s*d|3d|motion\s*design|graphic\s*design|illustrator|concept\s*artist)"

df_2d = df[
    df["Position"].str.contains(include_pattern, regex=True, na=False) &
    ~df["Position"].str.contains(exclude_pattern, regex=True, na=False)
].copy()

df_2d["Position"] = "2D Animator"

print("Rows after broader animation filtering:", df_2d.shape)

  df["Position"].str.contains(include_pattern, regex=True, na=False) &
  ~df["Position"].str.contains(exclude_pattern, regex=True, na=False)


Rows after broader animation filtering: (159, 11)


  df["Position"].str.contains(include_pattern, regex=True, na=False) &
  ~df["Position"].str.contains(exclude_pattern, regex=True, na=False)


Rows after broader animation filtering: (159, 11)


In [14]:
df_2d.shape

(159, 11)

(159, 11)

In [15]:
df_2d.sample(10)["Position"]

805       2D Animator
203162    2D Animator
103       2D Animator
69        2D Animator
203052    2D Animator
803       2D Animator
62163     2D Animator
86        2D Animator
111       2D Animator
5562      2D Animator
Name: Position, dtype: str

46743     2D Animator
65        2D Animator
93        2D Animator
5572      2D Animator
147       2D Animator
193737    2D Animator
106       2D Animator
154       2D Animator
6028      2D Animator
188930    2D Animator
Name: Position, dtype: str

In [16]:
df.loc[df_2d.index, "Position"].sample(10)

5565                                            animator 2d
87                                              2d animator
33677     front-end creative developer ( wordpress / gsa...
542                                      2d artist animator
125                                   2d animator, composer
46743                   front-end developer (web animation)
69                                     2d animation (spine)
76                                              2d animator
148                                     2d animator (spine)
101967                                  lead spine animator
Name: Position, dtype: str

6028       art director, senior art producer, animator, vfx
101459    lead senior 2d animator, spine, ui/ux, motion,...
80                                              2d animator
115                                             2d animator
154                         2d animator/ technical designer
185228                                    spine 2d animator
810                                       2d spine animator
93                                              2d animator
185226                                    spine 2d animator
193737                            trainee spine 2d animator
Name: Position, dtype: str

In [17]:
import numpy as np
import re

# ----------------------------
# 1. Replace NaNs with "Not stated" (SAFE columns only)
# ----------------------------

SAFE_TEXT_COLS = [
    "English Level",
    "Experience Years",
    "Moreinfo",
    "Looking For",
    "Highlights",
    "CV_lang"
]

for col in SAFE_TEXT_COLS:
    if col in df_2d.columns:
        df_2d[col] = df_2d[col].fillna("Not stated")


# ----------------------------
# 2. HARD FILTER: Drop rows with missing or useless CVs
# ----------------------------

# Replace NaN CVs explicitly
df_2d["CV"] = df_2d["CV"].fillna("")

# Drop empty CVs
df_2d = df_2d[df_2d["CV"].str.strip() != ""]

# Drop very short CVs (noise / spam)
df_2d = df_2d[df_2d["CV"].str.len() >= 150]

print("Rows after CV quality filtering:", df_2d.shape)


# ----------------------------
# 3. Normalize English Level
# ----------------------------

def normalize_english(x):
    x = str(x).lower()
    if "adv" in x or "fluent" in x:
        return "Advanced"
    if "inter" in x:
        return "Intermediate"
    if "basic" in x:
        return "Basic"
    return "Not stated"

df_2d["english_level"] = df_2d["English Level"].apply(normalize_english)


# ----------------------------
# 4. Normalize Experience Years
# ----------------------------

def extract_years(x):
    if x == "Not stated":
        return "Not stated"
    match = re.search(r"\d+(\.\d+)?", str(x))
    if match:
        return float(match.group())
    return "Not stated"

df_2d["years_experience"] = df_2d["Experience Years"].apply(extract_years)


# ----------------------------
# 5. Clean free-text fields (whitespace, formatting)
# ----------------------------

TEXT_CLEAN_COLS = ["CV", "Moreinfo", "Looking For", "Highlights"]

def clean_text(x):
    x = str(x)
    x = re.sub(r"\s+", " ", x)
    return x.strip()

for col in TEXT_CLEAN_COLS:
    df_2d[col] = df_2d[col].apply(clean_text)


# ----------------------------
# 6. Final sanity check
# ----------------------------

print("Final cleaned dataset shape:", df_2d.shape)
df_2d.head(10)

Rows after CV quality filtering: (156, 11)
Final cleaned dataset shape: (156, 13)


Unnamed: 0,Position,Moreinfo,Looking For,Highlights,Primary Keyword,English Level,Experience Years,CV,CV_lang,id,__index_level_0__,english_level,years_experience
62,2D Animator,"2d animation is a fork part of my life, I have...",Not stated,Spine PRO: A Complete 2D Character Animation G...,Artist,intermediate,1.5,Spine PRO: A Complete 2D Character Animation G...,en,af5f41b9-8845-536b-810e-af5a42b83b42,24292,Intermediate,1.5
63,2D Animator,"character animation environment UI/UX ,creatio...",Not stated,Not stated,Unity,pre,3.0,"character animation environment UI/UX ,creatio...",en,2530e013-7911-5334-9147-b9aec43cb3ef,24293,Not stated,3.0
64,2D Animator,Imagination and fantasy are my strong points. ...,Not stated,Not stated,Artist,upper,3.0,Imagination and fantasy are my strong points. ...,en,63507ce5-0e1e-522f-8dd8-d6690c5227d1,24294,Not stated,3.0
65,2D Animator,"Hi! My education is monumental arts, but my ca...",I'm not interested in companies that continue ...,I learned how to use Spine 2D and animation to...,Artist,fluent,6.0,I learned how to use Spine 2D and animation to...,en,8d532d41-cd58-5f0c-89ab-b3066a03e7cc,24295,Advanced,6.0
68,2D Animator,I have been working on real projects for more ...,Not stated,Not stated,Artist,basic,1.5,I have been working on real projects for more ...,en,7281f664-f276-5d2e-9011-7aed15e9509a,24298,Basic,1.5
70,2D Animator,2018-2020 - freelancer 2020 - Playrix - market...,Not stated,Not stated,Other,intermediate,4.0,2018-2020 - freelancer 2020 - Playrix - market...,en,1df57dc6-3ae0-51ca-a278-19800e8947ae,24300,Intermediate,4.0
71,2D Animator,"2D animator experienced in: -developing, maint...",For me the following issues are important: -te...,Not stated,Artist,intermediate,1.5,"2D animator experienced in: -developing, maint...",en,103821e9-a4e5-51ba-9666-9704ab3c8601,24301,Intermediate,1.5
72,2D Animator,"2D ANIMATOR, TECHNICAL ANIMATOR. Worked with a...",Not stated,Not stated,Artist,pre,7.0,"2D ANIMATOR, TECHNICAL ANIMATOR. Worked with a...",en,a56df35e-26f8-5275-8dc4-999639a9b757,24302,Not stated,7.0
73,2D Animator,2D animator with 1 year of experience in the a...,Not stated,Created 2D animation using Spine Prepared spec...,Artist,intermediate,1.0,Created 2D animation using Spine Prepared spec...,en,3a11af28-0b2b-5ea8-9367-15cad817c74e,24303,Intermediate,1.0
74,2D Animator,- 4 years in gamedev - ability to work in Adob...,Not stated,Not stated,Other,upper,4.0,- 4 years in gamedev - ability to work in Adob...,en,b793b1c1-11e7-54dd-9bd8-fd612ffb84df,24304,Not stated,4.0


Rows after CV quality filtering: (156, 11)
Final cleaned dataset shape: (156, 13)


Unnamed: 0,Position,Moreinfo,Looking For,Highlights,Primary Keyword,English Level,Experience Years,CV,CV_lang,id,__index_level_0__,english_level,years_experience
62,2D Animator,"2d animation is a fork part of my life, I have...",Not stated,Spine PRO: A Complete 2D Character Animation G...,Artist,intermediate,1.5,Spine PRO: A Complete 2D Character Animation G...,en,af5f41b9-8845-536b-810e-af5a42b83b42,24292,Intermediate,1.5
63,2D Animator,"character animation environment UI/UX ,creatio...",Not stated,Not stated,Unity,pre,3.0,"character animation environment UI/UX ,creatio...",en,2530e013-7911-5334-9147-b9aec43cb3ef,24293,Not stated,3.0
64,2D Animator,Imagination and fantasy are my strong points. ...,Not stated,Not stated,Artist,upper,3.0,Imagination and fantasy are my strong points. ...,en,63507ce5-0e1e-522f-8dd8-d6690c5227d1,24294,Not stated,3.0
65,2D Animator,"Hi! My education is monumental arts, but my ca...",I'm not interested in companies that continue ...,I learned how to use Spine 2D and animation to...,Artist,fluent,6.0,I learned how to use Spine 2D and animation to...,en,8d532d41-cd58-5f0c-89ab-b3066a03e7cc,24295,Advanced,6.0
68,2D Animator,I have been working on real projects for more ...,Not stated,Not stated,Artist,basic,1.5,I have been working on real projects for more ...,en,7281f664-f276-5d2e-9011-7aed15e9509a,24298,Basic,1.5
70,2D Animator,2018-2020 - freelancer 2020 - Playrix - market...,Not stated,Not stated,Other,intermediate,4.0,2018-2020 - freelancer 2020 - Playrix - market...,en,1df57dc6-3ae0-51ca-a278-19800e8947ae,24300,Intermediate,4.0
71,2D Animator,"2D animator experienced in: -developing, maint...",For me the following issues are important: -te...,Not stated,Artist,intermediate,1.5,"2D animator experienced in: -developing, maint...",en,103821e9-a4e5-51ba-9666-9704ab3c8601,24301,Intermediate,1.5
72,2D Animator,"2D ANIMATOR, TECHNICAL ANIMATOR. Worked with a...",Not stated,Not stated,Artist,pre,7.0,"2D ANIMATOR, TECHNICAL ANIMATOR. Worked with a...",en,a56df35e-26f8-5275-8dc4-999639a9b757,24302,Not stated,7.0
73,2D Animator,2D animator with 1 year of experience in the a...,Not stated,Created 2D animation using Spine Prepared spec...,Artist,intermediate,1.0,Created 2D animation using Spine Prepared spec...,en,3a11af28-0b2b-5ea8-9367-15cad817c74e,24303,Intermediate,1.0
74,2D Animator,- 4 years in gamedev - ability to work in Adob...,Not stated,Not stated,Other,upper,4.0,- 4 years in gamedev - ability to work in Adob...,en,b793b1c1-11e7-54dd-9bd8-fd612ffb84df,24304,Not stated,4.0


In [18]:
# ----------------------------
# 1. Rename columns to final names
# ----------------------------

final_df = df_2d.rename(columns={
    "English Level": "english_level_raw",
    "Experience Years": "experience_years_raw",
    "Moreinfo": "additional_info",
    "Looking For": "looking_for",
    "Highlights": "highlights",
    "CV": "cv_text"
})


# ----------------------------
# 2. Define desired columns explicitly
# ----------------------------

desired_columns = [
    "Position",
    "english_level",
    "years_experience",
    "cv_text",
    "highlights",
    "looking_for",
    "additional_info"
]

# Add experience_level ONLY if it exists
if "experience_level" in final_df.columns:
    desired_columns.insert(2, "experience_level")


# ----------------------------
# 3. Select only existing columns (SAFE)
# ----------------------------

final_df = final_df[[c for c in desired_columns if c in final_df.columns]]

print("Final dataset shape:", final_df.shape)
final_df.head(2)


Final dataset shape: (156, 7)


Unnamed: 0,Position,english_level,years_experience,cv_text,highlights,looking_for,additional_info
62,2D Animator,Intermediate,1.5,Spine PRO: A Complete 2D Character Animation G...,Spine PRO: A Complete 2D Character Animation G...,Not stated,"2d animation is a fork part of my life, I have..."
63,2D Animator,Not stated,3.0,"character animation environment UI/UX ,creatio...",Not stated,Not stated,"character animation environment UI/UX ,creatio..."


Final dataset shape: (156, 7)


Unnamed: 0,Position,english_level,years_experience,cv_text,highlights,looking_for,additional_info
62,2D Animator,Intermediate,1.5,Spine PRO: A Complete 2D Character Animation G...,Spine PRO: A Complete 2D Character Animation G...,Not stated,"2d animation is a fork part of my life, I have..."
63,2D Animator,Not stated,3.0,"character animation environment UI/UX ,creatio...",Not stated,Not stated,"character animation environment UI/UX ,creatio..."


In [19]:
final_df.to_csv(
    "clean_2d_animator_evaluation_dataset.csv",
    index=False,
    encoding="utf-8"
)

print("✅ Dataset exported successfully")

✅ Dataset exported successfully
✅ Dataset exported successfully


In [20]:
final_df.isna().sum()

Position            0
english_level       0
years_experience    0
cv_text             0
highlights          0
looking_for         0
additional_info     0
dtype: int64

Position            0
english_level       0
years_experience    0
cv_text             0
highlights          0
looking_for         0
additional_info     0
dtype: int64