# Bias Detection

**Main Objective:**  
Detect errors and biases introduced by the CV parser when extracting skills from raw CV text.
We perform this inspection using both rule based (regular expressions) and semantic techniques.

1. **Error detection steps:**  
    - Identify errors in candidates **Driving Licenses** and **Language Skills** using Regex.  
    - Uncover errors in candidates **Job Experience** using exact matching and semantic approach.  

2. **Bias detection:**
    - Analyze the errors identified in Step 1 for the groups previously examined (see `distribution_analysis.ipynb`) to determine whether the parser has disadvantaged or advantaged any of them.

In [29]:
%load_ext autoreload 
%autoreload 2

import json
import os

import polars as pl
from huggingface_hub import login

from hiring_cv_bias.bias_detection.fuzzy.matcher import SemanticMatcher
from hiring_cv_bias.bias_detection.fuzzy.parser import JobParser
from hiring_cv_bias.bias_detection.rule_based.data import (
    add_demographic_info,
)
from hiring_cv_bias.bias_detection.rule_based.evaluation.compare_parser import (
    compute_candidate_coverage,
)
from hiring_cv_bias.bias_detection.rule_based.extractors import (
    extract_driver_license,
    extract_languages,
    norm_driver_license,
    norm_languages,
)
from hiring_cv_bias.bias_detection.rule_based.patterns import (
    driver_license_pattern_eng,
    jobs_pattern,
    languages_pattern_eng,
    normalized_jobs,
)
from hiring_cv_bias.bias_detection.rule_based.utils import (
    print_highlighted_cv,
    print_report,
)
from hiring_cv_bias.config import (
    CANDIDATE_CVS_TRANSLATED_CLEANED_PATH,
    CLEANED_REVERSE_MATCHING_PATH,
    CLEANED_SKILLS,
    DRIVING_LICENSE_FALSE_NEGATIVES_PATH,
    JOB_TITLE_FALSE_NEGATIVES_PATH,
    LANGUAGE_SKILL_FALSE_NEGATIVES_PATH,
)
from hiring_cv_bias.utils import load_data

pl.Config.set_tbl_cols(-1)
pl.Config.set_tbl_width_chars(200);

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [30]:
os.environ["TOKENIZERS_PARALLELISM"] = "True"
with open("token.json", "r") as token:
    login(token=json.load(token)["token"])

!python -m spacy download en_core_web_sm 

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


### Load the data 

In [31]:
df_cv_raw = load_data(CANDIDATE_CVS_TRANSLATED_CLEANED_PATH)
df_skills = load_data(CLEANED_SKILLS)
df_info_candidates = load_data(CLEANED_REVERSE_MATCHING_PATH)

In [32]:
df_info_candidates = df_info_candidates.with_columns(
    pl.when(pl.col("LATITUDE") > 44.5)
    .then(pl.lit("NORTH"))
    .when(pl.col("LATITUDE") < 42)
    .then(pl.lit("SOUTH"))
    .otherwise(pl.lit("CENTER"))
    .alias("Location")
)

df_cv_raw = df_cv_raw.with_columns(
    pl.when(pl.col("len_anon") < 1000)
    .then(pl.lit("SHORT"))
    .when(pl.col("len_anon") < 2500)
    .then(pl.lit("MEDIUM"))
    .otherwise(pl.lit("LONG"))
    .alias("length")
)

### Bias detection for Driver Licences

- **Pre-processing step –> driving licence flag**  
  We call `add_demographic_info()` to add a Boolean column, `has_driving_license`, to the CV dataframe. This flag will help us compare what the regex detects in the raw CV text with what the parser extracted as driving licence for each candidate, allowing us to identify potential omissions in the parsing step.

- **How the flag is generated**  
  - A single case insensitive regex (`driver_license_pattern_eng`) looks for common phrases such as “driving license B”, "C1 driving licence” or even “own car”.  
  - The helper function `extract_driver_license(text)` returns `True` if the regex matches anywhere in the CV text.  

- **Resulting columns in `df_cv`**  
  -  Same as before
  - `Gender`, `Location` —> from the candidates sheet
  - `has_driving_license` —> `True` if any licence is mentioned, otherwise `False`

- **Note**  
  For now we only care whether a candidate has *any* licence. (given that the driver license type column contains a handful of null values (see `data_cleaning.ipynb`))
  
  The same regex already captures specific categories (A, B, C…), so the analysis could be extended later if we want to explore potential biases tied to particular licence types.

In [33]:
df_cv = add_demographic_info(df_cv_raw, df_info_candidates)
df_cv.head()

CANDIDATE_ID,CV_text_anon,Translated_CV,len_anon,length,Gender,Location,has_driving_license
i64,str,str,i64,str,str,str,bool
7990324,"""CV anonimizzato: """""" PROFILO D…",""" profile graduated from elsa m…",1445,"""MEDIUM""","""Female""","""NORTH""",True
7974050,"""CV anonimizzato: """"""  Curricul…",""" curriculum vitae personal inf…",2148,"""MEDIUM""","""Female""","""CENTER""",True
7965670,"""CV anonimizzato: """""" ESPERIENZ…",""" work experience 03/27/2023 – …",4911,"""LONG""","""Female""","""NORTH""",True
7960501,"""CV anonimizzato: """""" Esperienz…",""" work experience waiter and ba…",680,"""SHORT""","""Male""","""NORTH""",False
7960052,"""CV anonimizzato: """"""  Dat…",""" date of birth: 03/26/1996 nat…",5913,"""LONG""","""Female""","""CENTER""",False


**Comparing parser output and regex detection**

The `compute_candidate_coverage()` function evaluates how well the parsing system detects a specific category of skills  by comparing it to our approach. 

In this case, the chosen category is  `"DRIVERSLIC"` and we use a custom regex based extractor applied directly to the raw CV text.


This step is crucial for measuring the parser’s coverage by quantifying **false negatives**. (skills that are mentioned in the CV but missed by the parser)

**Output breakdown:** 
   1. **Regex positive candidates**: number of unique candidates flagged by our rule based extractor.
   2. **Parser positive unique candidates**: number of unique candidates flagged by the parser.

- **Both regex & parser**: candidates detected by both methods.
- **Only regex**: candidates our regex caught but the parser missed. 
- **Only parser**: candidates the parser flagged but our regex did not.

Then `print_report()` displays:
   - The overall confusion matrix and derived metrics (accuracy, precision, recall, F1).

In [34]:
res_dl = compute_candidate_coverage(
    df_cv=df_cv,
    df_parser=df_skills,
    skill_type="DRIVERSLIC",
    extractor=extract_driver_license,
    norm=norm_driver_license,
)

print("Confusion matrix:", res_dl.conf)

  0%|          | 0/7404 [00:00<?, ?it/s]

Regex positive candidates        : 4134
Parser positive unique candidates: 2032
- Both regex & parser   : 1689
- Only regex            : 2445
- Only parser           : 343

Confusion matrix: 
Counts   -> TP:1689   FP:343   TN:2927   FN:2445 
Metrics  -> Precision:0.83  Recall:0.409  F1:0.548  Acc:0.623


The parser achieves **high precision (~83 %)** but **low recall (~41 %)**.  
In other words, when it flags a skill it is usually correct, yet it **misses more than half of the skills** that the regex finds.

Let's dive into **false negatives (FN)**  
   * We’ll highlight in **red** the exact terms captured by the regex directly inside the CV text, making it easy to verify their presence at a glance.

In [35]:
df_fn = pl.DataFrame(res_dl.fn_rows)
sample = df_fn.sample(n=2, shuffle=True)
for row in sample.to_dicts():
    print_highlighted_cv(row, pattern=driver_license_pattern_eng)


CANDIDATE ID: 6656580 - GENERE: Female
Reason: Rule-based extractor found skill but parser missed it.
--------------------------------------------------------------------------------
ail / use of photo editing software (adobe lightroom) problem solving (problem analysis) / teamwork [31mdriving license: b[0m (25/08/2009)
--------------------------------------------------------------------------------

CANDIDATE ID: 2833090 - GENERE: Male
Reason: Rule-based extractor found skill but parser missed it.
--------------------------------------------------------------------------------
verhead cranes technical skills and competences with computers, specific equipment, machinery, etc. [31mdriving licence or[0m licences driving licence b [insert any other relevant information here, e.g. contact persons, refer
lls and competences with computers, specific equipment, machinery, etc. driving licence or licences [31mdriving licence b[0m [insert any other relevant information here, e.g. contact

In [36]:
print(f"False negatives matching snippet pattern: {df_fn.height}")
df_fn.write_csv(DRIVING_LICENSE_FALSE_NEGATIVES_PATH, separator=";")
print("Saved filtered false negatives!")

False negatives matching snippet pattern: 2445
Saved filtered false negatives!


In [37]:
%%bash --bg 
cd ..

# for Unix users  
.venv/bin/python -m streamlit run hiring_cv_bias/bias_detection/rule_based/app/fn_app.py

# for Windows users
#.venv/Scripts/python.exe -m streamlit run hiring_cv_bias/bias_detection/rule_based/app/fn_app.py

### Bias Detection Metrics

To assess the model’s fairness, we employed the following **bias detection metrics**:

| Metric | Formula | Interpretation |
|--------|---------|----------------|
| **Equality of Opportunity&nbsp;(TPR parity)** | $$\text{TPR}_g = \frac{TP_g}{TP_g + FN_g} $$ | $\text{TPR}_g$ equal for every $g$ ensures that **every individual who truly qualifies** for a positive outcome has the **same chance** of being correctly identified, regardless of group membership. |
| **Calibration&nbsp;\(NPV\)** | $$\text{NPV}_g = \frac{TN_g}{TN_g + FN_g}\qquad $$ | $\text{NPV}_g$ parity for every $g$ ensures that **when the model predicts a negative outcome**, the probability of being correct is the **same** for every group. |
| **Selection Rate** | $$\text{SR}_g = \frac{TP_g + FP_g}{TP_g + FP_g + TN_g + FN_g} $$ | Share of individuals in group $g$ predicted positive (selected). |
| **Disparate Impact (DI)** | $$\displaystyle DI = \frac{\text{SR}_{\text{target}}}{\text{SR}_{\text{reference}}}$$ | Ratio of selection rates; values **\< 0.80** (four-fifths rule) indicate potential adverse impact against the target group. |



<u>All these metrics were computed for the **Gender** and **Location** groups to detect and quantify possible bias in the selection process.</u>


In [38]:
print_report(
    result=res_dl,
    df_population=df_cv,
    reference_col="Male",
    group_col="Gender",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 1689, FP: 343, TN: 2927, FN: 2445
Accuracy: 0.623, Precision: 0.831, Recall: 0.409, F1: 0.548

Error and rates by Gender:
 shape: (2, 12)
┌────────┬───────┬─────┬─────┬──────┬──────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ Gender ┆ total ┆ tp  ┆ fp  ┆ fn   ┆ tn   ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---    ┆ ---   ┆ --- ┆ --- ┆ ---  ┆ ---  ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str    ┆ u32   ┆ u32 ┆ u32 ┆ u32  ┆ u32  ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞════════╪═══════╪═════╪═════╪══════╪══════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ Male   ┆ 3984  ┆ 966 ┆ 189 ┆ 1316 ┆ 1513 ┆ 3984         ┆ 0.04744  ┆ 0.330321 ┆ 0.423313                ┆ 0.534818        ┆ 1.0    

In [39]:
print_report(
    result=res_dl,
    df_population=df_cv,
    reference_col="NORTH",
    group_col="Location",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 1689, FP: 343, TN: 2927, FN: 2445
Accuracy: 0.623, Precision: 0.831, Recall: 0.409, F1: 0.548

Error and rates by Location:
 shape: (3, 12)
┌──────────┬───────┬──────┬─────┬──────┬──────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ Location ┆ total ┆ tp   ┆ fp  ┆ fn   ┆ tn   ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---      ┆ ---   ┆ ---  ┆ --- ┆ ---  ┆ ---  ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str      ┆ u32   ┆ u32  ┆ u32 ┆ u32  ┆ u32  ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞══════════╪═══════╪══════╪═════╪══════╪══════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ CENTER   ┆ 1001  ┆ 243  ┆ 52  ┆ 315  ┆ 391  ┆ 1001         ┆ 0.051948 ┆ 0.314685 ┆ 0.435484                ┆ 0.553

In [40]:
print_report(
    result=res_dl,
    df_population=df_cv,
    reference_col="LONG",
    group_col="length",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 1689, FP: 343, TN: 2927, FN: 2445
Accuracy: 0.623, Precision: 0.831, Recall: 0.409, F1: 0.548

Error and rates by length:
 shape: (3, 12)
┌────────┬───────┬─────┬─────┬──────┬──────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ length ┆ total ┆ tp  ┆ fp  ┆ fn   ┆ tn   ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---    ┆ ---   ┆ --- ┆ --- ┆ ---  ┆ ---  ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str    ┆ u32   ┆ u32 ┆ u32 ┆ u32  ┆ u32  ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞════════╪═══════╪═════╪═════╪══════╪══════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ LONG   ┆ 3789  ┆ 901 ┆ 161 ┆ 1470 ┆ 1257 ┆ 3789         ┆ 0.042491 ┆ 0.387965 ┆ 0.380008                ┆ 0.460946        ┆ 1.0    

### Bias detection for Language Skill

Just as we applied a simple presence check for driving licenses, we handle languages with a more granular, ad-hoc normalization and extraction pipeline that recognizes each specific language individually rather than merely flagging “has any language”.

Main **steps** for doing this are:

* Build a reverse lookup map (`_reverse_language_map`) by iterating over each language code in `LANGUAGE_VARIANTS` (populated from pycountry with English name variants `alpha_2`) and all its known name variants, storing entries like "english" -> "en", "italian" -> "it", etc.
* Apply `norm_languages`to every extracted mention from our regex based extractor so that each occurrence (like "English B2” is mapped to a clean ISO code ("en").
* Once all language mentions have been normalized to ISO codes via `norm_languages`, we invoke the coverage routine to quantify how well the parser matches our “ground truth” extractions.

In [41]:
res_lg = compute_candidate_coverage(
    df_cv=df_cv,
    df_parser=df_skills,
    skill_type="Language_Skill",
    extractor=extract_languages,
    norm=norm_languages,
)

print("Confusion matrix:", res_lg.conf)

  0%|          | 0/7404 [00:00<?, ?it/s]

Regex positive candidates        : 6387
Parser positive unique candidates: 6500
- Both regex & parser   : 5615
- Only regex            : 772
- Only parser           : 885

Confusion matrix: 
Counts   -> TP:10389  FP:2392  TN:132    FN:5438 
Metrics  -> Precision:0.81  Recall:0.656  F1:0.726  Acc:0.573


In [42]:
df_fn = pl.DataFrame(res_lg.fn_rows)
sample = df_fn.sample(n=5, shuffle=True)
for row in sample.to_dicts():
    print_highlighted_cv(row, pattern=languages_pattern_eng)


CANDIDATE ID: 6976698 - GENERE: Male
Reason: Rule-based extractor found skill but parser missed it.
--------------------------------------------------------------------------------
c operations to conduct chemical and biochemical analyses and know how to perform them nationality: [31mitalian[0m. - use of laboratory instruments for chemical and biochemical analyses. - define environmental, bio
r, soil, air, waste. a1, b - draft technical reports and document individual and group activities. -[31mEnglish[0m language professional experiences: intermediate -office package intern as a chemical technician vid
--------------------------------------------------------------------------------

CANDIDATE ID: 120664 - GENERE: Male
Reason: Rule-based extractor found skill but parser missed it.
--------------------------------------------------------------------------------
lephone ***/***.**.** ************ e-mail *********@*****.** – ********.*********@***.**********.** [31mItalian[0m citiz

In [43]:
print(f"False negatives matching snippet pattern: {df_fn.height}")
df_fn.write_csv(LANGUAGE_SKILL_FALSE_NEGATIVES_PATH, separator=";")
print("Saved filtered false negatives to false_negative.csv")

False negatives matching snippet pattern: 5438
Saved filtered false negatives to false_negative.csv


In [44]:
%%bash --bg 
cd ..
# for Unix users
.venv/bin/python -m streamlit run hiring_cv_bias/bias_detection/rule_based/app/fn_app.py

# for Windows users
#.venv/Scripts/python.exe -m streamlit run hiring_cv_bias/bias_detection/rule_based/app/fn_app.py

In [45]:
print_report(
    result=res_lg,
    df_population=df_cv,
    reference_col="Male",
    group_col="Gender",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 10389, FP: 2392, TN: 132, FN: 5438
Accuracy: 0.573, Precision: 0.813, Recall: 0.656, F1: 0.726

Error and rates by Gender:
 shape: (2, 12)
┌────────┬───────┬──────┬──────┬──────┬─────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ Gender ┆ total ┆ tp   ┆ fp   ┆ fn   ┆ tn  ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---    ┆ ---   ┆ ---  ┆ ---  ┆ ---  ┆ --- ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str    ┆ u32   ┆ u32  ┆ u32  ┆ u32  ┆ u32 ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞════════╪═══════╪══════╪══════╪══════╪═════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ Male   ┆ 3984  ┆ 5420 ┆ 1251 ┆ 2608 ┆ 77  ┆ 9356         ┆ 0.133711 ┆ 0.278752 ┆ 0.675137                ┆ 0.028678        ┆ 

In [46]:
print_report(
    result=res_lg,
    df_population=df_cv,
    reference_col="NORTH",
    group_col="Location",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 10389, FP: 2392, TN: 132, FN: 5438
Accuracy: 0.573, Precision: 0.813, Recall: 0.656, F1: 0.726

Error and rates by Location:
 shape: (3, 12)
┌──────────┬───────┬──────┬──────┬──────┬─────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ Location ┆ total ┆ tp   ┆ fp   ┆ fn   ┆ tn  ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---      ┆ ---   ┆ ---  ┆ ---  ┆ ---  ┆ --- ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str      ┆ u32   ┆ u32  ┆ u32  ┆ u32  ┆ u32 ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞══════════╪═══════╪══════╪══════╪══════╪═════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ CENTER   ┆ 1001  ┆ 1431 ┆ 343  ┆ 733  ┆ 10  ┆ 2517         ┆ 0.136273 ┆ 0.29122  ┆ 0.661275                ┆ 0.01

In [47]:
print_report(
    result=res_lg,
    df_population=df_cv,
    reference_col="LONG",
    group_col="length",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 10389, FP: 2392, TN: 132, FN: 5438
Accuracy: 0.573, Precision: 0.813, Recall: 0.656, F1: 0.726

Error and rates by length:
 shape: (3, 12)
┌────────┬───────┬──────┬──────┬──────┬─────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ length ┆ total ┆ tp   ┆ fp   ┆ fn   ┆ tn  ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---    ┆ ---   ┆ ---  ┆ ---  ┆ ---  ┆ --- ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str    ┆ u32   ┆ u32  ┆ u32  ┆ u32  ┆ u32 ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞════════╪═══════╪══════╪══════╪══════╪═════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ LONG   ┆ 3789  ┆ 5715 ┆ 925  ┆ 3371 ┆ 43  ┆ 10054        ┆ 0.092003 ┆ 0.335289 ┆ 0.62899                 ┆ 0.012595        ┆ 

### Bias detection for Job Title

Loading list of jobs from [ESCO](https://esco.ec.europa.eu/en/about-esco) and filtering out those that are too specific (length > 3).

In [48]:
df_skills_cleaned = df_skills.with_columns(
    pl.col("Skill")
    .str.to_lowercase()
    .str.replace_all("(m/f)", "", literal=True)
    .str.strip_chars()
    .alias("Skill")
)

The **bias detection pipeline for job titles** consists of two main components:

* **JobParser**: a class that extracts job experiences listed in [ESCO](https://esco.ec.europa.eu/en/about-esco) from raw CV texts, using SpaCy's `PhraseMatcher`.

* **SemanticMatcher**: once job experiences are extracted, the `SemanticMatcher` (`all-MiniLM-L6-v2`) is then used exclusively to determine which of these experiences match the parser extracted skills, employing semantic embeddings. Pairwise cosine similarity is calculated between the embeddings of **JobParser** skills and Parser skills. Matches are established when this similarity exceeds a specified threshold. This matching step is used only to compute metrics.


In [49]:
parser = JobParser(normalized_jobs)
matcher = SemanticMatcher()

In [50]:
res_job = compute_candidate_coverage(
    df_cv,
    df_skills_cleaned,
    "Job_title",
    parser.parse_with_n_grams,
    matcher=matcher.semantic_comparison,
)

print("Confusion matrix:", res_job.conf)

  0%|          | 0/7404 [00:00<?, ?it/s]

Regex positive candidates        : 5614
Parser positive unique candidates: 6763
- Both regex & parser   : 5299
- Only regex            : 315
- Only parser           : 1464

Confusion matrix: 
Counts   -> TP:7838   FP:14976 TN:326    FN:6394 
Metrics  -> Precision:0.34  Recall:0.551  F1:0.423  Acc:0.276


In [51]:
df_fn = pl.DataFrame(res_job.fn_rows)
sample = df_fn.sample(n=5, shuffle=True)
for row in sample.to_dicts():
    print_highlighted_cv(row, pattern=jobs_pattern)


CANDIDATE ID: 164547 - GENERE: Male
Reason: Rule-based extractor found skill but parser missed it.
--------------------------------------------------------------------------------
rth havana - cuba with regular residence permit let me introduce myself, I have had experience as a [31mwarehouse worker[0m: picking, reception, loading and unloading goods, checking and issuing transport documents, sorting
 oficina del historiador havana - cuba type of employment employee main duties and responsibilities [31mcarriage driver[0m and tour guide date 2004 - 2005 name and address of employer partagas - tobacco factory havana - cu
griculture type of employment employee main duties and responsibilities clerk: customer service and [31mwarehouse worker[0m education date november - december 2018 name and type of educational institution or - personal trai
ehouse worker education date november - december 2018 name and type of educational institution or - [31mpersonal trainer[0m training - formi

In [52]:
print(f"False negatives matching snippet pattern: {df_fn.height}")
df_fn.write_csv(JOB_TITLE_FALSE_NEGATIVES_PATH, separator=";")
print("Saved filtered false negatives to false_negative.csv")

False negatives matching snippet pattern: 6394
Saved filtered false negatives to false_negative.csv


In [53]:
%%bash --bg 
cd ..

# for Unix users
.venv/bin/python -m streamlit run hiring_cv_bias/bias_detection/rule_based/app/fn_app.py

# for Windows users
#.venv/Scripts/python.exe -m streamlit run hiring_cv_bias/bias_detection/rule_based/app/fn_app.py

In [54]:
print_report(
    result=res_job,
    df_population=df_cv,
    reference_col="Male",
    group_col="Gender",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 7838, FP: 14976, TN: 326, FN: 6394
Accuracy: 0.276, Precision: 0.344, Recall: 0.551, F1: 0.423

Error and rates by Gender:
 shape: (2, 12)
┌────────┬───────┬──────┬──────┬──────┬─────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ Gender ┆ total ┆ tp   ┆ fp   ┆ fn   ┆ tn  ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---    ┆ ---   ┆ ---  ┆ ---  ┆ ---  ┆ --- ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str    ┆ u32   ┆ u32  ┆ u32  ┆ u32  ┆ u32 ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞════════╪═══════╪══════╪══════╪══════╪═════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ Female ┆ 3420  ┆ 4265 ┆ 6534 ┆ 3273 ┆ 111 ┆ 14183        ┆ 0.460692 ┆ 0.230769 ┆ 0.5658                  ┆ 0.032801        ┆ 

In [55]:
print_report(
    result=res_job,
    df_population=df_cv,
    reference_col="NORTH",
    group_col="Location",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 7838, FP: 14976, TN: 326, FN: 6394
Accuracy: 0.276, Precision: 0.344, Recall: 0.551, F1: 0.423

Error and rates by Location:
 shape: (3, 12)
┌──────────┬───────┬──────┬───────┬──────┬─────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ Location ┆ total ┆ tp   ┆ fp    ┆ fn   ┆ tn  ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---      ┆ ---   ┆ ---  ┆ ---   ┆ ---  ┆ --- ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str      ┆ u32   ┆ u32  ┆ u32   ┆ u32  ┆ u32 ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞══════════╪═══════╪══════╪═══════╪══════╪═════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ CENTER   ┆ 1001  ┆ 1211 ┆ 2130  ┆ 938  ┆ 36  ┆ 4315         ┆ 0.493627 ┆ 0.217381 ┆ 0.563518                

In [56]:
print_report(
    result=res_job,
    df_population=df_cv,
    reference_col="LONG",
    group_col="length",
    metrics=[
        "equality_of_opportunity",
        "calibration_npv",
    ],
)

TP: 7838, FP: 14976, TN: 326, FN: 6394
Accuracy: 0.276, Precision: 0.344, Recall: 0.551, F1: 0.423

Error and rates by length:
 shape: (3, 12)
┌────────┬───────┬──────┬──────┬──────┬─────┬──────────────┬──────────┬──────────┬─────────────────────────┬─────────────────┬──────────────────┐
│ length ┆ total ┆ tp   ┆ fp   ┆ fn   ┆ tn  ┆ total_skills ┆ fp_rate  ┆ fn_rate  ┆ equality_of_opportunity ┆ calibration_npv ┆ disparate_impact │
│ ---    ┆ ---   ┆ ---  ┆ ---  ┆ ---  ┆ --- ┆ ---          ┆ ---      ┆ ---      ┆ ---                     ┆ ---             ┆ ---              │
│ str    ┆ u32   ┆ u32  ┆ u32  ┆ u32  ┆ u32 ┆ u32          ┆ f64      ┆ f64      ┆ f64                     ┆ f64             ┆ f64              │
╞════════╪═══════╪══════╪══════╪══════╪═════╪══════════════╪══════════╪══════════╪═════════════════════════╪═════════════════╪══════════════════╡
│ LONG   ┆ 3789  ┆ 5212 ┆ 9071 ┆ 4065 ┆ 62  ┆ 18410        ┆ 0.492721 ┆ 0.220804 ┆ 0.56182                 ┆ 0.015023        ┆ 