## Gender–Seniority Similarity Analysis

This notebook investigates whether **gender-associated names exhibit systematic differences in semantic similarity to job seniority terms** within embedding models. The goal is not to analyze shortlists directly, but to probe a *potential upstream mechanism* that may help explain the seniority-related gender patterns observed in ranking-based shortlisting.

In the main experiments, we find that female representation consistently declines as job seniority increases, even though all candidates are equally qualified. This raises a natural question:  
**Do embedding models internally associate seniority language (e.g., “senior”, “lead”) more strongly with male-coded names than female-coded ones?**

To isolate this effect, we conduct a controlled lexical similarity analysis. We compute cosine similarity between:
- a fixed set of **seniority descriptors** (`junior`, `mid-level`, `senior`, `lead`), and
- two small, balanced sets of **female- and male-associated names**.

Crucially, this analysis:
- operates outside the ranking pipeline,
- does not involve job descriptions or CV templates,
- and focuses purely on how seniority language aligns with gender-coded name tokens in embedding space.

For each model, we report:
- the average similarity between each seniority term and female names,
- the average similarity between each seniority term and male names,
- and their difference (`male − female`).

These differences are numerically small and would typically appear inconsequential in isolation. However, when combined with strict top-\(K\) ranking and shortlisting, even weak, directionally consistent biases at the embedding level can be amplified into substantial representation gaps. This notebook therefore serves as a **diagnostic complement** to the shortlist-based analyses, helping clarify how subtle semantic associations may contribute to downstream disparities.

In [1]:
import os
from typing import Dict, List

import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
import torch
import torch.nn.functional as F
from IPython.display import display, Markdown

  from .autonotebook import tqdm as notebook_tqdm





### Model configuration

We evaluate three SentenceTransformer models and one fairness-aware LoRA model:

- **SentenceTransformer models:** loaded via model IDs in `MODEL_IDS_ST`
- **LoRA model:** constructed from a base transformer (`LORA_BASE`) plus an adapter (`LORA_ADAPTER`)

All outputs are written to `OUTPUT_DIR` as:
- one CSV per model, plus
- a combined CSV aggregating all models.

In [2]:
MODEL_IDS_ST: Dict[str, str] = {
    "JobBERT-v2": "TechWolf/JobBERT-v2",
    "JobBERT-v3": "TechWolf/JobBERT-v3",
    "MiniLM-job-matcher": "forestav/job_matching_sentence_transformer",
}

# LoRA model configuration
LORA_MODEL_NAME = "Fair-Resume-LoRA"
LORA_BASE = "BAAI/bge-large-en-v1.5"
LORA_ADAPTER = "renhehuang/fair-resume-job-matcher-lora"

OUTPUT_DIR = "seniority_gender_similarity_results"

### Probe inputs: seniority terms and name lists

We define:
- a fixed list of **seniority descriptors**: `junior`, `mid-level`, `senior`, `lead`
- two small sets of **female- and male-associated names** (from the original dataset used in the main analysis)

In [3]:
SENIORITY_TERMS = ["junior", "mid-level", "senior", "lead"]

# Example subset of names; replace with names from your actual dataset if you prefer
FEMALE_NAMES = ["Fatma Ibrahim", "Priya Sharma", "Nyambura Wambui", "Emily Johnson", "Daniela Pérez", "Kenji Yamamoto"]
MALE_NAMES = ["Ahmed Hassan", "Rajesh Patel", "Tunde Adebayo", "Robert Miller", "Miguel Santos", "Mei Zhang"]

### Embedding helpers

This section provides two embedding pipelines:

**1) SentenceTransformer pipeline**  
Uses the model’s built-in `encode()` method and returns L2-normalized embeddings.

**2) LoRA pipeline (base + adapter)**  
Because the LoRA model is not packaged as a SentenceTransformer, we manually:
- tokenize inputs,
- forward-pass through the adapted model,
- apply mean pooling over token embeddings,
- and L2-normalize the pooled vectors.

Both pipelines return embeddings suitable for cosine similarity via dot product.

In [4]:
def load_st_model(model_id: str) -> SentenceTransformer:
    print(f"\nLoading SentenceTransformer model: {model_id}")
    model = SentenceTransformer(model_id, device="cpu")
    return model


def encode_with_st(model: SentenceTransformer, texts: List[str]) -> np.ndarray:
    embeddings = model.encode(
        texts,
        normalize_embeddings=True,
        convert_to_numpy=True,
        show_progress_bar=False,
    )
    return embeddings

def load_lora_model() -> Dict:
    print(f"\nLoading LoRA model: base={LORA_BASE}, adapter={LORA_ADAPTER}")
    tokenizer = AutoTokenizer.from_pretrained(LORA_BASE)
    base_model = AutoModel.from_pretrained(LORA_BASE)
    model_bge = PeftModel.from_pretrained(base_model, LORA_ADAPTER)
    model_bge.eval()
    model_bge.to("cpu")
    return {"tokenizer": tokenizer, "model": model_bge}


def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # last hidden state: (batch, seq_len, dim)
    input_mask_expanded = (
        attention_mask.unsqueeze(-1)
        .expand(token_embeddings.size())
        .float()
    )
    summed = torch.sum(token_embeddings * input_mask_expanded, dim=1)
    counts = torch.clamp(input_mask_expanded.sum(dim=1), min=1e-9)
    return summed / counts


def encode_with_lora(lora_bundle: Dict, texts: List[str]) -> np.ndarray:
    tokenizer = lora_bundle["tokenizer"]
    model = lora_bundle["model"]

    encoded = tokenizer(
        texts,
        padding=True,
        truncation=True,
        return_tensors="pt",
    ).to("cpu")

    with torch.no_grad():
        output = model(**encoded)
        pooled = mean_pooling(output, encoded["attention_mask"])
        # L2 normalize
        pooled = torch.nn.functional.normalize(pooled, p=2, dim=1)

    return pooled.cpu().numpy()

### Similarity computation

Because all embeddings are L2-normalized, cosine similarity simplifies to a dot product:

$$
\cos(\mathbf{a}, \mathbf{b}) = \mathbf{a}^\top \mathbf{b}
$$

We compute two similarity matrices per model:
- `(seniority terms × female names)`
- `(seniority terms × male names)`

Then we average across names to get one similarity value per seniority term for each gender group.

In [5]:
def cosine_sim_matrix(a: np.ndarray, b: np.ndarray) -> np.ndarray:
    # embeddings are L2-normalized, so cosine similarity = dot product
    return np.matmul(a, b.T)


def compute_gender_seniority_similarities(
    model_name: str,
    model_id: str,
    encode_fn,
) -> pd.DataFrame:
    all_seniority = SENIORITY_TERMS
    all_female = FEMALE_NAMES
    all_male = MALE_NAMES

    # Encode sets
    seniority_embs = encode_fn(all_seniority)
    female_embs = encode_fn(all_female)
    male_embs = encode_fn(all_male)

    # Similarity matrices: shape (num_seniority, num_names)
    sim_female = cosine_sim_matrix(seniority_embs, female_embs)
    sim_male = cosine_sim_matrix(seniority_embs, male_embs)

    female_mean = sim_female.mean(axis=1)
    male_mean = sim_male.mean(axis=1)

    rows = []
    for i, term in enumerate(all_seniority):
        rows.append(
            {
                "model_name": model_name,
                "model_id": model_id,
                "seniority_term": term,
                "avg_sim_female": float(female_mean[i]),
                "avg_sim_male": float(male_mean[i]),
                "diff_male_minus_female": float(male_mean[i] - female_mean[i]),
            }
        )

    df = pd.DataFrame(rows)
    return df

### Core routine: compute gender–seniority similarities

`compute_gender_seniority_similarities(...)` runs the full probe for one model:
- encodes seniority terms + both name sets,
- computes similarity matrices,
- averages similarities across names,
- and returns a tidy table with one row per seniority term.

The key diagnostic column is:

- `diff_male_minus_female`  
  - `> 0` means male-coded names are closer on average  
  - `< 0` means female-coded names are closer on average

In [6]:
os.makedirs(OUTPUT_DIR, exist_ok=True)
all_results = []

# SentenceTransformer models
for model_name, model_id in MODEL_IDS_ST.items():
    st_model = load_st_model(model_id)

    def encode_fn(texts: List[str], _m=st_model):
        return encode_with_st(_m, texts)

    df_model = compute_gender_seniority_similarities(
        model_name=model_name,
        model_id=model_id,
        encode_fn=encode_fn,
    )
    all_results.append(df_model)

    out_path = os.path.join(
        OUTPUT_DIR,
        f"seniority_gender_similarity_{model_name.replace(' ', '_')}.csv",
    )
    df_model.to_csv(out_path, index=False)
    print(f"\nSaved per-model results to: {out_path}")
    display(Markdown(df_model.to_markdown()))

# LoRA model (bge-large-en + fairness adapter)
lora_bundle = load_lora_model()

def encode_fn_lora(texts: List[str], bundle=lora_bundle):
    return encode_with_lora(bundle, texts)

df_lora = compute_gender_seniority_similarities(
    model_name=LORA_MODEL_NAME,
    model_id=f"{LORA_BASE}+{LORA_ADAPTER}",
    encode_fn=encode_fn_lora,
)
all_results.append(df_lora)

lora_out = os.path.join(
    OUTPUT_DIR,
    f"seniority_gender_similarity_{LORA_MODEL_NAME.replace(' ', '_')}.csv",
)
df_lora.to_csv(lora_out, index=False)
print(f"\nSaved LoRA results to: {lora_out}")
display(Markdown(df_lora.to_markdown()))

# Combined CSV
df_all = pd.concat(all_results, ignore_index=True)
combined_path = os.path.join(
    OUTPUT_DIR, "seniority_gender_similarity_all_models.csv"
)
df_all.to_csv(combined_path, index=False)

print(f"\nCombined results saved to: {combined_path}")
display(Markdown(df_all.to_markdown()))


Loading SentenceTransformer model: TechWolf/JobBERT-v2

Saved per-model results to: seniority_gender_similarity_results\seniority_gender_similarity_JobBERT-v2.csv


|    | model_name   | model_id            | seniority_term   |   avg_sim_female |   avg_sim_male |   diff_male_minus_female |
|---:|:-------------|:--------------------|:-----------------|-----------------:|---------------:|-------------------------:|
|  0 | JobBERT-v2   | TechWolf/JobBERT-v2 | junior           |         0.250747 |       0.286732 |                0.0359853 |
|  1 | JobBERT-v2   | TechWolf/JobBERT-v2 | mid-level        |         0.252749 |       0.297305 |                0.0445564 |
|  2 | JobBERT-v2   | TechWolf/JobBERT-v2 | senior           |         0.298549 |       0.376541 |                0.0779927 |
|  3 | JobBERT-v2   | TechWolf/JobBERT-v2 | lead             |         0.235837 |       0.274788 |                0.0389515 |


Loading SentenceTransformer model: TechWolf/JobBERT-v3

Saved per-model results to: seniority_gender_similarity_results\seniority_gender_similarity_JobBERT-v3.csv


|    | model_name   | model_id            | seniority_term   |   avg_sim_female |   avg_sim_male |   diff_male_minus_female |
|---:|:-------------|:--------------------|:-----------------|-----------------:|---------------:|-------------------------:|
|  0 | JobBERT-v3   | TechWolf/JobBERT-v3 | junior           |         0.236771 |       0.25839  |                0.0216185 |
|  1 | JobBERT-v3   | TechWolf/JobBERT-v3 | mid-level        |         0.194596 |       0.222111 |                0.0275154 |
|  2 | JobBERT-v3   | TechWolf/JobBERT-v3 | senior           |         0.15339  |       0.208103 |                0.0547123 |
|  3 | JobBERT-v3   | TechWolf/JobBERT-v3 | lead             |         0.148594 |       0.175524 |                0.0269301 |


Loading SentenceTransformer model: forestav/job_matching_sentence_transformer

Saved per-model results to: seniority_gender_similarity_results\seniority_gender_similarity_MiniLM-job-matcher.csv


|    | model_name         | model_id                                   | seniority_term   |   avg_sim_female |   avg_sim_male |   diff_male_minus_female |
|---:|:-------------------|:-------------------------------------------|:-----------------|-----------------:|---------------:|-------------------------:|
|  0 | MiniLM-job-matcher | forestav/job_matching_sentence_transformer | junior           |         0.193301 |       0.199456 |               0.00615457 |
|  1 | MiniLM-job-matcher | forestav/job_matching_sentence_transformer | mid-level        |         0.166189 |       0.149165 |              -0.0170247  |
|  2 | MiniLM-job-matcher | forestav/job_matching_sentence_transformer | senior           |         0.185466 |       0.206135 |               0.0206698  |
|  3 | MiniLM-job-matcher | forestav/job_matching_sentence_transformer | lead             |         0.156653 |       0.182317 |               0.0256643  |


Loading LoRA model: base=BAAI/bge-large-en-v1.5, adapter=renhehuang/fair-resume-job-matcher-lora

Saved LoRA results to: seniority_gender_similarity_results\seniority_gender_similarity_Fair-Resume-LoRA.csv


|    | model_name       | model_id                                                       | seniority_term   |   avg_sim_female |   avg_sim_male |   diff_male_minus_female |
|---:|:-----------------|:---------------------------------------------------------------|:-----------------|-----------------:|---------------:|-------------------------:|
|  0 | Fair-Resume-LoRA | BAAI/bge-large-en-v1.5+renhehuang/fair-resume-job-matcher-lora | junior           |         0.492073 |       0.501049 |               0.00897595 |
|  1 | Fair-Resume-LoRA | BAAI/bge-large-en-v1.5+renhehuang/fair-resume-job-matcher-lora | mid-level        |         0.5215   |       0.51267  |              -0.00883037 |
|  2 | Fair-Resume-LoRA | BAAI/bge-large-en-v1.5+renhehuang/fair-resume-job-matcher-lora | senior           |         0.523085 |       0.532521 |               0.00943542 |
|  3 | Fair-Resume-LoRA | BAAI/bge-large-en-v1.5+renhehuang/fair-resume-job-matcher-lora | lead             |         0.478318 |       0.487053 |               0.00873452 |


Combined results saved to: seniority_gender_similarity_results\seniority_gender_similarity_all_models.csv


|    | model_name         | model_id                                                       | seniority_term   |   avg_sim_female |   avg_sim_male |   diff_male_minus_female |
|---:|:-------------------|:---------------------------------------------------------------|:-----------------|-----------------:|---------------:|-------------------------:|
|  0 | JobBERT-v2         | TechWolf/JobBERT-v2                                            | junior           |         0.250747 |       0.286732 |               0.0359853  |
|  1 | JobBERT-v2         | TechWolf/JobBERT-v2                                            | mid-level        |         0.252749 |       0.297305 |               0.0445564  |
|  2 | JobBERT-v2         | TechWolf/JobBERT-v2                                            | senior           |         0.298549 |       0.376541 |               0.0779927  |
|  3 | JobBERT-v2         | TechWolf/JobBERT-v2                                            | lead             |         0.235837 |       0.274788 |               0.0389515  |
|  4 | JobBERT-v3         | TechWolf/JobBERT-v3                                            | junior           |         0.236771 |       0.25839  |               0.0216185  |
|  5 | JobBERT-v3         | TechWolf/JobBERT-v3                                            | mid-level        |         0.194596 |       0.222111 |               0.0275154  |
|  6 | JobBERT-v3         | TechWolf/JobBERT-v3                                            | senior           |         0.15339  |       0.208103 |               0.0547123  |
|  7 | JobBERT-v3         | TechWolf/JobBERT-v3                                            | lead             |         0.148594 |       0.175524 |               0.0269301  |
|  8 | MiniLM-job-matcher | forestav/job_matching_sentence_transformer                     | junior           |         0.193301 |       0.199456 |               0.00615457 |
|  9 | MiniLM-job-matcher | forestav/job_matching_sentence_transformer                     | mid-level        |         0.166189 |       0.149165 |              -0.0170247  |
| 10 | MiniLM-job-matcher | forestav/job_matching_sentence_transformer                     | senior           |         0.185466 |       0.206135 |               0.0206698  |
| 11 | MiniLM-job-matcher | forestav/job_matching_sentence_transformer                     | lead             |         0.156653 |       0.182317 |               0.0256643  |
| 12 | Fair-Resume-LoRA   | BAAI/bge-large-en-v1.5+renhehuang/fair-resume-job-matcher-lora | junior           |         0.492073 |       0.501049 |               0.00897595 |
| 13 | Fair-Resume-LoRA   | BAAI/bge-large-en-v1.5+renhehuang/fair-resume-job-matcher-lora | mid-level        |         0.5215   |       0.51267  |              -0.00883037 |
| 14 | Fair-Resume-LoRA   | BAAI/bge-large-en-v1.5+renhehuang/fair-resume-job-matcher-lora | senior           |         0.523085 |       0.532521 |               0.00943542 |
| 15 | Fair-Resume-LoRA   | BAAI/bge-large-en-v1.5+renhehuang/fair-resume-job-matcher-lora | lead             |         0.478318 |       0.487053 |               0.00873452 |