<a href="https://colab.research.google.com/github/luisadosch/Final-Project-snapAddy/blob/main/model6_Bag_of_Words_TF%E2%80%93IDF_%2B_Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Github-Zugangsdaten

In [2]:
# GitHub-Zugangsdaten
import pandas as pd

GH_USER = "luisadosch"
GH_REPO = "Final-Project-snapAddy"
BRANCH = "main"

def get_github_url(relative_path):
    return f"https://raw.githubusercontent.com/{GH_USER}/{GH_REPO}/{BRANCH}/{relative_path}"


jobs_annotated_active_df = pd.read_csv(get_github_url("data/processed/jobs_annotated_active.csv"))

department_df = pd.read_csv(get_github_url("data/raw/department-v2.csv"))

seniority_df = pd.read_csv(get_github_url("data/raw/seniority-v2.csv"))

# 2. Modell Seniority

In [3]:
#Seniority Daten sortieren
sdf = seniority_df.copy()

sdf["text"] = sdf["text"].astype(str).str.lower()
sdf["label"] = sdf["label"].astype(str)

sdf = sdf.dropna(subset=["text", "label"])

Prepare the seniority dataset for modeling. Lowercasing ensures uniform text representation. Dropping missing values prevents errors in the model.

In [4]:
#Seniority Train/Test Split
from sklearn.model_selection import train_test_split

sx = sdf["text"]
sy = sdf["label"]

sx_train, sx_test, sy_train, sy_test = train_test_split(
    sx,
    sy,
    test_size=0.2,
    random_state=42,
    stratify=sy
)

# Print dataset sizes
print("Seniority dataset sizes:")
print("Total:", len(sx))
print("Train:", len(sx_train))
print("Test:", len(sx_test))

Seniority dataset sizes:
Total: 9428
Train: 7542
Test: 1886


Split into training and test sets. stratify ensures rare classes are represented proportionally. The total is 9428, while the train set is 7542 and the test set is 1886.

In [5]:
#Seniority TF–IDF + Logistic Regression Pipeline
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

smodel = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),   # unigrams + bigrams
        min_df=3,
        max_df=0.9
    )),
    ("clf", LogisticRegression(
        max_iter=1000,
        class_weight="balanced"
    ))
])

Pipeline converts job titles into TF–IDF features and applies logistic regression. class_weight="balanced" ensures rare seniority levels get enough importance.

In [6]:
#Seniority Modell trainieren
smodel.fit(sx_train, sy_train)

# Vorhersagen auf Testdaten
sy_pred = smodel.predict(sx_test)

# Accuracy ausgeben
from sklearn.metrics import accuracy_score
print("Accuracy:", accuracy_score(sy_test, sy_pred))

Accuracy: 0.9703075291622482


Train the seniority classifier and generate predictions on the test set. The model achieves a high accuracy of 0.97 on the test set.

In [7]:
#Seniority Evaluation
from sklearn.metrics import f1_score, classification_report

sy_pred = smodel.predict(sx_test)

saccuracy = accuracy_score(sy_test, sy_pred)
smacro_f1 = f1_score(sy_test, sy_pred, average="macro")

print("Accuracy:", round(saccuracy, 3))
print("Macro F1:", round(smacro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(sy_test, sy_pred))

Accuracy: 0.97
Macro F1: 0.956

Classification Report:

              precision    recall  f1-score   support

    Director       0.99      0.98      0.98       197
      Junior       0.85      1.00      0.92        82
        Lead       0.97      0.98      0.98       709
  Management       0.92      0.93      0.92       151
      Senior       0.99      0.97      0.98       747

    accuracy                           0.97      1886
   macro avg       0.94      0.97      0.96      1886
weighted avg       0.97      0.97      0.97      1886



Evaluate using accuracy and macro F1, which accounts for class imbalance. Classification report shows precision, recall, and F1 per seniority level. The evaluation yields an accuracy of 0.97 and a macro F1 score of 0.956, reflecting strong performance across all seniority classes.

In [8]:
# Seniority Evaluation on Annotated ACTIVE Jobs

# Prepare evaluation data
s_eval_df = jobs_annotated_active_df.dropna(subset=["position", "seniority"]).copy()
s_eval_text = s_eval_df["position"].astype(str).str.lower()
s_eval_labels = s_eval_df["seniority"].astype(str)

# Predict seniority
s_eval_pred = smodel.predict(s_eval_text)

# Evaluation metrics
s_eval_accuracy = accuracy_score(s_eval_labels, s_eval_pred)
s_eval_macro_f1 = f1_score(s_eval_labels, s_eval_pred, average="macro")

print("Seniority Evaluation on ACTIVE Jobs")
print("Accuracy:", round(s_eval_accuracy, 3))
print("Macro F1:", round(s_eval_macro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(s_eval_labels, s_eval_pred))

Seniority Evaluation on ACTIVE Jobs
Accuracy: 0.437
Macro F1: 0.409

Classification Report:

              precision    recall  f1-score   support

    Director       0.58      0.88      0.70        34
      Junior       0.18      0.33      0.24        12
        Lead       0.33      0.71      0.45       125
  Management       0.90      0.59      0.72       192
Professional       0.00      0.00      0.00       216
      Senior       0.23      0.80      0.36        44

    accuracy                           0.44       623
   macro avg       0.37      0.55      0.41       623
weighted avg       0.40      0.44      0.38       623



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Evaluates the seniority model on annotated ACTIVE LinkedIn job entries using the job position title as input. The predictions are compared against manually labeled seniority levels, providing a realistic assessment of how well the model generalizes from label-based training data to real-world CV data. The accuracy ist 0.437 and the macro F1 score is 0.409.

In [9]:
#Seniority Top Features
import numpy as np

feature_names = smodel.named_steps["tfidf"].get_feature_names_out()
coefs = smodel.named_steps["clf"].coef_

for i, label in enumerate(smodel.named_steps["clf"].classes_):
    top = np.argsort(coefs[i])[-10:]
    print(f"\nTop words for {label}:")
    print(feature_names[top])


Top words for Director:
['marketing director' 'managing directors' 'managing' 'director marketing'
 'vertriebsdirektor' 'director sales' 'directors' 'sales director'
 'abteilungsdirektor' 'director']

Top words for Junior:
['marketing' 'assistent' 'associate' 'assistentin' 'mitarbeiter'
 'mitarbeiterin' 'referent' 'referentin' 'analyst' 'junior']

Top words for Lead:
['abteilungsleiter' 'projektleiter' 'geschäftsleitung' 'teamleiter'
 'leiterin' 'head of' 'head' 'vertriebsleiter' 'leitung' 'leiter']

Top words for Management:
['cio' 'vice president' 'vice' 'founder' 'chief' 'owner' 'ceo'
 'geschäftsführung' 'vp' 'geschäftsführer']

Top words for Senior:
['marketing manager' 'engineer' 'executive' 'assistant' 'managerin'
 'responsable' 'consultant' 'management' 'senior' 'manager']


Shows the most influential words for predicting each seniority class. Helps interpret the model’s decisions.

3. Modell Department

In [10]:
#Department Daten sortieren
ddf = department_df.copy()

ddf["text"] = ddf["text"].astype(str).str.lower()
ddf["label"] = ddf["label"].astype(str)

ddf = ddf.dropna(subset=["text", "label"])

Prepare department dataset. Lowercasing ensures consistent text representation. Drop missing values.

In [11]:
# Department Train/Test Split

dx = ddf["text"]
dy = ddf["label"]

dx_train, dx_test, dy_train, dy_test = train_test_split(
    dx,
    dy,
    test_size=0.2,
    random_state=42,
    stratify=dy
)

# Print dataset sizes
print("Department dataset sizes:")
print("Total:", len(dx))
print("Train:", len(dx_train))
print("Test:", len(dx_test))

Department dataset sizes:
Total: 10145
Train: 8116
Test: 2029


Split into train/test sets with proportional label distribution. The total is 10145, while the train set is 8116 and the test set is 2029.

In [12]:
# Department TF–IDF + Logistic Regression Pipeline
dmodel = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),   # unigrams + bigrams
        min_df=3,
        max_df=0.9
    )),
    ("clf", LogisticRegression(
        max_iter=1000,
        class_weight="balanced"
    ))
])

Same as seniority pipeline but for department prediction.

In [13]:
# Department Modell trainieren
dmodel.fit(dx_train, dy_train)

# Vorhersagen auf Testdaten
dy_pred = dmodel.predict(dx_test)

# Accuracy ausgeben
print("Accuracy:", accuracy_score(dy_test, dy_pred))

Accuracy: 0.9344504682109414


Train the department classifier and predict on test set. The model achieves a high accuracy of 0.93 in the test set.

In [14]:
# Department Evaluation
dy_pred = dmodel.predict(dx_test)

daccuracy = accuracy_score(dy_test, dy_pred)
dmacro_f1 = f1_score(dy_test, dy_pred, average="macro")

print("Accuracy:", round(daccuracy, 3))
print("Macro F1:", round(dmacro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(dy_test, dy_pred))

Accuracy: 0.934
Macro F1: 0.86

Classification Report:

                        precision    recall  f1-score   support

        Administrative       0.62      0.94      0.74        17
  Business Development       0.83      0.99      0.90       124
            Consulting       0.82      0.97      0.89        33
      Customer Support       0.88      1.00      0.93         7
       Human Resources       0.75      1.00      0.86         6
Information Technology       0.92      0.95      0.94       261
             Marketing       0.99      0.92      0.96       859
                 Other       0.50      1.00      0.67         8
    Project Management       0.57      0.88      0.69        40
            Purchasing       0.89      1.00      0.94         8
                 Sales       0.96      0.93      0.94       666

              accuracy                           0.93      2029
             macro avg       0.79      0.96      0.86      2029
          weighted avg       0.95      0.93   

Accuracy and macro F1 score for department model. Classification report for detailed class-level metrics.
Evaluate using accuracy and macro F1, which accounts for class imbalance. Classification report shows precision, recall, and F1 per department level. The evaluation yields an accuracy of 0.934 and a macro F1 score of 0.86, reflecting strong performance across all department classes.

In [15]:
# Department Evaluation on Annotated ACTIVE Jobs

# Prepare evaluation data
d_eval_df = jobs_annotated_active_df.dropna(subset=["position", "department"]).copy()
d_eval_text = d_eval_df["position"].astype(str).str.lower()
d_eval_labels = d_eval_df["department"].astype(str)

# Predict department
d_eval_pred = dmodel.predict(d_eval_text)

# Evaluation metrics
d_eval_accuracy = accuracy_score(d_eval_labels, d_eval_pred)
d_eval_macro_f1 = f1_score(d_eval_labels, d_eval_pred, average="macro")

print("Department Evaluation on ACTIVE Jobs")
print("Accuracy:", round(d_eval_accuracy, 3))
print("Macro F1:", round(d_eval_macro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(d_eval_labels, d_eval_pred))

Department Evaluation on ACTIVE Jobs
Accuracy: 0.223
Macro F1: 0.338

Classification Report:

                        precision    recall  f1-score   support

        Administrative       0.17      0.07      0.10        14
  Business Development       0.38      0.30      0.33        20
            Consulting       0.86      0.46      0.60        39
      Customer Support       1.00      0.17      0.29         6
       Human Resources       0.73      0.50      0.59        16
Information Technology       0.31      0.44      0.36        62
             Marketing       0.17      0.41      0.24        22
                 Other       0.00      0.00      0.00       344
    Project Management       0.27      0.56      0.37        39
            Purchasing       0.80      0.53      0.64        15
                 Sales       0.12      0.85      0.21        46

              accuracy                           0.22       623
             macro avg       0.44      0.39      0.34       623
        

Evaluates the department model on annotated ACTIVE LinkedIn job entries based on the job position title. Evaluating on manually labeled CV data allows for assessing the model’s robustness and applicability in realistic LinkedIn profile scenarios. The accuracy is 0.223 and the macro F1 score is 0.338.

In [16]:
# Department Top Features

feature_names = dmodel.named_steps["tfidf"].get_feature_names_out()
coefs = dmodel.named_steps["clf"].coef_

for i, label in enumerate(dmodel.named_steps["clf"].classes_):
    top = np.argsort(coefs[i])[-10:]
    print(f"\nTop words for {label}:")
    print(feature_names[top])


Top words for Administrative:
['assistentin des' 'geschäftsführung' 'gf' 'assistent der'
 'geschäftsleitung' 'der' 'sekretärin' 'assistent' 'assistenz'
 'assistentin']

Top words for Business Development:
['business intelligence' 'crm' 'digital business' 'of business'
 'business process' 'ebusiness' 'it business' 'development'
 'business development' 'business']

Top words for Consulting:
['senior berater' 'sap' 'coach' 'senior consultant' 'von' 'senior'
 'recruitment' 'beraterin' 'berater' 'consultant']

Top words for Customer Support:
['service and' 'it systems' 'customer' 'technical' 'it support' 'it'
 'customer support' 'technical support' 'supporter' 'support']

Top words for Human Resources:
['qualitätsmanagement' 'director digital' 'project director' 'gl'
 'manager hr' 'of human' 'resources' 'human resources' 'human' 'hr']

Top words for Information Technology:
['digitalization' 'administrator' 'entwickler' 'digitale' 'administration'
 'digitalisierung' 'sap' 'digital' 'it' 'cr

Shows the top words for each department class to interpret the model.

In [17]:
# Comparison: Training Evaluation vs ACTIVE Job Evaluation

comparison_metrics = pd.DataFrame({
    "Target": [
        "Seniority (Label Data)",
        "Department (Label Data)",
        "Seniority (ACTIVE Jobs)",
        "Department (ACTIVE Jobs)"
    ],
    "Accuracy": [
        saccuracy,
        daccuracy,
        s_eval_accuracy,
        d_eval_accuracy
    ],
    "Macro F1": [
        smacro_f1,
        dmacro_f1,
        s_eval_macro_f1,
        d_eval_macro_f1
    ]
})

print("Model Comparison:\n")
print(comparison_metrics)


#Top Features per Label

def print_top_features(model, n=5):
    feature_names = model.named_steps["tfidf"].get_feature_names_out()
    coefs = model.named_steps["clf"].coef_
    for i, label in enumerate(model.named_steps["clf"].classes_):
        top = np.argsort(coefs[i])[-n:]
        print(f"\nTop {n} words for {label}:")
        print(feature_names[top])

print("\n--- Seniority Top Features ---")
print_top_features(smodel)

print("\n--- Department Top Features ---")
print_top_features(dmodel)

Model Comparison:

                     Target  Accuracy  Macro F1
0    Seniority (Label Data)  0.970308  0.956030
1   Department (Label Data)  0.934450  0.860444
2   Seniority (ACTIVE Jobs)  0.436597  0.409319
3  Department (ACTIVE Jobs)  0.223114  0.338219

--- Seniority Top Features ---

Top 5 words for Director:
['director sales' 'directors' 'sales director' 'abteilungsdirektor'
 'director']

Top 5 words for Junior:
['mitarbeiterin' 'referent' 'referentin' 'analyst' 'junior']

Top 5 words for Lead:
['head of' 'head' 'vertriebsleiter' 'leitung' 'leiter']

Top 5 words for Management:
['owner' 'ceo' 'geschäftsführung' 'vp' 'geschäftsführer']

Top 5 words for Senior:
['responsable' 'consultant' 'management' 'senior' 'manager']

--- Department Top Features ---

Top 5 words for Administrative:
['der' 'sekretärin' 'assistent' 'assistenz' 'assistentin']

Top 5 words for Business Development:
['ebusiness' 'it business' 'development' 'business development' 'business']

Top 5 words for Consul

The comparison table summarizes model performance across two evaluation settings with concrete quantitative results. On the label-based datasets, the seniority classifier achieves a very high accuracy of 0.97 with a macro F1 score of 0.96, while the department classifier reaches an accuracy of 0.93 and a macro F1 score of 0.86. These results indicate that both TF–IDF + logistic regression models perform extremely well when trained and evaluated on curated label data.

When evaluated on annotated ACTIVE job entries from real CV data, performance drops substantially. Seniority prediction achieves an accuracy of 0.44 and a macro F1 score of 0.41, while department prediction performs considerably worse with an accuracy of 0.22 and a macro F1 score of 0.34. This sharp decline highlights a strong domain shift between clean label data and real-world job titles, which are shorter, noisier, more ambiguous, and often lack explicit domain or seniority cues.

The observed performance gap confirms that while simple bag-of-words baselines are effective on controlled datasets, they struggle to generalize to realistic CV data.

# 4. Modell Seniority mit synthetic Daten

In [18]:
ORD_MAP = {
    "Junior": 1.0,
    "Professional": 2.0,
    "Senior": 3.0,
    "Lead": 4.0,
    "Management": 5.0,
    "Director": 6.0,
}
INV_ORD = {v: k for k, v in ORD_MAP.items()}

In [19]:
def add_synthetic(train_df: pd.DataFrame, synthetic_csv_relpath: str) -> pd.DataFrame:
    syn = pd.read_csv(get_github_url(synthetic_csv_relpath))
    syn = syn[["position", "seniority"]].copy()

    id2label = {v: k for k, v in ORD_MAP.items()}
    syn["label"] = syn["seniority"].map(id2label)
    syn = syn.rename(columns={"position": "text"})
    syn = syn.dropna(subset=["text", "label"])

    out = pd.concat([train_df[["text", "label"]], syn[["text", "label"]]], ignore_index=True)
    return out

In [20]:
strain_df_aug = add_synthetic(sdf, "data/results/gemini_synthetic.csv")
strain_df_aug

Unnamed: 0,text,label
0,analyst,Junior
1,analyste financier,Junior
2,anwendungstechnischer mitarbeiter,Junior
3,application engineer,Senior
4,applications engineer,Senior
...,...,...
11309,Juristischer Berater,Professional
11310,"Leitung Personal, Finanzen, Einkauf, IT | Folk...",Management
11311,Verwaltungsleitung Landesspracheninstitut in d...,Management
11312,"Leitung Gebäudemanagement, Einkauf und Control...",Management


In [21]:
#Seniority Train/Test Split mit synthetic Daten
ssx = strain_df_aug["text"]
ssy = strain_df_aug["label"]

ssx_train, ssx_test, ssy_train, ssy_test = train_test_split(
    ssx,
    ssy,
    test_size=0.2,
    random_state=42,
    stratify=ssy
)

# Print dataset sizes
print("Seniority dataset sizes:")
print("Total:", len(ssx))
print("Train:", len(ssx_train))
print("Test:", len(ssx_test))

Seniority dataset sizes:
Total: 11314
Train: 9051
Test: 2263


In [22]:
smodel_syn = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),
        min_df=3,
        max_df=0.9
    )),
    ("clf", LogisticRegression(
        max_iter=1000,
        class_weight="balanced"
    ))
])

smodel_syn.fit(ssx_train, ssy_train)

In [23]:
#Seniority Modell trainieren mit synthetic Daten
smodel_syn.fit(ssx_train, ssy_train)

# Vorhersagen auf Testdaten
ssy_pred = smodel_syn.predict(ssx_test)

# Accuracy ausgeben
print("Accuracy:", accuracy_score(ssy_test, ssy_pred))

Accuracy: 0.8709677419354839


In [24]:
#Seniority Evaluation mit synthetic Daten
ssy_pred = smodel_syn.predict(ssx_test)

ssaccuracy = accuracy_score(ssy_test, ssy_pred)
ssmacro_f1 = f1_score(ssy_test, ssy_pred, average="macro")

print("Accuracy:", round(ssaccuracy, 3))
print("Macro F1:", round(ssmacro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(ssy_test, ssy_pred))

Accuracy: 0.871
Macro F1: 0.811

Classification Report:

              precision    recall  f1-score   support

    Director       0.98      0.89      0.93       242
      Junior       0.81      0.75      0.78       165
        Lead       0.95      0.92      0.93       739
  Management       0.77      0.79      0.78       257
Professional       0.41      0.84      0.55        83
      Senior       0.92      0.88      0.90       777

    accuracy                           0.87      2263
   macro avg       0.81      0.84      0.81      2263
weighted avg       0.89      0.87      0.88      2263



In [25]:
# Seniority Evaluation on Annotated ACTIVE Jobs mit synthetic Daten

# Prepare evaluation data
ss_eval_df = jobs_annotated_active_df.dropna(subset=["position", "seniority"]).copy()
ss_eval_text = ss_eval_df["position"].astype(str).str.lower()
ss_eval_labels = ss_eval_df["seniority"].astype(str)

# Predict seniority
ss_eval_pred = smodel_syn.predict(ss_eval_text)

# Evaluation metrics

ss_eval_accuracy = accuracy_score(ss_eval_labels, ss_eval_pred)
ss_eval_macro_f1 = f1_score(ss_eval_labels, ss_eval_pred, average="macro")

print("Seniority Evaluation on ACTIVE Jobs")
print("Accuracy:", round(ss_eval_accuracy, 3))
print("Macro F1:", round(ss_eval_macro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(ss_eval_labels, ss_eval_pred))

Seniority Evaluation on ACTIVE Jobs
Accuracy: 0.645
Macro F1: 0.571

Classification Report:

              precision    recall  f1-score   support

    Director       0.50      0.85      0.63        34
      Junior       0.17      0.58      0.26        12
        Lead       0.86      0.50      0.63       125
  Management       0.85      0.72      0.78       192
Professional       0.69      0.61      0.64       216
      Senior       0.35      0.77      0.48        44

    accuracy                           0.65       623
   macro avg       0.57      0.67      0.57       623
weighted avg       0.73      0.65      0.66       623



In [26]:
# Comparison: Baseline vs Synthetic – Training & ACTIVE Job Evaluation

comparison_metrics_seniority = pd.DataFrame({
    "Target": [
        "Seniority (Label Data – no Synthetic)",
        "Seniority (Label Data – with Synthetic)",
        "Seniority (ACTIVE Jobs – no Synthetic)",
        "Seniority (ACTIVE Jobs – with Synthetic)",
    ],
    "Accuracy": [
        saccuracy,        # Seniority baseline – label data
        ssaccuracy,       # Seniority synthetic – label data
        s_eval_accuracy, # Seniority baseline – ACTIVE jobs
        ss_eval_accuracy,# Seniority synthetic – ACTIVE jobs
    ],
    "Macro F1": [
        smacro_f1,
        ssmacro_f1,
        s_eval_macro_f1,
        ss_eval_macro_f1,
    ]
})

print("Model Comparison:\n")
print(comparison_metrics_seniority)


Model Comparison:

                                     Target  Accuracy  Macro F1
0     Seniority (Label Data – no Synthetic)  0.970308  0.956030
1   Seniority (Label Data – with Synthetic)  0.870968  0.811299
2    Seniority (ACTIVE Jobs – no Synthetic)  0.436597  0.409319
3  Seniority (ACTIVE Jobs – with Synthetic)  0.645265  0.571373


# 5. Modell Department mit synthetic Daten

In [27]:
def add_synthetic_department(train_df: pd.DataFrame, synthetic_csv_relpath: str) -> pd.DataFrame:
    syn = pd.read_csv(get_github_url(synthetic_csv_relpath))

    # expect columns: position, department
    syn = syn[["position", "department"]].copy()
    syn = syn.rename(columns={"position": "text", "department": "label"})
    syn = syn.dropna(subset=["text", "label"])

    out = pd.concat([train_df[["text", "label"]], syn[["text", "label"]]], ignore_index=True)
    return out

In [28]:
dtrain_df_aug = add_synthetic_department(ddf, "data/results/gemini_synthetic.csv")
dtrain_df_aug

Unnamed: 0,text,label
0,adjoint directeur communication,Marketing
1,advisor strategy and projects,Project Management
2,beratung & projekte,Project Management
3,beratung & projektmanagement,Project Management
4,beratung und projektmanagement kommunale partner,Project Management
...,...,...
12026,Juristischer Berater,Consulting
12027,"Leitung Personal, Finanzen, Einkauf, IT | Folk...",Human Resources
12028,Verwaltungsleitung Landesspracheninstitut in d...,Administrative
12029,"Leitung Gebäudemanagement, Einkauf und Control...",Purchasing


In [29]:
#Department Train/Test Split mit synthetic Daten
dsx = dtrain_df_aug["text"]
dsy = dtrain_df_aug["label"]

dsx_train, dsx_test, dsy_train, dsy_test = train_test_split(
    dsx,
    dsy,
    test_size=0.2,
    random_state=42,
    stratify=dsy
)

# Print dataset sizes
print("Department dataset sizes:")
print("Total:", len(dsx))
print("Train:", len(dsx_train))
print("Test:", len(dsx_test))


Department dataset sizes:
Total: 12031
Train: 9624
Test: 2407


In [30]:
# Department TF–IDF + Logistic Regression Pipeline
dmodel_syn = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),
        min_df=3,
        max_df=0.9
    )),
    ("clf", LogisticRegression(
        max_iter=1000,
        class_weight="balanced"
    ))
])

# Train the Department model on synthetic training data
dmodel_syn.fit(dsx_train, dsy_train)


In [31]:
# Department Modell trainieren mit synthetic Daten
dmodel_syn.fit(dsx_train, dsy_train)

# Vorhersagen auf Testdaten
dsy_pred = dmodel_syn.predict(dsx_test)

# Accuracy ausgeben
print("Department Accuracy:", accuracy_score(dsy_test, dsy_pred))

Department Accuracy: 0.8878271707519734


In [32]:
# Department Evaluation mit synthetic Daten

# Vorhersagen auf Testdaten
dsy_pred = dmodel_syn.predict(dsx_test)

# Evaluation metrics
dsaccuracy = accuracy_score(dsy_test, dsy_pred)
dsmacro_f1 = f1_score(dsy_test, dsy_pred, average="macro")

print("Department Accuracy:", round(dsaccuracy, 3))
print("Department Macro F1:", round(dsmacro_f1, 3))
print("\nClassification Report (Department):\n")
print(classification_report(dsy_test, dsy_pred))

Department Accuracy: 0.888
Department Macro F1: 0.779

Classification Report (Department):

                        precision    recall  f1-score   support

        Administrative       0.60      0.72      0.66        36
  Business Development       0.86      0.86      0.86       152
            Consulting       0.76      0.86      0.81        59
      Customer Support       0.54      0.93      0.68        15
       Human Resources       0.62      0.68      0.65        22
Information Technology       0.91      0.88      0.89       296
             Marketing       0.99      0.92      0.95       879
                 Other       0.57      0.91      0.70       164
    Project Management       0.60      0.81      0.69        64
            Purchasing       0.71      0.75      0.73        20
                 Sales       0.99      0.88      0.93       700

              accuracy                           0.89      2407
             macro avg       0.74      0.84      0.78      2407
          

In [33]:
# Department Evaluation on Annotated ACTIVE Jobs mit synthetic Daten

# Prepare evaluation data
ds_eval_df = jobs_annotated_active_df.dropna(subset=["position", "department"]).copy()
ds_eval_text = ds_eval_df["position"].astype(str).str.lower()
ds_eval_labels = ds_eval_df["department"].astype(str)

# Predict department
ds_eval_pred = dmodel_syn.predict(ds_eval_text)

# Evaluation metrics

ds_eval_accuracy = accuracy_score(ds_eval_labels, ds_eval_pred)
ds_eval_macro_f1 = f1_score(ds_eval_labels, ds_eval_pred, average="macro")

print("Department Evaluation on ACTIVE Jobs")
print("Accuracy:", round(ds_eval_accuracy, 3))
print("Macro F1:", round(ds_eval_macro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(ds_eval_labels, ds_eval_pred))

Department Evaluation on ACTIVE Jobs
Accuracy: 0.676
Macro F1: 0.563

Classification Report:

                        precision    recall  f1-score   support

        Administrative       0.28      0.36      0.31        14
  Business Development       0.28      0.65      0.39        20
            Consulting       0.71      0.51      0.60        39
      Customer Support       0.50      1.00      0.67         6
       Human Resources       0.41      0.69      0.51        16
Information Technology       0.74      0.40      0.52        62
             Marketing       0.67      0.45      0.54        22
                 Other       0.77      0.78      0.77       344
    Project Management       0.60      0.62      0.61        39
            Purchasing       0.47      0.60      0.53        15
                 Sales       0.91      0.63      0.74        46

              accuracy                           0.68       623
             macro avg       0.57      0.61      0.56       623
        

In [34]:
# Comparison: Baseline vs Synthetic – Training & ACTIVE Job Evaluation for Department

comparison_metrics_department = pd.DataFrame({
    "Target": [
        "Department (Label Data – no Synthetic)",
        "Department (Label Data – with Synthetic)",
        "Department (ACTIVE Jobs – no Synthetic)",
        "Department (ACTIVE Jobs – with Synthetic)",
    ],
    "Accuracy": [
        daccuracy,        # Department baseline – label data
        dsaccuracy,       # Department synthetic – label data
        d_eval_accuracy, # Department baseline – ACTIVE jobs
        ds_eval_accuracy, # Department synthetic – ACTIVE jobs
    ],
    "Macro F1": [
        dmacro_f1,
        dsmacro_f1,
        d_eval_macro_f1,
        ds_eval_macro_f1,
    ]
})

print("Department Model Comparison:\n")
print(comparison_metrics_department)


Department Model Comparison:

                                      Target  Accuracy  Macro F1
0     Department (Label Data – no Synthetic)  0.934450  0.860444
1   Department (Label Data – with Synthetic)  0.887827  0.778606
2    Department (ACTIVE Jobs – no Synthetic)  0.223114  0.338219
3  Department (ACTIVE Jobs – with Synthetic)  0.675762  0.562904


# 6. Modell Seniority mit synthetic Daten und Oversamling

In [35]:
# #Seniority Train/Test Split mit synthetic Daten und Oversampling
from imblearn.over_sampling import RandomOverSampler

# Train/Test Split
sosx = strain_df_aug["text"]
sosy = strain_df_aug["label"]

sosx_train, sosx_test, sosy_train, sosy_test = train_test_split(
    sosx,
    sosy,
    test_size=0.2,
    random_state=42,
    stratify=sosy
)

# Oversampling nur auf Trainingsdaten
ros = RandomOverSampler(random_state=42)
sosx_train_res, sosy_train_res = ros.fit_resample(sosx_train.values.reshape(-1,1), sosy_train)
sosx_train_res = sosx_train_res.flatten()

In [36]:
smodel_syn_over = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),
        min_df=3,
        max_df=0.9
    )),
    ("clf", LogisticRegression(
        max_iter=1000,
        class_weight="balanced"
    ))
])

# Train the model on oversampled training data
smodel_syn_over.fit(sosx_train_res, sosy_train_res)

In [37]:
#Seniority Modell trainieren mit synthetic Daten und Oversampling
smodel_syn_over.fit(sosx_train_res, sosy_train_res)

# Vorhersagen auf Testdaten
sosy_pred = smodel_syn_over.predict(sosx_test)

# Accuracy ausgeben
print("Accuracy:", accuracy_score(sosy_test, sosy_pred))

Accuracy: 0.8877596111356606


In [38]:
#Seniority Evaluation mit synthetic Daten und Oversampling

sosy_pred = smodel_syn_over.predict(sosx_test)

sosaccuracy = accuracy_score(sosy_test, sosy_pred)
sosmacro_f1 = f1_score(sosy_test, sosy_pred, average="macro")

print("Accuracy:", round(sosaccuracy, 3))
print("Macro F1:", round(sosmacro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(sosy_test, sosy_pred))

Accuracy: 0.888
Macro F1: 0.824

Classification Report:

              precision    recall  f1-score   support

    Director       0.96      0.90      0.93       242
      Junior       0.72      0.87      0.79       165
        Lead       0.95      0.93      0.94       739
  Management       0.78      0.79      0.79       257
Professional       0.56      0.61      0.59        83
      Senior       0.92      0.91      0.92       777

    accuracy                           0.89      2263
   macro avg       0.82      0.83      0.82      2263
weighted avg       0.89      0.89      0.89      2263



In [39]:
# Seniority Evaluation on Annotated ACTIVE Jobs mit synthetic Daten unf Oversampling

# Prepare evaluation data
sos_eval_df = jobs_annotated_active_df.dropna(subset=["position", "seniority"]).copy()
sos_eval_text = sos_eval_df["position"].astype(str).str.lower()
sos_eval_labels = sos_eval_df["seniority"].astype(str)

# Predict seniority
sos_eval_pred = smodel_syn_over.predict(sos_eval_text)

# Evaluation metrics

sos_eval_accuracy = accuracy_score(sos_eval_labels, sos_eval_pred)
sos_eval_macro_f1 = f1_score(sos_eval_labels, sos_eval_pred, average="macro")

print("Seniority Evaluation on ACTIVE Jobs")
print("Accuracy:", round(sos_eval_accuracy, 3))
print("Macro F1:", round(sos_eval_macro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(sos_eval_labels, sos_eval_pred))

Seniority Evaluation on ACTIVE Jobs
Accuracy: 0.544
Macro F1: 0.506

Classification Report:

              precision    recall  f1-score   support

    Director       0.50      0.88      0.64        34
      Junior       0.07      0.83      0.13        12
        Lead       0.86      0.53      0.65       125
  Management       0.82      0.76      0.79       192
Professional       0.82      0.26      0.39       216
      Senior       0.31      0.70      0.43        44

    accuracy                           0.54       623
   macro avg       0.56      0.66      0.51       623
weighted avg       0.76      0.54      0.58       623



# 7. Modell Department mit synthetic Daten und Oversampling

In [40]:
# #Department Train/Test Split mit synthetic Daten und Oversampling

# Train/Test Split
dosx = dtrain_df_aug["text"]
dosy = dtrain_df_aug["label"]

dosx_train, dosx_test, dosy_train, dosy_test = train_test_split(
    dosx,
    dosy,
    test_size=0.2,
    random_state=42,
    stratify=dosy
)

# Oversampling nur auf Trainingsdaten
ros = RandomOverSampler(random_state=42)
dosx_train_res, dosy_train_res = ros.fit_resample(dosx_train.values.reshape(-1,1), dosy_train)
dosx_train_res = dosx_train_res.flatten()

In [41]:
dmodel_syn_over = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),
        min_df=3,
        max_df=0.9
    )),
    ("clf", LogisticRegression(
        max_iter=1000,
        class_weight="balanced"
    ))
])

# Train the model on oversampled training data
dmodel_syn_over.fit(dosx_train_res, dosy_train_res)

In [42]:
#Department Modell trainieren mit synthetic Daten und Oversampling
dmodel_syn_over.fit(dosx_train_res, dosy_train_res)

# Vorhersagen auf Testdaten
dosy_pred = dmodel_syn_over.predict(dosx_test)

# Accuracy ausgeben
print("Accuracy:", accuracy_score(dosy_test, dosy_pred))

Accuracy: 0.9206481096800997


In [43]:
#Department Evaluation mit synthetic Daten und Oversampling

dosy_pred = dmodel_syn_over.predict(dosx_test)

dosaccuracy = accuracy_score(dosy_test, dosy_pred)
dosmacro_f1 = f1_score(dosy_test, dosy_pred, average="macro")

print("Accuracy:", round(dosaccuracy, 3))
print("Macro F1:", round(dosmacro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(dosy_test, dosy_pred))

Accuracy: 0.921
Macro F1: 0.839

Classification Report:

                        precision    recall  f1-score   support

        Administrative       0.78      0.69      0.74        36
  Business Development       0.94      0.86      0.90       152
            Consulting       0.80      0.90      0.85        59
      Customer Support       0.78      0.93      0.85        15
       Human Resources       0.88      0.68      0.77        22
Information Technology       0.92      0.91      0.91       296
             Marketing       0.99      0.95      0.97       879
                 Other       0.62      0.91      0.74       164
    Project Management       0.72      0.86      0.79        64
            Purchasing       0.82      0.70      0.76        20
                 Sales       0.99      0.93      0.96       700

              accuracy                           0.92      2407
             macro avg       0.84      0.85      0.84      2407
          weighted avg       0.93      0.92  

In [44]:
#Department Evaluation on Annotated ACTIVE Jobs mit synthetic Daten unf Oversampling

# Prepare evaluation data
dos_eval_df = jobs_annotated_active_df.dropna(subset=["position", "department"]).copy()
dos_eval_text = dos_eval_df["position"].astype(str).str.lower()
dos_eval_labels = dos_eval_df["department"].astype(str)

# Predict department
dos_eval_pred = dmodel_syn_over.predict(dos_eval_text)

# Evaluation metrics

dos_eval_accuracy = accuracy_score(dos_eval_labels, dos_eval_pred)
dos_eval_macro_f1 = f1_score(dos_eval_labels, dos_eval_pred, average="macro")

print("Department Evaluation on ACTIVE Jobs")
print("Accuracy:", round(dos_eval_accuracy, 3))
print("Macro F1:", round(dos_eval_macro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(dos_eval_labels, dos_eval_pred))

Department Evaluation on ACTIVE Jobs
Accuracy: 0.685
Macro F1: 0.612

Classification Report:

                        precision    recall  f1-score   support

        Administrative       0.28      0.36      0.31        14
  Business Development       0.24      0.60      0.35        20
            Consulting       0.71      0.62      0.66        39
      Customer Support       1.00      0.83      0.91         6
       Human Resources       0.62      0.62      0.62        16
Information Technology       0.60      0.52      0.56        62
             Marketing       0.59      0.45      0.51        22
                 Other       0.77      0.77      0.77       344
    Project Management       0.68      0.64      0.66        39
            Purchasing       0.67      0.67      0.67        15
                 Sales       0.79      0.65      0.71        46

              accuracy                           0.69       623
             macro avg       0.63      0.61      0.61       623
        

# 8. Modell Seniority mit synthetic Daten und Oversampling + neue Pipeline

In [45]:
# #Seniority Train/Test Split mit synthetic Daten und Oversampling
# Train/Test Split
sosx = strain_df_aug["text"]
sosy = strain_df_aug["label"]

sosx_train, sosx_test, sosy_train, m1sosy_test = train_test_split(
    sosx,
    sosy,
    test_size=0.2,
    random_state=42,
    stratify=sosy
)

# Oversampling nur auf Trainingsdaten
ros = RandomOverSampler(random_state=42)
sosx_train_res, sosy_train_res = ros.fit_resample(sosx_train.values.reshape(-1,1), sosy_train)
sosx_train_res = sosx_train_res.flatten()

smodel_syn_over = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),
        min_df=3,
        max_df=0.9
    )),
    ("clf", LogisticRegression(
        max_iter=2000,
        class_weight="balanced",
        C=0.5,              # stärker regularisiert
        penalty="l2",
        solver="liblinear"
    ))
])

# Train the model on oversampled training data
smodel_syn_over.fit(sosx_train_res, sosy_train_res)

#Seniority Modell trainieren mit synthetic Daten und Oversampling
smodel_syn_over.fit(sosx_train_res, sosy_train_res)

# Vorhersagen auf Testdaten
sosy_pred = smodel_syn_over.predict(sosx_test)

# Accuracy ausgeben
print("Accuracy:", accuracy_score(sosy_test, sosy_pred))

#Seniority Evaluation mit synthetic Daten und Oversampling

m1sosy_pred = smodel_syn_over.predict(sosx_test)

m1sosaccuracy = accuracy_score(m1sosy_test, m1sosy_pred)
m1sosmacro_f1 = f1_score(m1sosy_test, m1sosy_pred, average="macro")

print("Accuracy:", round(m1sosaccuracy, 3))
print("Macro F1:", round(m1sosmacro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(m1sosy_test, m1sosy_pred))

Accuracy: 0.8767123287671232
Accuracy: 0.877
Macro F1: 0.818

Classification Report:

              precision    recall  f1-score   support

    Director       0.98      0.88      0.93       242
      Junior       0.71      0.87      0.78       165
        Lead       0.94      0.93      0.93       739
  Management       0.77      0.76      0.77       257
Professional       0.53      0.67      0.60        83
      Senior       0.91      0.89      0.90       777

    accuracy                           0.88      2263
   macro avg       0.81      0.83      0.82      2263
weighted avg       0.88      0.88      0.88      2263



In [46]:
# Seniority Evaluation on Annotated ACTIVE Jobs mit synthetic Daten unf Oversampling

# Prepare evaluation data
sos_eval_df = jobs_annotated_active_df.dropna(subset=["position", "seniority"]).copy()
sos_eval_text = sos_eval_df["position"].astype(str).str.lower()
m1sos_eval_labels = sos_eval_df["seniority"].astype(str)

# Predict seniority
m1sos_eval_pred = smodel_syn_over.predict(sos_eval_text)

# Evaluation metrics

m1sos_eval_accuracy = accuracy_score(m1sos_eval_labels, m1sos_eval_pred)
m1sos_eval_macro_f1 = f1_score(m1sos_eval_labels, m1sos_eval_pred, average="macro")

print("Seniority Evaluation on ACTIVE Jobs")
print("Accuracy:", round(m1sos_eval_accuracy, 3))
print("Macro F1:", round(m1sos_eval_macro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(m1sos_eval_labels, m1sos_eval_pred))

Seniority Evaluation on ACTIVE Jobs
Accuracy: 0.555
Macro F1: 0.522

Classification Report:

              precision    recall  f1-score   support

    Director       0.55      0.85      0.67        34
      Junior       0.07      0.83      0.13        12
        Lead       0.87      0.52      0.65       125
  Management       0.83      0.76      0.79       192
Professional       0.81      0.29      0.43       216
      Senior       0.33      0.77      0.46        44

    accuracy                           0.56       623
   macro avg       0.58      0.67      0.52       623
weighted avg       0.77      0.56      0.59       623



# 9. Modell Department mit synthetic Daten und Oversampling + neue Pipeline

In [47]:
# #Department Train/Test Split mit synthetic Daten und Oversampling

# Train/Test Split
dosx = dtrain_df_aug["text"]
dosy = dtrain_df_aug["label"]

dosx_train, dosx_test, dosy_train, m1dosy_test = train_test_split(
    dosx,
    dosy,
    test_size=0.2,
    random_state=42,
    stratify=dosy
)

# Oversampling nur auf Trainingsdaten
ros = RandomOverSampler(random_state=42)
dosx_train_res, dosy_train_res = ros.fit_resample(dosx_train.values.reshape(-1,1), dosy_train)
dosx_train_res = dosx_train_res.flatten()
dmodel_syn_over = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),
        min_df=3,
        max_df=0.9
    )),
    ("clf", LogisticRegression(
        max_iter=2000,
        class_weight="balanced",
        C=0.5,              # stärker regularisiert
        penalty="l2",
        solver="liblinear"
    ))
])

# Train the model on oversampled training data
dmodel_syn_over.fit(dosx_train_res, dosy_train_res)

#Department Modell trainieren mit synthetic Daten und Oversampling
dmodel_syn_over.fit(dosx_train_res, dosy_train_res)

# Vorhersagen auf Testdaten
dosy_pred = dmodel_syn_over.predict(dosx_test)

# Accuracy ausgeben
print("Accuracy:", accuracy_score(dosy_test, dosy_pred))

#Department Evaluation mit synthetic Daten und Oversampling

m1dosy_pred = dmodel_syn_over.predict(dosx_test)

m1dosaccuracy = accuracy_score(m1dosy_test, m1dosy_pred)
m1dosmacro_f1 = f1_score(m1dosy_test, m1dosy_pred, average="macro")

print("Accuracy:", round(m1dosaccuracy, 3))
print("Macro F1:", round(m1dosmacro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(m1dosy_test, m1dosy_pred))

Accuracy: 0.9081844619858745
Accuracy: 0.908
Macro F1: 0.819

Classification Report:

                        precision    recall  f1-score   support

        Administrative       0.76      0.72      0.74        36
  Business Development       0.90      0.86      0.88       152
            Consulting       0.78      0.88      0.83        59
      Customer Support       0.64      0.93      0.76        15
       Human Resources       0.88      0.68      0.77        22
Information Technology       0.91      0.90      0.90       296
             Marketing       0.99      0.94      0.96       879
                 Other       0.59      0.91      0.72       164
    Project Management       0.67      0.84      0.74        64
            Purchasing       0.82      0.70      0.76        20
                 Sales       0.99      0.92      0.95       700

              accuracy                           0.91      2407
             macro avg       0.81      0.84      0.82      2407
          weight

In [48]:
#Department Evaluation on Annotated ACTIVE Jobs mit synthetic Daten unf Oversampling

# Prepare evaluation data
dos_eval_df = jobs_annotated_active_df.dropna(subset=["position", "department"]).copy()
dos_eval_text = dos_eval_df["position"].astype(str).str.lower()
m1dos_eval_labels = dos_eval_df["department"].astype(str)

# Predict department
m1dos_eval_pred = dmodel_syn_over.predict(dos_eval_text)

# Evaluation metrics

m1dos_eval_accuracy = accuracy_score(m1dos_eval_labels, m1dos_eval_pred)
m1dos_eval_macro_f1 = f1_score(m1dos_eval_labels, m1dos_eval_pred, average="macro")

print("Department Evaluation on ACTIVE Jobs")
print("Accuracy:", round(m1dos_eval_accuracy, 3))
print("Macro F1:", round(m1dos_eval_macro_f1, 3))
print("\nClassification Report:\n")
print(classification_report(m1dos_eval_labels, m1dos_eval_pred))

Department Evaluation on ACTIVE Jobs
Accuracy: 0.682
Macro F1: 0.618

Classification Report:

                        precision    recall  f1-score   support

        Administrative       0.23      0.36      0.28        14
  Business Development       0.25      0.60      0.35        20
            Consulting       0.71      0.62      0.66        39
      Customer Support       1.00      1.00      1.00         6
       Human Resources       0.62      0.62      0.62        16
Information Technology       0.57      0.50      0.53        62
             Marketing       0.71      0.45      0.56        22
                 Other       0.78      0.76      0.77       344
    Project Management       0.60      0.67      0.63        39
            Purchasing       0.67      0.67      0.67        15
                 Sales       0.85      0.63      0.72        46

              accuracy                           0.68       623
             macro avg       0.64      0.63      0.62       623
        

In [52]:
# --- Full comparison: Baseline vs Synthetic vs Synthetic + Oversampling + neue Pipeline  ---
full_comparison_department = pd.DataFrame({
    "Target": [
        "Seniority (Label Data – no Synthetic)",     # Baseline train
        "Seniority (Label Data – with Synthetic)",  # Synthetic only train
        "Seniority (Label Data – Synthetic + Oversampling)",  # Synthetic + Oversampling train
        "Seniority (Label Data – Synthetic + Oversampling + neue Pipeline)",  # Synthetic + Oversampling train + neue Pipeline
        "Seniority (ACTIVE Jobs – no Synthetic)",   # Baseline eval
        "Seniority (ACTIVE Jobs – with Synthetic)", # Synthetic only eval
        "Seniority (ACTIVE Jobs – Synthetic + Oversampling)", # Synthetic + Oversampling eval
        "Seniority (ACTIVE Jobs – Synthetic + Oversampling + neue Pipeline)", # Synthetic + Oversampling eval + neue Pipeline
        "Department (Label Data – no Synthetic)",     # Baseline train
        "Department (Label Data – with Synthetic)",  # Synthetic only train
        "Department (Label Data – Synthetic + Oversampling)",  # Synthetic + Oversampling train
        "Department (Label Data – Synthetic + Oversampling + neue Pipeline)",  # Synthetic + Oversampling train + neue Pipeline
        "Department (ACTIVE Jobs – no Synthetic)",   # Baseline eval
        "Department (ACTIVE Jobs – with Synthetic)", # Synthetic only eval
        "Department (ACTIVE Jobs – Synthetic + Oversampling)", # Synthetic + Oversampling eval
        "Department (ACTIVE Jobs – Synthetic + Oversampling + neue Pipeline)", # Synthetic + Oversampling eval + neue Pipeline
    ],
    "Accuracy": [
        saccuracy,        # Baseline train
        ssaccuracy,       # Synthetic only train
        sosaccuracy,      # Synthetic + Oversampling train
        m1sosaccuracy,      # Synthetic + Oversampling train + neue Pipeline
        s_eval_accuracy,  # Baseline eval
        ss_eval_accuracy, # Synthetic only eval
        sos_eval_accuracy, # Synthetic + Oversampling eval
        m1sos_eval_accuracy, # Synthetic + Oversampling eval + neue Pipeline
        daccuracy,        # Baseline train
        dsaccuracy,       # Synthetic only train
        dosaccuracy,      # Synthetic + Oversampling train
        m1dosaccuracy,      # Synthetic + Oversampling train + neue Pipeline
        d_eval_accuracy,  # Baseline eval
        ds_eval_accuracy, # Synthetic only eval
        dos_eval_accuracy, # Synthetic + Oversampling eval
        m1dos_eval_accuracy, # Synthetic + Oversampling eval + neue Pipeline
    ],
    "Macro F1": [
        smacro_f1,        # Baseline train
        ssmacro_f1,       # Synthetic only train
        sosmacro_f1,      # Synthetic + Oversampling train
        m1sosmacro_f1,      # Synthetic + Oversampling train + neue Pipeline
        s_eval_macro_f1,  # Baseline eval
        ss_eval_macro_f1, # Synthetic only eval
        sos_eval_macro_f1, # Synthetic + Oversampling eval
        m1sos_eval_macro_f1, # Synthetic + Oversampling eval + neue Pipeline
        dmacro_f1,        # Baseline train
        dsmacro_f1,       # Synthetic only train
        dosmacro_f1,      # Synthetic + Oversampling train
        m1dosmacro_f1,      # Synthetic + Oversampling train + neue Pipeline
        d_eval_macro_f1,  # Baseline eval
        ds_eval_macro_f1, # Synthetic only eval
        dos_eval_macro_f1, # Synthetic + Oversampling eval
        m1dos_eval_macro_f1, # Synthetic + Oversampling eval + neue Pipeline
    ]
})

print("\nFull Comparison: Baseline vs Synthetic vs + Oversampling vs + neue Pipeline\n")
print(full_comparison_department)



Full Comparison: Baseline vs Synthetic vs + Oversampling vs + neue Pipeline

                                               Target  Accuracy  Macro F1
0               Seniority (Label Data – no Synthetic)  0.970308  0.956030
1             Seniority (Label Data – with Synthetic)  0.870968  0.811299
2   Seniority (Label Data – Synthetic + Oversampling)  0.887760  0.824383
3   Seniority (Label Data – Synthetic + Oversampli...  0.876712  0.817640
4              Seniority (ACTIVE Jobs – no Synthetic)  0.436597  0.409319
5            Seniority (ACTIVE Jobs – with Synthetic)  0.645265  0.571373
6   Seniority (ACTIVE Jobs – Synthetic + Oversampl...  0.544141  0.505893
7   Seniority (ACTIVE Jobs – Synthetic + Oversampl...  0.555377  0.521959
8              Department (Label Data – no Synthetic)  0.934450  0.860444
9            Department (Label Data – with Synthetic)  0.887827  0.778606
10  Department (Label Data – Synthetic + Oversampl...  0.920648  0.838617
11  Department (Label Data – Synth