COUNTERFACTUAL EXAMPLE WITH LLM APPLICATION

As per the Caselaw of the ECJ (CK vs. Magistrat der Stadt Wien & Dun & Bradstreeet Austria GmBH), it is clarified that the data controller do not need to provide detailed information about the algorithm and the court emphasizes on contractual examples.

Thus, some guidance has been provided for ECJ to establish legal requirements for explanations to be provided, besides the technical ones. As LLMs are used as external tools for decision making as well as for xplanations, it is also included in the workbook.

Within the scope of this workbook,

*   A supervised learning xAI tool (DICE) is run and the counterfactual examples are examined from legal perspective and in line with Caselaw of ECJ.

*   LLM is used to explain the counterfactual examples generated by the xAI tool.

* LLM is used to make a decision independently with an appropriate prompt and provide 3 counterfactual examples.



In [9]:
!pip install scikit-learn pandas openai kagglehub dice_ml openai dice_ml

Collecting dice_ml
  Downloading dice_ml-0.12-py3-none-any.whl.metadata (20 kB)
Collecting raiutils>=0.4.0 (from dice_ml)
  Downloading raiutils-0.4.2-py3-none-any.whl.metadata (1.4 kB)
Downloading dice_ml-0.12-py3-none-any.whl (2.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading raiutils-0.4.2-py3-none-any.whl (17 kB)
Installing collected packages: raiutils, dice_ml
Successfully installed dice_ml-0.12 raiutils-0.4.2


A Synthetic dataset has been created, as for illustration purposes.

In [10]:
import pandas as pd
import numpy as np

np.random.seed(42)

N = 2000

df = pd.DataFrame({
    "Age": np.random.randint(21, 45, N),
    "YearsExperience": np.random.randint(0, 15, N),
    "EducationLevel": np.random.randint(1, 4, N),  # 1 = High School, 2 = Bachelor, 3 = Master+
    "CodingScore": np.random.randint(0, 100, N),
    "InterviewScore": np.random.randint(0, 100, N),
})

# random non-linear hire decision generation
df["Hire"] = (
    (df["CodingScore"] * 0.35)
    + (df["InterviewScore"] * 0.35)
    + (df["YearsExperience"] * 3)
    + (df["EducationLevel"] * 5)
) + np.random.normal(0, 10, N)

# Normalize ve threshold
df["Hire"] = (df["Hire"] > df["Hire"].median()).astype(int)

df.head()


Unnamed: 0,Age,YearsExperience,EducationLevel,CodingScore,InterviewScore,Hire
0,27,9,1,84,81,1
1,40,7,2,61,68,1
2,35,3,2,96,18,0
3,31,13,1,37,1,0
4,28,6,2,14,11,0


Data set is splitted into test and train data. Test data will be used to get an instance to have a contrafactual example. As the model is not trained with test data, we used the instance for contrafactual example generation, as well as instance for LLM request.

In [11]:
from sklearn.model_selection import train_test_split

features = ["Age", "YearsExperience", "EducationLevel", "CodingScore", "InterviewScore"]

X_df = df[features].copy()
y_series = df["Hire"].copy()

X_train_df, X_test_df, y_train_s, y_test_s = train_test_split(
    X_df, y_series,
    test_size=0.2,
    random_state=42,
    stratify=y_series
)


In [12]:
from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(
    n_estimators=300,
    max_depth=7,
    random_state=42
)

rf_model.fit(X_train_df, y_train_s)

print("Test accuracy:", rf_model.score(X_test_df, y_test_s))

Test accuracy: 0.8175


Accuracy is acceptable. Model has not been finetuned or data preprocessing has not been completed, as the purpose is to show counterfactuals.

Dice has been imported as a simple xAI tool. Random forest has been used as the base algorith, as it is part of Sklearn library.

Old version of xAI tools may have conflict with Tensorflow, thus, a compatible black box algorithm has been used.

In [13]:
import dice_ml

data_df = df[features + ["Hire"]].copy()

d = dice_ml.Data(
    dataframe=data_df,
    continuous_features=["Age", "YearsExperience", "CodingScore", "InterviewScore"],
    outcome_name="Hire"
)

m = dice_ml.Model(model=rf_model, backend="sklearn")

exp = dice_ml.Dice(d, m, method="genetic")


The instance is chose from test splitted date.

In [14]:
# Test verisi: features + gerçek label
test_df = X_test_df.copy()
test_df["Hire"] = y_test_s.values

# Model tahmini (proba)
preds = rf_model.predict_proba(X_test_df)

# Modelin 0 sınıfını tahmin ettiği indeksleri bul
model_pred_0_idx = np.where(preds[:, 0] > 0.5)[0]   # 0 sınıfı daha yüksek olasılık

# İlk örneği seçelim
i = model_pred_0_idx[0]

query_instance = X_test_df.iloc[[i]]   # feature'lar
true_label = y_test_s.iloc[i]          # gerçek etiket
model_pred_label = rf_model.predict(X_test_df.iloc[[i]])[0]

print("Selected index:", i)
print("True label:", true_label)
print("Model prediction:", model_pred_label)
print("Model proba:", preds[i])
print("\nQuery instance:")
query_instance

Selected index: 0
True label: 0
Model prediction: 0
Model proba: [0.59450605 0.40549395]

Query instance:


Unnamed: 0,Age,YearsExperience,EducationLevel,CodingScore,InterviewScore
1484,29,0,3,72,59


Dice Algoritm is trained, to generatw 3 counterfactuals where desired outcome is hiring.

In [15]:
cf = exp.generate_counterfactuals(
    query_instance,
    total_CFs=3,
    desired_class=1
)




100%|██████████| 1/1 [00:00<00:00,  2.38it/s]


In [16]:
# 1) Original instance
orig = query_instance.copy()
orig_proba = rf_model.predict_proba(orig)[0][1]
orig["Hiring_Probability"] = orig_proba
orig["Type"] = "Original"

# 2) Counterfactual examples
cf_df = cf.cf_examples_list[0].final_cfs_df.copy()
cf_probas = rf_model.predict_proba(cf_df[features])

cf_df["Hiring_Probability"] = cf_probas[:, 1]
cf_df["Type"] = "Counterfactual"

# 3) Keep only desired columns:
cols = features + ["Hiring_Probability", "Type"]

# 4) Combine
combined_df = pd.concat([orig[cols], cf_df[cols]], ignore_index=True)

combined_df

Unnamed: 0,Age,YearsExperience,EducationLevel,CodingScore,InterviewScore,Hiring_Probability,Type
0,29,0,3,72,59,0.405494,Original
1,29,0,2,74,70,0.569468,Counterfactual
2,25,0,3,74,72,0.535302,Counterfactual
3,26,5,3,72,67,0.645258,Counterfactual


Education level is highest in the instance, thus we excluded that variable. Age is still kept, as it may show its effect on the decision.

In [17]:
cf = exp.generate_counterfactuals(
    query_instance,
    total_CFs=3,
    desired_class=1,
    features_to_vary=["YearsExperience", "CodingScore", "InterviewScore"]
)


100%|██████████| 1/1 [00:00<00:00,  2.48it/s]


In [18]:
# 1) Original instance
orig = query_instance.copy()
orig_proba = rf_model.predict_proba(orig)[0][1]
orig["Hiring_Probability"] = orig_proba
orig["Type"] = "Original"

# 2) Counterfactual examples
cf_df = cf.cf_examples_list[0].final_cfs_df.copy()
cf_probas = rf_model.predict_proba(cf_df[features])

cf_df["Hiring_Probability"] = cf_probas[:, 1]
cf_df["Type"] = "Counterfactual"

# 3) Keep only desired columns:
cols = features + ["Hiring_Probability", "Type"]

# 4) Combine
combined_df = pd.concat([orig[cols], cf_df[cols]], ignore_index=True)

combined_df

Unnamed: 0,Age,YearsExperience,EducationLevel,CodingScore,InterviewScore,Hiring_Probability,Type
0,29,0,3,72,59,0.405494,Original
1,29,0,3,79,61,0.563924,Counterfactual
2,29,0,3,74,68,0.582081,Counterfactual
3,29,0,3,83,61,0.734797,Counterfactual


**LLM EXPLANATION**

LLM is used to generate explanations for the contrafactual examples:

* Explanation for each contrafactual example
* Explanation for all and summarize steps to be taken

**EXPLANATION FOR EACH CONTRAFACTUAL**

In [20]:
from openai import OpenAI

client = OpenAI(
  api_key="api_key"
)


In [21]:
def generate_llm_cf_explanations_all(
    combined_df,
    features,
    prob_col="Hiring_Probability",
    model_name="gpt-4.1-mini",
):
    """
    Use an LLM to generate explanations for ALL counterfactuals at once.
    Returns a single string (multi-CF explanation).
    """
    # Orijinal ve CF'leri ayır
    orig_row = combined_df[combined_df["Type"] == "Original"].iloc[0]
    cf_rows = combined_df[combined_df["Type"] == "Counterfactual"].reset_index(drop=True)

    # Orijinal özet
    orig_feat_str = ", ".join(
        f"{feat} = {orig_row[feat]}" for feat in features
    )
    orig_prob = orig_row.get(prob_col, None)

    # Tüm CF'leri listele
    cf_blocks = []
    for i, cf_row in cf_rows.iterrows():
        idx = i + 1
        cf_feat_str = ", ".join(
            f"{feat} = {cf_row[feat]}" for feat in features
        )
        cf_prob = cf_row.get(prob_col, None)
        block = f"Counterfactual #{idx}:\n- Features: {cf_feat_str}\n- Hiring probability: {cf_prob:.3f}"
        cf_blocks.append(block)

    cf_block_text = "\n\n".join(cf_blocks)

    user_prompt = f"""
You are an explainable AI assistant for a hiring model. The model predicts the probability
that a candidate will be hired.

The ORIGINAL candidate:
- Features: {orig_feat_str}
- Hiring probability: {orig_prob:.3f}

Below are several COUNTERFACTUAL candidates, each representing a version of the same person
with some features changed:

{cf_block_text}

TASK:
For EACH counterfactual candidate (CF #1, CF #2, etc.), do the following:

1. Briefly describe which features changed compared to the original candidate
   (only mention features that actually changed).
2. Explain in simple, non-technical language why these changes make the model
   more likely to hire that counterfactual candidate.
3. Phrase the explanation as actionable guidance for the person, e.g.:
   "To increase your hiring probability, you would need to improve your coding test score
   from 72 to 85 and slightly increase your interview performance."
4. Focus especially on CodingScore, InterviewScore, and YearsExperience if they changed,
   but feel free to mention other features if relevant.

Format your answer as a numbered list:
1) Explanation for CF #1
2) Explanation for CF #2
3) Explanation for CF #3
and so on.
"""

    response = client.chat.completions.create(
        model=model_name,
        messages=[
            {"role": "user", "content": user_prompt}
        ],
    )

    explanation_text = response.choices[0].message.content
    return explanation_text.strip()


In [22]:
all_cf_explanation = generate_llm_cf_explanations_all(
    combined_df=combined_df,
    features=features,
    prob_col="Hiring_Probability",
    model_name="gpt-4.1-mini"
)

print(all_cf_explanation)

1) Explanation for CF #1  
- Changed features: CodingScore increased from 72 to 79, InterviewScore increased from 59 to 61.  
- Explanation: Improving both your coding test score and interview performance shows stronger skills and better preparation, which makes you more attractive to the hiring model. These improvements give the impression that you are more capable and ready for the job.  
- Actionable guidance: To increase your hiring probability, you would need to improve your coding test score from 72 to 79 and slightly enhance your interview performance from 59 to 61.

2) Explanation for CF #2  
- Changed features: CodingScore increased from 72 to 74, InterviewScore increased from 59 to 68.  
- Explanation: Compared to the original, your coding skills improved a little, but your interview skills improved significantly. A much better interview score suggests you communicate well and fit the role, which strongly boosts your chances.  
- Actionable guidance: To increase your hiring p

**LLM EXPLANATION FOR ALL**

In [25]:
def generate_llm_explanation_original_from_cfs(
    combined_df,
    features,
    prob_col="Hiring_Probability",
    model_name="gpt-4.1-mini",  # hesabında açık olan bir model ismi yaz
):
    """
    Use an LLM to explain the ORIGINAL candidate in light of all counterfactuals.
    The output is a single explanation focusing on what the original candidate
    would need to change (according to the CFs) to increase hiring probability.
    """
    # Orijinal ve CF'leri ayır
    orig_row = combined_df[combined_df["Type"] == "Original"].iloc[0]
    cf_rows = combined_df[combined_df["Type"] == "Counterfactual"].reset_index(drop=True)

    # Orijinal özet
    orig_feat_str = ", ".join(
        f"{feat} = {orig_row[feat]}" for feat in features
    )
    orig_prob = orig_row.get(prob_col, None)

    # Tüm CF'leri özetle
    cf_blocks = []
    for i, cf_row in cf_rows.iterrows():
        idx = i + 1
        cf_feat_str = ", ".join(
            f"{feat} = {cf_row[feat]}" for feat in features
        )
        cf_prob = cf_row.get(prob_col, None)
        block = f"Counterfactual #{idx}:\n- Features: {cf_feat_str}\n- Hiring probability: {cf_prob:.3f}"
        cf_blocks.append(block)

    cf_block_text = "\n\n".join(cf_blocks)

    user_prompt = f"""
You are an explainable AI assistant for a hiring model.

The ORIGINAL candidate:
- Features: {orig_feat_str}
- Hiring probability: {orig_prob:.3f}

Below are several COUNTERFACTUAL candidates, each representing a version of the same person
with some features changed, for which the model gives a higher hiring probability:

{cf_block_text}

TASK:
Using these counterfactual candidates as guidance, analyze and explain the ORIGINAL candidate.

Specifically:
1. Describe what the original candidate is currently "missing" or where they are weaker,
   compared to the counterfactual candidates that the model is more likely to hire.
2. Summarize which feature changes appear most important across the counterfactuals
   (for example, consistent increases in CodingScore, InterviewScore, or YearsExperience).
3. Turn this into direct, actionable advice for the original candidate:
   clearly explain what they would need to improve or change in order to look more like
   the successful counterfactuals and increase their chances of being hired.

Write your answer as:
- A short paragraph that summarizes the situation.
- Then a concise bullet list of concrete changes the original candidate should focus on.
Use simple, non-technical language.
"""

    response = client.chat.completions.create(
        model=model_name,
        messages=[
            {"role": "user", "content": user_prompt}
        ],
    )

    explanation_text = response.choices[0].message.content
    return explanation_text.strip()


In [26]:
explanation_original = generate_llm_explanation_original_from_cfs(
    combined_df=combined_df,
    features=features,
    prob_col="Hiring_Probability",
    model_name="gpt-4.1-mini"
)

print(explanation_original)

The original candidate’s hiring probability is notably lower than the counterfactual versions, mainly because they have lower scores on key skills and are slightly older than some of the more favored candidates. The model favors candidates with higher coding and interview scores, and a bit younger age seems to correlate with a higher chance of hiring in these examples. Experience and education level remain the same, so improvements there are less relevant in this case.

To improve your chances, consider focusing on the following:

- **Increase your coding test score** by practicing problem-solving and programming challenges to demonstrate stronger technical skills.
- **Improve your interview performance** by preparing for common questions and practicing clear, confident communication.
- **If possible, leverage your youth as an advantage** by emphasizing your energy, adaptability, and willingness to learn quickly.
- Since years of experience and education are unchanged, focus mainly on 

**DIRECTLY ASKING TO THE LLM FOR DECISION**

In [27]:
def instance_to_feature_string(instance_row, features):
    """Convert a 1-row Series/DataFrame to 'feat = value' comma-separated string."""
    return ", ".join(f"{feat} = {instance_row[feat]}" for feat in features)

In [28]:
candidate_row = query_instance.iloc[0]
candidate_str = instance_to_feature_string(candidate_row, features)
print(candidate_str)
# e.g. "Age = 29, YearsExperience = 0, EducationLevel = 3, CodingScore = 72, InterviewScore = 59"


Age = 29, YearsExperience = 0, EducationLevel = 3, CodingScore = 72, InterviewScore = 59


In [44]:
def llm_decide_and_counterfactuals(
    client,
    candidate_row,
    features,
    model_name="gpt-4.1-mini"  # adjust to a model you have access to
):
    """
    Ask an LLM to:
    - Decide whether to hire a candidate or not,
    - Assign an approximate hiring probability (0–1),
    - Generate 3 counterfactual examples as a table, explaining what to change.
    """
    candidate_str = instance_to_feature_string(candidate_row, features)

    # You can adapt this description to match your synthetic dataset logic
    dataset_description = """
You should decide whether to hire an employee for a tech company depending on the below variables:
- Age: integer, between about 21 and 45.
- YearsExperience: integer, years of professional experience (0 to ~15).
- EducationLevel: categorical integer where 1 = High School, 2 = Bachelor, 3 = Master or higher.
- CodingScore: integer, 0 to 100, representing performance on a coding assessment.
- InterviewScore: integer, 0 to 100, representing performance in a structured interview.

The role is tech role and the company will use you to make a decision.

Higher CodingScore, InterviewScore, YearsExperience, and EducationLevel generally indicate a stronger candidate,
but there may be trade-offs. Your job is to evaluate a single candidate and propose improvements.
""".strip()

    user_prompt = f"""
{dataset_description}

The candidate's current features are:
{candidate_str}

TASK:
1. Decide whether this candidate should be hired or not hired, based on the feature values.
2. Provide an approximate "hiring probability" between 0.0 and 1.0 (your subjective estimate).
3. Generate exactly 3 counterfactual examples that represent realistic modifications to this candidate
   which would make them more likely to be hired.

For the counterfactuals:
- Each counterfactual should be a small modification of the original candidate (not a completely different person).
- You may change Age, YearsExperience, EducationLevel, CodingScore, and InterviewScore,
  but keep changes realistic (e.g., you can increase CodingScore or InterviewScore, increase experience, etc.).
- For each counterfactual, briefly explain in one sentence why it is more hireable.

IMPORTANT OUTPUT FORMAT:
Reply in **Markdown** with the following structure:


1. A short paragraph with your decision and hiring probability. Example:
   "Decision: Do not hire. Estimated hiring probability: 0.35."

2. A Markdown table with the 3 counterfactual examples. The table MUST have columns:

   | CF_ID | Age | YearsExperience | EducationLevel | CodingScore | InterviewScore | Rationale |

   - CF_ID should be CF1, CF2, CF3.
   - Age, YearsExperience, EducationLevel, CodingScore, InterviewScore:
     the modified values you recommend.
   - Rationale: one short sentence explaining why this version is more hireable.
   - Probability

3. The original instance is the first row.

Make sure the output is valid Markdown.
""".strip()

    response = client.chat.completions.create(
        model=model_name,
        messages=[
            {"role": "user", "content": user_prompt}
        ],
    )

    return response.choices[0].message.content


In [45]:
candidate_row = query_instance.iloc[0]

llm_output = llm_decide_and_counterfactuals(
    client=client,
    candidate_row=candidate_row,
    features=features,
    model_name="gpt-4.1-mini"   # or any other model you can use
)

print(llm_output)


Decision: Do not hire. Estimated hiring probability: 0.40.

| CF_ID | Age | YearsExperience | EducationLevel | CodingScore | InterviewScore | Rationale                                                       | Probability |
|-------|-----|-----------------|----------------|-------------|----------------|-----------------------------------------------------------------|-------------|
| ORIG  | 29  | 0               | 3              | 72          | 59             | Original candidate with strong education but low experience and interview score. | 0.40        |
| CF1   | 29  | 2               | 3              | 72          | 59             | Adding 2 years of experience increases practical knowledge and value.           | 0.60        |
| CF2   | 29  | 0               | 3              | 80          | 65             | Improved coding and interview scores show stronger technical and communication skills. | 0.65        |
| CF3   | 30  | 1               | 3              | 75          | 70       

**CONCLUSION**

The scope of this work is to analyze only the outcome of a classical Counterfactual Explanation algotithm and LLM explanation. There are limitations described in the discussion part.

Subject to those limitations, we reached on the following conclusions:

* DICE provides meaningful contrafactual examples, together with their probabilities.
* Explanation provided by LLM is satisfactory, prompting LLM to provide a general explanation provides better result.
* Complex algorithms, including LLMs are less reliable and traceable. Thus, for explanation, sticking on the 'old' algorithms may be a good approach for now. However, LLMs are good tools to make the explanation less technical, and more concise and clear.



**LIMITATIONS**

The following improvements should be made as a complementary work:
* Fidelity and accuracy of the CF examples provided by DICE should be checked.
* Other xAI tools should be illustrated and dice should be evaluated with respective metrics.
* LLMs result for the current instance is in with the Random Forest algorithm, hovewer, requests should be made for a number of instances and the results should be compared.
* LLM gives different probabilities for each instances. Ways to freeze the algorithm (defining the seed, changing parameters etc) and effect of getting different results need to be discussed from a legal perspective.  

**DISCUSSION**

Explaining the unsupervised learning algorithm, in particular, LLMs requires some kind of human involvement. Otherwise, the explanation provided by the LLM needs explanation, which leads to an inevitable recursive logic.

As the current work focuses on the quality of the explanation, but not fidelity of the same, it is important to focus on accuracy and fidelity of the LLMs' explanations.

Both legal and technical framework shoudl be revised as to comply with unique features of LLM and Generative AI algorithms. Mechanisms/layers, immune from uncertainty and vaguness related to large language models, should be implemented for the explanation of the LLMs.

ECJ's decision requiring counterfactual examples is not in line with the unique structure of the Generative AI. As creation of counterfactual examples is also a generative process. The legal requirements should be revisited in line with the distinct nature of these algorithms.  

Fidelity and reliabilty of Chain of Thought and high level reasoning,newly introduced by OpenAI, is need to be analyzed, as well. As the use of unsupervised learning is inevitable, even for xAI, analyzing the internal mechanisms of the current GenAI from legal perspective is of relevance.  