<p><font size="6" color='grey'> <b>
Machine Learning
</b></font> </br></p>
<p><font size="5" color='grey'> <b>
eXplainable Artificial Intelligence (XAI) - Titanic
</b></font> </br></p>

---


with DALEX (moDel Agnostic Language for Exploration and eXplanation)

[DALEX](https://github.com/ModelOriented/DALEX)
[DrWhy.AI](https://github.com/ModelOriented/DrWhy/blob/master/README.md)

In [None]:
#@title 🔧 Colab-Umgebung { display-mode: "form" }
!uv pip install --system -q git+https://github.com/ralf-42/Python_Modules
from ml_lib.utilities import get_ipinfo
import sys
print()
print(f"Python Version: {sys.version}")
print()
get_ipinfo()

# 0  | Install & Import
***

In [None]:
# Install
!uv pip install --system -q dalex -U
!uv pip install --system -q numpy scipy -U

In [None]:
# Import
from pandas import read_csv, DataFrame

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import (
    accuracy_score,
    cohen_kappa_score,
    confusion_matrix,
    ConfusionMatrixDisplay,
    classification_report,
)

import dalex as dx

In [None]:
# Warnung ausstellen
import warnings
warnings.filterwarnings("ignore")

# 1  | Understand
***

<p><font color='black' size="5">
Anwendungsfall
</font></p>

Dies ist der legendäre Titanic ML-Wettbewerb – die beste erste Herausforderung, um in ML-Modellierung einzutauchen.

Die Aufgabe ist einfach: Verwenden Sie maschinelles Lernen, um ein Modell zu erstellen, das vorhersagt, welche Passagiere den Schiffbruch der Titanic überlebt haben.

Wie lassen sich die Ergebnisse nachvollziehbar erklären.


[Titanic Org](https://www.encyclopedia-titanica.org/)

[DataSet](https://www.openml.org/search?type=data&status=active&id=40945)

[Info](https://www.kaggle.com/competitions/titanic/data)



In [None]:
df = read_csv(
    "https://raw.githubusercontent.com/ralf-42/ML_Intro/main/02%20data/Titanic.csv",
    usecols=["pclass", "survived", "sex", "age", "sibsp", "parch"],
)

In [None]:
data = df.copy()
target = data.pop("survived")

In [None]:
data.groupby("pclass").count()

In [None]:
target.value_counts()

In [None]:
data.head(-5)

# 2 | Prepare

---


<p><font color='black' size="5">
Datentyp ermitteln
</font></p>

In [None]:
all_col = data.columns
num_col = data.select_dtypes(include="number").columns
cat_col = data.select_dtypes(exclude="number").columns


<p><font color='black' size="5">
Kodierung
</font></p>

In [None]:
categorical_transformer = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="constant", fill_value="missing")),
        ("onehot", OneHotEncoder(handle_unknown="ignore")),
    ]
)


<p><font color='black' size="5">
Skalierung
</font></p>

In [None]:
numerical_transformer = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="mean")),
        ("scaler", StandardScaler())]
)


<p><font color='black' size="5">
Pipeline
</font></p>

In [None]:
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numerical_transformer, num_col),
        ("cat", categorical_transformer, cat_col),
    ]
)


<p><font color='black' size="5">
Train-Test-Split
</font></p>

In [None]:
data_train, data_test, target_train, target_test = train_test_split(
    data, target, test_size=0.20, random_state=42, stratify=target
)
data_train.shape, data_test.shape, target_train.shape, target_test.shape

# 3 | Modeling
---

<p><font color='black' size="5">
Modellauswahl
</font></p>

In [None]:
classifier = MLPClassifier(
    hidden_layer_sizes=(150, 100, 50), max_iter=500, random_state=42
)

In [None]:
model = Pipeline(steps=[("preprocessor", preprocessor), ("classifier", classifier)])

<p><font color='black' size="5">
Training
</font></p>

In [None]:
model.fit(data_train, target_train)

# 4 | Evaluate
---


<p><font color='black' size="5">
Accuracy, Cohen's Kappa, Confusion Matrix
</font></p>

In [None]:
# @title
# @markdown <p><font size="5" color='grey'> <b> Code </b></font> </br></p>
def evaluate(model, data_train, data_test, target_train, target_test):
    # ---
    target_train_pred = model.predict(data_train)
    target_test_pred = model.predict(data_test)
    target_pred = model.predict(data)
    # ---
    acc_train = accuracy_score(target_train, target_train_pred) * 100
    cks_train = cohen_kappa_score(target_train, target_train_pred)
    print(f"Train -- Accuracy: {acc_train:5.2f}%, Cohen's Kappa: {cks_train:5.2f}")
    # ---
    acc_test = accuracy_score(target_test, target_test_pred) * 100
    cks_test = cohen_kappa_score(target_test, target_test_pred)
    print(f"Test -- Accuracy: {acc_test:5.2f}%, Cohen's Kappa: {cks_test:5.2f}")
    # ---
    acc_all = accuracy_score(target, target_pred) * 100
    cks_all = cohen_kappa_score(target, target_pred)
    print(f"All -- Accuracy: {acc_all:5.2f}, Cohen's Kappa: {cks_all:5.2f}")
    print("\n")
    # ---
    conf_matrix = confusion_matrix(target_test, target_test_pred)
    display_labels_ = ["Not Survived", "Survived"]
    disp = ConfusionMatrixDisplay(conf_matrix, display_labels=display_labels_)
    disp.plot(cmap="Blues")
    print(
        classification_report(
            target_test, target_test_pred, target_names=display_labels_
        )
    )
    # ---
    result = data_test.copy()
    result["target"] = target_test
    result["target_pred"] = target_test_pred

In [None]:
evaluate(model, data_train, data_test, target_train, target_test)

# 5 | Deploy
---

# A | XAI - Local Explanations
---

<p><font color='black' size="5">
Explanation for Single Predictions
</font></p>


In [None]:
exp = dx.Explainer(model, data, target)

In [None]:
data.columns

<p><font color='black' size="5">
Prognose für Rose DeWitt Bukater und Jack Dawson
</font></p>

In [None]:
rose = DataFrame(
    {"pclass": [1], "sex": ["female"], "age": [22], "sibsp": [0], "parch": [1]},
    index=["Rose"],
)

rose_pred = float(exp.predict(rose)) * 100
print(f"Prognose: Rose wird gerettet: {rose_pred:.2f}%")

In [None]:
jack = DataFrame(
    {"pclass": [3], "sex": ["male"], "age": [23], "sibsp": [0], "parch": [0]}
)

jack_pred = float(exp.predict(jack)) * 100
print(f"Prognose: Jack wird gerettet: {jack_pred:.2f}%")

<p><font color='black' size="5">
Erläuterung der Prognose für Rose
</font></p>


[Break Down](https://pbiecek.github.io/ema/breakDown.html)

Welche Variablen tragen am meisten zu diesem Ergebnis bei?  Zerlegung der Vorhersage des Modells in Beiträge, die verschiedenen erklärenden Merkmale zugeordnet werden können.

In [None]:
bd_rose = exp.predict_parts(rose, type="break_down", label=rose.index[0])
bd_plus_rose = exp.predict_parts(
    rose, type="break_down_interactions", label="Rose Plus"
)

In [None]:
bd_rose.result

In [None]:
bd_rose.plot()

[Break Down plus](https://pbiecek.github.io/ema/iBreakDown.html)


Berücksichtigung von Interaktionen zwischen den Merkmalen.

In [None]:
bd_plus_rose.plot()

[Shapley Values](https://pbiecek.github.io/ema/shapley.html)


Die Wert der Beiträge eines Merkmals wird über alle (oder eine große Anzahl) möglicher Ordnungen zu mitteln. Die Idee ist eng mit den „Shapley-Werten“ verknüpft, die ursprünglich für kooperative Spiele entwickelt wurden (Shapley 1953 )

In [None]:
sh_rose = exp.predict_parts(rose, type="shap", B=10, label=rose.index[0])
sh_rose.result.loc[sh_rose.result.B == 0,]
sh_rose.plot(bar_width=16)

[Individual Profile / Ceteris Paribus Profiles](https://pbiecek.github.io/ema/ceterisParibus.html)

In [None]:
cp_rose = exp.predict_profile(rose, label=rose.index[0])
cp_jack = exp.predict_profile(jack, label=jack.index[0])
cp_rose.result.head()

In [None]:
cp_rose.plot(cp_jack)

In [None]:
cp_rose.plot(cp_jack, variable_type="categorical")

# B | XAI - Gobal Explanations
---
Explanation on Model Level

<p><font color='black' size="5">
Modellperformance
</font></p>

[Model Performance](https://pbiecek.github.io/ema/modelPerformance.html)

In [None]:
mp = exp.model_performance(model_type="classification")
mp.result

In [None]:
mp.plot(geom="roc")

<p><font color='black' size="5">
Wichtigkeit der Merkmale für einzelne Merkmale & Merkmalsgruppen
</font></p>


Bild von <a href="https://pixabay.com/de/users/thedigitalartist-202249/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=1276873">Pete Linforth</a> auf <a href="https://pixabay.com/de//?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=1276873">Pixabay</a>

In [None]:
vi = exp.model_parts()
vi.result

In [None]:
vi.plot(max_vars=5)

In [None]:
vi_grouped = exp.model_parts(
    variable_groups={"personal": ["sex", "age", "sibsp", "parch"], "status": ["pclass"]}
)
vi_grouped.result

In [None]:
vi_grouped.plot()


<p><font color='black' size="5">
Modellprofil - Partial Dependence Profile & Accumulated Local Dependence Profile
</font></p>

[Partial Dependence Profile (PDP)](https://pbiecek.github.io/ema/partialDependenceProfiles.html)
[Accumulated Local Dependence Profile (ALE)](https://pbiecek.github.io/ema/accumulatedLocalProfiles.html)

In [None]:
pdp_num = exp.model_profile(type="partial", label="pdp")
aldp_num = exp.model_profile(type="accumulated", label="alpd")

In [None]:
pdp_num.plot(aldp_num)

In [None]:
pdp_cat = exp.model_profile(
    type="partial",
    variable_type="categorical",
    variables=["sex", "pclass"],
    label="pdp",
)

aldp_cat = exp.model_profile(
    type="accumulated",
    variable_type="categorical",
    variables=["sex", "pclass"],
    label="aldp",
)

In [None]:
aldp_cat.plot(pdp_cat)

# C | Weitere Beispiele und Methoden
---

* Resources - https://dalex.drwhy.ai/python

* Introduction to the `dalex` package: [Titanic: tutorial and examples](http://dalex.drwhy.ai/python-dalex-titanic.html)
* Key features explained: [FIFA20: explain default vs tuned model with dalex](http://dalex.drwhy.ai/python-dalex-fifa.html)
* How to use dalex with: [xgboost](http://dalex.drwhy.ai/python-dalex-xgboost.html), [tensorflow](http://dalex.drwhy.ai/python-dalex-tensorflow.html), [h2o (feat. autokeras, catboost, lightgbm)](http://dalex.drwhy.ai/python-dalex-h2o.html)
* More explanations: [residuals, shap, lime](http://dalex.drwhy.ai/python-dalex-new.html)
* Introduction to the [Fairness module in dalex](http://dalex.drwhy.ai/python-dalex-fairness.html)
* Introduction to the [Aspect module in dalex](http://dalex.drwhy.ai/python-dalex-aspect.html)
* Introduction to [Arena: interactive dashboard for model exploration](http://dalex.drwhy.ai/python-dalex-arena.html)


* Code in the form of [jupyter notebook](https://github.com/ModelOriented/DALEX-docs/tree/master/jupyter-notebooks)
* Changelog: [NEWS](https://github.com/ModelOriented/DALEX/blob/master/python/dalex/NEWS.md)
* Theoretical introduction to the plots: [Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models](https://pbiecek.github.io/ema)