The objective of this demo is to showcase a scenario in which a ficticious bank wants to implement an ML-based AI system to predict the likelihood that an applicant will default on a credit-card loan. It could be used, in part, to determine whether a client is eligible for another loan or a credit increase. The AI system should be trustable, hence they want to use TrustML to assess the trustworthiness of the candidate classification models before their deployment.

We use the "credit card default" dataset (https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset), which contains historical data on credit-card defaults in Taiwan to simulate the scenario and the TrustML package to verify the model's trustworthiness.

# Step 1: Defining the configuration file
In this scenario, the bank is interested in having an AI model complying with several trustworthiness criteria. In decreasing order of importance for them, the AI system should comply with:
1. Performance. It is worse to class customers as low default risk when they are actually of high default risk, than it is to class customers as high risk when they are actually of low default risk.
2. Uncertainty. Knowing how uncertain the prediction is, as it’s important for the banker in order to increase the credit or giving new loans. More uncertainty, more risk for the banker.
3. Fairness. Ethical aspects. Sensiblel attributes from the applicants: gender. The AI system should perform the predictions regardless of the sensible attributes.

Based on this, we specify a configuration file based on metrics belonging to the three considered trustworthiness dimensions, and we specify the assessment method as a weighted average with equal weights for the two dimensions and the metrics that will be used.

We define the configuration file as follows:

```yaml
metrics:
    - AccuracySKL
    - PrecisionSKL:
        multiclass_average: "binary"
    - RecallSKL:
        multiclass_average: "binary"
    - F1SKL:
        multiclass_average: "binary"
    - PPercentageSKL:
        protected_attributes: [SEX]
        positive_class: 0
    - EqualOpportunitySKL:
        protected_attributes: [SEX]
        positive_class: 0    
    - InvertedExpectedCalibrationSKL
    - InvertedBrierSKL
assessment_method:
    WeightedAverage:
        performance-0.3:
            AccuracySKL: 0.1
            PrecisionSKL: 0.1
            RecallSKL: 0.6
            F1SKL: 0.2
        uncertainty-0.4:                  
            InvertedBrierSKL: 0.5
            InvertedExpectedCalibrationSKL: 0.5        
        fairness-0.3:             
            PPercentageSKL: 0.5
            EqualOpportunitySKL: 0.5
```

Note how we included the sensible attribute in the fairness-related metrics (SEX column), as well as the positive target, which is "0" in this case, i.e., the positive target is no credit-card default, while "1" indicates the credit-hard holder defaulted.

# Step 2: Import relevant packages
The first step will consist in importing the TrustML package, the classification model that we will use in the demo (RandomForestClassifier) and some supporting functions/modules, notably pandas for the dataset loading/manipulation and train_test_split to partition the dataset.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
from trustML.computation import TrustComputation

pd.reset_option("max_columns")
pd.set_option('display.max_columns', None)

# Step 3: Load dataset and create train/test splits
Now we load the dataset file, and we do some preprocessing: we drop the irrelevant "ID" feature, we set the type of the categorical features to pandas' "category", and extract the target column (i.e., if the credit card holder defaulted or not), and split the dataset into training and test, with a 70%-30% proportion.

NOTE that pandas will require the "xlrd" package to read the dataset file, as it is in Excel format.

We can install it easily through pip:
```
pip install xlrd
```

In [None]:
demo_path = 'demos/credit_card_default/'
file_dataset = demo_path + 'credit_card_default.xls'
path_configuration_bank = demo_path + 'config_credit_card_default.yml'

# Load the data
dataset = pd.read_excel(file_dataset, header=1).drop(columns=['ID']).rename(columns={'PAY_0':'PAY_1'})
dataset.head()

# Extract the target
Y = dataset["default payment next month"]
categorical_features = ['EDUCATION', 'MARRIAGE','PAY_1', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']
for col in categorical_features:
    dataset[col] = dataset[col].astype('category')

# SENSITIVE COLUMN [2 (female), 1 (male)] -> [0,1]
dataset['SEX'].replace([2,1], [0,1], inplace=True)

# Decompose the dataset
# Train-test split
X_train, X_test, Y_train, Y_test = train_test_split(dataset.drop(columns=['default payment next month']), 
    Y, test_size = 0.3, random_state=1)

# Step 4: Create and train the classifier
We will now train a random forest classifier on the training set.

In [None]:
# TRAIN A RANDOM FOREST CLASSIFIER
rf_classifier = RandomForestClassifier()
rf_classifier.fit(X_train, Y_train)

# Step 5: Compute the trustworthiness
Once trained, we will assess its trustworthiness with the TrustML package. For this, we instantiate a TrustComputation, we call the load_trust_definition method with the path to the configuration file we specified, and lastly we call the compute_trust function, passing the trained model and the test dataset (features and target) to evaluate the model's trustworthiness in such dataset. This function stores the trust assessment as a JSON-formatted string.

In [None]:
# TRUST STUFF
trust_bank = TrustComputation()
trust_bank.load_trust_definition(config_path=path_configuration_bank)
trust_bank.compute_trust(trained_model=rf_classifier, data_x=X_test, data_y=Y_test)

Now we can print the complete trustworthiness assessment as a JSON-formatted string using the get_trust_as_JSON function:

In [None]:
print(trust_bank.get_trust_as_JSON())

Which results in the following output:

```javascript
{
  "name": "Trust",
  "weighted_score": 0.68,
  "children": [
    {
      "name": "performance",
      "weight": 0.3,
      "weighted_score": 0.14,
      "raw_score": 0.45,
      "children": [
        {
          "name": "AccuracySKL",
          "weight": 0.1,
          "weighted_score": 0.08,
          "raw_score": 0.81
        },
        {
          "name": "PrecisionSKL",
          "weight": 0.1,
          "weighted_score": 0.06,
          "raw_score": 0.65
        },
        {
          "name": "RecallSKL",
          "weight": 0.6,
          "weighted_score": 0.21,
          "raw_score": 0.36
        },
        {
          "name": "F1SKL",
          "weight": 0.2,
          "weighted_score": 0.09,
          "raw_score": 0.46
        }
      ]
    },
    {
      "name": "uncertainty",
      "weight": 0.4,
      "weighted_score": 0.25,
      "raw_score": 0.63,
      "children": [
        {
          "name": "InvertedBrierSKL",
          "weight": 0.5,
          "weighted_score": 0.14,
          "raw_score": 0.27
        },
        {
          "name": "InvertedExpectedCalibrationSKL",
          "weight": 0.5,
          "weighted_score": 0.5,
          "raw_score": 0.99
        }
      ]
    },
    {
      "name": "fairness",
      "weight": 0.3,
      "weighted_score": 0.29,
      "raw_score": 0.97,
      "children": [
        {
          "name": "PPercentageSKL",
          "weight": 0.5,
          "weighted_score": 0.48,
          "raw_score": 0.96
        },
        {
          "name": "EqualOpportunitySKL",
          "weight": 0.5,
          "weighted_score": 0.49,
          "raw_score": 0.98
        }
      ]
    }
  ]
}
```

We can also generate a graphical report in PDF:

In [None]:
trust_bank.generate_trust_PDF(save_path=demo_path + "report_german_credit.pdf")

Which would generate a PDF an excerpt of which is shown in the following image:

![Report excerpt](excerpt_report.png)


# Conclusions
Which results in a value of 0.68 of the trustworthiness indicator for the classification model that will conform the AI system. As we can observe in the drill-down assessment, despite the considerably high accuracy (0.81), partly due to the class-imbalance, the low recall and F1 scores negatively impact the overall performance dimension and thus the Trustworthiness of the model. 

According to the trustability criteria of the ficticious open source community, the model would not be deemed as acceptable, as the recall obtained is low (raw score of 0.36, weighted score of 0.21) and the overall trustworthiness obtained is below their threshold (0.8), so the model/dataset would require changes before the AI system may be deployed and used to assist the bankers' decision making processes.

This notebook has illustrated how easy it is to use the TrustML package to evaluate the trustworthiness of a classification model intended to be used as part of an AI system. In this case, the TrustML package has been used as part of a model building pipeline, obtaining a trustworthiness assessment of 0.68 (out of 1).