# **Attribute inference attack with the blackbox module**

In this notebook, we will demonstrate how to perform an attribute inference attack with the blackbox module on a classification model. The goal of an attribute inference attack is to infer the value of a sensitive attribute of a data record by querying a model trained on the data. In this case, we will consider the 'Job' attribute of the German credit dataset as the attribute to be inferred. 

### **Importing the necessary libraries and loading the data** 

In [None]:
import numpy as np
from holisticai.security.attackers.attribute_inference.wrappers.classification.scikitlearn import SklearnClassifier
from holisticai.datasets import load_dataset
from holisticai.security.attackers.attribute_inference.black_box import AttributeInferenceBlackBox

In [2]:
dataset = load_dataset('german_credit', preprocessed=True)
train_test = dataset.train_test_split(test_size=0.2, stratify=dataset['y'], random_state=0)
train = train_test['train']
test = train_test['test']
train

### **Dataset preprocessing**

The German credit dataset is a dataset that contains information about 1000 loan applicants. The dataset has 20 attributes, including the 'Job' attribute, which will be the attribute to be inferred. The 'Job' attribute has four possible values: 'Unskilled', 'Unskilled Resident', 'Skilled', and 'Highly Skilled'. The dataset also contains a binary target attribute, 'Credit', which indicates whether the loan application was approved or not.

For this demonstration, we will transform the 'Job' attribute into a binary attribute by considering only two values: 'Highly Skilled', represented by 1, and 'Not Highly Skilled', represented by 0. The goal of the attack will be to infer the value of the 'Job' attribute of a data record by querying a model trained on the data.

In [3]:
job_feat_train = train['X']['Job'].values
job_feat_train[job_feat_train < 2] = 0
job_feat_train[job_feat_train >= 2] = 1

job_feat_test = test['X']['Job'].values
job_feat_test[job_feat_test < 2] = 0
job_feat_test[job_feat_test >= 2] = 1

To make this demonstration more simple, we will also remove the 'Credit amount' attribute and the 'Duration' attribute from the dataset.

In [4]:
x_train = train['X'].drop(columns=['Credit amount', 'Duration']).values
x_test = test['X'].drop(columns=['Credit amount', 'Duration']).values

y_train = train['y'].values
y_test = test['y'].values

We will also replace the transformed 'Job' attribute into the original training data and test data.

In [5]:
attack_feature = 0
x_train[:, attack_feature] = job_feat_train
x_test[:, attack_feature] = job_feat_test

### **Attribute inference attack - blackbox**

Now, we will perform an attack to infer the selected attribute using the `AttributeInferenceBlackBox` class from the `holisticai` library. This class, creates an object that uses an internal model to perform the attack. The internal model is trained on the same dataset used to train the target model to learn the attacked feature from the remaining features. This module assumes the availability of the attacked model's predictions for the samples under attack, in addition to the rest of the feature values. 

In [7]:
from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier()
classifier.fit(x_train, y_train)
classifier = SklearnClassifier(classifier)

attack = AttributeInferenceBlackBox(estimator=classifier, attack_feature=attack_feature)

pred = classifier.predict_proba(x_train)
attack.fit(x_train, y_train, pred)

attack_x_test = np.delete(x_test, attack_feature, axis=1)
pred = classifier.predict_proba(x_test)

feat_true = x_test[:, attack_feature]

values = [0, 1]
feat_pred = attack.infer(attack_x_test, y_test, pred, values=values)

### **Measuring the attack success**

The success of the attack is measured by the accuracy of the inferred attribute. The accuracy is calculated as the ratio of the correctly inferred attributes to the total number of data records. This can be done by using traditional classification metrics such as accuracy, precision, recall, and F1-score. For our case, we will use the `classification_efficacy_metrics` function from the `holisticai` library to calculate these metrics.

In [8]:
from holisticai.efficacy.metrics import classification_efficacy_metrics

classification_efficacy_metrics(feat_true, feat_pred)

Unnamed: 0_level_0,Value,Reference
Metric,Unnamed: 1_level_1,Unnamed: 2_level_1
Accuracy,0.915,1
Balanced Accuracy,0.845709,1
Precision,0.920245,1
Recall,0.974026,1
F1-Score,0.946372,1


As we can see, our attack achieved an accuracy of 0.915, which means that the attack was able to infer the 'Job' attribute with an accuracy of 91.5%, which is a high accuracy. This demonstrates the vulnerability of the model to attribute inference attacks and the importance of protecting sensitive attributes in the data.