# **Attribute inference attack with the BlackBox module**

In this notebook, we will demonstrate how to perform an attribute inference attack with the BlackBox module. The goal of an attribute inference attack is to infer the value of a sensitive attribute of a data record by querying a model trained on the data. In this case, we will work with the 'us_crime' dataset, which contains information about crime rates in the United States, and we will consider the 'race' attribute to perform the attack.

### **Importing the necessary libraries and loading the data** 

In [1]:
import numpy as np
from holisticai.security.attackers.attribute_inference.dataset_utils import AttackDataset
from holisticai.security.attackers.attribute_inference.wrappers.regression.scikitlearn import ScikitlearnRegressor
from holisticai.datasets import load_dataset
from holisticai.security.attackers.attribute_inference.black_box import AttributeInferenceBlackBox

In [4]:
dataset = load_dataset('us_crime', preprocessed=True, protected_attribute='race')
train_test = dataset.train_test_split(test_size=0.2, random_state=0)
train = train_test['train']
test = train_test['test']
train

In [5]:
train['X'].head()

Unnamed: 0,state,fold,population,householdsize,racepctblack,racePctAsian,racePctHisp,agePct12t21,agePct12t29,agePct16t24,...,NumStreet,PctForeignBorn,PctBornSameState,PctSameHouse85,PctSameCity85,PctSameState85,LandArea,PopDens,PctUsePubTrans,LemasPctOfficDrugUn
0,45,8,0.14,0.56,0.85,0.09,0.03,0.76,0.83,0.77,...,0.09,0.09,0.62,0.33,0.36,0.36,0.34,0.07,0.27,1.0
1,6,2,0.01,0.4,0.02,0.11,0.22,0.39,0.42,0.25,...,0.01,0.19,0.66,0.3,0.57,0.78,0.01,0.26,0.02,0.0
2,55,5,0.0,0.33,0.0,0.01,0.01,0.36,0.4,0.24,...,0.0,0.03,0.67,0.76,0.77,0.71,0.02,0.15,0.02,0.0
3,34,3,0.01,0.71,0.36,0.21,0.06,0.77,0.65,0.66,...,0.0,0.35,0.48,0.66,0.53,0.57,0.01,0.48,0.82,0.0
4,13,8,0.01,0.65,0.08,0.2,0.04,0.48,0.36,0.2,...,0.0,0.13,0.22,0.2,0.0,0.0,0.07,0.07,0.01,0.0


### **Dataset preprocessing**

The 'us_crime' dataset that we will use in this notebook is a processed version that contains 1594 records and 102 attributes, including the protected attribute that we will use in the attack. This protected attribute is a binary attribute that indicates whether the individual is white or non-white.

In [None]:
train_data = train['X'].copy()
train_data['group_a'] = train['group_a'].astype(int)

test_data = test['X'].copy()
test_data['group_a'] = test['group_a'].astype(int)

In [None]:
x_train = train_data.values
x_test = test_data.values

y_train = train['y'].values
y_test = test['y'].values

In [None]:
attack_feature = 101 # last column represents the sensitive attribute

### **Attribute inference attack - blackbox**

Now, we will perform a attack to infer the selected attribute using the `AttributeInferenceBlackBox` class from the `holisticai` library. This class, creates an object that uses an internal model to perform the attack. The internal model is trained on the same dataset used to train the target model to learn the attacked feature from the remaining features. This module assumes the availability of the attacked model's predictions for the samples under attack, in addition to the rest of the feature values. 

In [12]:
from holisticai.pipeline import Pipeline
from sklearn.tree import DecisionTreeRegressor

regressor = Pipeline(steps=[
    ('model', DecisionTreeRegressor())
])

regressor.fit(x_train, y_train)

# regressor = train_holisticai_regressor(x_train, y_train)
regressor = ScikitlearnRegressor(regressor)

attack = AttributeInferenceBlackBox(estimator=regressor, attack_feature=attack_feature, scale_range=(0,1))

pred = regressor.predict(x_train)

attack.fit(x_train, y_train, pred)

attack_x_test = np.delete(x_test, attack_feature, axis=1)

pred = regressor.predict(x_test)

feat_true = x_test[:, attack_feature]

values = [False, True]
feat_pred = attack.infer(attack_x_test, y_test, pred, values=values)

### **Measuring the attack success**

The success of the attack is measured by the accuracy of the inferred attribute. The accuracy is calculated as the ratio of the correctly inferred attributes to the total number of data records. This can be done by using traditional classification metrics such as accuracy, precision, recall, and F1-score. For our case, we will use the `classification_efficacy_metrics` function from the `holisticai` library to calculate these metrics.

In [13]:
from holisticai.efficacy.metrics import classification_efficacy_metrics

classification_efficacy_metrics(feat_true, feat_pred)

Unnamed: 0_level_0,Value,Reference
Metric,Unnamed: 1_level_1,Unnamed: 2_level_1
Accuracy,0.969925,1
Balanced Accuracy,0.927434,1
Precision,0.976676,1
Recall,0.988201,1
F1-Score,0.982405,1


As we can see, our attack achieved an accuracy of 0.969, which means that the attack was able to infer the 'group_a' attribute with an accuracy of 96.9%, which is a high accuracy. This demonstrates the vulnerability of the model to attribute inference attacks and the importance of protecting sensitive attributes in the data.