In [1]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount= True)

ModuleNotFoundError: No module named 'google.colab'

### Addtional Resutls for our PMC Journal Paper: [Privacy and Utility Preserving Sensor-Data Transformations](https://arxiv.org/pdf/1911.05996.pdf)

#### (1) Brief Description of the Resutls
To show that the compound architecture, proposed in the paper, can generalize across datasets, we repeat the same experiment as Table 9 (in the [paper](https://arxiv.org/pdf/1911.05996.pdf)) on another dataset ([MobiAct](https://bmi.teicrete.gr/en/the-mobifall-and-mobiact-datasets-2/)) by keeping the same architecture for RAE and AAE.

In this experiment we consider all kind of **falls** as **sensitive** activities, assuming that they can be considered as symptoms of some diseases. We also consider being **steady** as a **neutral** activity, which in this dataset it means either **sitting** or **standing**. 

We see summary of the results for two different settings for utility-privacy parameters in the below image, that show almost similar results to what we have on the [MotionSense](https://www.kaggle.com/malekzadeh/motionsense-dataset) dataset in Table 9 in the paper.

Accuracy of recognizing the **required** activities are almost same before and after transformations, while accuracy in detecting falls is dropped from 99.6\% to less than 4.5\%. Moreover, we can reduce the adversary's accuracy in detecting gender from 97.35\% to 66.8\%, which is close to the random guess in this dataset that is 74.5\%.

Note that in this dataset we have 41 males and 14 females. So, randomly guessing a subject as male is 74.5\% accurate.($\frac{41 \text{ males}}{55 \text{ males and females}}=74.5$).}

| <img src="additional_mobi_act_pmc_paper.png" class="img-responsive"> |
|:---:|
| Reproducing results of Table 9 in the paper (MotionSens dataset) on another dataset (MobiActdataset). |

* Note that this notebook just use all the already trained models. If you want to train your own models for each stage, please look at other files in this repository.


#### (2) Import required libraries 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.keras as keras
from scipy import stats
pd.set_option('display.float_format', lambda x: '%.4f' % x)

#### (3) Dataset
(Dataset is available here: [The MobiFall and MobiAct datasets](https://bmi.teicrete.gr/en/the-mobifall-and-mobiact-datasets-2/))

We choose 6 activities in the dataset.

* **STR**: satir-stepping, including both staris down or stairs up"
* **WAL**: walking
* **JOG**: jogging
* **JUM**: jumpping
* **STD**: being steady: either sitting or standing 
* **FALL**: falling (Suddenly from standing position to the floor)

#### Name and Code of each Class

In [2]:
act_list = ["STR","WAL","JOG","JUM","STD","FALL"]
gender_labels = ["Female", "Male"]
for i, a in enumerate(act_list):
    print(i,":",a, end=" | ")
print()
for i, a in enumerate(gender_labels):
    print(i,":",a, end=" | ")

0 : STR | 1 : WAL | 2 : JOG | 3 : JUM | 4 : STD | 5 : FALL | 
0 : Female | 1 : Male | 

#### Loading Train and Test Data
(Note: you first need to prepare dataset using the first part of the tutorial in the same folder)

In [3]:
x_train = np.load("x_train.npy", allow_pickle=True)
x_test = np.load("x_test.npy", allow_pickle=True)
y_train = np.load("y_train.npy",allow_pickle=True)
y_test = np.load("y_test.npy",allow_pickle=True)
y_train = y_train.astype(int)
y_test = y_test.astype(int)

In [4]:
nb_classes = len(np.unique(y_test[:,0]))
batch_size = 64

Y_train = keras.utils.to_categorical(y_train[:,0], nb_classes)
Y_test = keras.utils.to_categorical(y_test[:,0], nb_classes)
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], x_train.shape[2] ,1))
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], x_test.shape[2] ,1))
x_train.shape, Y_train.shape, x_test.shape, Y_test.shape

((176760, 128, 9, 1), (176760, 6), (37117, 128, 9, 1), (37117, 6))

In [5]:
np.set_printoptions(suppress=True)

from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

## src: https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7
def cm_analysis(y_true, y_pred, labels, ymap=None, figsize=(10,10)):
    cm = confusion_matrix(y_true, y_pred, labels=range(len(labels)))
    cm_sum = np.sum(cm, axis=1, keepdims=True)
    cm_perc = cm / cm_sum.astype(float) * 100
    print(labels)
    print(cm_perc.round(1))

#### Original Data (without any transformation)

##### Activity Recognition

In [6]:
model = keras.models.load_model('_best_FCN_.hdf5')
preds = model.predict(x_test, verbose=1)
preds = preds.argmax(axis=1)
cm_analysis(y_test[:,0], preds, act_list, figsize=(16,16))

print("f1: ", np.round(f1_score(y_test[:,0], preds, average='macro')*100,2))
print("acc: ", np.round(accuracy_score(y_test[:,0], preds)*100,2))

['STR', 'WAL', 'JOG', 'JUM', 'STD', 'FALL']
[[99.   0.3  0.   0.   0.7  0. ]
 [ 0.4 98.3  0.   0.   1.1  0.2]
 [ 0.7  0.1 94.5  4.8  0.   0. ]
 [ 1.7  0.   5.2 93.   0.1  0. ]
 [ 0.1  1.   0.   0.  98.7  0.2]
 [ 0.   0.   0.   0.   0.2 99.8]]
f1:  97.24
acc:  97.74


## Replacement

As we said, we do not want the app to infer when falls happen, while we want them to infer the other four moving activites ('STR', 'WAL', 'JOG', 'JUM') as accurate as possible.

In [7]:
rae = keras.models.load_model("_RAE_model.hdf5")
rep_x_test = rae.predict(x_test, verbose=1)

# model = keras.models.load_model('_best_FCN_.hdf5')
preds = model.predict(rep_x_test, verbose=1)
preds = preds.argmax(axis=1)
cm_analysis(y_test[:,0], preds, act_list, figsize=(16,16))

['STR', 'WAL', 'JOG', 'JUM', 'STD', 'FALL']
[[98.8  0.3  0.   0.   0.9  0. ]
 [ 0.9 97.6  0.   0.   1.2  0.3]
 [ 1.6  0.1 93.4  4.8  0.   0. ]
 [ 2.4  0.   4.5 93.   0.1  0. ]
 [ 0.3  0.9  0.   0.  98.6  0.2]
 [ 0.3  0.   0.   0.  95.8  3.9]]


### How about gender information?
So, we can hid falls and they are inferred as being steady, while we share the moving activites ('STR', 'WAL', 'JOG', 'JUM') with minimum distortion. 
The question is: what if somebody could infer our gender from them, when they are not supposed to infer it?!

* First, let's see how's the accuracy of gender classifier on the output of RAE: So, we choose the required activites and give them to the already trained gender classifier.

In [8]:
print("Required", end="--> ")
wl = [0,1,2,3]
for i in wl:
    print(act_list[i], end="; ")

w_train_data = x_train[np.isin(y_train[:,0],wl)]
w_act_train_labels = y_train[np.isin(y_train[:,0],wl)][:,0]
w_gen_train_labels = y_train[np.isin(y_train[:,0],wl)][:,1]
print(w_train_data.shape,w_act_train_labels.shape, w_gen_train_labels.shape)

w_test_data = x_test[np.isin(y_test[:,0],wl)]
w_act_test_labels = y_test[np.isin(y_test[:,0],wl)][:,0]
w_gen_test_labels = y_test[np.isin(y_test[:,0],wl)][:,1]
print(w_test_data.shape,w_act_test_labels.shape, w_gen_test_labels.shape)

Required--> STR; WAL; JOG; JUM; (107320, 128, 9, 1) (107320,) (107320,)
(21340, 128, 9, 1) (21340,) (21340,)


In [9]:
eval_gen = keras.models.load_model('_best_gen_FCN_.hdf5')
preds_ = eval_gen.predict(w_test_data, verbose=1)
preds = (preds_ > 0.5).astype(int)[:,0]
cm_analysis(w_gen_test_labels, preds, ['F','M'], figsize=(16,16))
print("f1: ", np.round(f1_score(w_gen_test_labels, preds, average='macro')*100,2))
print("acc: ", np.round(accuracy_score(w_gen_test_labels, preds)*100,2))

['F', 'M']
[[96.2  3.8]
 [ 2.3 97.7]]
f1:  96.51
acc:  97.31


In [10]:
w_data_train_rep = w_train_data.copy()
# w_data_train_rep = rae.predict(w_data_train_rep, verbose=1)
w_data_test_rep = w_test_data.copy()
w_data_test_rep = rae.predict(w_data_test_rep, verbose=1)



In [11]:
preds = eval_gen.predict(w_data_test_rep, verbose=1)
preds = (preds > 0.5).astype(int)[:,0]
cm_analysis(w_gen_test_labels, preds, ['F','M'], figsize=(16,16))
print("f1: ", np.round(f1_score(w_gen_test_labels, preds, average='macro')*100,2))
print("acc: ", np.round(accuracy_score(w_gen_test_labels, preds)*100,2))

['F', 'M']
[[96.9  3.1]
 [ 4.  96. ]]
f1:  95.21
acc:  96.24


Thus, if the adversary look at the non-sensitive activities such as walking, they can accurately infer the gender. And as we see, even if adversaries do not use the raw data and just look at the output of RAE, they still can infer gender with high accuracy.

## Anonymization

Now, we build the AAE and give it the output of the RAE with the goal of hiding the gender.

In [12]:
class Enc_Reg:
    l2p = 0.001
    @staticmethod
    def early_layers(inp, fm, hid_act_func):
        # Start
        x = Conv2D(32, fm, padding="same", kernel_regularizer=regularizers.l2(Enc_Reg.l2p), activation=hid_act_func)(inp)
        x = MaxPooling2D(pool_size=(2, 1))(x)
        x = Dropout(0.25)(x)
        
        # 1
        x = Conv2D(32, fm, padding="same", kernel_regularizer=regularizers.l2(Enc_Reg.l2p), activation=hid_act_func)(x)
        x = MaxPooling2D(pool_size=(2, 1))(x)
        x = Dropout(0.25)(x)

        return x
    
    @staticmethod
    def late_layers(inp, num_classes, fm, act_func, hid_act_func):
        # 2
        x = Conv2D(32, fm, padding="same", kernel_regularizer=regularizers.l2(Enc_Reg.l2p), activation=hid_act_func)(inp)
        x = MaxPooling2D(pool_size=(2, 1))(x)
        x = Dropout(0.25)(x)
        
        # End
        x = Flatten()(x)
        x = Dense(64, kernel_regularizer=regularizers.l2(Enc_Reg.l2p), activation=hid_act_func)(x)
        x = Dropout(0.5)(x)
        x = Dense(16, kernel_regularizer=regularizers.l2(Enc_Reg.l2p), activation=hid_act_func)(x)
        x = Dropout(0.5)(x)
        x = Dense(num_classes, activation=act_func)(x)

        return x
   
    @staticmethod
    def build(height, width, num_classes, name, fm, act_func,hid_act_func):
        inp = Input(shape=(height, width, 1))
        early = Enc_Reg.early_layers(inp, fm, hid_act_func=hid_act_func)
        late  = Enc_Reg.late_layers(early, num_classes, fm, act_func=act_func, hid_act_func=hid_act_func)
        model = Model(inputs=inp, outputs=late ,name=name)
        return model


class Dec_Reg:
    l2p = 0.001
    @staticmethod
    def early_layers(inp, fm, hid_act_func):
        # Start
        x = Conv2D(32, fm, padding="same", kernel_regularizer=regularizers.l2(Dec_Reg.l2p), activation=hid_act_func)(inp)
        x = MaxPooling2D(pool_size=(2, 1))(x)
        x = Dropout(0.25)(x)
        
        # 1
        x = Conv2D(32, fm, padding="same", kernel_regularizer=regularizers.l2(Dec_Reg.l2p), activation=hid_act_func)(x)
        x = MaxPooling2D(pool_size=(2, 1))(x)
        x = Dropout(0.25)(x)


        return x
    
    @staticmethod
    def late_layers(inp, num_classes, fm, act_func, hid_act_func):
        # 2
        x = Conv2D(32, fm, padding="same", kernel_regularizer=regularizers.l2(Dec_Reg.l2p), activation=hid_act_func)(inp)
        x = MaxPooling2D(pool_size=(2, 1))(x)
        x = Dropout(0.25)(x)
        
        # 3
        x = Conv2D(32, fm, padding="same", kernel_regularizer=regularizers.l2(Dec_Reg.l2p), activation=hid_act_func)(x)
        x = MaxPooling2D(pool_size=(2, 1))(x)
        x = Dropout(0.25)(x)
        
        #4
        x = Conv2D(32, fm, padding="same", kernel_regularizer=regularizers.l2(Dec_Reg.l2p), activation=hid_act_func)(x)
        x = MaxPooling2D(pool_size=(2, 1))(x)
        x = Dropout(0.25)(x)
        
        
        # End
        x = Flatten()(x)
        x = Dense(128, kernel_regularizer=regularizers.l2(Dec_Reg.l2p), activation=hid_act_func)(x)
        x = Dropout(0.5)(x)
        x = Dense(32, kernel_regularizer=regularizers.l2(Dec_Reg.l2p), activation=hid_act_func)(x)
        x = Dropout(0.5)(x)
        x = Dense(num_classes, activation=act_func)(x)

        return x
   
    @staticmethod
    def build(height, width, num_classes, name, fm, act_func,hid_act_func):
        inp = Input(shape=(height, width, 1))
        early = Dec_Reg.early_layers(inp, fm, hid_act_func=hid_act_func)
        late  = Dec_Reg.late_layers(early, num_classes, fm, act_func=act_func, hid_act_func=hid_act_func)
        model = Model(inputs=inp, outputs=late ,name=name)
        return model


class Encoder:
    l2p = 0.0001
    @staticmethod
    def layers(x, fm, act_func, hid_act_func):
        x = Conv2D(64, fm, activation=hid_act_func, kernel_regularizer=regularizers.l2(Encoder.l2p), padding='same')(x)
        x = BatchNormalization()(x)
        
        x = Conv2D(64, fm, activation=hid_act_func, kernel_regularizer=regularizers.l2(Encoder.l2p), padding='same')(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2,1))(x)

        x = Conv2D(64, fm, activation=hid_act_func,kernel_regularizer=regularizers.l2(Encoder.l2p), padding='same')(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2,1))(x)

        x = Conv2D(1, fm, activation=act_func, padding='same')(x) 
        y = BatchNormalization()(x)

        return y
   
    @staticmethod
    def build(height, width, fm, act_func, hid_act_func):
        inp = Input(shape=(height, width,1))
        enc = Encoder.layers(inp, fm, act_func=act_func, hid_act_func=hid_act_func)
        model = Model(inputs=inp, outputs=enc ,name="Encoder")
        return model

class Decoder:
    l2p = 0.0001
    @staticmethod
    def layers(y, height, width, fm, act_func, hid_act_func):
        
        x = Conv2DTranspose(64, fm, strides = (1, 1), activation=hid_act_func,kernel_regularizer=regularizers.l2(Decoder.l2p), padding='same')(y)
        x = BatchNormalization()(x)
        x = Conv2DTranspose(64, fm,  strides = (2, 1), activation=hid_act_func,kernel_regularizer=regularizers.l2(Decoder.l2p), padding='same')(x)
        x = BatchNormalization()(x)
        x = Conv2DTranspose(64, fm, strides = (2, 1), activation=hid_act_func,kernel_regularizer=regularizers.l2(Decoder.l2p), padding='same')(x)
        x = BatchNormalization()(x)
        
        xh = Conv2D(1, fm, activation=act_func, padding='same')(x)
        return xh
   
    @staticmethod
    def build(height, width, fm , act_func, hid_act_func):
        inp = Input(shape=(height, width,1))
        dec  = Decoder.layers(inp,height, width, fm, act_func=act_func, hid_act_func=hid_act_func)
        model = Model(inputs=inp, outputs=dec ,name="Decoder")
        return model        

In [13]:
import keras.backend as K
def gen_equ_loss_func(y_true, y_pred):
    loss = K.mean(K.abs(0.5 - y_pred))
    return loss

def build_AAE(loss_weights):
    id_class_numbers = 1
    act_class_numbers = 4
    #fm = (2,3)
    #reps_id = Enc_Reg.build(height, width//4, id_class_numbers, name ="EncReg", fm=fm, act_func="sigmoid",hid_act_func="relu")
    fm = (5,9)
    rcon_id = Dec_Reg.build(height, width, id_class_numbers, name ="GenReg", fm=fm, act_func="sigmoid",hid_act_func="relu")
    # print(rcon_id.summary())
    rcon_task = Dec_Reg.build(height, width, act_class_numbers, name ="ActReg", fm=fm, act_func="softmax",hid_act_func="relu")
    # print(rcon_task.summary())
    #reps_id.compile( loss="binary_crossentropy", optimizer='adam', metrics=['acc'])
    rcon_id.compile( loss="binary_crossentropy", optimizer='adam', metrics=['acc'])
    rcon_task.compile( loss="categorical_crossentropy", optimizer='adam', metrics=['acc'])

    #reps_id.trainable = False
    rcon_id.trainable = False
    rcon_task.trainable = False

    enc_to_reps = Encoder.build(height, width, fm=fm, act_func="linear", hid_act_func="relu")
    # print(enc_to_reps.summary())
    reps_to_dec = Decoder.build(height//4, width, fm=fm, act_func="linear", hid_act_func="relu")
    # print(reps_to_dec.summary())
    enc_to_reps.compile( loss="mean_squared_error", optimizer='adam', metrics=['mse'])
    reps_to_dec.compile( loss="mean_squared_error", optimizer='adam', metrics=['mse'])

    x = Input(shape=(height, width,1))
    z = enc_to_reps(x)
    #idz = reps_id(z)
    xh = reps_to_dec(z)
    idxh = rcon_id(xh)
    txh = rcon_task(xh)



    anon_model = Model(inputs = x,
                       outputs = [xh,
                                  #idz,
                                  idxh,
                                  txh
                                 ],
                       name ="anon") 
    anon_model.compile(loss = ["mean_squared_error",
                               #"binary_crossentropy",
                                gen_equ_loss_func,
                               "categorical_crossentropy"
                              ],
                       loss_weights = loss_weights,                 
                       optimizer = "adam",
                       metrics = ["acc"])
    #enc_to_reps.set_weights(enc_dec_tmp.layers[1].get_weights()) 
    #reps_to_dec.set_weights(enc_dec_tmp.layers[2].get_weights()) 


    return anon_model, rcon_task, rcon_id

from tensorflow.keras.layers import *
from tensorflow.keras import Model
from tensorflow.keras import regularizers
height = w_data_train_rep.shape[1]
width = w_data_train_rep.shape[2]
fm = (5,9)
loss_weights=[2, 1, 4]            
anon_model, rcon_task, rcon_id = build_AAE(loss_weights)
anon_model.summary()

Using TensorFlow backend.


Model: "anon"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_5 (InputLayer)            [(None, 128, 9, 1)]  0                                            
__________________________________________________________________________________________________
Encoder (Functional)            (None, 32, 9, 1)     375365      input_5[0][0]                    
__________________________________________________________________________________________________
Decoder (Functional)            (None, 128, 9, 1)    375361      Encoder[0][0]                    
__________________________________________________________________________________________________
GenReg (Functional)             (None, 1)            337665      Decoder[0][0]                    
_______________________________________________________________________________________________

Here we can see the results of anonymization by AAE (hiding the gender in this case) for three different settings of trade-off parameters.

In [14]:
anon_model.load_weights('gender_anon_1.h5')
rep_anon_test = anon_model.predict(rep_x_test, verbose = 1)[0]


eval_act = keras.models.load_model('_best_FCN_.hdf5')
preds = eval_act.predict(rep_anon_test, verbose=1)
preds = preds.argmax(axis=1)
cm_analysis(y_test[:,0], preds, act_list, figsize=(16,16))


eval_gen = keras.models.load_model('_best_gen_FCN_.hdf5')
preds = eval_gen.predict(rep_anon_test, verbose=1)
preds = (preds > 0.5).astype(int)[:,0]
cm_analysis(y_test[:,1], preds, ['F','M'], figsize=(16,16))
print("f1: ", np.round(f1_score(y_test[:,1], preds, average='macro')*100,2))
print("acc: ", np.round(accuracy_score(y_test[:,1], preds)*100,2))

['STR', 'WAL', 'JOG', 'JUM', 'STD', 'FALL']
[[98.5  0.5  0.   0.   0.9  0. ]
 [ 1.5 97.1  0.   0.   1.1  0.3]
 [ 3.3  0.1 92.1  4.5  0.   0. ]
 [ 4.6  0.   4.4 91.   0.   0. ]
 [ 0.5  1.   0.   0.  95.8  2.7]
 [ 0.7  0.   0.   0.  95.6  3.7]]
['F', 'M']
[[61.1 38.9]
 [13.8 86.2]]
f1:  73.59
acc:  79.71


In [15]:
anon_model.load_weights('gender_anon_2.h5')

rep_anon_test = anon_model.predict(rep_x_test, verbose = 1)[0]

eval_act = keras.models.load_model('_best_FCN_.hdf5')
preds = eval_act.predict(rep_anon_test, verbose=1)
preds = preds.argmax(axis=1)
cm_analysis(y_test[:,0], preds, act_list, figsize=(16,16))


eval_gen = keras.models.load_model('_best_gen_FCN_.hdf5')
preds = eval_gen.predict(rep_anon_test)
preds = (preds > 0.5).astype(int)[:,0]
cm_analysis(y_test[:,1], preds, ['F','M'], figsize=(16,16))
print("f1: ", np.round(f1_score(y_test[:,1], preds, average='macro')*100,2))
print("acc: ", np.round(accuracy_score(y_test[:,1], preds)*100,2))

['STR', 'WAL', 'JOG', 'JUM', 'STD', 'FALL']
[[99.3  0.3  0.   0.   0.4  0. ]
 [ 3.5 95.1  0.   0.   1.1  0.3]
 [ 4.   0.1 92.2  3.7  0.   0. ]
 [ 6.   0.   4.9 89.   0.   0. ]
 [ 1.4  0.9  0.   0.  96.3  1.4]
 [ 1.   0.   0.   0.  95.3  3.7]]
['F', 'M']
[[69.2 30.8]
 [30.2 69.8]]
f1:  65.67
acc:  69.61


In [16]:
anon_model.load_weights('gender_anon_3.h5')
rep_anon_test = anon_model.predict(rep_x_test, verbose = 1)[0]


eval_act = keras.models.load_model('_best_FCN_.hdf5')
preds = eval_act.predict(rep_anon_test, verbose=1)
preds = preds.argmax(axis=1)
cm_analysis(y_test[:,0], preds, act_list, figsize=(16,16))


eval_gen = keras.models.load_model('_best_gen_FCN_.hdf5')
preds = eval_gen.predict(rep_anon_test,verbose=1)
preds = (preds > 0.5).astype(int)[:,0]
cm_analysis(y_test[:,1], preds, ['F','M'], figsize=(16,16))
print("f1: ", np.round(f1_score(y_test[:,1], preds, average='macro')*100,2))
print("acc: ", np.round(accuracy_score(y_test[:,1], preds)*100,2))

['STR', 'WAL', 'JOG', 'JUM', 'STD', 'FALL']
[[99.1  0.6  0.   0.   0.4  0. ]
 [ 4.2 94.3  0.   0.   1.1  0.4]
 [ 2.5  0.1 93.4  4.   0.   0. ]
 [ 5.4  0.   5.2 89.4  0.   0. ]
 [ 5.   1.   0.   0.  92.6  1.4]
 [ 0.8  0.   0.   0.  94.5  4.7]]
['F', 'M']
[[69.4 30.6]
 [36.1 63.9]]
f1:  62.01
acc:  65.29


Finally, we see that after using the AAE we suffer a bit more accuracy loss. However, inferring the gender is so close to the random guess on this dataset.