In [17]:
import pandas as pd
import numpy as np
import warnings
from sklearn.preprocessing import StandardScaler, MinMaxScaler
#warnings.filterwarnings('ignore')
# BEWARE, ignoreing warnings is not always a good idea
# I am doing it for presentation

# Private and Encrypted AI - Credit Approval Application

This notebook is meant for my exploratory development of en encrypted federated deep learning approach.
I will develop a final model in a separate folder.

### Glossary
1. [Data Preparation & Setup](#data_prep)
2. [Classical Deep Learning](#classical_dl)
3. [Federated Deep Learning](#federated_dl)<br>
    3.1 [Model Averaging with Trusted Aggregator](#fl_model_avg)
4. [Encrypted Deep Learning](#encrypted_dl)<br>
   4.1 [Secured Multi-Party Computation (SMPC)](#smpc) <br>
   4.2 [Encrypted Gradient Averaging](#fl_encrypt_avg)<br>
   4.3 [Differential Privacy for DL](#dp_dl)
   
<hr>

_Notes_ <br>This project was inspired by lectures of [Andrew Trask](https://iamtrask.github.io/) in the [Private AI Scholarship Challenge on Udacity](https://www.udacity.com/facebook-AI-scholarship). Furthermore, segments of the code are inspired by the [PySyft tutorials on GitHub](https://github.com/OpenMined/PySyft/tree/dev/examples/tutorials); an excellent resource for people starting off with Private AI. 

<a id='data_prep'></a>
## Data Preparation
- only using non-NaN values. I drop NaN values because the dataset is not very big regardless, and we are not dropping very many values.
- Convert binary variables to a numeric representation, and one-hot-encode categorical variables. We do not want to use label encoder since a label encoder would make it 

In [2]:
cols = [ f"A{i}" for i in range(1,16)]
cols.append('label')

In [3]:
df = pd.read_csv('data/crx.data', names=cols)\
    .replace(to_replace='?', value=np.nan).dropna()
print(df.shape, "\n ------- \n")
print(df.head(2))

(653, 16) 
 ------- 

  A1     A2    A3 A4 A5 A6 A7    A8 A9 A10  A11 A12 A13    A14  A15 label
0  b  30.83  0.00  u  g  w  v  1.25  t   t    1   f   g  00202    0     +
1  a  58.67  4.46  u  g  q  h  3.04  t   t    6   f   g  00043  560     +


### Data Analysis

Let's check out what this data looks like first, so that we have an idea of what we are dealing with. In true encrypted, federated learning we would not have this luxury though...

In [4]:
def to_binary(df, col):
    u = df[col].unique()
    mapping =dict(zip(u, [i for i in range(0,len(u))]))
    return df[col].map(mapping)

In [5]:
df.A1.head()

0    b
1    a
2    a
3    b
4    b
Name: A1, dtype: object

In [6]:
#convert to float
for col in ['A2', 'A3', 'A8', 'A11', 'A14', 'A15']:
    df[col] = df[col].astype(float)
    
#binarize
for col in ['A1', 'A9', 'A10', 'A12', 'label']:
    df[col] = to_binary(df, col)
    
onehot_cols = ['A4', 'A5', 'A6', 'A7', 'A13']

#perform one hot encoding, and drop original columns
df  = df.join(pd.get_dummies(df[onehot_cols], dtype=int))\
                                .drop(onehot_cols, axis=1)

In [7]:
set(df.dtypes) #check that we have the data types we expect, no object types

{dtype('int64'), dtype('float64')}

In [8]:
#distribution of numeric-only columns
df[['A2', 'A3', 'A8', 'A11', 'A14', 'A15']].describe().iloc[1:, :10].round(3)

Unnamed: 0,A2,A3,A8,A11,A14,A15
mean,31.504,4.83,2.244,2.502,180.36,1013.761
std,11.838,5.027,3.371,4.968,168.297,5253.279
min,13.75,0.0,0.0,0.0,0.0,0.0
25%,22.58,1.04,0.165,0.0,73.0,0.0
50%,28.42,2.835,1.0,0.0,160.0,5.0
75%,38.25,7.5,2.625,3.0,272.0,400.0
max,76.75,28.0,28.5,67.0,2000.0,100000.0


In [9]:
df.head(2) #double check what our DF looks like

Unnamed: 0,A1,A2,A3,A8,A9,A10,A11,A12,A14,A15,...,A7_ff,A7_h,A7_j,A7_n,A7_o,A7_v,A7_z,A13_g,A13_p,A13_s
0,0,30.83,0.0,1.25,0,0,1.0,0,202.0,0.0,...,0,0,0,0,0,1,0,1,0,0
1,1,58.67,4.46,3.04,0,0,6.0,0,43.0,560.0,...,0,1,0,0,0,0,0,1,0,0


### Simulate Real People's Data

To illustrate how this model would work in real life, I want to simulate this data belonging to people. I am generating random names to be associated with each row. I know that this is not an ideal example since I am in fact starting with the data all collated on my computer with peoples names and data being directly exposed. Not private at all...

In [10]:
import names #used to get random names
names.get_first_name()+' ' +names.get_last_name() #call random name

'Thelma Hoyle'

In [11]:
users = []
used_names = set()
for idx in range(len(df)):
    name = names.get_first_name()+' ' +names.get_last_name()
    while name in used_names:
        name = names.get_first_name()+' ' +names.get_last_name()
        
    used_names.add(name)
    users.append(name)

In [12]:
df['name'] = users
df.head(2)

Unnamed: 0,A1,A2,A3,A8,A9,A10,A11,A12,A14,A15,...,A7_h,A7_j,A7_n,A7_o,A7_v,A7_z,A13_g,A13_p,A13_s,name
0,0,30.83,0.0,1.25,0,0,1.0,0,202.0,0.0,...,0,0,0,0,1,0,1,0,0,Stuart Pettrey
1,1,58.67,4.46,3.04,0,0,6.0,0,43.0,560.0,...,1,0,0,0,0,0,1,0,0,Steven Narro


In [21]:
#get features and labels as numpy arrays which we can convert to tensors
features = df.drop(['label', 'name'], axis=1).values.astype(float)
labels = df['label'].values.astype(float)


#normalize
sclr = MinMaxScaler()
features = sclr.fit_transform(features)

In [22]:
#save features and labels for future use
np.save('data/features', features)
np.save('data/labels', labels)

#save labels where shape is (1,2)
labels=pd.get_dummies(df['label']).values.astype(float)
np.save('data/labels_dim', labels)


_Please Note_ <br>
Normalization is not necessary per se for any machine learning algorithm, but it is recommended for deep learning for training purposes. Read more [here](https://datascience.stackexchange.com/a/13221/60648).

## Model Development
I am using PyTorch to create a neural network to classify whether someone is accepted for credit or not. PyTorch integrates will with PySyft, the package used to encrypt our deep learning model

In [131]:
import copy
from torch import nn
from torch import optim
import torch.nn.functional as F
import syft as sy
import torch as th
th.manual_seed(42) #so that dropout affects same layers

data = th.tensor(features, dtype=th.float32, requires_grad=True)
target = th.tensor(labels, dtype=th.float32, requires_grad=False).reshape(-1,2)

class Model(nn.Module):
    '''
    Neural Network Example Model
    
    Attributes
    :hidden_layers (nn.ModuleList) - hidden units and dimensions for each layer
    :output (nn.Linear) - final fully-connected layer to handle output for model
    :dropout (nn.Dropout) - handling of layer-wise drop-out parameter
    
    Functions
    :forward - handling of forward pass of datum through the network.
    '''
    def __init__(self, args):
        super(Model, self).__init__()
        self.hidden_layers = nn.ModuleList([nn.Linear(args.in_size,
                                                      args.hidden_layers[0])])

        #create hidden layers
        layer_sizes = zip(args.hidden_layers[:-1], args.hidden_layers[1:]) 
        #gives input/output sizes for each layer
        self.hidden_layers.extend([nn.Linear(h1, h2) for h1, h2 in layer_sizes])
        self.output = nn.Linear(args.hidden_layers[-1], args.out_size)
        self.dropout = None if args.drop_p is None \
                                            else nn.Dropout(p=args.drop_p)
        
    def forward(self, x):
        x = x.view(-1, args.in_size)
        for each in self.hidden_layers:
            x = F.relu(each(x)) #apply relu to each hidden node
            
            if self.dropout is not None:
                x = self.dropout(x) #apply dropout
                
        x = self.output(x) #apply output weights
        
        if args.activation is None:
            return x
        
        return args.activation(x, dim=args.dim) #apply activation log softmax

<a id='classical_dl'></a>
## Classical Deep Learning
Here we train our network on data that is not distributed (therefore this is not yet a federated or encrypted problem). However, this exercise is useful in showing how we can transition from traditional deep learning to federated deep learning.

First create a dataset of batch size one. This is realistic since most people would only have their own credit score data. This might be different if we decide to use a secure or trusted third party to manage parts of the data, but we don't trust the credit rating company with our data.

In [98]:
class Arguments():
    def __init__(self, in_size, out_size, hidden_layers,
                       activation=F.softmax, dim=-1):
        self.batch_size = 1
        self.drop_p = None
        self.epochs = 10
        self.lr = 0.001
        self.in_size = in_size
        self.out_size = out_size
        self.hidden_layers = hidden_layers
        self.precision_fractional=10
        self.activation = activation
        self.dim = dim

In [99]:
dataset = [(data[i], target[i]) for i in range(len(data))]

#instantiate model
in_size = data[0].shape[0]
out_size = 2
hidden_layers=[30,15]

In [100]:
_data, _target = dataset[0]
_data, _target

(tensor([  0.0000,  30.8300,   0.0000,   1.2500,   0.0000,   0.0000,   1.0000,
           0.0000, 202.0000,   0.0000,   0.0000,   1.0000,   0.0000,   1.0000,
           0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,
           0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,
           1.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,
           0.0000,   0.0000,   1.0000,   0.0000,   1.0000,   0.0000,   0.0000],
        grad_fn=<SelectBackward>), tensor([1., 0.]))

In [164]:
def train(model, datasets, criterion):
    #use a simple stochastic gradient descent optimizer
    #define optimizer for each model
    optimizer = optim.SGD(params=model.parameters(), lr=args.lr)
    steps=0
    model.train() #training mode
    for e in range(1, args.epochs+1):
        running_loss=0
        for ii, (data,target) in enumerate(datasets): #iterates over pointers to remote data
            steps+=1
            optimizer.zero_grad()#zero out gradients so that one forward pass doesnt pick up previous forward's gradients
            outputs = model.forward(data) #make prediction
            outputs = outputs.reshape(1,-1) #get shape of (1,2) as we need at least two dimension
            loss = criterion(outputs, target)

            loss.backward()
            optimizer.step()
            
            #print(f"step: {steps}", loss.item())
            running_loss+=loss.item()

        print(f'Epoch: {e} \tLoss: {running_loss/len(datasets):.6f}')
        running_loss=0


In [102]:
args = Arguments(in_size, out_size, hidden_layers, activation=F.softmax, dim=1)
base_model = Model(args)

In [109]:
model = copy.deepcopy(base_model) #exact replica of base model
train(model, dataset, nn.MSELoss())

Epoch: 1 	Loss: 0.181214
Epoch: 2 	Loss: 0.177997
Epoch: 3 	Loss: 0.177718
Epoch: 4 	Loss: 0.186159
Epoch: 5 	Loss: 0.178692
Epoch: 6 	Loss: 0.186108
Epoch: 7 	Loss: 0.178352
Epoch: 8 	Loss: 0.176986
Epoch: 9 	Loss: 0.176078
Epoch: 10 	Loss: 0.179313


We can also use PyTorch's `Dataset` class to make the processing of data a little easier, but for the purpose of this example it will not give any clear benefits. If you would like to read more about PyTorch's abstract `Dataset` class [read here](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html), with another example [here](https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel). Generally speaking, using `Dataset` and `DataLoader` makes the handling of training and testing data much easier.

In [111]:
from torch.utils.data import Dataset, DataLoader, TensorDataset
dataset_ = TensorDataset(data, target)
data_loader = DataLoader(dataset_, batch_size=1, shuffle=False) #this gives us an identical implementation

In [112]:
%%time
#training loss will look a little different since the dataset is shuffled
model = copy.deepcopy(base_model)
train(model, data_loader, nn.MSELoss())

Epoch: 1 	Loss: 0.181214
Epoch: 2 	Loss: 0.177997
Epoch: 3 	Loss: 0.177718
Epoch: 4 	Loss: 0.186159
Epoch: 5 	Loss: 0.178692
Epoch: 6 	Loss: 0.186108
Epoch: 7 	Loss: 0.178352
Epoch: 8 	Loss: 0.176986
Epoch: 9 	Loss: 0.176078
Epoch: 10 	Loss: 0.179313
CPU times: user 2.48 s, sys: 79.7 ms, total: 2.56 s
Wall time: 2.53 s


Now we have a credit application model that is training on our data. However, this is by no means yet federated learning. The implementation above simply trains a model with a batch size of 1. We will federate the model in the upcoming section [here]().