<a href="https://colab.research.google.com/github/podschwadt/private_ai/blob/master/he.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Privacy Preserving Machine Learning

First things first. Let's run the package installations. They take quite a while. So hit run on the cell below before continuing with this introduction.


Executing? Perfect!  

Consider the following scenario: You are business that speaclizes in machine learning. You have trained some great models on data that has been carefully collected and labeled. The data is quite sensitve and you had to jump through a lot of legal and hoops to get access to it. In this notebook this data will be represented by the android permission data that we have been working so far. Since that you are working on is sensitve and hard to get you are faced with a porblem. Your clients are reclutant to give you their data but at the same time you don't want to give your model to them either. 
But there are solutions to this problem and it this notebook we will explore to of those. Namely Secure Multiparty Computation SMC (also often called just Multi Party Computation  MPC) and Homomorphic Encryption(HE). Both are cryptographic ways of performing computation on data that is being kept secret. Here we will be focusing on HE



In [None]:
!pip install Pyfhel

Next we'll get our usual boilerplat code out of the way. Data loading, splitting, etc.

In [None]:
import tensorflow as tf
import numpy as np

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# select a subset of the data
# we only wants ones and zeros
# 200 instances per class

# instances
x_train = np.concatenate( [ x_train[ y_train == 0 ][ :200 ], x_train[ y_train == 1 ][ :200 ] ] )
x_test = np.concatenate( [ x_test[ y_test == 0 ][ :200 ], x_test[ y_test == 1 ][ :200 ] ] )
# x_train = x_train.astype( float ) / 255.
# x_test = x_test.astype( float ) / 255.


x_train_rounded = np.round( x_train )

print( 'training data: ', x_train.shape )
print( 'test data: ', x_test.shape )

# labels
y_train = np.concatenate( [ np.zeros( 200 ), np.ones( 200 ) ] )
y_test = np.concatenate( [ np.zeros( 200 ), np.ones( 200 ) ] )

print( 'training data: ', y_train.shape )
print( 'test data: ', y_test.shape )


## Fully Homomorphic encryption

Fully Homomorphic encryption is a tool that can be used for PPML. It does not rely on splitting the secret between parties to jointly evaluate a function. It is more like "traditional" cryptography in the sense that the one party encrypts the data. Any other party can perform computation the data without the need for decrypting it. The result of the computation is still encrypted. 

Opposed to whwat we have been doing so far we will not be working with a high level library but rather will build our own functions on top of simple operations.



In [None]:
from Pyfhel import Pyfhel, PyPtxt, PyCtxt
import time

# Pyfhel class contains most of the functions.
# PyPtxt is the plaintext class
# PyCtxt is the ciphertext class


HE = Pyfhel()           
# p (long): Plaintext modulus. All operations are modulo p.
# m (long=2048): Coefficient modulus.
# flagBatching (bool=false): Set to true to enable batching.
# base (long=2): Polynomial base.
# sec (long=128): Security level equivalent in AES. 128 or 192.
# intDigits (int=64): truncated positions for integer part.
# fracDigits (int=32): truncated positions for fractional part.
HE.contextGen(p=65537)  

# generate keys
HE.keyGen()           



Before we can encrypt nmumber we need to encode them. After that we can perform computation on the ciphertexts. Once we decrypt the result we need to decode it into the desired format.

In [None]:
# plaintext values
a = 1
b = 2

# encode
a = HE.encodeInt( a )
print('a:', a )
b = HE.encodeInt( b )
print('b:', b )

# encrypt
a_ctxt = HE.encrypt( a )
b_ctxt = HE.encrypt( b )

# perform computation
result = a_ctxt + b_ctxt
decrypted = HE.decrypt( result )

# decrypt
print( 'decrypted:', decrypted ) 

# decdode
print( 'decoded:', HE.decodeInt( decrypted ) )



Thankfully we don't have to encode and decode evertime. There are convience methods for it.

In [None]:
# plaintext values
a = 1
b = 2

# encpde and encrypt
a_ctxt = HE.encryptInt( a )
b_ctxt = HE.encryptInt( b )

# perform computation
result = a_ctxt + b_ctxt

# decrypt and decdode
print( 'decerypted and decoded:', HE.decryptInt( result ) )

Using the functions `encodeFrac`, `decodeFrac` and `encryptFrac` and `decryptFrac` to replicate the firs example with float values. What do you notice about the encoding?

In [None]:
# plaintext values
a = .1
b = .2

# encode
a = HE.encodeFrac( a )
print('a:', a )
b = HE.encodeFrac( b )
print('b:', b)

# encrypt
a_ctxt = HE.encrypt( a )
b_ctxt = HE.encrypt( b )

# perform computation
result = a_ctxt + b_ctxt
decrypted = HE.decrypt( result )

# decrypt
print( 'decrypted:', decrypted ) 
print( 'decrypted polynomial:', decrypted.to_poly_string() ) 

# decdode
print( 'decoded:', HE.decodeFrac( decrypted ) )

But what about the noise? I thought there was noise involved int HE?

In [None]:
HE = Pyfhel()           
HE.contextGen( p=65537 )  
# generate keys
HE.keyGen()      


# plaintext values
a = 1
b = 2

# encpde and encrypt
a_ctxt = HE.encryptInt( a )
b_ctxt = HE.encryptInt( b )

# perform computation
result = a_ctxt * b_ctxt
result = result * a_ctxt

print( 'decerypted: ', HE.decrypt( result ) )
print( 'decerypted and decoded:', HE.decryptInt( result ) )


In [None]:
# we can also estimate the noise budget
HE.relinKeyGen(2,5)
HE.multDepth()

We need to increase the noise budget

In [None]:
HE = Pyfhel()           
# p (long): Plaintext modulus. All operations are modulo p.
# m (long=2048): Coefficient modulus.
# flagBatching (bool=false): Set to true to enable batching.
# base (long=2): Polynomial base.
# sec (long=128): Security level equivalent in AES. 128 or 192.
# intDigits (int=64): truncated positions for integer part.
# fracDigits (int=32): truncated positions for fractional part.
HE.contextGen( p=65537, m=4096 )  

# generate keys
HE.keyGen()      

# plaintext values
a = 1
b = 2

# encpde and encrypt
a_ctxt = HE.encryptInt( a )
b_ctxt = HE.encryptInt( b )

# perform computation
result = a_ctxt * b_ctxt
result = result * a_ctxt

print( 'decerypted: ', HE.decrypt( result ) )
print( 'decerypted and decoded:', HE.decryptInt( result ) )

In [None]:
# we can also estimate the noise budget
HE.relinKeyGen(2,5)
HE.multDepth()

For a simple example consider the following scenario. We are still working with the MNIST data set (that we all know and love) but to keep things simple we are only using two classes and small amount of data. First we are training a simple classifier on plain data. Namely a perceptron. 

In [None]:
from sklearn.linear_model import Perceptron

percp = Perceptron(fit_intercept=False)
percp.fit( x_train.reshape( ( x_train.shape[ 0 ], -1 ) ), y_train )
print( 'test score: ', percp.score( x_test.reshape( ( x_train.shape[ 0 ], -1 ) ), y_test ) )


print( 'prediction:', percp.predict( x_test[ 1:2 ].reshape( ( 1, -1 ) ) ) )
print( 'output:', percp.decision_function( x_test[ 1:2 ].reshape( ( 1, -1 ) ) ) )
print( 'actual lable:', y_test[ 1:2 ] )

Let's transfere the the perceptron algorithm to the encrypted domain. We can perform operations between plaintexts and ciphertexts but we need to encode the plaintexts first

In [None]:
from Pyfhel import Pyfhel, PyPtxt, PyCtxt
import time

# Pyfhel class contains most of the functions.
# PyPtxt is the plaintext class
# PyCtxt is the ciphertext class


HE = Pyfhel()           
# p (long): Plaintext modulus. All operations are modulo p.
# m (long=2048): Coefficient modulus.
# flagBatching (bool=false): Set to true to enable batching.
# base (long=2): Polynomial base.
# sec (long=128): Security level equivalent in AES. 128 or 192.
# intDigits (int=64): truncated positions for integer part.
# fracDigits (int=32): truncated positions for fractional part.
HE.contextGen(p=65537)  

# generate keys
HE.keyGen()           

# encrypt values
inputs = [ HE.encryptInt( x ) for x in x_test[ 1 ].reshape( -1 ) ]
prediction = HE.encryptInt( 0 )

# encode weights
weights = [ HE.encodeInt( x ) for x in percp.coef_[ 0 ] ]

start = time.time()

# perform prediction
for w, x in zip( weights, inputs ):
  temp = x * w
  prediction = prediction + temp


# decrypt results
print( 'prediction took:', time.time() - start )
result = HE.decryptInt( prediction )
print( 'prediction:', result )
print( 'actual label:', y_test[ 1 ] )


let's do it with SIMD

In [None]:
HE = Pyfhel()           
HE.contextGen( p=65537, flagBatching=True, )   

# generate keys
HE.keyGen()    

# plain data
a = [ 1,2,3,4 ]
b = 2

a = HE.encodeBatch( a )
print( 'encoded:', a )

a = HE.encrypt( a )

# adding another value
try:
  print( 'try 1')
  b_enc = HE.encodeInt( b )
  a = a + b_enc
  print( 'success!!')
except Exception as e:
  print( e )

try:
  print( 'try 2')
  b_enc = HE.encodeBatch( b )
  a = a + b_enc
  print( 'success!!')
except Exception as e:
  print( e )

try:
  print( 'try 3')
  b_enc = HE.encodeBatch( [b] * 4 )
  a = a + b_enc
  print( 'success!!')
except Exception as e:
  print( e )

print( 'decoded and decrypted: ', HE.decryptBatch( a ) )


In [None]:
HE = Pyfhel()           
HE.contextGen( p=65537, flagBatching=True )  
HE.keyGen()   
# need to get data into the correct shape
x_test = x_test.reshape( (x_test.shape[ 0 ], -1 ) )

slots = HE.getnSlots()
num_features = x_test.shape[ 1 ]

print( x_test.shape )

# encrypt values
# iterate over every feature
cipher_texts = []
for i in range( num_features ):
  feature = x_test[ :,i ] 
  cipher_texts.append( HE.encryptBatch( feature ) )

prediction = HE.encryptBatch( [0] * num_features )

# encode weights
weights = [ HE.encodeBatch( [x] * num_features ) for x in percp.coef_[ 0 ] ]

start = time.time()

# perform prediction
for w, x in zip( weights, cipher_texts ):
  temp = x * w
  prediction = prediction + temp


start = time.time()

# perform prediction

# decrypt results
print( 'prediction took:', time.time() - start )
result = HE.decryptBatch( prediction )
print( result )
print( len(result) )


print( percp.decision_function( x_test ) )


Why did that not work?

The outputs are too large. All operations are mod p.

We neither a larger p or smaller outputs.


In [None]:
# change the weights to be smaller

# create a copy of the preceptron
percp1 = Perceptron(fit_intercept=False)
percp1.classes_ = percp.classes_ 
coef = np.copy( percp.coef_ )

# make changes to the coefficents
coef = np.copy( percp.coef_ )
coef[ coef > 0 ] = 1
coef[ coef < 0 ] = -1

percp1.coef_ = coef
percp1.intercept_ = percp.intercept_

print( 'test score: ', percp1.score( x_test.reshape( ( x_train.shape[ 0 ], -1 ) ), y_test ) )

In [None]:
HE = Pyfhel()           
HE.contextGen( p=65537, flagBatching=True )  
HE.keyGen()   
# need to get data into the correct shape
x_test = x_test.reshape( (x_test.shape[ 0 ], -1 ) )

slots = HE.getnSlots()
num_features = x_test.shape[ 1 ]

print( x_test.shape )

# encrypt values
# iterate over every feature
cipher_texts = []
for i in range( num_features ):
  feature = x_test[ :,i ] 
  cipher_texts.append( HE.encryptBatch( feature ) )

prediction = HE.encryptBatch( [0] * num_features )

# encode weights
weights = [ HE.encodeBatch( [x] * num_features ) for x in percp1.coef_[ 0 ] ]

start = time.time()

# perform prediction
for w, x in zip( weights, cipher_texts ):
  temp = x * w
  prediction = prediction + temp


start = time.time()

# perform prediction

# decrypt results
print( 'prediction took:', time.time() - start )
result = HE.decryptBatch( prediction )
print( result )
print( len(result) )


print( percp1.decision_function( x_test ) )

putting together the building blocks and building a simple neural network over
encrypted data

In [None]:
# prepare the training data
x_train = x_train.reshape( ( x_train.shape[ 0 ], -1 ) )
x_test = x_test.reshape( ( x_test.shape[ 0 ], -1 ) )

In [None]:
# train a tiny neural network
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

print( x_train.shape )

model = Sequential()
model.add( Dense( 2, activation='relu', input_shape=x_train.shape[ 1: ]  ) )
model.add( Dense( 1, activation='sigmoid' ) )


model.summary()
model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['accuracy'])

model.fit( x_train, y_train, epochs=32, verbose=1 )
model.evaluate( x_test, y_test )

lets build a model that can work with HE

In [None]:
import tensorflow as tf
import numpy as np

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# select a subset of the data
# we only wants ones and zeros
# 200 instances per class

# instances
x_train = np.concatenate( [ x_train[ y_train == 0 ][ :200 ], x_train[ y_train == 1 ][ :200 ] ] )
x_test = np.concatenate( [ x_test[ y_test == 0 ][ :200 ], x_test[ y_test == 1 ][ :200 ] ] )
# x_train = x_train.astype( float ) / 255.
# x_test = x_test.astype( float ) / 255.


x_train_rounded = np.round( x_train )

print( 'training data: ', x_train.shape )
print( 'test data: ', x_test.shape )

# labels
y_train = np.concatenate( [ np.zeros( 200 ), np.ones( 200 ) ] )
y_test = np.concatenate( [ np.zeros( 200 ), np.ones( 200 ) ] )

print( 'training data: ', y_train.shape )
print( 'test data: ', y_test.shape )

# prepare the training data
x_train = x_train.reshape( ( x_train.shape[ 0 ], -1 ) )
x_test = x_test.reshape( ( x_test.shape[ 0 ], -1 ) )

In [None]:
# fist we normalize the data
x_train = x_train.astype( float ) / 255.
x_test = x_test.astype( float ) / 255.



In [None]:
idx = np.arange(0,x_train.shape[1],4)
x_train = x_train[:,idx]
x_test = x_test[:,idx]

In [None]:
x_train[0]
x_train.shape

In [None]:
def relu_aprox( x ):
  return 0.046875*x**2 + 0.5*x + 0.9375



model = Sequential()
model.add( Dense( 2 , activation=relu_aprox, input_shape=x_train.shape[ 1: ]  ) )
model.add( Dense( 1 ) )

model.summary()
model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['accuracy'])

model.fit( x_train, y_train, epochs=32 )

print( 'keras' )
print( model.evaluate( x_test, y_test ) )
print( 'prediction' )
test_sample = x_test[ 0:1 ]
print( 'prediction', model.predict( test_sample ) )




Extract weights and setup the encryption scheme


# HE

In [None]:
print( 'prediction', model.predict( test_sample ) )
expected_result = model.predict( test_sample )

In [None]:
#-------------------------------------------------------------------------------
# setup HE
#-------------------------------------------------------------------------------
print('HE')
HE = Pyfhel()           
HE.contextGen(p=655370, m=2**13, fracDigits=128)  
# generate keys
HE.keyGen()    
HE.relinKeyGen(16,5)
print('multiplicative depth', HE.multDepth())

In [None]:
# extract weights
print( 'weights layer 0' )
layer0_weights = model.layers[ 0 ].get_weights() # format [ weights, biases ]
print( layer0_weights[ 0 ].shape, layer0_weights[ 1 ].shape )

print( 'weights layer 1' )
layer1_weights = model.layers[ 1 ].get_weights()
print( layer1_weights[ 0 ].shape, layer1_weights[ 1 ].shape )



# let's implement the actual layers
#-------------------------------------------------------------------------------
# layers
#-------------------------------------------------------------------------------


Convert values and encrypt

In [None]:
import sys
#-------------------------------------------------------------------------------
# convert values
#-------------------------------------------------------------------------------

def weight_converter( weights, biases ):
  bias_ = []     # holds converted biases 
  weights_ = []  # holds converted weights 

  # convert biases 
  print('converting biases')
  for i, b in enumerate(biases):
    sys.stdout.write(f'\r  {i+1}/{len(biases)}')
    bias_.append( HE.encodeFrac( b ) )
  print()

  # convert weights
  print('converting weights')
  i = 0
  for input in weights:
    w = []
    for weight in input:
      sys.stdout.write(f'\r  {i+1}/{len(weights) * len(input)}')
      w.append( HE.encodeFrac( weight ) )
      i+=1
    weights_.append( w )  
  print()
  return weights_, bias_

# layer 0
print("layer 0:")
weights_0, bias_0 = weight_converter( layer0_weights[ 0 ], layer0_weights[ 1 ] )
# for i in range(len(layer0_weights[0])):
#   print([HE.decodeFrac(x) for x in weights_0[i]], layer0_weights[0][i])
# layer 1
print("layer 1:")
weights_1, bias_1 = weight_converter( layer1_weights[ 0 ], layer1_weights[ 1 ] )


# convert values for activation functions
relu_aprox_coef = [ HE.encodeFrac( 0.046875 ),  HE.encodeFrac( 0.5 ), 
              HE.encodeFrac( 0.9375 ) ]

#-------------------------------------------------------------------------------
# encrypt inputs
#-------------------------------------------------------------------------------
inputs = [ HE.encryptFrac( x ) for x in test_sample[ 0 ] ]

Now it is your turn. Implement the layers. Good Luck :D

In [None]:
# ------------ layer 0 -----------------
print('layer0:')
units_0 = [ HE.encryptFrac( 0 ) for x in range( 2 ) ]

# iterate over units
count = 0
for i in range( len( units_0 ) ):
  # iterate over inpust
  for j in range( len( inputs ) ):
    sys.stdout.write(f'\r  {count+1}/{len(units_0) * len(inputs)}')
    prod = inputs[j] * weights_0[ j ][ i ]
    HE.relinearize( prod )
    units_0[ i ] = units_0[ i ] + prod
    count += 1
  units_0[ i ] = units_0[ i ] + bias_0[ i ]
  
  # f(units_0[ i ]) = a*units_0[ i ]^2 + b*units_0[ i ] + c 
  x_sqr = units_0[ i ] * units_0[ i ]
  HE.relinearize( x_sqr )
  a = x_sqr * relu_aprox_coef[ 0 ]
  HE.relinearize( a )
  b = units_0[ i ] * relu_aprox_coef[ 1 ] 
  HE.relinearize( b )
  c = relu_aprox_coef[ 2 ] 
  units_0[ i ] = a + b + c

print()


# ------------ layer 1 -----------------
print('layer1:')
units_1 = [ HE.encryptFrac( 0 ) for x in range( 1 ) ]

# iterate over units
count = 0
for i in range( len( units_1 ) ):
  # iterate over inpust
  for j in range( len( units_0 ) ):
    sys.stdout.write(f'\r  {count+1}/{len(units_1) * len(units_0)}')
    prod = units_0[ j ] * weights_1[ j ][ i ]
    HE.relinearize( prod )
    units_1[ i ] = units_1[ i ] + prod
    count += 1
  units_1[ i ] = units_1[ i ] + bias_1[ i ]
  

# decrypt the result
print( 'classification result' )
print( HE.decryptFrac( units_1[ 0 ] )  )
print('exptected result')
print( model.predict( test_sample ) )

