![](https://drive.google.com/uc?id=1KU7xevtxH0q0zDzl1kB46r9VtDX97PT0)

Credit : The below tutorial has been adapted from the following resources - official documentation of TensorFlow Privacy , works of Professor Enrique Barra, PhD , Fabian Garcia Pastor , kaggle notebook of Yirun Zhang . The optimizer code is taken from the work of Mayank Shah


# Privacy in the BigData World

As much as predictive powers of Big Data revolutionizes the way the businesses across the globe operate , it also poses major security threats and privacy concerns .  Consider the famous incident of a pregnant teenager being sent coupons for baby products based on her historical buying data . The increasing use of algorithms for predictions increases the risk that private information which was not willingly disclosed is nevertheless extracted.

![](https://drive.google.com/uc?id=1XDeR_N6DXE503u-8EdJJxDfzTm-82xsZ)

**Pre-emption** is one of the big concerns surrounding big data analytics which involves reducing the personâ€™s range of future actions .  Consider an example of a company analyzing its employee behavior to analyze who will be with them for longer term and who is likely to quit soon . And if the company decides to offer educational training and services to the employees who were predicted to stay with them for a longer duration , it could have huge implications . Let's consider the implications of pre-emption in more complex scenarios like the judicial system where at the moment only convicts are imposed punishments . Decisions like preventing the individuals who have higher probability to commit crime from traveling can prevent crime before they occur. Both prediction and pre-emption as powerful as they seem can lead to several ethical concerns .

ðŸŽ¯ What kind of prediction is ethically acceptable?

ðŸŽ¯ When is a prediction profound or strong enough to justify consequences? 

ðŸŽ¯ What restrictions on individuals can be justified based on predictions? 

ðŸŽ¯ Which information should be allowed as a basis for predictions?

## Differential Privacy

<div class="alert alert-block alert-info">
Differential privacy is a guarantee to protect data and addresses the risk of information about a particular value being released when queries are sent to datasets. Differential privacy works particularly well in Big Data systems and it makes inference and tracking attacks less likely. Consider the scenario of an organization interested in using its user database for research purposes . Since the database contains sensitive data about users , the organization has to anonymize its user data before using it for research .
</div>    

Major application of differential privacy is in healthcare where there is a trade-off between protecting sensitive information about patients and mining useful information from datasets in order to determine health trends.Consider an example where a hospital may have a database of patient records and each record contains a binary value indicating whether or not the patient has a particular disease. Such information is tracked to keep record of the total number of patients with the disease , but there could also be a possibility that the hospital or third parties are interested to investigate correlations between the disease and age or gender and the disease or any other factor.Individual patient specific data has to be hidden in this case as patients may not prefer others getting to know they have a disease . So when the above data is used for research , it has to be ensured that the data is analysed in a meaningful way without violation of privacy. 


### Advantages of differential privacy :

ðŸ“Œ Limits the amount of information that any analyst can learn about an individual 

ðŸ“Œ Protects individuals' privacy. 

ðŸ“Œ It also makes inference and tracking attacks less likely as a great deal of complexity is needed to infer information or track individuals .

### Differential Privacy Methods:

<div class="alert alert-block alert-info">
Differential privacy uses suited algorithms that add a sufficient amount of noise to a data set in order to guarantee that nothing specific is being revealed about an individual from the data. The implications of adding or removing a single individual data point is relatively smaller compared to the noise being added . This mechanism does not cause any significant change to the overall outcome of an analysis.
</div> 


![](https://drive.google.com/uc?id=10H9zUkz_E9Gd8O63mK566LSQHtEU_9NF)

**Laplace mechanism** is applied to anonymize statistical aggregates like average . Random number is added to the average ( ie) noise gets added to the laplace distribution ) . Consider the scenario where we divide the population into groups based on age , social and physical aspects and the average number of people having disease in each sub group is released . This could lead to the attacker learning a lot about the subgroups and can even result in attackers identifying the individuals in each subgroup. 

**A randomized response procedure** that helps to protect the privacy of survey participants. Consider a scenario where for each survey response , two coins are flipped using a random number generator .The results of flipped coins are decoded as follows .

![](https://drive.google.com/uc?id=18CUpAeXYZVuspmdfNUGXsR4AOmhAwgpk)
 
This process has distorted the survey results in a statistical way and it is impossible to derive information about the response of individual participants. Although it has to be noted that some statistical calculations can still help determine actual survey results and the approach only works for a larger group of participants .

**The exponential mechanism** uses a quality score to rank the output based on its representation of actual input data. Every possible database entry gets a score based on its likelihood derived from the input data and outputs which have higher scores with a higher probability are chosen. The synthetic dataset generated based on the above mechanism allows for the same statistical conclusions as the original database while preserving privacy as it does not provide real data and hence individual specific information cannot be inferred .


# TensorFlow Privacy :

<div class="alert alert-block alert-info">
TensorFlow Privacy wraps an existing TensorFlow optimizer to create a variant that implements Differentially private stochastic gradient descent (DP-SGD) .(DP-SGD) modifies the gradients used in stochastic gradient descent (SGD) and the models trained with DP-SGD provide provable differential privacy guarantees for their input data.
</div>    

The two modifications made to SGD are below. 

ðŸŽ¯ The sensitivity of each gradient is bounded by clipping each gradient computed on each training point . This limits how much each individual training point sampled in a minibatch can influence gradient computations and the resulting updates applied to model parameters.

ðŸŽ¯ Random noise is sample and added to clipped gradients which makes it statistically impossible to know whether or not a particular data point was included in the training dataset 
 
DP-SGD has three privacy-specific hyperparameters and one existing hyperparameter that requires tuning

ðŸ“Œl2_norm_clip (float) - The maximum Euclidean (L2) norm of each gradient that is applied to update model parameters. This hyperparameter is used to bound the optimizer's sensitivity to individual training points.

ðŸ“Œnoise_multiplier (float) - The amount of noise sampled and added to gradients during training. Generally, more noise results in better privacy (often, but not necessarily, at the expense of lower utility).

ðŸ“Œmicrobatches (int) - Each batch of data is split in smaller units called microbatches. By default, each microbatch should contain a single training example. This allows us to clip gradients on a per-example basis rather than after they have been averaged across the minibatch. This in turn decreases the (negative) effect of clipping on signal found in the gradient and typically maximizes utility.

ðŸ“Œlearning_rate (float) - This hyperparameter already exists in vanilla SGD. The higher the learning rate, the more each update matters. If the updates are noisy (such as when the additive noise is large compared to the clipping threshold), a low learning rate may help the training procedure converge.


In [None]:
!pip install tensorflow_privacy

from tensorflow_privacy.privacy.analysis import compute_dp_sgd_privacy
from tensorflow_privacy.privacy.optimizers.dp_optimizer import DPGradientDescentGaussianOptimizer

Credit : The code below is an adapted from Yirun Zhang Kaggle notebook

In [None]:
import warnings
warnings.filterwarnings("ignore")

import gc
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.keras.backend as K
import tensorflow.keras.layers as L
import tensorflow.keras.models as M
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import ReduceLROnPlateau, ModelCheckpoint, EarlyStopping
import tensorflow_addons as tfa
from sklearn.model_selection import KFold
from sklearn.metrics import log_loss
from hyperopt import hp, fmin, tpe, Trials
from hyperopt.pyll.base import scope
from tqdm.notebook import tqdm

print('Tensorflow version:', tf.__version__)
AUTO = tf.data.experimental.AUTOTUNE

In [None]:
# Detect hardware, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection. No parameters necessary if TPU_NAME environment variable is set. On Kaggle this is always the case.
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy() # default distribution strategy in Tensorflow. Works on CPU and single GPU.

print("REPLICAS: ", strategy.num_replicas_in_sync)

In [None]:
MIXED_PRECISION = False
XLA_ACCELERATE = True

if MIXED_PRECISION:
    from tensorflow.keras.mixed_precision import experimental as mixed_precision
    if tpu: policy = tf.keras.mixed_precision.experimental.Policy('mixed_bfloat16')
    else: policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
    mixed_precision.set_policy(policy)
    print('Mixed precision enabled')

if XLA_ACCELERATE:
    tf.config.optimizer.set_jit(True)
    print('Accelerated Linear Algebra enabled')

In [None]:
train_features = pd.read_csv('../input/lish-moa/train_features.csv')
train_targets = pd.read_csv('../input/lish-moa/train_targets_scored.csv')
test_features = pd.read_csv('../input/lish-moa/test_features.csv')

ss = pd.read_csv('../input/lish-moa/sample_submission.csv')

In [None]:
def preprocess(df):
    df.loc[:, 'cp_type'] = df.loc[:, 'cp_type'].map({'trt_cp': 0, 'ctl_vehicle': 1})
    df.loc[:, 'cp_dose'] = df.loc[:, 'cp_dose'].map({'D1': 0, 'D2': 1})
    del df['sig_id']
    return df

train = preprocess(train_features)
test = preprocess(test_features)

del train_targets['sig_id']

In [None]:
top_feats = [  0,   1,   2,   3,   5,   6,   8,   9,  10,  11,  12,  14,  15,
        16,  18,  19,  20,  21,  23,  24,  25,  27,  28,  29,  30,  31,
        32,  33,  34,  35,  36,  37,  39,  40,  41,  42,  44,  45,  46,
        48,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,
        63,  64,  65,  66,  68,  69,  70,  71,  72,  73,  74,  75,  76,
        78,  79,  80,  81,  82,  83,  84,  86,  87,  88,  89,  90,  92,
        93,  94,  95,  96,  97,  99, 100, 101, 103, 104, 105, 106, 107,
       108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
       121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 132, 133, 134,
       135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
       149, 150, 151, 152, 153, 154, 155, 157, 159, 160, 161, 163, 164,
       165, 166, 167, 168, 169, 170, 172, 173, 175, 176, 177, 178, 180,
       181, 182, 183, 184, 186, 187, 188, 189, 190, 191, 192, 193, 195,
       197, 198, 199, 202, 203, 205, 206, 208, 209, 210, 211, 212, 213,
       214, 215, 218, 219, 220, 221, 222, 224, 225, 227, 228, 229, 230,
       231, 232, 233, 234, 236, 238, 239, 240, 241, 242, 243, 244, 245,
       246, 248, 249, 250, 251, 253, 254, 255, 256, 257, 258, 259, 260,
       261, 263, 265, 266, 268, 270, 271, 272, 273, 275, 276, 277, 279,
       282, 283, 286, 287, 288, 289, 290, 294, 295, 296, 297, 299, 300,
       301, 302, 303, 304, 305, 306, 308, 309, 310, 311, 312, 313, 315,
       316, 317, 320, 321, 322, 324, 325, 326, 327, 328, 329, 330, 331,
       332, 333, 334, 335, 338, 339, 340, 341, 343, 344, 345, 346, 347,
       349, 350, 351, 352, 353, 355, 356, 357, 358, 359, 360, 361, 362,
       363, 364, 365, 366, 368, 369, 370, 371, 372, 374, 375, 376, 377,
       378, 379, 380, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391,
       392, 393, 394, 395, 397, 398, 399, 400, 401, 403, 405, 406, 407,
       408, 410, 411, 412, 413, 414, 415, 417, 418, 419, 420, 421, 422,
       423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435,
       436, 437, 438, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450,
       452, 453, 454, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465,
       466, 468, 469, 471, 472, 473, 474, 475, 476, 477, 478, 479, 482,
       483, 485, 486, 487, 488, 489, 491, 492, 494, 495, 496, 500, 501,
       502, 503, 505, 506, 507, 509, 510, 511, 512, 513, 514, 516, 517,
       518, 519, 521, 523, 525, 526, 527, 528, 529, 530, 531, 532, 533,
       534, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547,
       549, 550, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563,
       564, 565, 566, 567, 569, 570, 571, 572, 573, 574, 575, 577, 580,
       581, 582, 583, 586, 587, 590, 591, 592, 593, 595, 596, 597, 598,
       599, 600, 601, 602, 603, 605, 607, 608, 609, 611, 612, 613, 614,
       615, 616, 617, 619, 622, 623, 625, 627, 630, 631, 632, 633, 634,
       635, 637, 638, 639, 642, 643, 644, 645, 646, 647, 649, 650, 651,
       652, 654, 655, 658, 659, 660, 661, 662, 663, 664, 666, 667, 668,
       669, 670, 672, 674, 675, 676, 677, 678, 680, 681, 682, 684, 685,
       686, 687, 688, 689, 691, 692, 694, 695, 696, 697, 699, 700, 701,
       702, 703, 704, 705, 707, 708, 709, 711, 712, 713, 714, 715, 716,
       717, 723, 725, 727, 728, 729, 730, 731, 732, 734, 736, 737, 738,
       739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751,
       752, 753, 754, 755, 756, 758, 759, 760, 761, 762, 763, 764, 765,
       766, 767, 769, 770, 771, 772, 774, 775, 780, 781, 782, 783, 784,
       785, 787, 788, 790, 793, 795, 797, 799, 800, 801, 805, 808, 809,
       811, 812, 813, 816, 819, 820, 821, 822, 823, 825, 826, 827, 829,
       831, 832, 833, 834, 835, 837, 838, 839, 840, 841, 842, 844, 845,
       846, 847, 848, 850, 851, 852, 854, 855, 856, 858, 860, 861, 862,
       864, 867, 868, 870, 871, 873, 874]

print(len(top_feats))

Credit :
The author of the optimizer code is Mayank Shah - https://github.com/mayankshah1607

The optimizer code in the official repository isnt compatible with TensorFlow 2.0+ version . The below code by Mayank is a turnaround to avoid the error because of version incompatibility

In [None]:
from absl import logging
import collections

from tensorflow_privacy.privacy.analysis import privacy_ledger
from tensorflow_privacy.privacy.dp_query import gaussian_query

def make_optimizer_class(cls):
  """Constructs a DP optimizer class from an existing one."""
  parent_code = tf.compat.v1.train.Optimizer.compute_gradients.__code__
  child_code = cls.compute_gradients.__code__
  GATE_OP = tf.compat.v1.train.Optimizer.GATE_OP  # pylint: disable=invalid-name
  if child_code is not parent_code:
    logging.warning(
        'WARNING: Calling make_optimizer_class() on class %s that overrides '
        'method compute_gradients(). Check to ensure that '
        'make_optimizer_class() does not interfere with overridden version.',
        cls.__name__)

  class DPOptimizerClass(cls):
    """Differentially private subclass of given class cls."""

    _GlobalState = collections.namedtuple(
      '_GlobalState', ['l2_norm_clip', 'stddev'])
    
    def __init__(
        self,
        dp_sum_query,
        num_microbatches=None,
        unroll_microbatches=False,
        *args,  # pylint: disable=keyword-arg-before-vararg, g-doc-args
        **kwargs):
      """Initialize the DPOptimizerClass.

      Args:
        dp_sum_query: DPQuery object, specifying differential privacy
          mechanism to use.
        num_microbatches: How many microbatches into which the minibatch is
          split. If None, will default to the size of the minibatch, and
          per-example gradients will be computed.
        unroll_microbatches: If true, processes microbatches within a Python
          loop instead of a tf.while_loop. Can be used if using a tf.while_loop
          raises an exception.
      """
      super(DPOptimizerClass, self).__init__(*args, **kwargs)
      self._dp_sum_query = dp_sum_query
      self._num_microbatches = num_microbatches
      self._global_state = self._dp_sum_query.initial_global_state()
      # TODO(b/122613513): Set unroll_microbatches=True to avoid this bug.
      # Beware: When num_microbatches is large (>100), enabling this parameter
      # may cause an OOM error.
      self._unroll_microbatches = unroll_microbatches

    def compute_gradients(self,
                          loss,
                          var_list,
                          gate_gradients=GATE_OP,
                          aggregation_method=None,
                          colocate_gradients_with_ops=False,
                          grad_loss=None,
                          gradient_tape=None,
                          curr_noise_mult=0,
                          curr_norm_clip=1):

      self._dp_sum_query = gaussian_query.GaussianSumQuery(curr_norm_clip, 
                                                           curr_norm_clip*curr_noise_mult)
      self._global_state = self._dp_sum_query.make_global_state(curr_norm_clip, 
                                                                curr_norm_clip*curr_noise_mult)
      

      # TF is running in Eager mode, check we received a vanilla tape.
      if not gradient_tape:
        raise ValueError('When in Eager mode, a tape needs to be passed.')

      vector_loss = loss()
      if self._num_microbatches is None:
        self._num_microbatches = tf.shape(input=vector_loss)[0]
      sample_state = self._dp_sum_query.initial_sample_state(var_list)
      microbatches_losses = tf.reshape(vector_loss, [self._num_microbatches, -1])
      sample_params = (self._dp_sum_query.derive_sample_params(self._global_state))

      def process_microbatch(i, sample_state):
        """Process one microbatch (record) with privacy helper."""
        microbatch_loss = tf.reduce_mean(input_tensor=tf.gather(microbatches_losses, [i]))
        grads = gradient_tape.gradient(microbatch_loss, var_list)
        sample_state = self._dp_sum_query.accumulate_record(sample_params, sample_state, grads)
        return sample_state
    
      for idx in range(self._num_microbatches):
        sample_state = process_microbatch(idx, sample_state)

      if curr_noise_mult > 0:
        grad_sums, self._global_state = (self._dp_sum_query.get_noised_result(sample_state, self._global_state))
      else:
        grad_sums = sample_state

      def normalize(v):
        return v / tf.cast(self._num_microbatches, tf.float32)

      final_grads = tf.nest.map_structure(normalize, grad_sums)
      grads_and_vars = final_grads#list(zip(final_grads, var_list))
    
      return grads_and_vars

  return DPOptimizerClass


def make_gaussian_optimizer_class(cls):
  """Constructs a DP optimizer with Gaussian averaging of updates."""

  class DPGaussianOptimizerClass(make_optimizer_class(cls)):
    """DP subclass of given class cls using Gaussian averaging."""

    def __init__(
        self,
        l2_norm_clip,
        noise_multiplier,
        num_microbatches=None,
        ledger=None,
        unroll_microbatches=False,
        *args,  # pylint: disable=keyword-arg-before-vararg
        **kwargs):
      dp_sum_query = gaussian_query.GaussianSumQuery(
          l2_norm_clip, l2_norm_clip * noise_multiplier)

      if ledger:
        dp_sum_query = privacy_ledger.QueryWithLedger(dp_sum_query,
                                                      ledger=ledger)

      super(DPGaussianOptimizerClass, self).__init__(
          dp_sum_query,
          num_microbatches,
          unroll_microbatches,
          *args,
          **kwargs)

    @property
    def ledger(self):
      return self._dp_sum_query.ledger

  return DPGaussianOptimizerClass

In [None]:
GradientDescentOptimizer = tf.compat.v1.train.GradientDescentOptimizer
DPGradientDescentGaussianOptimizer_NEW = make_gaussian_optimizer_class(GradientDescentOptimizer)

In [None]:
l2_norm_clip = 1.5
noise_multiplier = 1.3
num_microbatches = 64
learning_rate = 1e-3

optimizer = DPGradientDescentGaussianOptimizer_NEW(
    l2_norm_clip=l2_norm_clip,
    noise_multiplier=noise_multiplier,
    num_microbatches=num_microbatches,
    learning_rate=learning_rate)

In [None]:
with strategy.scope():  
    model = tf.keras.Sequential([
    tf.keras.layers.Input(len(top_feats)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.2),
    tfa.layers.WeightNormalization(tf.keras.layers.Dense(2048, activation="relu")),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.5),
    tfa.layers.WeightNormalization(tf.keras.layers.Dense(1048, activation="relu")),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.5),
    tfa.layers.WeightNormalization(tf.keras.layers.Dense(206, activation="sigmoid"))
    ])
    model.compile(optimizer=optimizer,loss=keras.losses.BinaryCrossentropy())
    

## Measure the differential privacy guarantee

{Quote from official documentation below }

Perform a privacy analysis to measure the DP guarantee achieved by a training algorithm. Knowing the level of DP achieved enables the objective comparison of two training runs to determine which of the two is more privacy-preserving. At a high level, the privacy analysis measures how much a potential adversary can improve their guess about properties of any individual training point by observing the outcome of our training procedure (e.g., model updates and parameters).

This guarantee is sometimes referred to as the **privacy budget**. A lower privacy budget bounds more tightly an adversary's ability to improve their guess. This ensures a stronger privacy guarantee. Intuitively, this is because it is harder for a single training point to affect the outcome of learning: for instance, the information contained in the training point cannot be memorized by the ML algorithm and the privacy of the individual who contributed this training point to the dataset is preserved.
The privacy analysis here is performed  in the framework of RÃ©nyi Differential Privacy (RDP) -(research paper)
 
Two metrics are used to express the DP guarantee of an ML algorithm:

ðŸŽ¯Delta () - Bounds the probability of the privacy guarantee not holding. A rule of thumb is to set it to be less than the inverse of the size of the training dataset.

ðŸŽ¯Epsilon () - This is the privacy budget. It measures the strength of the privacy guarantee by bounding how much the probability of a particular model output can vary by including (or excluding) a single training point and a smaller value implies  a better privacy guarantee.

Tensorflow Privacy provides a tool, compute_dp_sgd_privacy.py, to compute the value of given a fixed value of and the following hyperparameters from the training process:
 
ðŸ“ŒThe total number of points in the training data, n.

ðŸ“ŒThe batch_size.

ðŸ“ŒThe noise_multiplier.

ðŸ“ŒThe number of epochs of training.


References :

https://github.com/tensorflow/privacy

https://github.com/mayankshah1607

https://www.kaggle.com/gogo827jz/hyperparameter-tuning-for-neural-network-on-tpu