# Poisoning

In this exercise we will be exploring data poisoning. Specifically backdoor poisoning. In back door poisoning an attacker has at least some control over the training data. Their goal is to place a marker in the data so that later at test time the system behaves in a certain way when it finds the marker.

In our case the desired behaviour is to predict a specific class that the attacker chooses at attack time.

The attack consists of the following steps:

1.   Create a marker/pattern
2.   Embed the marker in the training data
3.   Label all the marked data with the desired label

Below you can find a code stub that you can use need to expand to create poisoned data and train a model on that data. Afterwards you need to perform some evaluation on the data. Design your experiments to answer the following questions:

1.   How large does a marker need to be effective?
2.   Does the opacity of the marker matter?
3.   Does the "design" of the marker have any impact on success rate?
4.   Are there good or bad marker placements? If so where are they? Can you think of a way to determine good placement?
5.   Does the marker always need to be in the same place?
6.   Do you need access to all classes during training? How many classes do you need access to?
7.   Does the backdoor attack impact the model's performance on clean data?
8.   Is the marker on its own effective? Do you need to modify valid instances? Think about training and test time?


# TIP

Change the runtime to GPU accelerated. Otherwise, you will be here for a while.

To do this:

1.   Select `Runtime` from the menu at the top
2.   Click `Change Runtime type`
3.   Under `Hardware accelerator` choose `GPU`
4.   Hit `Save` and if it is asks you to reconnect do so



In [None]:
!pip install tensorflow-gpu==1.15.2 keras==2.2.3


In [None]:
!pip install adversarial-robustness-toolbox==1.7.1


demo

![example](https://i2.wp.com/bdtechtalks.com/wp-content/uploads/2020/10/trojannet-stop-sign.jpg)

Pipeline

![pipeline](https://blog-assets.f-secure.com/wp-content/uploads/2021/04/13152604/data_poisoning_in_action_fig1-1536x463.png)

We add the poisoned data to retrain the model.

Let's first load the library

In [None]:
%tensorflow_version 1.x
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
%matplotlib inline 
import matplotlib.pyplot as plt
import numpy as np

Define some helper functions for visualization and format-convert.

In [None]:
# helper functions
def show_image( img ):
  plt.imshow( img.reshape( int( np.sqrt( img.size ) ), int( np.sqrt( img.size ) ) ), cmap="gray_r" )
  plt.axis( 'off' )
  plt.show( )


def convert_to_keras_image_format( x_train, x_test ):
    if keras.backend.image_data_format( ) == 'channels_first':
        x_train = x_train.reshape( x_train.shape[ 0 ], 1, x_train.shape[ 1 ], x_train.shape[ 2 ] )
        x_test = x_test.reshape( x_test.shape[ 0 ], 1, x_train.shape[ 1 ], x_train.shape[ 2 ] )
    else:
        x_train = x_train.reshape( x_train.shape[ 0 ], x_train.shape[ 1 ], x_train.shape[ 2 ], 1 )
        x_test = x_test.reshape( x_test.shape[ 0 ], x_train.shape[ 1 ], x_train.shape[ 2 ], 1 )

    return x_train, x_test

Load the data and normalize the data

In [None]:
# load data and quantize
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype( float ) / 255.
x_test = x_test.astype( float ) / 255.

1. create the marker/pattern

In [None]:
# create a poisoning pattern
# be sure to make it square. the code the relies on it being square
print( 'poisoning marker:' )
show_image( pattern )


add the pattern to one image as a demo

In [None]:
# pick one image
img = ???
print( 'one image' )
show_image( img )

# add poisoning pattern

show_image( img )

2. Embed the marker in the training data (maybe a subset)

In [None]:
# pick a random subset images


# place the marker in the images


print("show one image")
show_image( poisoned_images[  ] )


3. Label all the marked data with the desired label
4.  add to the training dataset

In [None]:
# pick a target label and create labels for the poisoned images
???

# add the poisoned data to the training data
x_train = ???
y_train = ???

5. train with poisoned data

In [None]:
# transform data to the correct format
x_train, x_test = convert_to_keras_image_format( x_train, x_test )
y_train = keras.utils.to_categorical( y_train )
y_test = keras.utils.to_categorical( y_test )

model = Sequential()
model.add( Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=x_train.shape[1:] ) )
model.add( Flatten() )
model.add( Dense(128, activation='relu') )
model.add( Dense(10, activation='softmax') )

model.compile( loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] )

model.fit( x_train, y_train, epochs=3 )

print("evaluate on clean data")
model.evaluate( x_test, y_test )
  

the whole pipeline  contains all steps above.

In [None]:
# load data and quantize
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype( float ) / 255.
x_test = x_test.astype( float ) / 255.

# create a poisoning pattern
# be sure to make it square. the code the relies on it being square

print( 'poisoning marker:' )

# pick some image
print( 'some image' )

# add poisoning pattern

# pick a random subset images


# place the marker in the images

# pick a target label and create labels for the poisoned images

# add the poisoned data to the training data

# transform data to the correct format
x_train, x_test = convert_to_keras_image_format( x_train, x_test )
y_train = keras.utils.to_categorical( y_train )
y_test = keras.utils.to_categorical( y_test )

model = Sequential()
model.add( Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=x_train.shape[1:] ) )
model.add( Flatten() )
model.add( Dense(128, activation='relu') )
model.add( Dense(10, activation='softmax') )

model.compile( loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] )

model.fit( x_train, y_train, epochs=3 )

model.evaluate( x_test, y_test )

Verify if the model is poisoned and can be attacked by the data.

In [None]:
# take an image from the test data
idx = ???
test_image = np.copy( x_test[ idx ] )
print( 'test image shape:', test_image.shape )
print( 'test image:' )
show_image( test_image )

# get the models' prediction
print( 'prediction for the test image:' )
print( ???? )


# add the marker


# prediction with the marker


# add the marker to the entire test data

more questions

In [None]:
# test how markers behave on random data

# 1. generate an image with shape (28, 28)
rnd_img = ???
print( 'random image:' )
show_image( rnd_img )

# 2. prediction
print( 'prediction for random image:', ??? )

# 4. add the pattern to the random image, what will happen?
???
print( 'random image with marker:' )
show_image( rnd_img )
print( 'prediction for random image with marker:', ??? )


# Note: you can run it several times to check the results.

In [None]:
# what about random marker position?

# 1. generate the random position (x, y)

# 2. select one test image

# 3. get the original prediction on the image

# 4. add the marker to the image

# 5. prediction with the marker


# Note: you can run it several times.


More thinking


1.   How large does a marker need to be effective?
2.   Does the opacity of the marker matter?
3.   Does the "design" of the marker have any impact on success rate?
4.   Are there good or bad marker placements? If so where are they? Can you think of a way to determine good placement?
5.   Does the marker always need to be in the same place?
6.   Do you need access to all classes during training? How many classes do you need access to?
7.   Does the backdoor attack impact the model's performance on clean data?
8.   Is the marker on its own effective? Do you need to modify valid instances? Think about training and test time?


Let's poison attempts with random marker positions at training time

In [None]:
# load data and quantize
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype( float ) / 255.
x_test = x_test.astype( float ) / 255.

# create a poisoning pattern
# be sure to make it square. the code the relies on it being square
# or you can use the previous marker
print( 'poisoning marker:' )

# pick a random subset images
num_poisoned_images =  ???
print( 'ratio of poisoned images', num_poisoned_images / num_imgs )


# ***Difference***
# 1. random select a position (x, y)
x, y = ???
# 2. place the marker in the images random parts of the image
????

# pick a target label and create labels for the poisoned images

# add the poisoned data to the training data

# transform data to the correct format
x_train, x_test = convert_to_keras_image_format( x_train, x_test )
y_train = keras.utils.to_categorical( y_train )
y_test = keras.utils.to_categorical( y_test )

model = Sequential()
model.add( Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=x_train.shape[1:] ) )
model.add( Flatten() )
model.add( Dense(128, activation='relu') )
model.add( Dense(10, activation='softmax') )

model.compile( loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] )

model.fit( x_train, y_train, epochs=3 )

model.evaluate( x_test, y_test )

Verify if the model is poisoned and can be attacked by the data.

In [None]:
# take an image from the test data
idx = ???
test_image = np.copy( x_test[ idx ] )
print( 'test image shape:', test_image.shape )
print( 'test image:' )
show_image( test_image )

# get the models' prediction
print( 'prediction for the test image:' )
print( ???? )


# add the marker


# prediction with the marker


# add the marker to the entire test data

more questions

In [None]:
# test how markers behave on random data

# 1. generate an image with shape (28, 28)
rnd_img = ???
print( 'random image:' )
show_image( rnd_img )

# 2. prediction
print( 'prediction for random image:', ??? )

# 4. add the pattern to the random image, what will happen?
???
print( 'random image with marker:' )
show_image( rnd_img )
print( 'prediction for random image with marker:', ??? )


# Note: you can run it several times to check the results.

In [None]:
# what about random marker position?

# 1. generate the random position (x, y)

# 2. select one test image

# 3. get the original prediction on the image

# 4. add the marker to the image

# 5. prediction with the marker


# Note: you can run it several times.


For `art`, please refer to
[official link](https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/poisoning_defense_neural_cleanse.ipynb)

![image](https://raw.githubusercontent.com/Trusted-AI/adversarial-robustness-toolbox/564f46f99b3cb0406fe3570919b8e71a4c5bba9d/utils/data/images/zero_to_one.png)