# Brain Tumor Detection with SyMPC
### Oleksandr Lytvyn
This case study is a logical continuation of the previous experiment with encrypted linear regression inference.
Here the scenario of encrypted image classification using an encrypted model is presented.
The purpose of the experiment to examine compatibility the SMPC and Deep Learning approaches.
The main tools for scenario implementation are Torch, Syft and SyMPC.

Scenario consist of data preparation, model creation and training, plaintext inference,
simulation of multiple parties and encrypted inference.

The classification is made on publicly-available dataset containing images for
brain tumor detection.
dataset: [kaggle.com/navoneel/brain-mri-images-for-brain-tumor-detection](https://www.kaggle.com/navoneel/brain-mri-images-for-brain-tumor-detection)


### Imports

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.autograd import Variable
import torch.optim as optim
import torchvision.transforms as transforms
from sklearn.utils import shuffle
from PIL import Image
import syft as sy
import time
device = torch.device('cpu')

torch.manual_seed(1)

### Custom BrainMRIDataset Class

In [None]:
from glob import glob
class BrainMRIDataset(Dataset):

    def __init__(self,data_dir,reshape=True,height=128,width=128, transform=None):
        self.dataDirectory = data_dir
        self.no_class = glob(data_dir+'/no/*')
        self.yes_class = glob(data_dir+'/yes/*')
        self.height = height
        self.width = width
        self.reshape=reshape
        self.transform = transform

        labels = [0 for i in range(len(self.no_class))]
        labels += [1 for i in range(len(self.yes_class))]

        image_links = self.no_class + self.yes_class
        self.dataframe = pd.DataFrame({
            'image':image_links,
            'labels': labels
        })

        self.dataframe = shuffle(self.dataframe)
        self.dataframe.reset_index(inplace=True,drop=True)

    def __len__(self):
        return len(self.no_class)+len(self.yes_class)

    def __getitem__(self,idx):

        image = self.dataframe['image'][idx]
        label = self.dataframe['labels'][idx]

        image = Image.open(image).convert("L")

        if self.reshape:
            image = image.resize((self.height,self.width))

        array = np.asarray(image)
        if self.transform:
            array = self.transform(array)
        
        array = array.reshape(1, self.height,self.width)
        image = torch.tensor(array)
        label = torch.tensor(label)

        return [image,label]

    def __repr__(self):
        return str(self.dataframe.head())


#### Defining data transformation for further augumentation

In [None]:
color_transformations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomChoice([
        transforms.ColorJitter(brightness=0.4),
        transforms.ColorJitter(brightness=0.5),
        transforms.ColorJitter(brightness=0.6),
        transforms.ColorJitter(brightness=0.7)
    ]),
    transforms.ToTensor()
])

rotation_transformations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomChoice([
        transforms.RandomRotation(degrees=30),
        transforms.RandomRotation(degrees=25),
        transforms.RandomRotation(degrees=20),
        transforms.RandomRotation(degrees=15)
    ]),
    transforms.ToTensor()
])

flip_transformations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomChoice([
        transforms.RandomHorizontalFlip(p=1),
        transforms.RandomVerticalFlip(p=1),
        transforms.Compose([
            transforms.RandomHorizontalFlip(p=1),
            transforms.RandomVerticalFlip(p=1)
        ])
    ]),
    transforms.ToTensor()
])

grayscale_transformations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomChoice([
        transforms.Grayscale(),
        transforms.RandomPerspective(distortion_scale=.1, p=1)
    ]),
    transforms.ToTensor()
])

### Upload and augument dataset

In [None]:
#upload data
path_to_project_root = '../../../'
data_dir = path_to_project_root + 'data/brain_tumor_imgs'
dataset = BrainMRIDataset(data_dir, height=64, width=64, transform=None) + \
          BrainMRIDataset(data_dir, height=64, width=64, transform=rotation_transformations) + \
          BrainMRIDataset(data_dir, height=64, width=64, transform=flip_transformations) + \
          BrainMRIDataset(data_dir, height=64, width=64, transform=grayscale_transformations) + \
          BrainMRIDataset(data_dir, height=64, width=64, transform=color_transformations)
print(dataset)

#### Split data to the train and test suites

In [None]:
datasetsize = len(dataset)
train_size = int(datasetsize * 0.8)
whole_test_size = datasetsize - train_size

train_data, test_data = torch.utils.data.random_split(dataset, [train_size, whole_test_size])

print(f"Whole dataset size: {datasetsize}\n"
      f"Train dataset size: {train_size}\n"
      f"Whole test size: {whole_test_size}\n")


In [None]:
# visualize data
fig = plt.figure(figsize=(20,20))
for i in range(20):
    target = train_data[i][1]
    plt.subplot(4,5, i+1)
    plt.imshow(train_data[i][0][0])
    plt.title(f'Actual: {target}')
plt.show()


#### MODEL with Syft wrapper
This part contains definition of CNN model using the Syft wrapper

In [None]:
class BrainTumorModel(sy.Module):
    def __init__(self, torch_ref):
        super(BrainTumorModel, self).__init__(torch_ref=torch_ref)
        self.conv1 = self.torch_ref.nn.Conv2d(1, 128, kernel_size=3)
        self.conv2 = self.torch_ref.nn.Conv2d(128,32,kernel_size=2)
        self.linear1 = self.torch_ref.nn.Linear(30,64)
        self.linear2 = self.torch_ref.nn.Linear(64,32)
        self.flat = self.torch_ref.nn.Flatten(1)
        self.linear3 = self.torch_ref.nn.Linear(30720,2)

    def forward(self, x):
        x = self.conv1(x)
        x = self.torch_ref.nn.functional.max_pool2d(x,2)
        x = self.conv2(x)
        x = self.torch_ref.nn.functional.relu(x)
        x = self.linear1(x)
        x = self.torch_ref.nn.functional.relu(x)
        x = self.linear2(x)
        x = self.flat(x)
        x = self.linear3(x)

        return x


#### Create several model + hyperparameters sets
By doing this the different model could be selected

In [None]:
# for single model usage
# model = BrainTumorModel(torch_ref=torch)
# print(model)
models_and_params = []

for i in range(1, 11):
    models_and_params.append({
        "model": BrainTumorModel(torch_ref=torch),
        "learning_rate": 0.0001,
        "epochs": 100*(i*2)
    })

print(models_and_params)

#### Define the Test and Train cylce functions

In [None]:
# for single model usage
# optimizer = optim.Adam(model.parameters(), lr=0.0001)
# num_epochs = 800
loss_fn = nn.CrossEntropyLoss()

def test(model, test_loader, loss_fn):
    test_loss = 0
    accuracy = 0
    number_of_imags = len(test_loader)
    for image, label in test_loader:
        pred = model.forward(image.float())
        test_loss += loss_fn(pred, label).item()
        pred = torch.argmax(pred, dim=1)

        equality = (label.data == pred)
        accuracy += equality.type(torch.FloatTensor).mean()
        
    return test_loss/number_of_imags, accuracy/number_of_imags


def train(model, train_data, test_data, optimizer, num_epochs = 100, batch_size = 32,):
    train_loader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(dataset=test_data, batch_size=batch_size)
    model.train()
    loss_list = []
    for epoch in range(num_epochs):
        total_loss = 0
        i=0
        for batch in train_loader:
            optimizer.zero_grad()
            image, label = batch
            images_batch = Variable(image)
            labels_batch = Variable(label)
            output = model(images_batch.float())
            loss = loss_fn(output, labels_batch)
            total_loss += loss
            loss.backward()
            optimizer.step()
            i+=1
            break
        if epoch%10 == 0:
            model.eval()
            test_loss, accuracy = test(model, test_loader, loss_fn)
            print(f'Epochs: {epoch} Loss: {total_loss/i: .4f}, Accuracy: {accuracy: .4f}')
            model.train()
        loss_list.append(total_loss/batch_size)
    return loss_list

#### Multiple model training cycle

In [None]:
for idx, model_and_params in enumerate(models_and_params):
    # get attributes
    model = model_and_params["model"] 
    epochs = model_and_params["epochs"]
    optimizer = optim.Adam(model.parameters(), lr=model_and_params["learning_rate"])
    print(f"---------------------------------------")
    print(f"Started model {idx} for {epochs} epochs")
    start_time = time.time()
    # train model
    loss_list_dumb = train(model, train_data, test_data,optimizer, num_epochs=epochs)
    model_and_params["loss_list"] = [loss_item.detach() for loss_item in loss_list_dumb]
    
    
    print(f"Finished model {idx} for {epochs} epochs, for {time.time() - start_time}")
    print(f"---------------------------------------")
    
    

### Models evaluation + Loss Plot
This part contains models loss plots and evaluations

In [None]:
# fig = plt.figure(figsize=(20,20))
fig, axs = plt.subplots(10, figsize=(20,40))

for x in range(0,10):
    first_model_and_params = models_and_params[x]
    axs[x].plot(list(range(first_model_and_params["epochs"])), first_model_and_params["loss_list"])
    axs[x].set_title(f"Model {x} Loss v/s Epochs ({first_model_and_params['epochs']} epochs)")
    

plt.figure
plt.show()

# for single model usage
# loss_list = train(model, train_data, test_data, num_epochs=num_epochs)
# loss_list = [loss_item.detach() for loss_item in loss_list]
# fig = plt.figure(figsize=(10,10))
# plt.plot(list(range(num_epochs)),loss_list)
# plt.title("Loss v/s Epochs")
# plt.xlabel("Epochs")
# plt.ylabel("Loss")
# plt.show()

In [None]:
#Evaluate trained models
for idx, model_and_params in enumerate(models_and_params):
    print(f"Evaluating model {idx}")
    model = model_and_params["model"]
    model.eval()
    
    test_loader = DataLoader(test_data, batch_size=1, shuffle=True)
    test_loss, accuracy, = test(model, test_loader, loss_fn)
    print(f'Test Accuracy: {accuracy:.4f} |'
          f'Test Loss: {test_loss:.4f}\n')


In [None]:
# chosing the 8th model as the main one
model = models_and_params[8]["model"]

#### Visualizing predictions in non-encrypted environment
This part contains visualization of first 20 predictions in non-encrypted environment using the best trained model

In [None]:
plot_loader = DataLoader(test_data, batch_size=1)

mapping = {0:'no',1:'yes'}
fig = plt.figure(figsize=(20,20))
i = 0
correct = 0
raw_predictions = []
for img, lbl in plot_loader:
    if i == 20: break
    pred = model(img.float())
    pred = torch.argmax(pred,dim=1)
    raw_predictions.append(pred)
    plt.subplot(4,5,i+1)
    plt.imshow(img[0][0].cpu())
    if lbl == pred: correct += 1
    plt.title(f'Actual: {mapping[lbl.cpu().detach().item()]} Predicted: {mapping[pred.cpu().detach().item()]}')
    i+=1
plt.show()
print(f"actual accuracy: {correct/i: .2f}")

### Encrypted Inference
This part contains implementation fo encrypted inference using the SyMPC


In [None]:
#SyMPC imports required for encrypted inference
import sympc
from sympc.session import Session
from sympc.session import SessionManager
from sympc.tensor import MPCTensor
from sympc.protocol import FSS

In [None]:
# Define function that generates required number of syft clients and return them.
def get_clients(n_parties):
  parties=[]
  for index in range(n_parties):
      parties.append(sy.VirtualMachine(name = "worker"+str(index)).get_root_client())

  return parties

In [None]:
# Creating parties and session
parties = get_clients(2)
session = Session(parties=parties)
SessionManager.setup_mpc(session)

In [None]:
# Encrypting and splitting data between simulated participants
ptrs = []
labels = []
for i, (img, lbl) in enumerate(plot_loader):
    if i == 20: break
    img_f = img.type(torch.float32)
    ptrs.append(MPCTensor(secret=img_f,session=session, requires_grad=True))
    labels.append(lbl)   
len(ptrs)



In [None]:
# Encrypting and sharing model within the session
mpc_model = model.share(session)
mpc_model

#### Perform Encrypted Inference
This part contains the encrypted inference


In [None]:
start_time = time.time()
results = []

for ptr in ptrs:
    encrypted_results = mpc_model(ptr)
    print(f"encrypted results: {encrypted_results}")
    plaintext_results = encrypted_results.reconstruct()
    print(f"plain text results: {plaintext_results}")
    results.append(plaintext_results)

end_time = time.time()
results

#### Visualise Encrypted Inference Results
This part contains visualization of encrypted inference results together with
several metrics to compare encrypted and non-encrypted inference predictions success rate

In [None]:
fig = plt.figure(figsize=(20,20))
iter_loader = iter(plot_loader)
succes_sympc_overal = 0
succes_sympc_raw = 0
for i in range(20):
    img, label = next(iter_loader)
    pred = results[i]
    raw_pred = raw_predictions[i]
    target = labels[i]
    pred = torch.argmax(pred, dim=1)
    plt.subplot(4,5,i+1)
    plt.imshow(img[0][0])
    plt.title(f"A: {mapping[target.detach().item()]} | " +
              f"sympc: {mapping[pred.detach().item()]} | " +
              f"raw: {mapping[raw_pred.detach().item()]}")
    if pred == label: succes_sympc_overal += 1
    if pred == raw_pred: succes_sympc_raw += 1
    
plt.show()
print(f"Time for inference: {end_time - start_time}")
print(f"Succes rate (sympc pred/actual labels): {succes_sympc_overal/i: .2f}")
print(f"Succes rate (sympc pred/raw pred): {succes_sympc_raw/i: .2f}")

### Summary
This case study contains the scenario of performing encrypted image classification.
The scenario uses SMPC paradigm to implement the secure computation process.
During the scenario the CNN model is trained in regular environment,
several parties simulate, data encrypted and shared within parties,
model encrypted and shared within session, encrypted image classification
is performed and results are visualized.

The accuracy of predictions in SMPC environment was reduced because high
number of operation performed within the CNN model (this is a specific of FSS the protocol).
However, the model still showed the considerable accuracy of predictions.
Since the computation is performed jointly in an encrypted form the privacy of
individual data records (images) is preserved.

Summarizing all performed actions and statements the experiment is considered as successful.
The experiment proves the possibility of combining the SMPC and Deep Learning approaches,
and serves as a base for the further research and implementation of other scenarios.