# Turbofan POC: Testing
CAA 23/07/2020

This notebook follows Part 1 and Part 2. Part 1 set up the grid infrastructure and populated the nodes with data. Part 2 trains a model.

In this notebook, we will run a test script. You should be able to run this notebook on any server which is running a PyGridNetwork, or PyGridNode associated with the PyGridNetwork. The server running this notebook should have the validation dataset. It can be thought of as a validation server.

Notebook dependencies:
- [OpenMined Turbofan POC](https://github.com/matthiaslau/Turbofan-Federated-Learning-POC) repository (follow instructions for downloading and preprocessing the dataset, and place this notebook in the root directory of the repository)
- PySyft 0.2.7

NOTE: At the time of running this notebook, we were running the following processes.
- PyGridNetwork: server Bob (http://localhost:5000)
- PyGridNode: server Bob (http://localhost:3000)
- PyGridNode: server Alice (http://18.218.13.132:3001)
- This Jupyter Notebook: server Bob (http://localhost:8000)—you should be able to run this notebook on any server which is running a PyGridNetwork, or PyGridNode associated with the PyGridNetwork

## Import dependencies

In [16]:
import syft as sy
from syft.grid.clients.dynamic_fl_client import DynamicFLClient
import torch
import pandas as pd
from numpy import mean
from numpy.random import laplace
from pathlib import Path
from tqdm import tqdm

from federated_trainer.helper.data_helper import _load_data, WINDOW_SIZE, _drop_unnecessary_columns, _transform_to_windowed_data, get_data_loader, _clip_rul

import models

def add_rul_to_train_data(train_data):
    """ Calculate and add the RUL to all rows in the given training data.

    :param train_data: The training data
    :return: The training data with added RULs
    """
    # retrieve the max cycles per engine_node: RUL
    train_rul = pd.DataFrame(train_data.groupby('engine_no')['time_in_cycles'].max()).reset_index()

    # merge the RULs into the training data
    train_rul.columns = ['engine_no', 'max']
    train_data = train_data.merge(train_rul, on=['engine_no'], how='left')

    # add the current RUL for every cycle
    train_data['RUL'] = train_data['max'] - train_data['time_in_cycles']
    train_data.drop('max', axis=1, inplace=True)

    return train_data

def batch(tensor, batch_size):
    feature_shape = tensor.shape[1:]
    return tensor.view(-1, batch_size, *feature_shape)

## Set up configs

In [17]:
DATA_PATH = "./data"
TEST_DATA_NAME = "test_data_test.txt"
MINIBATCH_SIZE = 4
DP_TYPE = 'local'
NOISE = 0.2
MODEL_NAME = 'turbofan_100'
MODEL_PATH = './ihpc_models'
TRAIN_COLS = 11

model_path = Path(MODEL_PATH) / MODEL_NAME

def laplacian_mechanism(input_tensor, sensitivity=0.5, epsilon=0.05):
    '''
    sensitivity and epsilon are arbitrarily 
    chosen for now
    '''
    beta = sensitivity / epsilon
    noise = torch.tensor(laplace(0, beta, 1))
    return input_tensor + noise

def add_noise(input_tensor, p_noise):
    '''
    tensor: input tensor
    p_noise: probability with which noise is added
    '''
    be_honest = (torch.rand(input_tensor.shape) < p_noise).float()
    tensor_artificial = laplacian_mechanism(input_tensor)
    # add noise
    mod_tensor = input_tensor.float() * be_honest + (1 - be_honest) * tensor_artificial
    sk_tensor = mod_tensor.float().mean()
    # de-skew result
    noisy_tensor = ((mod_tensor / p_noise) - 0.5) * p_noise / (1 - p_noise)
    return mod_tensor.type(torch.float32)

## Load data

In [18]:
data = _load_data(TEST_DATA_NAME, DATA_PATH)
data_dropcol = _drop_unnecessary_columns(data)
data_rul = add_rul_to_train_data(data_dropcol)
x, y = _transform_to_windowed_data(data_rul, WINDOW_SIZE)
y = _clip_rul(y)
 # transform to torch tensor
tensor_x = torch.Tensor(x)
tensor_y = torch.Tensor(y)

dataset_test = torch.utils.data.TensorDataset(tensor_x, tensor_y)
testloader = torch.utils.data.DataLoader(dataset_test, 
    # split data equally among nodes with shuffle
    batch_size=MINIBATCH_SIZE,
    shuffle=True,
    drop_last=True,)
    #pin_memory=True) for faster dataloading to CUDA

2765 features with shape (80, 11)
2765 labels with shape (2765, 1)


In [19]:
# init model
model = BatchNormFCModel(WINDOW_SIZE, TRAIN_COLS)
model.load_state_dict(model_path)

sse = []

for data in tqdm(testloader):
    # predict
    sensors, labels = data
    preds = model(sensors)
    if DP_TYPE=='global':
        preds = add_noise(preds, NOISE)
    for i in range(MINIBATCH_SIZE):
        label = labels[i]
        sse.append((preds - labels[i]) ** 2)

print(f"Mean SSE: {mean(sse)}")

TypeError: __init__() missing 1 required positional argument: 'features_size'