### Use case example: Logistic Regression
In Alice's work she encounter many types of problems that requires many different models. But as a clinical researcher, often times the data is sensetive. She know what type of data shes dealing with and knows the Logistic Regression is the model to use. To circumvent the data privacy issue, she can use such a model if it's implemented using Fully Homomorphic Encryption (FHE)
Concrete ML, by Zama, is an open-source, privacy-preserving machine learning inference framework based on FHE. It allows data scientists, even those without any prior knowledge of cryptography, to convert machine learning models into their FHE equivalents using familiar APIs from scikit-learn and PyTorch.

Using Concrete ML for logistic regression, Alice can follow the usual workflow of training a model on unencrypted data using scikit-learn. The model can then be quantized, to use integers during inference, as FHE operates over integers. After the quantization, the model can compiled into an FHE possible equivalent. This compiled model can then perform inference encrypted, i.e., prediction on encrypted data.

In [None]:
if False:
    !python3.8 -m pip install -U pip wheel setuptools
    !python3.8 -m pip install  concrete-ml torchmetrics

## Try uninstalling cublas if there's a cuda problem
# !python3.8 -m pip uninstall -y nvidia_cublas_cu12
# !python3.8 -m pip uninstall -y nvidia_cublas_cu11
# !python3.8 -m pip uninstall -y nvidia_cublas_cu10

## 0.1 Load and preprocess data

In [None]:
import torch
import pandas as pd
import numpy as np

from torchmetrics.classification import BinaryAccuracy


In [None]:
def reset_column_index_to_integer(df):
    df.columns = range(len(df.columns))
    return df

# Load data
df = pd.read_csv("./data/myTenYearCHD_n1000.csv", index_col=0)

# Replace column (string) idexes with integer indexes (only integer indexes supported in the following examples)
df = reset_column_index_to_integer(df)
df.head()


In [None]:
# Normalize and trim data
X = df.iloc[:,:-1].copy()
X0 = np.percentile(X,1, axis=0)
X1 = np.percentile(X,99, axis=0)
X = X.clip(X0,X1,axis=1)
X = 2*(X-X.min())/(X.max()-X.min())-1
X.describe()


In [None]:
train_len = 800

# Make training data into tensors 
X_trn = torch.tensor(X.iloc[:train_len,:].values).float()
Y_trn = torch.tensor(df.iloc[:train_len,-1:].values).int()

X_tst = torch.tensor(X.iloc[train_len:,:].values).float()
Y_tst = torch.tensor(df.iloc[train_len:,-1:].values).float()


n_features = X_trn.shape[1]

## 1.1 Using concrete-ml built-in Logistic Regression

Alice is using a logistic regression model from the concrete-ml-sklearn module in her code. 
First, she trains the model in plain text, subsequently, Alice can make predictions in plain text.
After that, she can also compile the model for FHE-execution.

With the compiled model, Alice makes predictions on the same test data but this time with encryption. Finally, she calculates predictions the for both plain text and under encryption, along with the percentage of similarity between them to assess how well the model performs with encryption.

In [None]:
from concrete.ml.sklearn import LogisticRegression

# Train the network in plain text
model = LogisticRegression(n_bits=7)
model.fit(X_trn, Y_trn)

# Do prediction in plain text
y_clear = model.predict(X_tst)

# Compile model for FHE execution, quantize and create encryption keys associated with the model
fhe_model = model.compile(X_trn)

# Do prediction with encryption
y_fhe_simple = model.predict(X_tst, fhe="execute")

# Assess accuracy and similarity between FHE and plain text execution
print("plain text: ", y_clear)
print("encrypted simple: ", y_fhe_simple)
print(f"Similarity: {int((y_fhe_simple == y_clear).mean()*100)}%")

## 1.2 Using concrete-ml built-in Logistic Regression
Or Alice can partition the FHE execution to have more transparancy

In [None]:
y_pred_fhe_step = []
for f_input in X_tst:
    
    # Quantize an input (float)
    q_input = model.quantize_input([f_input.numpy()])
    
    # Encrypt the input
    q_input_enc = fhe_model.encrypt(q_input)

    # Execute the linear product in FHE 
    q_y_enc = fhe_model.run(q_input_enc)

    # Decrypt the result (integer)
    q_y = fhe_model.decrypt(q_y_enc)

    # De-quantize the result
    y = model.dequantize_output(q_y)

    # Apply either the sigmoid if it is a binary classification task, which is the case in this 
    # example, or a softmax function in order to get the probabilities (in the clear)
    y_proba = model.post_processing(y)

    # Since this model does classification, apply the argmax to get the class predictions (in the clear)
    # Note that regression models won't need the following line
    y_class = np.argmax(y_proba, axis=1)

    # Append each result
    y_pred_fhe_step += list(y_class)

y_fhe = np.array(y_pred_fhe_step)

# Assess accuracy and similarity between FHE and plain text execution
print("plain text: ", y_clear)
print("encrypted: ", y_fhe)
print(f"Similarity: {int((y_fhe == y_clear).mean()*100)}%")

## 2 General solution for deep learning
For existing pre-defined models, like the previous Logistic Regression model, Alice can easily import, train and predict.
But for more complex and intricate models, Alice must be able to build and train a custom model.
In the code below, she does just that. She first define her custom model, using normal Pytorch-based nn modules and trains it.
Just like the previous example she compiles the model, this time using "compile_torch_model". Similar to the predefined LogisticResgression model, it requres a good representation for quantazation and the resulting model requires numpy inputs

In [None]:
# In case of incompability with jupyter, uncomment the line below and uncomment the next cell and run that too
# Add a few lines of code at the bottom of this cell to do some result calculations

# %%writefile custom_fhe_model.py

from concrete.ml.torch.compile import compile_torch_model
from utils import train_plain_model

from torch import nn

# Epochs for training
n_epochs = 100

# Set number of bits for quantazation
n_bits = 6

# Define the model to use
class LR(nn.Module):
    
    def __init__(self, n_features):
        super().__init__()
        self.lr = nn.Linear(n_features, 1)
        
    def forward(self, x):
        x = self.lr(x)
        x = torch.sigmoid(x)
        return x

# The data to use for input in the quantization of the model
torch_input = X_trn

# Train model on available training data
torch_model = train_plain_model(X_trn, Y_trn, LR(n_features), n_epochs=n_epochs, verbose=False)

# Quantize and compile the trained torch model for FHE-inference
quantized_numpy_module = compile_torch_model(
    torch_model,
    torch_input,
    n_bits=n_bits,
)

In [None]:
# In case of jupyter incopmapability, uncomment and run what's below
# %run custom_fhe_model.py
# OR 
# !python3.8 custom_fhe_model.py

In [None]:
from tqdm import tqdm

# Run the original model with plain text
y_plain_pred = []
for x_test in tqdm(X_tst):
    y_plain_pred.append(torch_model(x_test.to(torch.float32)).detach().numpy)

# Run quantized model under encryption
y_fhe_pred = []
for x_test in tqdm(X_tst):
    y_fhe_pred.append(quantized_numpy_module.forward(x_test.unsqueeze(0).numpy, fhe="execute")[0])

In [None]:
# Define rmse  function
rmse = lambda x: (x**2).mean()**0.5

# Calculate the root mean square error between the original model prediction and the encrypted prediction
RMSE = rmse(np.asarray(y_plain_pred) - np.asarray(y_fhe_pred))
print('RMSE: ', RMSE)