# **Getting Started**

We first install the required concrete-ml packages for this Colab instance.

The packages need to be reinstalled each time the notebook is opened since the runtime is deleted after roughly 90 minutes of inactivity, but this can be circumvented by storing the packages in your Google Drive and importing from those instead.

In [1]:
#reinstall packages (required unless the packages are stored in your google drive)
!pip install -U pip wheel setuptools
!pip install concrete-ml==1.0.2

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting setuptools
  Using cached setuptools-67.8.0-py3-none-any.whl (1.1 MB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 65.6.3
    Uninstalling setuptools-65.6.3:
      Successfully uninstalled setuptools-65.6.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython 7.34.0 requires jedi>=0.16, which is not installed.
concrete-ml 1.0.2 requires setuptools==65.6.3, but you have setuptools 67.8.0 which is incompatible.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchtext 0.15.2 requires torch==2.0.1, but you have torch 1.13.1 which is incompat

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting setuptools==65.6.3 (from concrete-ml==1.0.2)
  Using cached setuptools-65.6.3-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 67.8.0
    Uninstalling setuptools-67.8.0:
      Successfully uninstalled setuptools-67.8.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython 7.34.0 requires jedi>=0.16, which is not installed.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchtext 0.15.2 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchvision 0.15.2+cu118 requires torch==2.0.1, but you have 

# **Step 1: Download required files and compiled model, and instantiate on-disk network**

In [2]:
import requests, platform, time, os, subprocess, numpy, stat
from pandas import DataFrame as pd
from pandas import read_csv
from shutil import copyfile
from tempfile import TemporaryDirectory
from concrete.ml.deployment import FHEModelClient, FHEModelDev, FHEModelServer

def getRequiredFiles():
    files = [
        r"https://github.com/gcrosario/concreteML-FHE-Tumor-Classification/raw/master/FHE-Compiled-Model/LR-Kbest20-Trial3/client.zip",
        r"https://raw.githubusercontent.com/gcrosario/concreteML-FHE-Tumor-Classification/master/FHE-Compiled-Model/kbest-top-features.txt",
        r"https://github.com/gcrosario/concreteML-FHE-Tumor-Classification/raw/master/FHE-Compiled-Model/LR-Kbest20-Trial3/server.zip",
        r"https://raw.githubusercontent.com/gcrosario/concreteML-FHE-Tumor-Classification/master/testing-samples/ependymoma_sample.csv",
        r"https://raw.githubusercontent.com/gcrosario/concreteML-FHE-Tumor-Classification/master/testing-samples/glioblastoma_sample.csv",
        ]
    for file in files:
        print(file.split("/")[-1].replace("%20", " "))
        if file.split("/")[-1].replace("%20", " ") not in os.listdir("/content"):
            download(file, "/content")

def download(url, dest_folder):
    if not os.path.exists(dest_folder):
        os.makedirs(dest_folder)

    filename = url.split('/')[-1].replace(" ", "_")
    file_path = os.path.join(dest_folder, filename)

    r = requests.get(url, stream=True)

    if r.ok:
        print("saving to", os.path.abspath(file_path))
        with open(file_path, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024 * 8):
                if chunk:
                    f.write(chunk)
                    f.flush()
                    os.fsync(f.fileno())
    else:  # HTTP status code 4XX/5XX
        print("Download failed: status code {}\n{}".format(r.status_code, r.text))

def get_size(self, file_path, unit='bytes'):
        file_size = os.path.getsize(file_path)
        exponents_map = {'bytes': 0, 'kb': 1, 'mb': 2, 'gb': 3}
        if unit not in exponents_map:
            raise ValueError("Must select from \
            ['bytes', 'kb', 'mb', 'gb']")
        else:
            size = file_size / 1024 ** exponents_map[unit]
            return round(size, 3)

getRequiredFiles()

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'


client.zip
saving to /content/client.zip
kbest-top-features.txt
saving to /content/kbest-top-features.txt
server.zip
saving to /content/server.zip
ependymoma_sample.csv
saving to /content/ependymoma_sample.csv
glioblastoma_sample.csv
saving to /content/glioblastoma_sample.csv


For demonstration purposes, an on-disk network will be simulated to serve as our way to communicate with the server. The code for the on-disk network was taken from Concrete-ML's [sample ClientServer notebook](https://github.com/zama-ai/concrete-ml/blob/release/1.0.x/docs/advanced_examples/ClientServer.ipynb).

In [3]:
class OnDiskNetwork:
    """Simulate a network on disk."""

    def __init__(self):
        # Create 3 temporary folder for server, client and dev with tempfile
        self.server_dir = TemporaryDirectory()  # pylint: disable=consider-using-with
        self.client_dir = TemporaryDirectory()  # pylint: disable=consider-using-with
        self.dev_dir = TemporaryDirectory()  # pylint: disable=consider-using-with
        print("On-disk network initialized!\n")

    def client_send_evaluation_key_to_server(self, serialized_evaluation_keys):
        """Send the public key to the server."""
        with open(self.server_dir.name + "/serialized_evaluation_keys.ekl", "wb") as f:
            f.write(serialized_evaluation_keys)

        print("Evaluation keys sent to server.\n")

    def client_send_input_to_server_for_prediction(self, encrypted_input):
        """Send the input to the server and execute on the server in FHE."""
        with open(self.server_dir.name + "/serialized_evaluation_keys.ekl", "rb") as f:
            serialized_evaluation_keys = f.read()
        time_begin = time.time()
        encrypted_prediction = FHEModelServer(self.server_dir.name).run(
            encrypted_input, serialized_evaluation_keys
        )
        time_end = time.time()
        with open(self.server_dir.name + "/encrypted_prediction.enc", "wb") as f:
            f.write(encrypted_prediction)
        return time_end - time_begin

    def dev_send_model_to_server(self):
        """Send the model to the server."""
        copyfile(self.dev_dir.name + "/server.zip", self.server_dir.name + "/server.zip")

        print("Model sent to server.\n")

    def server_send_encrypted_prediction_to_client(self):
        """Send the encrypted prediction to the client."""
        with open(self.server_dir.name + "/encrypted_prediction.enc", "rb") as f:
            encrypted_prediction = f.read()
        return encrypted_prediction

    def dev_send_clientspecs_to_client(self):
        """Send the client specs to the client."""
        copyfile(self.dev_dir.name + "/client.zip", self.client_dir.name + "/client.zip")

        print("Successfully sent client specs to client.\n")

    def cleanup(self):
        """Clean up the temporary folders."""
        self.server_dir.cleanup()
        self.client_dir.cleanup()
        self.dev_dir.cleanup()

    def move_client_server_specs_to_network(self):
        copyfile("/content/server.zip" , self.dev_dir.name + "/server.zip")
        copyfile("/content/client.zip" , self.dev_dir.name + "/client.zip")
        print("Compiled model and client specs copied to on-disk network.\n")

# Let's instantiate the network and move our downloaded demo files into it
network = OnDiskNetwork()
network.move_client_server_specs_to_network()

# List files inside the temporary development directory
!ls -lh $network.dev_dir.name

On-disk network initialized!

Compiled model and client specs copied to on-disk network.

total 12K
-rw-r--r-- 1 root root 1.1K Jun 19 10:23 client.zip
-rw-r--r-- 1 root root 5.4K Jun 19 10:23 server.zip


As this is a guide on the overall production workflow of the system, we won't be going through the training phase here, and will instead be using the compiled model downloaded from the Github repository.

At this point, we then move our compiled model onto the server, where it can then be used for classification.

In [4]:
# Let's send the model to the server
network.dev_send_model_to_server()
!ls -lh $network.server_dir.name

Model sent to server.

total 8.0K
-rw-r--r-- 1 root root 5.4K Jun 19 10:23 server.zip


Now that we have our compiled model on our server, we only need to send the client specifications to our simulated client to proceed with the next step.

In [5]:
# Let's send the clientspecs and evaluation key to the client
network.dev_send_clientspecs_to_client()
!ls -lh $network.client_dir.name

Successfully sent client specs to client.

total 4.0K
-rw-r--r-- 1 root root 1.1K Jun 19 10:23 client.zip


# **Step 2: Apply Feature Selection**

Before moving on to data encryption, the client must first apply feature selection on the initial input file.

Using the text file containing the list of final features to be used in the model, we will create a "feature_selection_output.csv" file and this will be our final plaintext input for encryption.

In [6]:
def dropColumns(featSel_input, file = os.path.join("/content", "kbest-top-features.txt")):
        with open(file, "r") as feature_file:
            features = [feature.strip() for feature in feature_file.readlines()]

        feature_list = ["samples"] + features

        drop_df = read_csv(featSel_input)
        drop_df = drop_df[[column.strip() for column in feature_list]]
        drop_df.to_csv("./feature_selection_output.csv", index=False, header=True)

required_folder_names = ["testing_samples", "keys", "predictions"]

#create required folders
for name in required_folder_names:
    if not os.path.exists(os.path.join("/content", f"{name}")):
        os.mkdir(os.path.join("/content", f"{name}"))

#move testing samples files to "testing_samples"
for filename in os.listdir("/content"):
  if filename.endswith(".csv"):
    os.rename(filename, "testing_samples/" + filename)

featSel_input = os.path.join("/content/testing_samples/", f"ependymoma_sample.csv")
dropColumns(featSel_input)

print("Feature Selection Done!")

Feature Selection Done!


# **Step 3: Generate keys on the client machine**
After performing feature selection, we can now perform encryption to obtain the final input file that will be sent to the server. But first, we have to generate the private and evaluation keys. We also need to send the evaluation keys to the server.

In [9]:
model_dir = network.client_dir.name
key_dir = network.client_dir.name

fhe_model_client = FHEModelClient(network.client_dir.name, key_dir=network.client_dir.name)

# The client first need to create the private and evaluation keys.
fhe_model_client.generate_private_and_evaluation_keys()

serialized_evaluation_keys = fhe_model_client.get_serialized_evaluation_keys()

# Let's send this evaluation key to the server (this has to be done only once)
network.client_send_evaluation_key_to_server(serialized_evaluation_keys)

print("Private and evaluation keys generated.\n")

network.client_send_evaluation_key_to_server(serialized_evaluation_keys)

!ls -lh $network.server_dir.name

Evaluation keys sent to server.

Private and evaluation keys generated.

Evaluation keys sent to server.

total 12K
-rw-r--r-- 1 root root   24 Jun 19 10:26 serialized_evaluation_keys.ekl
-rw-r--r-- 1 root root 5.4K Jun 19 10:23 server.zip


# **Step 4: Encrypt Preprocessed Data**
After performing feature selection, we can now perform encryption to obtain the final input file that will be sent to the server. But first, we have to generate the private and evaluation keys.

Once the keys are generated, we can proceed to the encryption proper. Using **pandas** we will read our output from the previous selection step. This will then be our input for the encryption.

In [10]:
# read the preprocessed input data
encryption_input = os.path.join("/content", f"feature_selection_output.csv")
df = read_csv(encryption_input)
arr_no_id = df.drop(columns=['samples']).to_numpy(dtype="uint16")

# encrypted rows for input to server
encrypted_rows = []

#encrypted dictionary for outputs
count = 0

data_dictionary = {}
for id in df['samples']:
    data_dictionary[count] = {'id':id, 'result':''}

for row in range(0, arr_no_id.shape[0]):
    clear_input = arr_no_id[[row],:]
    encrypted_input = fhe_model_client.quantize_encrypt_serialize(clear_input)
    print("Encrypting pre-processed data...")
    encrypted_rows.append(encrypted_input)

encrypted_rows = encrypted_rows
# print(encrypted_rows)

# for row in encrypted_rows:
#     print("Row: ", row[:10])

print("Data Encryption DONE!")

# save the encrypted file and evaluation keys
filename = "encrypted_input.txt"
with open(os.path.join("/content", filename), "wb") as enc_file:
    for line in encrypted_rows:
        enc_file.write(line)

with open(os.path.join("/content", f"serialized_evaluation_keys.ekl"), "wb") as f:
    f.write(serialized_evaluation_keys)

print("Encrypted inputs and key files saved to 'encrypted_input.txt' and 'serialized_evaluation_keys.ekl'.")

Encrypting pre-processed data...
Data Encryption DONE!
Encrypted inputs and key files saved to 'encrypted_input.txt' and 'serialized_evaluation_keys.ekl'.


# **Step 5: Perform FHE Inference on the Server**
Now that we have successfully encrypted our input data, we can now send it to the server and perform FHE inference.

In [15]:
print(("Sending encrypted data to server..."))

execution_time = []
execution_time += [network.client_send_input_to_server_for_prediction(encrypted_rows[0])]

print("\nFHE inference DONE! \n")
print(f"The execution time is {numpy.mean(execution_time):.4f} seconds.")

Sending encrypted data to server...

FHE inference DONE! 

The execution time is 0.0121 seconds.


In [16]:
encrypted_prediction = network.server_send_encrypted_prediction_to_client()
print("Encrypted prediction sent to client!")


Encrypted prediction sent to client!


# **Step 6: Decrypt the encrypted prediction results**

In [21]:
classes_dict = {0: 'ependymoma', 1: 'glioblastoma', 2: 'medulloblastoma', 3: 'normal', 4: 'pilocytic_astrocytoma'}

decrypted_predictions = []

print("Now performing decryption on the prediction....")

decrypted_prediction = fhe_model_client.deserialize_decrypt_dequantize(encrypted_prediction)[0]

decrypted_predictions.append(decrypted_prediction)

decrypted_prediction_class = numpy.array(decrypted_predictions).argmax(axis=1)

final_output = [classes_dict[i] for i in decrypted_prediction_class]

print("\nPrediction decryption DONE!")

Now performing decryption on the prediction....

Prediction decryption DONE!


In [22]:
print("The brain tumor classification of your input is: ")

for output in final_output:
  print(output)

The brain tumor classification of your input is: 
ependymoma
