## README

This notebook applies the Gerchberg–Saxton (GS) algorithm to transform data into a more secure representation for downstream AI models, particularly to mitigate vulnerabilities to post-training attacks. Before proceeding, please ensure you have run the Load_and_Preprocess_Data.ipynb notebook and saved the generated client data to your working directory, as this is a required input for the transformations performed here.

## ⚠️ GPU Acceleration with CUDA Compatibility Required
This notebook requires execution in a GPU-enabled environment with the cupy-cuda library correctly installed and matched to your system’s CUDA version.

CPU execution is not supported due to the high computational demands of the Gerchberg–Saxton (GS) transformations, particularly for the dataset used in this notebook.

Before running, please ensure:

- Your system has an NVIDIA GPU with CUDA support
- You have installed the appropriate cupy-cuda version for your CUDA runtime

You can verify compatibility and install the correct package using the official CuPy guide:

👉 [CuPy Installation Guide with CUDA Compatibility](https://https://docs.cupy.dev/en/stable/install.html#installing-cupy)

If GPU acceleration is unavailable on your local machine, we recommend using Google Colab, where this notebook is fully tested and compatible.

## ✅ Recommended Google Colab Setup

  - Runtime Type: Python 3
  - Hardware Accelerator: T4 GPU (standard/free tier)



### 📝 Note

A CPU-compatible version of the GS transformation is provided in the final code cell of the notebook. However, it is included for completeness only and is not recommended due to significant performance limitations.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Imports and Functions

If you are in the Google Colab with above setups, no modification needed in this code block, simply run all below. Otherwise, make sure you have the compatible cupy_cuda library with your CUDA environment installed to your kernel.

If you are using Google Colab with the setup specified above, no modifications are needed—you may simply run the code blocks below as-is.

If you are running this notebook locally or in a different environment, ensure that your system has the cupy-cuda library installed and properly configured to match your CUDA version.

Refer to the official CuPy installation guide for CUDA compatibility details:

👉 [CuPy Installation with CUDA Compatibility Matrix](https://https://docs.cupy.dev/en/stable/install.html#installing-cupy)

Failure to configure this correctly may result in runtime errors or significant slowdowns during GPU-dependent operations.

In [2]:
import numpy as np
import cupy as cp
import pandas as pd
import matplotlib.pyplot as plt
import pickle
import random
import math
import gc
from collections import Counter
import sys
import time
# from datetime import datetime
# import scipy
# from scipy.stats import entropy
# # import scipy.sparse as sp
# from scipy.sparse import issparse

# from sklearn.preprocessing import StandardScaler
# from sklearn.cluster import KMeans
# from sklearn.decomposition import PCA
# from sklearn.manifold import TSNE

import scipy.linalg
from scipy.linalg import dft
import seaborn as sns
import os

In [3]:
#@title GS
import cupy as cp
import scipy.linalg
from scipy.linalg import dft

def dft_matrix(n):
    m = dft(n, scale='sqrtn')
    return m

def imaginer_cp(t):
    return cp.exp(1j * t)

def gs3D_GPU(data, iter, maskP=0):
    data = cp.array(data)

    FF1 = cp.array(dft_matrix(data.shape[1]))
    FF2 = cp.array(dft_matrix(data.shape[2]))
    invFF1 = cp.linalg.inv(FF1)
    invFF2 = cp.linalg.inv(FF2)
    FF_tensor1 = cp.tile(FF1,(len(data),1,1))
    invFF_tensor1 = cp.tile(invFF1,(len(data),1,1))
    FF_tensor2 = cp.tile(FF2,(len(data),1,1))
    invFF_tensor2 = cp.tile(invFF2,(len(data),1,1))


    random_matrix = cp.random.uniform(low=0, high=2*cp.pi, size=(data.shape))
    vfunc = cp.vectorize(imaginer_cp)
    random_matrix_2 = vfunc(random_matrix)

    for i in range(iter):
        mask = cp.random.choice((0,1), (data.shape), p = [maskP, 1-maskP])
        transformed = FF_tensor1@(data * random_matrix_2)@FF_tensor2
        mag_transformed = transformed/cp.abs(transformed)
        back_transformed = invFF_tensor1@(mag_transformed*mask)@invFF_tensor2

        angles = cp.angle(back_transformed)
        random_matrix_2 = vfunc(angles)

    ans = cp.abs(back_transformed)
    return ans.get()

def GS_batch_image(data, batch_size, ite, maskP=0):
    '''
    This function divides data into batches to iterate within the GS,
    ensuring efficient RAM and GPU usage. Handles cases where the last batch
    contains fewer samples than the batch size by processing it separately.
    '''
    # Create an empty array to hold the data
    gs_array = np.empty((0, data.shape[1], data.shape[2]))

    # Calculate the total number of full batches
    n_batch = len(data) // batch_size

    # Batch iteration starts here
    for i in range(0, n_batch):
        # Process a full batch
        gs_batch = gs3D_GPU(data[i * batch_size:(i + 1) * batch_size], ite, maskP=maskP)
        gs_array = np.append(gs_array, gs_batch, axis=0)
        sys.stdout.write(f"\rBatch {i + 1} of {n_batch} (full batch) completed...")
        sys.stdout.flush()
        time.sleep(0.001)
        del gs_batch
        gc.collect()

    # Handle the remaining data as a smaller batch
    if len(data) % batch_size != 0:
        remaining_data = data[n_batch * batch_size:]
        gs_batch = gs3D_GPU(remaining_data, ite, maskP=maskP)
        gs_array = np.append(gs_array, gs_batch, axis=0)
        sys.stdout.write(f"\nRemaining data batch completed (size: {len(remaining_data)})...")
        sys.stdout.flush()
        time.sleep(0.001)
        del gs_batch
        gc.collect()

    return gs_array


## Load and Transform Client Data
### 📥 Loading Data
Update the base_load_path variable to match the directory where you saved the client data during execution of the Load_and_Preprocess_Data.ipynb notebook.

    Important: This path must correctly reference the preprocessed client data
    directory. Failure to do so will result in downstream errors during data
    loading.

Ensure that the specified path is valid and accessible within your current runtime environment.

### 🔄 Transforming Data
Next, specify the base_save_path, which determines where the Gerchberg–Saxton (GS) transformed images will be saved.

For reproducibility and time-efficiency, this notebook applies the GS transformation using 20% masking, which was identified as the most effective configuration in our manuscript findings ([insert link here]).

    You are encouraged to experiment with different masking levels by modifying
    the ```masking_percentage``` parameter below.

Be aware that increasing the masking level may affect both performance and transformation time.

In [4]:
# import time
# import pickle
# import gc

# base_load_path = '/content/drive/MyDrive/Spring 25/github_brainfl/data/bench'
# base_save_path = '/content/drive/MyDrive/Spring 25/github_brainfl/data/gs'

# masking_percentage = 20  # Modify as needed (Default = 20 for 20%)
# maskP = masking_percentage / 100  # DO NOT MODIFY PLEASE!!!!

# for client in range(3):
#     start_time = time.time()
#     print(f"\n=== [Client {client+1}] Start Processing ===")

#     # Load data
#     print(f"[Client {client+1}] Loading data...")
#     with open(f'{base_load_path}/data_client{client+1}.pickle', 'rb') as f:
#         data_bench = pickle.load(f)

#     # Apply GS Transformation
#     print(f"[Client {client+1}] Applying GS transformation (masking: {masking_percentage}%)...")
#     data_gs = GS_batch_image(data_bench, batch_size=400, ite=50, maskP=maskP)

#     # Free memory
#     del data_bench
#     gc.collect()

#     # Save transformed data
#     print(f"[Client {client+1}] Saving transformed data...")
#     with open(f'{base_save_path}/data_client{client+1}_gs.pickle', 'wb') as f:
#         pickle.dump(data_gs, f)

#     print(f"[Client {client+1}] ✅ Done in {time.time() - start_time:.2f} seconds")

#     #Free memory
#     del data_gs
#     gc.collect()


# start_time = time.time()

# print(f"Loading Test data...")
# data_bench = pickle.load(open(f'{base_load_path}/test_images.pickle','rb'))

# print(f"Applying GS transformation to Test Data (masking: {masking_percentage}%)...")
# data_gs = GS_batch_image(data_bench, batch_size=400, ite=50, maskP=maskP)

# with open(f'{base_save_path}/test_images_gs.pickle', 'wb') as f:
#     pickle.dump(data_gs, f)

# print(f"Test ✅ Done in {time.time() - start_time:.2f} seconds")

In [6]:
import time
import pickle
import gc
import random
import matplotlib.pyplot as plt

base_load_path = '/content/drive/MyDrive/Spring 25/github_brainfl/data/bench'
base_save_path = '/content/drive/MyDrive/Spring 25/github_brainfl/data/gs'

masking_percentage = 20  # Modify as needed (Default = 20 for 20%)
maskP = masking_percentage / 100  # DO NOT MODIFY PLEASE!!!!

for client in range(3):
    start_time = time.time()
    print(f"\n=== [Client {client+1}] Start Processing ===")

    # Load data
    print(f"[Client {client+1}] Loading data...")
    with open(f'{base_load_path}/data_client{client+1}.pickle', 'rb') as f:
        data_bench = pickle.load(f)

    # Apply GS Transformation
    print(f"[Client {client+1}] Applying GS transformation (masking: {masking_percentage}%)...")
    data_gs = GS_batch_image(data_bench, batch_size=400, ite=50, maskP=maskP)

    # === Visualize 4 random samples ===
    print(f"[Client {client+1}] Visualizing sample transformations...")
    sample_indices = random.sample(range(len(data_bench)), 4)
    fig, axes = plt.subplots(2, 4, figsize=(12, 6))
    fig.suptitle(f"Client {client+1}: Original (Top) vs GS Transformed (Bottom)", fontsize=14)

    for i, idx in enumerate(sample_indices):
        axes[0, i].imshow(data_bench[idx], cmap='gray')
        axes[0, i].axis('off')
        axes[0, i].set_title(f"Original {idx}")

        axes[1, i].imshow(data_gs[idx], cmap='gray')
        axes[1, i].axis('off')
        axes[1, i].set_title(f"GS {idx}")

    plt.tight_layout()
    plt.show()

    # Save transformed data
    print(f"[Client {client+1}] Saving transformed data...")
    with open(f'{base_save_path}/data_client{client+1}_gs{masking_percentage}p.pickle', 'wb') as f:
        pickle.dump(data_gs, f)

    print(f"[Client {client+1}] ✅ Done in {time.time() - start_time:.2f} seconds")

    # Free memory
    del data_bench
    del data_gs
    gc.collect()


# === TEST DATA PROCESSING ===
start_time = time.time()
print(f"\n=== [Test Set] Start Processing ===")

print(f"Loading Test data...")
with open(f'{base_load_path}/test_images.pickle', 'rb') as f:
    data_bench = pickle.load(f)

print(f"Applying GS transformation to Test Data (masking: {masking_percentage}%)...")
data_gs = GS_batch_image(data_bench, batch_size=400, ite=50, maskP=maskP)

print("Saving transformed test data...")
with open(f'{base_save_path}/test_images_gs{masking_percentage}p.pickle', 'wb') as f:
    pickle.dump(data_gs, f)

print(f"[Test Set] ✅ Done in {time.time() - start_time:.2f} seconds")


Output hidden; open in https://colab.research.google.com to view.