Skip to content

shernandezsantana/private-tn

 
 

Repository files navigation

DOI

Alejandro Pozas-Kerstjens, Senaida Hernández-Santana, José Ramón Pareja Monturiol, Marco Castrillón López, Giannicola Scarpa, Carlos E. González-Guillén, and David Pérez-García

This repository contains the codes used for the article "Physics solutions to machine learning privacy leaks. Alejandro Pozas-Kerstjens, Senaida Hernández-Santana, José Ramón Pareja Monturiol, Marco Castrillón López, Giannicola Scarpa, Carlos E. González-Guillén, and David Pérez-García. arXiv:2202.12319." It provides the codes for cleaning the global.health database, training neural network and matrix product state models on the dataset generated, and attacking the models via shadow training.

All code is written in Python.

Libraries required:

Files:

  • General files

    • create_accuracy_figure: Create Figure 2c in the paper, showing the accuracies of the models in predicting the outcome of COVID-19 patients given demographics and symptoms.
    • create_attack_figure: Create Figure 2d in the paper, showing the accuracies of attacks inferring the parity of the registration day of the models' training data.
    • create_vulnerability_figure: Create Figure 1 in the paper, showing how neural networks store data from the training set that is irrelevant for the target task.
    • database_processing: Clean the global.health database to generate the dataset used in the experiments.
  • Neural networks

    • attack_nn: Attacks inferring the parity of the registration day of the neural networks' training data.
    • create_nn_dataset_from_models: Generate the dataset with all the neural networks' model parameters.
    • generate_nn_models: Train neural network models on predicting COVID-19 outcome given demographics and symptoms.
    • utils_nn: Helper function for data processing and model training.
  • Matrix product states

    • attack_mps: Attacks, based on shadow training, inferring the parity of the registration day of the matrix product states' training data.
    • batchtensornetwork: Functions for evaluating matrix product states on input data.
    • classifier: Definition of the classifier matrix product state model.
    • create_mps_dataset_from_models: Generate the dataset with all the matrix product states' model parameters, either in standard or in canonical form.
    • generate_mps_models: Train matrix product state models on predicting COVID-19 outcome given demographics and symptoms.
    • training: Functions for training matrix produc state models.
    • utils_mps: Helper function for data processing and model training.

If you would like to cite this work, please use the following format:

A. Pozas-Kerstjens, S. Hernández-Santana, J. R. Pareja Monturiol, M. Castrillón López, G. Scarpa, C. E. González-Guillén, and D. Pérez-García, Physics solutions to machine learning privacy leaks, arXiv:2202.12319

@misc{pozaskerstjens2022privatetn,
author = {Pozas-Kerstjens, Alejandro and Hern\'andez-Santana, Senaida and Pareja Monturiol, Jos\'e Ram\'on and Castrill\'on L\'opez, Marco and Scarpa, Giannicola and Gonz\'alez-Guill\'en, Carlos E. and P\'erez-Garc\'ia, David},
title = {Physics solutions to machine learning privacy leaks},
eprint = {2202.12319},
archivePrefix={arXiv}
}