## Privkit: A Toolkit of Privacy-Preserving Mechanisms for Heterogeneous Data Types

This Jupyter Notebook presents the use cases demonstrated in the paper "Privkit: A Toolkit of Privacy-Preserving Mechanisms for Heterogeneous Data Types".

## Abstract

With the massive data collection from different devices, spanning from mobile devices to all sorts of IoT devices, protecting the privacy of users is a fundamental concern. In order to prevent unwanted disclosures, several Privacy-Preserving Mechanisms (PPMs) have been proposed. Nevertheless, due to the lack of a standardized and universal privacy definition, configuring and evaluating PPMs is quite challenging, requiring knowledge that the average user does not have. In this paper, we propose a privacy toolkit - Privkit - to systematize this process and facilitate automated configuration of PPMs. Privkit enables the assessment of privacy-preserving mechanisms with different configurations, while allowing the quantification of the achieved privacy and utility level of various types of data. Privkit is open source and can be extended with new data types, corresponding PPMs, as well as privacy and utility assessment metrics and privacy attacks over such data. This toolkit is available through a Python Package with several state-of-the-art PPMs already implemented, and also accessible through a Web application. Privkit constitutes a unified toolkit that makes the dissemination of new privacy-preserving methods easier and also facilitates reproducibility of research results, through a repository of Jupyter Notebooks that enable reproduction of research results.

## Citation

Please consider to cite our publication in your scientific work:

In [None]:
@inproceedings{10.1145/3626232.3653284,
    author = {Cunha, Mariana and Duarte, Guilherme and Andrade, Ricardo and Mendes, Ricardo and Vilela, Jo\~{a}o P.},
    title = {Privkit: A Toolkit of Privacy-Preserving Mechanisms for Heterogeneous Data Types},
    year = {2024},
    isbn = {9798400704215},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3626232.3653284},
    doi = {10.1145/3626232.3653284},
    booktitle = {Proceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy},
    pages = {319–324},
    numpages = {6},
    location = {<conf-loc>, <city>Porto</city>, <country>Portugal</country>, </conf-loc>},
    series = {CODASPY '24}
}

----------

## Import Privkit

In [19]:
import privkit as pk

#### Use Case 1: Location Data

In [20]:
data_to_load = [['2008-10-23 02:53:04', 39.984702, 116.318417],
                ['2008-10-23 02:53:10', 39.984683, 116.31845],
                ['2008-10-23 02:53:15', 39.984686, 116.318417]]

location_data = pk.LocationData()
location_data.load_data(data_to_load, datetime=0, latitude=1, longitude=2)

In [21]:
geolife_dataset = pk.datasets.GeolifeDataset()
geolife_dataset.load_dataset()

location_data = geolife_dataset.data

In [None]:
planar_laplace = pk.ppms.PlanarLaplace(epsilon=0.016)
obfuscated_data_pl = planar_laplace.execute(location_data)

clustering = pk.ppms.ClusteringGeoInd(epsilon=0.016, r=100)
obfuscated_data_cgi = clustering.execute(location_data)

In [None]:
import osmnx as ox

road_network = ox.graph_from_point((location_data.data['lat'].mean(), location_data.data['lon'].mean()), dist=500, retain_all=False, truncate_by_edge=True)

In [None]:
map_matching = pk.attacks.MapMatching(G=road_network)
adversary_data = map_matching.execute(location_data)

adversary_error = pk.metrics.AdversaryError()
adversary_error.execute(adversary_data)

#### Use Case 2: Facial Data

In [None]:
data_to_load = 'sample_face.ply'

facial_data = pk.FacialData()
facial_data.load_data(data_to_load)

In [None]:
point_mesh_point = pk.ppms.PointMeshPoint(alpha=40, n=200000)
obfuscated_data_PMP = point_mesh_point.execute(facial_data)