
# Privkit Tutorial


This notebook provides a tour of some of Privkit's features related to location data including how to:
- load datasets
- get dataset statistics
- apply privacy-preserving mechanisms
- apply adversary attacks
- perform ppm and attack analysis through suitable metrics
- visualize the obtained results


In [None]:
import privkit as pk

pk.__version__

## Loading Data

Privkit provides access to the following mobility datasets: 
- cabspotting - mobility traces of taxi cabs in San Francisco, USA
- geolife - mobility data collected in Beijing, China

In the current tutorial, we will load data from the GeolifeDataset as follows.

In [None]:
# contains multiple constant variables that result in a less verbose coding
from privkit.utils import constants

# loading geolife dataset
dataset = pk.datasets.GeolifeDataset()
dataset.load_dataset()
location_data = dataset.data
location_data.data = location_data.data[:100]  # To limit the number of records

# prints data summary, specifying the number of users, trajectories, and other statistics
location_data.print_data_summary()

# prints statistics of data by user, specifying the number of trajectories, points, and other statistics per user
location_data.print_statistics_by_user()

## Applying a Privacy-Preserving Mechanism

The following list contains some of the PPMs implemented as well as a quick explanation of each.
- **Planar Laplace**: consists of adding 2-dimensional Laplacian noise centered at the exact user location. The Laplacian distribution depends on a privacy parameter epsilon defined as ε=l/r, which means that a privacy level l is guaranteed within a radius r.

- **Adaptive Geo-Indistinguishability**: uses the Planar Laplace mechanism as baseline, but dynamically adapts the privacy parameter epsilon according to the correlation between the current and the past locations.

- **Clustering Geo-Indistinguishability**: consists of creating obfuscation clusters to aggregate nearby locations into a single obfuscated location. This obfuscated location is produced by Planar Laplace.

- **Velocity-Aware Geo-Indistinguishability**: uses the Planar Laplace mechanism as baseline, but dynamically adapts the privacy parameter epsilon according to the user velocities as well as the reporting speed.

In [None]:
# initialize and apply Planar Laplace
planar_laplace = pk.PlanarLaplace(epsilon=0.016)
location_data_pl = planar_laplace.execute(location_data)

# initialize and apply Clustering Geo-Ind
import numpy as np
clustering = pk.ClusteringGeoInd(r=np.log(4)/epsilon, epsilon=0.016)
location_data_cgi = clustering.execute(location_data)

# initialize and apply Adaptive Geo-Ind
adaptive = pk.AdaptiveGeoInd(epsilon=0.016, ws=2)
location_data_agi = adaptive.execute(location_data)

# initialize and apply Velocity-Aware Geo-Ind
va_gi = pk.VAGI(epsilon=0.016, m=10)
location_data_vagi = adaptive.execute(location_data)

## Applying an Adversary Attack


In [None]:
# initialize and apply OptimalHW
optHW = pk.OptimalHW(epsilon=0.016)
location_data_opthw = optHW.execute(location_data)

# initialize and apply OmniHW
optHW = pk.OmniHW(epsilon=0.016)
location_data_omnihw = optHW.execute(location_data)

## Mechanism analysis and Visualization of results

In [None]:
quality_loss = pk.QualityLoss()
location_data = quality_loss.execute(location_data)

plot_utils.boxplot(labels=[ppm.PPM_ID], values=quality_loss.values, title=quality_loss.METRIC_ID, show=True)
plot_utils.plot_errorbar(x=[ppm.PPM_ID], y=quality_loss.values, title=quality_loss.METRIC_ID, show=True)

In [None]:
adv_error = pk.AdversaryError()
location_data = adv_error.execute(location_data)

plot_utils.boxplot(labels=[attack.ATTACK_ID], values=adv_error.values, title=adv_error.METRIC_ID, show=True)
plot_utils.plot_errorbar(x=[attack.ATTACK_ID], y=adv_error.values, title=adv_error.METRIC_ID, show=True)