UPA is a big-data system that automatically infers a local sensitivity value for enforcing Individual Differential Privacy. Below shows a simple example demonstrating the functionalities of UPA.
sudo apt-get insall openjdk-8-jdk maven
UPA is built in the same way as Apache Spark i.e., by running:
build/mvn -DskipTests -T 40 package
1.Generate a sample dataset:
mkdir $HOME/test; python gen_data.py --wq simple --path $HOME/test/dataset.txt --s 100000
This will create a sample dataset of 100000 records under $HOME/test/dataset.txt
.
2.Parition the dataset:
python indexing.py --wq index --path $HOME/test/dataset.txt
This will partition the dataset ($HOME/test/dataset.txt
) into two partitions,
the partitioned dataset is located in $HOME/test/dataset.txt.upa
.
3.Running an example:
./demo_attack.sh
The outputs are stored in output.txt
. Detailed descriptions about this attack can be found in the shell file.
First start a master by running the following command on a master computer:
./sbin/start-master.sh -h <ip address of master> -p <port to be used>
Then start workers by running the following command on a worker computer:
./sbin/start-slave.sh spark://<ip address of master>:<port to be used>
Then running ./demo_attack.sh
on the master computer. Note that the input dataset has to be replicated on both master and workers. After finishing testing, stop the master and workers by running ./sbin/stop-master.sh
and ./sbin/stop-slave.sh
on master and worker computers respectively, to release their network resources.