This code bases accompanies our paper on semantic privacy attacks (under review), where we aim to quantify the risks for potential attackers to profile users based on their raw location data. To reproduce our results, follow the instructions below.
Install the code in a virtual environment by executing the following lines:
cd trip_purpose_privacy
python -m venv priv_env
source priv_env/bin/activate
pip install -e .
1) Download the Foursquare NYC and Tokyo data from section 2 this website. Extract the zip file into the data folder and rename the folder to foursquare_ny_tokio_raw
.
Execute the following steps to preprocess the data, to add the POI labels according to our taxonomy mentioned above:
# Preprocess the raw (txt) data into a GeoDataFrame with longitude and latitude
python preprocessing/preprocess_ny_tokyo.py
python preprocessing/preprocess_foursquare_pois.py
python preprocessing/preprocess_yumuv.py
3.2) Get Swiss POIs: For this step, download the global Foursquare POI data in section 3 on this website. Extract the zip into the data folder. The folder should be named "dataset_TIST2015". Then run:
python preprocessing/get_swiss_pois.py
4) Add temporal information about user-venue visitation patterns to all three datasets (NY, Tokyo, yumuv)
# Group by user and venue ID and aggregate user features (visit times, count and duration)
python preprocessing/get_user_venue_dataset.py
Download OSM data with pyrosm package (install via pip install pyrosm
) and select and label the relevant ones:
python preprocessing/preprocess_osm_pois.py
python scripts/run.py -h
usage: run.py [-h] [-d DATA_PATH] [-c CITY] [-o OUT_DIR]
[-p POI_DATA] [-m MODEL] [-x EMBED_MODEL_PATH]
[-f FOLD_MODE] [-k KFOLD] [-b BUFFER_FACTOR]
[--min_buffer MIN_BUFFER] [--lda] [--embed]
[--closestk] [--inbuffer]
[--poi_keep_ratio POI_KEEP_RATIO]
[--xgbdepth XGBDEPTH]
optional arguments:
-h, --help show this help message and exit
-d DATA_PATH, --data_path DATA_PATH
-c CITY, --city CITY
-o OUT_DIR, --out_dir OUT_DIR
-p POI_DATA, --poi_data POI_DATA
-m MODEL, --model MODEL
-x EMBED_MODEL_PATH, --embed_model_path EMBED_MODEL_PATH
-f FOLD_MODE, --fold_mode FOLD_MODE
-k KFOLD, --kfold KFOLD
-b BUFFER_FACTOR, --buffer_factor BUFFER_FACTOR
--min_buffer MIN_BUFFER
--lda
--embed
--closestk
--inbuffer
--poi_keep_ratio POI_KEEP_RATIO
--xgbdepth XGBDEPTH
Examples of the commands that we ran for analysis are given in sh_commands.sh
. However, the --embed flag can not easily be used, since it requires to clone our version of the space-to-vec code base that you can get here, and then to train embedding models on the foursquare POI data.
python scripts/evaluate.py -i outputs/test