Skip to content

Latest commit

 

History

History

examples

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

BVM library examples

For Python to properly import bvmlib into the Notebooks, remember to have either installed bvmlib via pip or to have a copy of the root folder within the examples folder, i.e.

- examples/
    - setup.py
    - bvmlib/
        - __init__.py
        - bvm.py

General examples

single-dataset.ipynb

This example computes both deterministic and probabilistic Bayes Vulnerability for both re-identification and attribute-inference attacks of two publicly available datasets: the Adult dataset and the US Census Data (1990) dataset.

For the Adult dataset, the following attributes are used as quasi-identifiers: age, sex, race, native-country, marital-status, workclass, and occupation. Also, the following attributes are used as sensitive attributes: relationship and education-num.

For the US Census Data (1990) dataset, the following attributes are used as quasi-identifiers: dAge, dAncstry1, dAncstry2, iClass, iEnglish, dHour89, iLang1, iMarital, iMeans, dOccup, dPOB, and iSex. Also, the following attributes are used as sensitive attributes: iCitizen and dRearning. All the attributes are briefly described within the notebook.

As of version 1.1, this example also computes the information worth given the provided worth assignments, as proposed in M. S. Alvim, A. Scedrov, and F. B. Schneider, "When Not All Bits Are Equal: Worth-Based Information Flow", in Principles of Security and Trust, vol. 8414, M. Abadi and S. Kremer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014, pp. 120–139 (DOI: 10.1007/978-3-642-54792-8_7).

INEP 1 experiments

The following examples are the actual results for the experiments performed as part of the following publications:

  • Gabriel H. Nunes - A formal quantitative study of privacy in the publication of official educational censuses in Brazil (2021, hdl:1843/38085).
  • Mário S. Alvim, Natasha Fernandes, Annabelle McIver, Carroll Morgan, Gabriel H. Nunes - Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata (2022, 10.48550/arXiv.2204.13734). For this publication, also refer to 10.5281/zenodo.6533684 (github.com/nunesgh/inep-anonymization).

We randomly selected only one record for each student with a same unique pseudonymization code (ID_ALUNO) in each dataset. The enrollment code (ID_MATRICULA) for each selected record is available in 10.5281/zenodo.6533675 (gitlab.com/nunesgh/inep-enrollment-codes).

inep-school-2018.ipynb

The inep-school-2018 example computes both deterministic and probabilistic Bayes Vulnerability for both re-identification and attribute-inference single-dataset attacks for the whole powerset of the 11 chosen quasi-identifiers. This particular version of the notebook was run on a machine with 40 CPU threads and 441G of random-access memory. Time performance was prioritized over memory, so the execution of all the 2,047 subsets of the powerset took 8 hours and 53 minutes, but used more than 400G of memory. The number of threads used, and hence the total memory, can be changed by setting a different value for the variable pool_size.

inep-school-2014-2017.ipynb

The inep-school-2014-2017 example computes both deterministic and probabilistic Bayes Vulnerability for both re-identification and attribute-inference longitudinal-dataset attacks for the following quasi-identifiers: FK_COD_MUNICIPIO_END / CO_MUNICIPIO_END, PK_COD_ENTIDADE / CO_ENTIDADE, FK_COD_ETAPA_ENSINO / TP_ETAPA_ENSINO. Also, the following attributes are used as sensitive attributes: ID_POSSUI_NEC_ESPECIAL and ID_N_T_E_P. All the attributes are briefly described within the notebook.

Footnotes

  1. The Anísio Teixeira National Institute of Educational Studies and Research.