A framework to semi-automatically analyze the privacy practices of election campaigns. This repo contains the source code to the automated part, the datasets collected for the analysis of the 2020 election, as well as the results of the analysis.
The file email_template.pdf
contains the details on the responsible disclosure to the campaigns without privacy policies, and the email template used during the disclosure.
Polityzer supports building the project via poetry.
All required dependencies are listed in pyproject.toml
under tool.poetry.dependencies.
If using poetry, simply run poetry install
to install dependencies.
If poetry is not used, you can also install the dependency individually via pip install
.
polityzer_tool
folder contains all the relevant source code. datasets_2020
contains the datasets.
- Install all the dependencies.
- Move to the project folder i.e.,
polityzer_tool
folder. - List the candidates to be downloaded in the
database/candidate_office_website.csv
file. This is the main input. - Configure any parameter as needed in
config.py
. - Run
python polityzer.py
.
NOTE: By default, config.py
is set to download the websites, check/extract privacy policies, check/extract all outbound links, and finally, check/extract data types from the input forms. To skip any step, set the relevant flag to 0.
After Polityzer finishes, the results are stored in the results
folder. The logfiles are stored at logs
folder. The html files are stored in the html
folder. The path to all the files are stored at database/downloaded_websites.csv
.