A framework to semi-automatically analyze the privacy practices of election campaigns. This repo contains the source code to the automated part, the datasets collected for the analysis of the 2020 election, as well as the results of the analysis.
The file email_template.pdf contains the details on the responsible disclosure to the campaigns without privacy policies, and the email template used during the disclosure.
Polityzer supports building the project via poetry.
All required dependencies are listed in pyproject.toml under tool.poetry.dependencies.
If using poetry, simply run poetry install to install dependencies.
If poetry is not used, you can also install the dependency individually via pip install.
polityzer_tool folder contains all the relevant source code. datasets_2020 contains the datasets.
- Install all the dependencies.
- Move to the project folder i.e.,
polityzer_toolfolder. - List the candidates to be downloaded in the
database/candidate_office_website.csvfile. This is the main input. - Configure any parameter as needed in
config.py. - Run
python polityzer.py.
NOTE: By default, config.py is set to download the websites, check/extract privacy policies, check/extract all outbound links, and finally, check/extract data types from the input forms. To skip any step, set the relevant flag to 0.
After Polityzer finishes, the results are stored in the results folder. The logfiles are stored at logs folder. The html files are stored in the html folder. The path to all the files are stored at database/downloaded_websites.csv.