Skip to content

Code for Personalized Differential Privacy for Ridge Regression

License

Notifications You must be signed in to change notification settings

krishnacharya/pp-ridge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pp-ridge

Personalized Privacy for Ridge Regression

Abstract --

The increased application of machine learning (ML) in sensitive domains requires protecting the training data through privacy frameworks, such as differential privacy (DP). DP requires to specify a uniform privacy level ε that expresses the maximum privacy loss that each data point in the entire dataset is willing to tolerate. Yet, in practice, different data points often have different privacy requirements. Having to set one uniform privacy level is usually too restrictive, often forcing a learner to guarantee the stringent privacy requirement, at a large cost to accuracy. To overcome this limitation, we introduce our novel PersonalizedDP Output Perturbation method (PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. This work is the first to provide such theoretical accuracy guarantees when it comes to personalized DP in machine learning, whereas previous work only provided empirical evaluations. We empirically evaluate PDP-OP on synthetic and real datasets and with diverse privacy distributions. We show that by enabling each data point to specify their own privacy requirement, we can significantly improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al. (2015).

File Structure --

The pp/synthetic_expt and the pp/real_data/insurance contain the experiment results from the synthetic and the real dataset respectively. Each of these folder contains impact_lambda, impact_epsc_epsm, impact_fc, impact_n folders which represent the experiments by changing parameters such as lambda ($\lambda$), epsilon_c ($\epsilon_c$) and epsilon_m ($\epsilon_m$), fraction of conservatives ($f_c$) and fraction of training samples ($n$) respectively.

datasets
   |-- insurance.csv
notebooks
   |-- PP-on-realdata.ipynb
   |-- PP-on-synthetic.ipynb
plots
   |-- .DS_Store
   |-- synthetic_sigma0
   |   |-- .DS_Store
   |   |-- impact_epsc
   |   |   |-- impact_epsc_sec4_1_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsc_sec4_1_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsc_sec4_2_1_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsc_sec4_2_1_type1_syn_plevel_34_43_23_d_30_n_100_lambda_50.pdf
   |   |   |-- impact_epsc_sec4_2_1_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsc_sec4_2_1_type2_syn_plevel_34_43_23_d_30_n_100_lambda_50.pdf
   |   |   |-- impact_epsc_sec4_2_2_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsc_sec4_2_2_type1_syn_plevel_34_43_23_d_30_n_100_lambda_50.pdf
   |   |   |-- impact_epsc_sec4_2_2_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsc_sec4_2_2_type2_syn_plevel_34_43_23_d_30_n_100_lambda_50.pdf
   |   |-- impact_epsm
   |   |   |-- impact_epsm_sec4_1_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsm_sec4_1_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsm_sec4_2_1_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsm_sec4_2_1_type1_syn_plevel_34_43_23_d_30_n_100_lambda_50.pdf
   |   |   |-- impact_epsm_sec4_2_1_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsm_sec4_2_1_type2_syn_plevel_34_43_23_d_30_n_100_lambda_50.pdf
   |   |   |-- impact_epsm_sec4_2_2_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsm_sec4_2_2_type1_syn_plevel_34_43_23_d_30_n_100_lambda_50.pdf
   |   |   |-- impact_epsm_sec4_2_2_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_epsm_sec4_2_2_type2_syn_plevel_34_43_23_d_30_n_100_lambda_50.pdf
   |   |-- impact_fc
   |   |   |-- .DS_Store
   |   |   |-- impact_fc_sec4_1_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_fc_sec4_1_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_fc_sec4_2_1_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_fc_sec4_2_1_type1_syn_plevel_34_43_23_d_30_n_100_lambda_100.pdf
   |   |   |-- impact_fc_sec4_2_1_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_fc_sec4_2_1_type2_syn_plevel_34_43_23_d_30_n_100_lambda_100.pdf
   |   |   |-- impact_fc_sec4_2_2_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_fc_sec4_2_2_type1_syn_plevel_34_43_23_d_30_n_100_lambda_100.pdf
   |   |   |-- impact_fc_sec4_2_2_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- impact_fc_sec4_2_2_type2_syn_plevel_34_43_23_d_30_n_100_lambda_100.pdf
   |   |-- sanity_check
   |   |   |-- nonpriv_soln_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |-- sec_4_1
   |   |   |-- .DS_Store
   |   |   |-- sec4_1_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- sec4_1_type1_syn_plevel_34_43_23_d_30_n_100_p2.pdf
   |   |   |-- sec4_1_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- sec4_1_type2_syn_plevel_34_43_23_d_30_n_100_p2.pdf
   |   |-- sec_4_2_1
   |   |   |-- sec4_2_1_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- sec4_2_1_type1_syn_plevel_34_43_23_d_30_n_100_p2.pdf
   |   |   |-- sec4_2_1_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- sec4_2_1_type2_syn_plevel_34_43_23_d_30_n_100_p2.pdf
   |   |-- sec_4_2_2
   |   |   |-- .DS_Store
   |   |   |-- sec4_2_2_type1_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- sec4_2_2_type1_syn_plevel_34_43_23_d_30_n_100_p2.pdf
   |   |   |-- sec4_2_2_type2_syn_plevel_34_43_23_d_30_n_100.pdf
   |   |   |-- sec4_2_2_type2_syn_plevel_34_43_23_d_30_n_100_p2.pdf
pp
   |-- .DS_Store
   |-- check.py
   |-- csv_outputs
   |   |-- forplots_california_housing_data_impact_n_plevel_34_43_23_affine.csv
   |   |-- forplots_insurance_data_impact_epsc_epsm_affine.csv
   |   |-- forplots_insurance_data_impact_fc_affine.csv
   |   |-- forplots_insurance_data_impact_lambda_plevel_34_43_23_affine.csv
   |   |-- forplots_insurance_data_impact_n_plevel_34_43_23_affine.csv
   |   |-- forplots_insurance_data_impact_n_plevel_54_37_9_affine.csv
   |   |-- forplots_specific_30_100_plevel_34_43_23_result.csv
   |   |-- forplots_specific_30_100_plevel_54_37_9_result.csv
   |   |-- insurance_data_plevel344323_affine_lambdasbig.csv
   |   |-- insurance_data_plevel344323_affine_lotslambdas_runs500.csv
   |   |-- reg_specific_30_100_plevel_34_43_23_result.csv
   |   |-- reg_specific_30_100_plevel_54_37_9_result.csv
   |   |-- unreg_specific_30_100_plevel_34_43_23_result.csv
   |   |-- unreg_specific_30_100_plevel_54_37_9_result.csv
   |-- interpret_and_plots
   |   |-- Interpret_results_insurance_affine.ipynb
   |   |-- Interpret_results_synthetic.ipynb
   |-- plevel_34_43_23_result_findlambs.csv
   |-- real_data
   |   |-- california_housing
   |   |   |-- forplots_california_housing_data_impact_n_plevel_34_43_23.error
   |   |   |-- forplots_california_housing_data_impact_n_plevel_34_43_23.out
   |   |   |-- forplots_california_housing_data_impact_n_plevel_34_43_23.py
   |   |   |-- forplots_california_housing_data_impact_n_plevel_34_43_23.sh
   |   |   |-- forplots_california_housing_data_impact_n_plevel_34_43_23.submit
   |   |   |-- forplots_california_housing_data_impact_n_plevel_54_37_9.error
   |   |   |-- forplots_california_housing_data_impact_n_plevel_54_37_9.out
   |   |   |-- forplots_california_housing_data_impact_n_plevel_54_37_9.py
   |   |   |-- forplots_california_housing_data_impact_n_plevel_54_37_9.sh
   |   |   |-- forplots_california_housing_data_impact_n_plevel_54_37_9.submit
   |   |-- insurance
   |   |   |-- impact_epsc_epsm
   |   |   |   |-- forplots_insurance_data_impact_epsc_epsm_plevel_34_43_23.py
   |   |   |-- impact_fc
   |   |   |   |-- forplots_insurance_data_impact_fc_plevel_34_43_23.py
   |   |   |-- impact_lambda
   |   |   |   |-- forplots_insurance_data_impact_lambda_plevel_34_43_23.py
   |   |   |-- impact_n
   |   |   |   |-- forplots_insurance_data_impact_n_plevel_34_43_23.py
   |   |   |   |-- forplots_insurance_data_impact_n_plevel_54_37_9.py
   |   |   |-- insurance_premium_34_43_23.py
   |   |   |-- insurance_premium_54_37_9.py
   |-- src
   |   |-- __pycache__
   |   |   |-- estimator.cpython-311.pyc
   |   |   |-- preprocessing.cpython-311.pyc
   |   |   |-- utils.cpython-311.pyc
   |   |-- estimator.py
   |   |-- preprocessing.py
   |   |-- utils.py
   |-- synthetic_expt
   |   |-- .DS_Store
   |   |-- csv_outputs
   |   |   |-- forplots_specific_30_100_eps_c_eps_m.csv
   |   |   |-- forplots_specific_30_100_eps_c_eps_m_CHECK.csv
   |   |   |-- forplots_specific_30_100_impact_fc.csv
   |   |   |-- forplots_specific_30_100_impact_n.csv
   |   |-- impact_epsc_epsm
   |   |   |-- CHECK.py
   |   |   |-- forplots_specific_d_n_epsc_epsm.py
   |   |-- impact_fc
   |   |   |-- forplots_specific_d_n_impact_fc.py
   |   |-- impact_lambda
   |   |   |-- forplots_specific_d_n_plevel_34_43_23.py
   |   |   |-- forplots_specific_d_n_plevel_54_37_9.py
   |   |-- impact_n
   |   |   |-- forplots_specific_d_n_impact_n.py
   |   |-- run_plevel_34_43_23.py
   |   |-- run_plevel_54_37_9.py
   |-- tests
   |   |-- test_utils.py

About

Code for Personalized Differential Privacy for Ridge Regression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published