# Differentially Private PCA

This notebook documents making a differentially private PCA release.

----
Any constructors that have not completed the proof-writing and vetting process may still be accessed if you opt-in to "contrib".
Please contact us if you are interested in proof-writing. Thank you!

In [7]:
from opendp.mod import enable_features
enable_features("contrib", "floating-point", "honest-but-curious")

In [8]:
import numpy as np

def sample_microdata(*, num_columns=None, num_rows=None, cov=None):
    cov = cov or sample_covariance(num_columns)
    microdata = np.random.multivariate_normal(
        np.zeros(cov.shape[0]), cov, size=num_rows or 100_000
    )
    microdata -= microdata.mean(axis=0)
    return microdata

def sample_covariance(num_features):
    A = np.random.uniform(0, num_features, size=(num_features, num_features))
    return A.T @ A

In this notebook we'll be working with an example dataset generated from a random covariance matrix.

In [9]:
num_columns = 4
num_rows = 10_000
example_dataset = sample_microdata(num_columns=num_columns, num_rows=num_rows)

Releasing a DP PCA model with the OpenDP Library is easy because it provides an API similar to scikit-learn:

In [10]:
import opendp.prelude as dp

model = dp.PCA(
    epsilon=1.,
    row_norm=1.,
    n_samples=num_rows,
    n_features=4,
)

model.fit(example_dataset)
print(model)

print("singular values", model.singular_values_)
print("components", model.components_)

PCA(epsilon=1.0, n_components=4, n_features=4, n_samples=10000, row_norm=1.0)
singular values [ 9.23344466 35.24114393 46.55453464 80.50287029]
components [[ 0.57418577  0.51788     0.45447691  0.44222363]
 [ 0.64684141  0.15868187 -0.66020823 -0.34719064]
 [-0.3925399   0.4567303  -0.56918622  0.55976504]
 [ 0.3127608  -0.70570376 -0.18331087  0.60873641]]


You can retrieve the measurement used to make the release, in a similar manner as other OpenDP APIs.

In [11]:
model.get_measurement()

<opendp.mod.Measurement at 0x177de7ad0>