# Generating and accessing DP Release of mean

This notebook is a brief introduction to how a Release generates.
A Release contains the output of all nodes of the executing graph. It could be a batch of statistics (Mean, Histogram, Covariance, etc).  It also includes literal passed in such as column names, bounds, and the n to resize a dataset.

### Step1- setting up required libraries

In [1]:
# load libraries
import os
import sys
import numpy as np
import opendp.whitenoise_core as wn

### Step2- establishing data information


In [6]:
data_path = os.path.abspath('/Users/saniyavahedian/Documents/GitHub/Odp/whitenoise-samples/analysis/data/PUMS_california_demographics_1000/data.csv')
print(os.path.abspath(data_path))
var_names = ["age", "sex", "educ", "race", "income", "married", "pid"]

/Users/saniyavahedian/Documents/GitHub/Odp/whitenoise-samples/analysis/data/PUMS_california_demographics_1000/data.csv


### Step3- loading a dataset and computing differentially private mean of age

After runtime in WhiteNoise package runs, there are 2 different ways to load the dataset from a path for computing differently private mean/statistics.

#### First one:

In [7]:
analysis = wn.Analysis()
analysis.enter()



# load data
data = wn.Dataset(path = data_path, column_names = var_names)
# get mean of age
age_mean = wn.dp_mean(data = wn.cast(data['age'], type="FLOAT"),
                          privacy_usage = {'epsilon': .65},
                          data_lower = 0.,
                          data_upper = 100.,
                          data_n = 1000
                         )
analysis.exit()

 #### Second one:
 

In [8]:

with wn.Analysis() as analysis:
    # load data
    data = wn.Dataset(path = data_path, column_names = var_names)

    # get mean of age
    age_mean = wn.dp_mean(data = wn.cast(data['age'], type="FLOAT"),
                          privacy_usage = {'epsilon': .65},
                          data_lower = 0.,
                          data_upper = 100.,
                          data_n = 1000
                         )

### Step4- calling analysis.release() to access the value of dp mean and other components

####  Note: After calling analysis.release(), the value of the lower and upper bounds collapse to the DP mean value. This reduces the sensitivity of later calculation because this value comes from DP. 

In [9]:
analysis.release()

print("DP mean of age: {0}".format(age_mean.value))
print("Desired lower bound:{0}".format(age_mean.lower))
print("Desired upper bound:{0}".format(age_mean.upper))
print("Privacy usage: {0}\n\n".format(analysis.privacy_usage))
# calling get_accuracy to customize alpha by a user
print("accuracy, alpha:",age_mean.get_accuracy(.05))

DP mean of age: 45.082360349849445
Desired lower bound:45.082360349849445
Desired upper bound:45.082360349849445
Privacy usage: approximate {
  epsilon: 0.65
}



accuracy, alpha: 0.46088188823907555


### Step5- calling analysis.report() to get the JSON Release





In [10]:
print("JSON Release:")
analysis.report()

JSON Release:


[{'description': 'DP release information',
  'variables': 'age',
  'statistic': 'DPMean',
  'releaseInfo': 45.082360349849445,
  'privacyLoss': {'delta': 0.0, 'epsilon': 0.65, 'name': 'approximate'},
  'accuracy': None,
  'batch': 0,
  'nodeID': 11,
  'postprocess': False,
  'algorithmInfo': {'mechanism': 'Laplace',
   'name': '',
   'cite': '',
   'argument': {'constraint': {'lowerbound': 0.0, 'upperbound': 100.0},
    'n': 1000}}}]

We will have 'name' and 'cite' which are the actual name of the algorithm and the paper, respectively in the future release.