---
title: Sanitization Process for Data Privacy
authors:
- Juan Zinser 
tags:
- distributed
- data
- privacy
created_at: 2018-01-30
updated_at: 2018-01-30
tldr: This is short description of the content and findings of the post.
---

As good as data can be now a days, it must satisfy several requirements for it to be made public. A natural trade-off arises between privacy and utility of a dataset. On one side, depending on regulations and the data-holder carefulness, data satisfies certain privacy concerns to prevent sensitive information from being revealed. On the other side, for inference and conclusions to be taken from a dataset, data should be available for people whose interest is to analyze it. These analysis rely on data quality, and the more, the better for it's users. Work has been done to make sure data follows the corresponding privacy constraints, by generalization, suppression or sanitization techniques, with the aim of making data less informative (more private). The purpose of this work is to explore new ways of sanitization for databases and to measure their performance.

Motivation
Why is it important to make data public?

Data privacy exists because making data public is important, and it has to be taken care of before making it public. Open-Data is a term that refers to the action of Public or Private Institutions making their Data Public, and usually it helps to improve public policy and public services. An open data culture enhaces collaboration, participation and social innovation European Data Portal and Janssen, Marijin et al.

## Analyse the Supervised Set

In [1]:
from dask import dataframe, delayed
from dask.distributed import Client
import os
import math
import pandas as pd
import numpy as np
from sanitization_tools import *
income_dataset_path = "/home/juanzinser/Workspace/Tesis/data/kaggle/WA_Fn-UseC_-Telco-Customer-Churn.csv"
#client = Client("0.0.0.0:8786")
client = Client()
# read data


# Areas of opportunitty
+ get the different modes
+ train the models with hyperparameter tunning and correct dataset splitting (cross validation)

# Experiemntal design
+ Trials will be performed N times for each set of parameters and for each dataset.
+ For each data set is needed the `x` and `y` columns.

In [2]:
cases = []
for pr in range(1,11): # si lo llevamos hasta 16 cubrimos de forma correcta otro par de columnas
        cases += [[pr, False, True, True, False],
                  [pr, True, True, True, False],
                  [pr, False, True, True, True],
                  [pr, True, False, False, False],
                  [pr, False, False, False, False],
                  [pr, False, False, False, True]]


In [3]:
from sklearn import preprocessing, metrics, linear_model, metrics, svm, naive_bayes, tree

model_dict = dict()
model_dict["linear_regression"] = linear_model.LinearRegression()
model_dict["svm"] = svm.SVC(gamma=0.001, C=100.)
model_dict["naive_bayes"] = naive_bayes.GaussianNB()
model_dict["tree"] = tree.DecisionTreeRegressor()

processed_cases = list()
case_model_scores = dict()
reco_list = list()
for case in cases:
    
    privacy, include_real, uniform, uniform2, maybe = case[0], case[1], case[2], case[3], case[4]
    case_name = str(privacy)+("m" if maybe else "t" if include_real else "f") + ("t" if uniform else "f")+("t" if uniform2 else "f")
    
    if case_name not in processed_cases:
        print(case_name)
        for rand_num in range(10):
            print(rand_num)
            case_name_rand = case_name +"_"+ str(rand_num)
            data = dataframe.read_csv(income_dataset_path)
            data_cols = data.columns
            y_col = "Churn"
            data_y = data[y_col].compute()

            # selects categorical data
            cat_columns = list(set(data.select_dtypes(["bool_", "object_","flexible"],["number"]).columns).difference({y_col}))
            std_cols = data.select_dtypes(["number"]).columns

            meta_info = get_meta_info(data, cat_columns, privacy, include_real, uniform, uniform2, maybe, client)
            asd = {}
            rmse_dict = {}
            for col in cat_columns:
                if len(meta_info["columns"][col]["counter"])<50:
                    asd[col] = client.gather(client.compute(data[col].map(lambda x: entry_sanitization(entry=x, **meta_info["algorithm"], **meta_info["columns"][col]), meta=('x', float))))
                    rmse_dict[col] = sum([np.power(x-y,2) for x,y in zip(meta_info["columns"][col]["counter"].values(), asd[col].sum())])
            nis = pd.DataFrame.from_dict(rmse_dict, orient="index").reset_index()
            nis.columns = ["class", "rmse"]
            nis["case"] = case_name_rand
            reco_list.append(nis)
                
            dataa =  pd.concat([pd.DataFrame([i for i in v.values], columns=[k+"/"+i for i in meta_info["columns"][k]["key_to_order"].keys()]) for k, v in asd.items()], axis=1)
            dataa["y"] = (data_y==data_y.unique()[0]).astype(int)

            std_scaler = preprocessing.StandardScaler()
            for col in std_cols:
                dataa[col] = std_scaler.fit_transform(pd.DataFrame({col:data[col].compute().values}))

            # apply a suppervised algorithm
            case_model_scores[case_name_rand] = dict()
            for model_name, model in model_dict.items():
                case_model_scores[case_name_rand][model_name] = get_auc_score_of_model(dataa, model)
        processed_cases.append(case_name)


1ftt
0


  ("('getitem-e8438b2ecbb5076214cd5e9ac820a7be', 0)" ... d8060d0>, None)
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and 
keep data on workers

    future = client.submit(func, big_data)    # bad

    big_future = client.scatter(big_data)     # good
    future = client.submit(func, big_future)  # good
  % (format_bytes(len(b)), s))


KilledWorker: ("('map-113a9e2704c04b820349d508a69fdaf4', 0)", 'tcp://127.0.0.1:40473')

In [6]:
asd

{}

In [9]:
reco_df = pd.concat(reco_list)
reco_df.to_csv("supervised_rmse_df_churn.csv")

# since the RMSE matters independently of the supervised taggs it is better to analyse the 
# RMSE in the non supervised case since there is more control of the number of classes.
# construct a dataframe from the scores dictionary
df_models_scores = pd.DataFrame.from_dict(case_model_scores, orient="index").reset_index().rename(columns={"index":"case"})
df_models_scores = df_models_scores.melt(id_vars=df_models_scores.columns[0], value_vars=df_models_scores.columns[1:], value_name="models")

df_models_scores["error"] = df_models_scores["models"].map(lambda x: x[0])
df_models_scores["auc"] = df_models_scores["models"].map(lambda x: x[1])
df_models_scores["roc"] = df_models_scores["models"].map(lambda x: x[2])
df_models = df_models_scores[["case", "variable", "error", "auc", "roc"]]
df_models.columns = [["case", "model", "error", "auc", "roc"]]
df_models.to_csv("model_scores_roc_churn.csv")

In [None]:
from sanitization_tools import *
supervised_results = pn.read_csv("model_scores_roc.csv")
rocs_by_case(supervised_results, {},{"real":["t","f","m"]}, savefig=True, title="by IF REAL", save_name="income_roc_privacy_grouped_tmf",language="spanish")
rocs_by_case(supervised_results, {"uniform":1, "uniform2":1},{"real":["t","f","m"]}, savefig=True, title="by Privacy Uniform Weights", save_name="income_roc_privacy_grouped_tmf_1", language="spanish")
rocs_by_case(supervised_results, {"uniform":0, "uniform2":0},{"real":["t","f","m"]}, savefig=True, title="by Privacy Proportional Weights", save_name="income_roc_privacy_grouped_tmf_0", language="spanish")
rocs_by_case(supervised_results, {"real":"t", "uniform":1, "uniform2":1},{"privacy":[i for i in range(1,11)]}, savefig=True, title="by Privacy Real Unifrom", save_name="income_roc_privacyt1", language="spanish")
rocs_by_case(supervised_results, {"real":"t", "uniform":0, "uniform2":0},{"privacy":[i for i in range(1,11)]}, savefig=True, title="by Privacy Real Proportional", save_name="income_roc_privacyt0", language="spanish")
rocs_by_case(supervised_results, {"real":"f", "uniform":1, "uniform2":1},{"privacy":[i for i in range(1,11)]}, savefig=True, title="by Privacy NonReal Uniform", save_name="income_roc_privacyf1", language="spanish")
rocs_by_case(supervised_results, {"real":"f", "uniform":0, "uniform2":0},{"privacy":[i for i in range(1,11)]}, savefig=True, title="by Privacy NonReal Proportional", save_name="income_roc_privacyf0", language="spanish")
rocs_by_case(supervised_results, {"real":"m", "uniform":1, "uniform2":1},{"privacy":[i for i in range(1,11)]}, savefig=True, title="by Privacy MaybeReal Uniform", save_name="income_roc_privacym1", language="spanish")
rocs_by_case(supervised_results, {"real":"m", "uniform":0, "uniform2":0},{"privacy":[i for i in range(1,11)]}, savefig=True, title="by Privacy MaybeReal Proportional", save_name="income_roc_privacym0", language="spanish")

print(supervised_results.columns)

In [None]:
plot_intervals(supervised_results, "privacy","auc", {}, 
               {"real":["t", "f", "m"]}, savefig=True, 
               title="AUC Privacy by Real", save_name="auc_real_privacy")

plot_intervals(supervised_results, "privacy","auc", {}, 
               {"uniform":[0,1]}, savefig=True, 
               title="AUC Privacy by Uniform", save_name="auc_uniform_privacy")

In [None]:
from sanitization_tools import *
supervised_results = pn.read_csv("model_scores_roc.csv")

rocs_by_case(supervised_results, {"model": "tree"},
                {"privacy":[i for i in range(1,11)]}, savefig=False, title="by Privacy TREE")
rocs_by_case(supervised_results, {"model": "svm"},
                {"privacy":[i for i in range(1,11)]}, savefig=False, title="by Privacy SVM")
rocs_by_case(supervised_results, {"model": "linear_regression"},
                {"privacy":[i for i in range(1,11)]}, savefig=False, title="by Privacy Linear Regression")
rocs_by_case(supervised_results, {"model": "naive_bayes"},
                {"privacy":[i for i in range(1,11)]}, savefig=False, title="by Privacy Naive Bayes")

In [None]:
rmse_auc_plot_no_intervals(supervised_results, "privacy", "auc", ["t", "f", "m"], [None], [None], [None], [None],
                           {("uniform","uniform2"):[(1,1),(0,0)]}, savefig=True, 
                           title="Supervised AUC for Real and Uniform", save_name="supervised_auc_gb_tmf_01")

rmse_auc_plot_no_intervals(supervised_results, "privacy", "auc", [None], [None], [None], [None], ["svm", "linear_regression", "tree", "naive_bayes"],
                           savefig=True, title="Supervised AUC for Real and Uniform", save_name="supervised_auc_models")

### AUC Table

In [None]:
auc_sum = supervised_results.groupby(["privacy", "real"])["auc"].mean().reset_index()
auc_pivsum = auc_sum.pivot(index="privacy", columns="real", values="auc" ).round(2)
print(auc_pivsum.to_latex())

auc_sum = supervised_results[(supervised_results.uniform==1) & (supervised_results.uniform2==1)].groupby(["privacy", "real"])["auc"].mean().reset_index()
auc_pivsum = auc_sum.pivot(index="privacy", columns="real", values="auc" ).round(2)
print(auc_pivsum.to_latex())

auc_sum = supervised_results[(supervised_results.uniform==0) & (supervised_results.uniform2==0)].groupby(["privacy", "real"])["auc"].mean().reset_index()
auc_pivsum = auc_sum.pivot(index="privacy", columns="real", values="auc" ).round(2)
print(auc_pivsum.to_latex())

## Model Comparison

In [None]:
plot_intervals(supervised_results, "privacy","auc", {}, 
               {"model":["svm", "linear_regression", "tree", "naive_bayes"]}, savefig=True, 
               title="AUC Privacy by Model", save_name="auc_model_privacy")

plot_intervals(supervised_results, "privacy","auc", {"uniform": [0], "uniform2":[0], "uniform_original":[0]}, 
               {"model":["svm", "linear_regression", "tree", "naive_bayes"]}, savefig=False, 
               title="Non Uniform and Exponential Original", save_name="auc_nclasses00")

In [None]:
plot_intervals(supervised_results, "real", "auc", {}, 
               {"model":["svm", "linear_regression", "tree", "naive_bayes"]}, savefig=False, 
               title="Non Uniform and Exponential Original", save_name="auc_isreal")

### Bar plots

In [None]:
from sanitization_tools import *
supervised_results = pn.read_csv("model_scores_roc.csv")

df = supervised_results
gb_param = "real"
yaxis = "auc"
base_filter = {}
lines_cases = {"model":["svm", "linear_regression", "tree", "naive_bayes"]}
savefig=True
title="Models by if 'real' is included"
save_name="include_real_model_wr"
language="english"
width_delta=.1

fig, ax = plt.subplots()
pt = base_filter.get("privacy")
if pt is not None:
    base_filter.pop("privacy")
    df = df.query("privacy < {pt}".format(pt=pt))
if "uniform" in gb_param:
    df = df[df.uniform == df.uniform2]

df = get_base_filtered_df(df, base_filter)
dfc = get_single_filter_df(df, "privacy", 1)
dfc = get_single_filter_df(dfc, "uniform", 1)
dfc = get_single_filter_df(dfc, "real", "t")
gb0 = dfc.groupby(["model"])[yaxis].mean()
print(gb0)
df = get_base_filtered_df(df, base_filter)
ps = list()
labels = list()
width = 0
scatter_x = list()
scatter_y = list()
if len(lines_cases)>0:
    for k, v in lines_cases.items():
        v = [v] if not isinstance(v, list) else v
        for v0 in v:
            dfc = get_single_filter_df(df, k, v0)

            gb = dfc.groupby([gb_param])[yaxis].mean().reset_index()
            gb2 = dfc.groupby([gb_param])[yaxis].std().reset_index()

            x = gb[gb_param].unique()
            print(gb)
            print(x)
            ind = np.arange(len(x))
            curr_p = ax.bar(ind + width, gb[yaxis], width_delta, color=np.random.rand(3,),
                            bottom=0, yerr=gb2[yaxis])
            scatter_y.extend([gb0.loc[v0]]*3)
            scatter_x.extend(ind+width)
            ps.append(curr_p)
            param_dict = {k: v0}
            tt = get_label_name(param_dict, True, language)
            labels.append(tt)
            width += width_delta
else:
    gb = df.groupby([gb_param])[yaxis].mean().reset_index()
    gb2 = df.groupby([gb_param])[yaxis].std().reset_index()

    x = gb[gb_param].unique()
    ind = np.arange(len(x))
    curr_p = ax.bar(ind+width, gb[yaxis], width_delta, color=np.random.rand(3,),
                    bottom=0, yerr=gb2[yaxis])
    ps.append(curr_p)
    tt = get_label_name(base_filter, True, language)
    labels.append(tt)
    width += width_delta

plt.scatter(scatter_x, scatter_y, color="k", s=100)
ax.set_title(title)
ax.set_xticks(ind + width_delta / 2)
ax.set_ylabel(yaxis)
x = label_rename(x, language)
ax.set_xticklabels(x, rotation=45, ha="right")
ax.legend([list(p)[0] for p in ps], labels)

dict_use = english_dict if language == "english" else spanish_dict
gb_param = dict_use.get(gb_param.lower()) if dict_use.get(gb_param.lower()) else gb_param
yaxis = dict_use.get(yaxis.lower()) if dict_use.get(yaxis.lower()) else yaxis
ax.set_xlabel(gb_param.upper())
ax.set_ylabel(yaxis.upper())
plt.tight_layout()
if savefig:
    plt.savefig(figures_path + save_name + ".png")
plt.show()

plot_bars(supervised_results, "real", "auc", {}, 
               {"model":["svm", "linear_regression", "tree", "naive_bayes"]}, savefig=True, 
               title="Models by if 'real' is included", save_name="include_real_model", width_delta=.1)

plot_bars(supervised_results, "uniform", "auc", {}, 
               {"model":["svm", "linear_regression", "tree", "naive_bayes"]}, savefig=True, 
               title="Models by if sanitization distribution", save_name="uniform_model", width_delta=.1)

In [None]:
from sanitization_tools import *
supervised_results = pn.read_csv("model_scores_roc.csv")

df = supervised_results
gb_param = "real"
yaxis = "auc"
base_filter = {}
lines_cases = {"real":["t","f","m"]}
savefig=True
title="Is  Real"
save_name="privacy_is_real"
width_delta=.1


fig, ax = plt.subplots()
pt = base_filter.get("privacy")
if pt is not None:
    base_filter.pop("privacy")
    df = df.query("privacy < {pt}".format(pt=pt))
df = df[df.uniform == df.uniform2]
df = get_base_filtered_df(df, base_filter)
ps = list()
labels = list()
width = 0
xticks = list()
xticks_locs = list()
colors = {"t":"b","f":"r","m":"g"}
if len(lines_cases)>0:
    for k, v in lines_cases.items():
        v = [v] if not isinstance(v, list) else v
        for v0 in v:
            dfc = get_single_filter_df(df, k, v0)

            gb = dfc.groupby([gb_param])[yaxis].mean().reset_index()
            gb2 = dfc.groupby([gb_param])[yaxis].std().reset_index()

            x = gb[gb_param].unique()
            xticks.extend(x)
            ind = np.arange(len(x))
            xticks_locs.append(ind+width)
            curr_p = ax.bar(ind + width, gb[yaxis], width_delta, color=colors[x[0]],
                            bottom=0, yerr=gb2[yaxis])
            ps.append(curr_p)
            param_dict = {k: v0}
            tt = get_label_name(param_dict, True, "spanish")
            labels.append(tt)
            width += width_delta

ax.set_title(title)
#ax.set_xticks(ind + width_delta / 2)
ax.set_xticks(xticks_locs)
ax.set_ylabel(yaxis)
xticks = label_rename(xticks, "spanish")
ax.set_xticklabels(xticks, rotation = 45, ha="right")
ax.legend([p[0] for p in ps], labels)

ax.set_xlabel(gb_param.upper())
ax.set_ylabel(yaxis.upper())
plt.tight_layout()
if savefig:
    plt.savefig(figures_path + save_name + ".png")
plt.show()
    

    
plot_bars_single_chunk(df = supervised_results, gb_param = "privacy",yaxis = "auc", base_filter = {"real":"t"}, 
                       lines_cases = {"privacy":[str(i) for i in range(1,11)]}, savefig=True, title="AUC by Privacy, Include Real", 
                       save_name="privacy_auc_bar_t", width_delta=.1, language="english")
plot_bars_single_chunk(df = supervised_results, gb_param = "privacy",yaxis = "auc", base_filter = {"real":"f"}, 
                       lines_cases = {"privacy":[str(i) for i in range(1,11)]}, savefig=True, title="AUC by Privacy, No Real", 
                       save_name="privacy_auc_bar_f", width_delta=.1, language="english")
plot_bars_single_chunk(df = supervised_results, gb_param = "privacy",yaxis = "auc", base_filter = {"real":"t", "uniform":0}, 
                       lines_cases = {"privacy":[str(i) for i in range(1,11)]}, savefig=True, title="AUC by Privacy, Include Real \n Proportional", 
                       save_name="privacy_auc_bar_t0", width_delta=.1, language="english")
plot_bars_single_chunk(df = supervised_results, gb_param = "privacy",yaxis = "auc", base_filter = {"real":"t", "uniform":1}, 
                       lines_cases = {"privacy":[str(i) for i in range(1,11)]}, savefig=True, title="AUC by Privacy, Include Real \n Uniform", 
                       save_name="privacy_auc_bar_t1", width_delta=.1, language="english")
plot_bars_single_chunk(df = supervised_results, gb_param = "privacy",yaxis = "auc", base_filter = {"real":"f", "uniform":0}, 
                       lines_cases = {"privacy":[str(i) for i in range(1,11)]}, savefig=True, title="AUC by Privacy, Include Real \n Proportional", 
                       save_name="privacy_auc_bar_f0", width_delta=.1, language="english")
plot_bars_single_chunk(df = supervised_results, gb_param = "privacy",yaxis = "auc", base_filter = {"real":"f", "uniform":1}, 
                       lines_cases = {"privacy":[str(i) for i in range(1,11)]}, savefig=True, title="AUC by Privacy, Include Real \n Uniform", 
                       save_name="privacy_auc_bar_f1", width_delta=.1, language="english")


## Analyse the Non-Supervised Set

In [None]:
from sanitization_tools import *
non_supervised_results = pn.read_csv("rmse_df_simulated_rel.csv")

rmse_auc_plot_no_intervals(non_supervised_results, "privacy", "rmse", 
                           ["t", "m", "f"], [None], [None], [0,1], [None],
                            {("uniform","uniform2"):[(1,1),(0,0)]}, savefig=True, 
                           title="All Cases Privacy (R,U,UO)", save_name="rmse_privacy")
rmse_auc_plot_no_intervals(non_supervised_results, "nclasses", "rmse", 
                           ["t", "m", "f"], [None], [None], [0,1], [None],
                            {("uniform","uniform2"):[(1,1),(0,0)]}, savefig=True, 
                           title="All Cases Number Classes (R,U,UO)", save_name="rmse_nclasses")
rmse_auc_plot_no_intervals(non_supervised_results, "privacy", "chi", 
                           ["t", "m", "f"], [None], [None], [0,1], [None],
                            {("uniform","uniform2"):[(1,1),(0,0)]}, savefig=True, 
                           title="All Cases Privacy (R,U,UO)", save_name="privacy")
rmse_auc_plot_no_intervals(non_supervised_results, "nclasses", "chi", 
                           ["t", "m", "f"], [None], [None], [0,1], [None],
                            {("uniform","uniform2"):[(1,1),(0,0)]}, savefig=True, 
                           title="All Cases Number Classes (R,U,UO)", save_name="nclasses")



In [None]:
from sanitization_tools import *
non_supervised_results = pn.read_csv("rmse_df_simulated_rel.csv")

rmse_auc_plot_with_intervals(non_supervised_results, "privacy", "chi", 
                           ["t", "m", "f"], [None], [None], [0,1], [None],
                            {("uniform","uniform2"):[(1,1),(0,0)]}, savefig=True, 
                           title="All Cases Privacy (R,U,UO)", save_name="privacy")
rmse_auc_plot_with_intervals(non_supervised_results, "nclasses", "chi", 
                           ["t", "m", "f"], [None], [None], [0,1], [None],
                            {("uniform","uniform2"):[(1,1),(0,0)]}, savefig=True, 
                           title="All Cases Number Classes (R,U,UO)", save_name="nclasses")


In [None]:
from sanitization_tools import *
non_supervised_results = pn.read_csv("rmse_df_simulated_rel.csv")

plot_intervals(non_supervised_results, "nclasses","rmse", {"uniform": [0], "uniform2":[0], "uniform_original":[0]}, 
               {"real":["t", "m", "f"]}, savefig=True, title="Non Uniform and Exponential Original", save_name="rmse_nclasses00")
plot_intervals(non_supervised_results, "nclasses","rmse",{"uniform": [0], "uniform2":[0], "uniform_original":[1]},
               {"real":["t", "m", "f"]}, savefig=True, title="Non Uniform and Uniform Original", save_name="rmse_nclasses01")
plot_intervals(non_supervised_results, "nclasses","rmse", {"uniform": [1], "uniform2":[1], "uniform_original":[0]},
                {"real":["t", "m", "f"]}, savefig=True, title="Uniform and Exponential Original", save_name="rmse_nclasses10")
plot_intervals(non_supervised_results, "nclasses","rmse", {"uniform": [1], "uniform2":[1], "uniform_original":[1]},  {"real":["t", "m", "f"]},
               savefig=True, title="Uniform and Uniform Original", save_name="rmse_nclasses11")


plot_intervals(non_supervised_results, "privacy","rmse", {"uniform": [0], "uniform2":[0], "uniform_original":[0]},
                {"real":["t", "m", "f"]}, savefig=True, title="Non Uniform and Exponential Original", save_name="rmse_privacy00")
plot_intervals(non_supervised_results, "privacy","rmse",{"uniform": [0], "uniform2":[0], "uniform_original":[1]},
                {"real":["t", "m", "f"]}, savefig=True, title="Non Uniform and Uniform Original", save_name="rmse_privacy01")
plot_intervals(non_supervised_results, "privacy","rmse", {"uniform": [1], "uniform2":[1], "uniform_original":[0]},
                {"real":["t", "m", "f"]}, savefig=True, title="Uniform and Exponential Original", save_name="rmse_privacy10")
plot_intervals(non_supervised_results, "privacy","rmse", {"uniform": [1], "uniform2":[1], "uniform_original":[1]}, 
                {"real":["t", "m", "f"]},savefig=True, title="Uniform and Uniform Original", save_name="rmse_privacy11")

In [None]:
from sanitization_tools import *
non_supervised_results = pn.read_csv("rmse_df_simulated_rel.csv")

plot_intervals(non_supervised_results, "nclasses","chi", {"uniform": [0], "uniform2":[0], "uniform_original":[0]}, 
               {"real":["t", "m", "f"]}, savefig=True, title="Non Uniform and Exponential Original", save_name="nclasses00")
plot_intervals(non_supervised_results, "nclasses","chi",{"uniform": [0], "uniform2":[0], "uniform_original":[1]},
               {"real":["t", "m", "f"]}, savefig=True, title="Non Uniform and Uniform Original", save_name="nclasses01")
plot_intervals(non_supervised_results, "nclasses","chi", {"uniform": [1], "uniform2":[1], "uniform_original":[0]},
                {"real":["t", "m", "f"]}, savefig=True, title="Uniform and Exponential Original", save_name="nclasses10")
plot_intervals(non_supervised_results, "nclasses","chi", {"uniform": [1], "uniform2":[1], "uniform_original":[1]},  {"real":["t", "m", "f"]},
               savefig=True, title="Uniform and Uniform Original", save_name="nclasses11")


plot_intervals(non_supervised_results, "privacy","chi", {"uniform": [0], "uniform2":[0], "uniform_original":[0]},
                {"real":["t", "m", "f"]}, savefig=True, title="Non Uniform and Exponential Original", save_name="privacy00")
plot_intervals(non_supervised_results, "privacy","chi",{"uniform": [0], "uniform2":[0], "uniform_original":[1]},
                {"real":["t", "m", "f"]}, savefig=True, title="Non Uniform and Uniform Original", save_name="privacy01")
plot_intervals(non_supervised_results, "privacy","chi", {"uniform": [1], "uniform2":[1], "uniform_original":[0]},
                {"real":["t", "m", "f"]}, savefig=True, title="Uniform and Exponential Original", save_name="privacy10")
plot_intervals(non_supervised_results, "privacy","chi", {"uniform": [1], "uniform2":[1], "uniform_original":[1]}, 
                {"real":["t", "m", "f"]},savefig=True, title="Uniform and Uniform Original", save_name="privacy11")

In [None]:
plot_intervals_std(non_supervised_results, "privacy","chi", {},
                {"real":["t", "m", "f"]}, savefig=True, title="Privacy Considering if is Real", save_name="privacy_isreal")
plot_intervals_std(non_supervised_results, "privacy","chi", {},
                {"uniform":[0,1]}, savefig=True, title="Privacy Considering if Uniform", save_name="privacy_uniform")
plot_intervals_std(non_supervised_results, "privacy","chi", {"uniform_original":[1]},
                {"uniform":[0,1]}, savefig=True, title="Privacy Considering if is Sanitization Distribution \n and Uniform Original", save_name="privacy_uniform_original1")
plot_intervals_std(non_supervised_results, "privacy","chi", {"uniform_original":[0]},
                {"uniform":[0,1]}, savefig=True, title="Privacy Considering if is Sanitization Distribution \n and Exponential Original", save_name="privacy_uniform_original0")

plot_intervals_std(non_supervised_results, "nclasses","chi", {},
                {"real":["t", "m", "f"]}, savefig=True, title="Number of Classes Considering if is Real", save_name="nclasses_isreal")
plot_intervals_std(non_supervised_results, "nclasses","chi", {},
                {"uniform":[0,1]}, savefig=True, title="Number of Classes Considering if Uniform", save_name="nclasses_uniform")
plot_intervals_std(non_supervised_results, "nclasses","chi", {"uniform_original":[1]},
                {"uniform":[0,1]}, savefig=True, title="Number of Classes Considering if is Sanitization Distribution \n and Uniform Original", save_name="nclassses_uniform_original1")
plot_intervals_std(non_supervised_results, "nclasses","chi", {"uniform_original":[0]},
                {"uniform":[0,1]}, savefig=True, title="Number of Classes Considering if is Original Distribution \n and Non Exponential Original", save_name="nclasses_uniform_original0")
