[![View On GitLab](https://img.shields.io/badge/Open%20in-GitLab-orange)](https://gitlab.com/your_username/your_repository/-/blob/main/examples/showcase.ipynb)
![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)

# Differential Privacy Testing

Use this as a brief intro..

---

**Table of Contents**

- [Introduction](#intro)
- [DP testing in practice](#dp_testing_in_practice)
  1) [Dataset Generation](#dataset_generation)
  2) [Collecting DP Results](#collecting_dp_results)
  3) [Partitioning Results](#paritioning_results)
  4) [Compute Empirical Epsilons](#compute_empirical_epsilons)
- [Conclusions](#conclusions)
- [Theoretical Background & Refernces](#theoretical_backgroud)

<a id="intro"></a>
## How we test DP results:  

What we do here. 

Mention that there are similar libraries doing so such as the Google's stochastic tester.
The advantage of this one is the simplicity.

<a id="dp_testing_in_practice"></a>
## Experimental settings

In this section we go from theory to practice. To do so we need to:
1) **Dataset Generation**: *Brief description*
2) **Collecting DP Results**: *Brief description*
3) **Partitioning Results**: *Brief description*
4) **Compute Empirical Epsilons**: *Brief description*

<a id="dataset_generation"></a>
### Dataset Generation

*Content Dataset Generation goes here.*

---

In [1]:
# if you don't have a postgres database up and running, run this code to create one.
# %%capture
# # Load the database
# # Inspired by https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb#scrollTo=YUj0878jPyz7
# !sudo apt-get -y -qq update
# !sudo apt-get -y -qq install postgresql-14
# # Start postgresql server
# !sudo sed -i "s/port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf
# !sudo service postgresql start
# # Set password
# !sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'pyqrlew-db'"

In [None]:
from dp_tester.generate_datasets import generate_D_0_dataset, generate_adj_datasets, D_1

generate_D_0_dataset()
generate_adj_datasets(D_1, user_id=0)

<a id="collecting_dp_results"></a>
### Collecting DP Results

*Content Collecting DP Results goes here.*

---

In [None]:
from dp_tester.generate_dp_results import generate_dp_results
from dp_tester.query_executors import SqlAlchemyQueryExecutor
from dp_tester.dp_rewriters import PyqrlewDpRewriter
from dp_tester.table_renamers import PyqrlewTableRenamer
from dp_tester.generate_datasets import D_0
import json

query = "SELECT store_id, SUM(spent) FROM transactions GROUP BY store_id"
epsilon = 1.0
delta = 1e-4
runs = 5000

query_executor = SqlAlchemyQueryExecutor()
dp_rewriter = PyqrlewDpRewriter(engine=query_executor.engine)
table_renamer = PyqrlewTableRenamer(dp_rewriter.dataset)

results = generate_dp_results(
    non_dp_query=query,
    epsilon=epsilon,
    delta=delta,
    runs=runs,
    dp_rewriter=dp_rewriter,
    query_executor=query_executor,
    table_renamer=table_renamer,
    d_0=D_0,
    adjacent_ds=[D_1],
)

# save results if needed
with open("results.json", "w") as outfile:
    json.dump(obj=results, fp=outfile)

<a id="paritioning_results"></a>
### Partitioning Results

*Content Dataset Generation goes here.*
Here we want to associate a result into a single bucket. There are many ways to do so.
We then generate counts.
---

In [None]:
from dp_tester.generate_datasets import N_STORES
from dp_tester.partitioners import QuantityOverGroups
from dp_tester.analyzer import partition_results_to_bucket_ids, counts_from_indexes

NBINS = 20

# read results if saved
with open("results.json") as infile:
    results = json.load(infile)

partitioner = QuantityOverGroups(groups=range(N_STORES))
partitioned_results = partitioner.partition_results(results)
partitioner.generate_buckets(partitioned_results, nbuckets=NBINS)

bucket_ids = partition_results_to_bucket_ids(partitioned_results, partitioner.bucket_id)

d_0_d_1_counts = {}

for group in partitioner.groups:
    buckets_ids_d_0 = bucket_ids[f"{D_0}-{group}"]
    buckets_ids_d_1 = bucket_ids[f"{D_1}-{group}"]

    counts_d_0 = counts_from_indexes(buckets_ids_d_0, NBINS)
    counts_d_1 = counts_from_indexes(buckets_ids_d_1, NBINS)

    d_0_d_1_counts[f"{D_0}-{D_1}-{group}"] = (counts_d_0, counts_d_1)

<a id="compute_empirical_epsilons"></a>
### Compute Empirical Epsilon

*Content Compute Empirical Epsilon goes here.*

---

In [None]:
from dp_tester.analyzer import empirical_epsilon

COUNT_THRESHOLD = 5

empirical_epsilons = {}
for name, (count_d_0, counts_d_1) in d_0_d_1_counts.items():
    empirical_epsilons[name] = empirical_epsilon(
        count_d_0, counts_d_1, delta=delta, counts_threshold=COUNT_THRESHOLD
    )

empirical_epsilons_values = list(empirical_epsilons.values())
max_eps = max(empirical_epsilons_values)

print(f"Epsilon used during the experiment: {epsilon}")
print(f"Max empirical epsilon found: {max_eps}")
print(f"Did the test passed? {max_eps < epsilon}")

<a id="conclusions"></a>
## Conclusions

In this notebook, we covered:

- A summary of what was accomplished.
- Key takeaways and observations.
- Possible next steps or further reading.

Thank you for exploring our library!

<a id="theoretical_backgroud"></a>
## Theoretical Background & References

- Differencial Privacy
- Wilson paper
- Kariutz paper