# 📊 Toy Analytics Platform: Local Differential Privacy Simulation

Built by **Stu**

This notebook demonstrates:
- User event simulation
- Applying Local Differential Privacy (Randomized Response)
- Aggregating noisy data
- Estimating true counts
- Studying the impact of ε (epsilon)

## 🏗️ Setup
Let's import libraries and helper functions.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt

from lib.user_simulation import simulate_user_events
from lib.local_dp_utils import apply_randomized_response_array
from lib.aggregator import aggregate_noised_data, estimate_true_counts

## 👥 Simulate Users
We simulate 500 users generating clicks, page views, and keystrokes.

In [ ]:
# Simulate events
num_users = 500
events = simulate_user_events(num_users)

# Show a sample
{k: v[:10] for k, v in events.items()}

## 🛡️ Apply Local Differential Privacy
Each user's event is noised **before** being sent to the server.

We use ε (epsilon) = 1.0 for this simulation.

In [ ]:
# Apply Randomized Response
epsilon = 1.0

noised_events = {
    event: apply_randomized_response_array(values, epsilon)
    for event, values in events.items()
}

# Show a sample
{k: v[:10] for k, v in noised_events.items()}

## 🧮 Aggregate Noised Data
The server aggregates noisy events and tries to estimate true counts.

In [ ]:
# Aggregate noised events
aggregated_noised = aggregate_noised_data(noised_events)
aggregated_noised

In [ ]:
# Estimate true counts
estimated_counts = estimate_true_counts(aggregated_noised, num_users, epsilon)
estimated_counts

## 📊 Compare Estimates to Ground Truth
Let's plot the true vs estimated event counts.

In [ ]:
# Compute true counts
true_counts = {event: np.sum(values) for event, values in events.items()}

# Plot
events_list = list(true_counts.keys())
true_values = [true_counts[event] for event in events_list]
estimated_values = [estimated_counts[event] for event in events_list]

x = np.arange(len(events_list))
width = 0.35

fig, ax = plt.subplots()
rects1 = ax.bar(x - width / 2, true_values, width, label="True")
rects2 = ax.bar(x + width / 2, estimated_values, width, label="Estimated")

ax.set_ylabel("Counts")
ax.set_title("True vs Estimated Counts")
ax.set_xticks(x)
ax.set_xticklabels(events_list)
ax.legend()

plt.grid(True)
plt.show()

## 🧪 Try Changing Epsilon
Experiment:
- Set ε to 0.1 (strong privacy, more noise)
- Set ε to 3.0 (weaker privacy, less noise)
- Observe accuracy vs privacy tradeoff!