# Day 34: Privacy-Preserving AI (Differential Privacy)

In this lab, we will explore **Differential Privacy (DP)** using the **Laplace Mechanism**.
We will see how adding noise protects individual data points but preserves the overall trend, and how the `epsilon` parameter controls this trade-off.

In [None]:
import sys
import os
import numpy as np
import matplotlib.pyplot as plt

# Add root directory to sys.path
sys.path.append(os.path.abspath('../../'))

from src.privacy.differential_privacy import LaplaceMechanism

## 1. Sensitive Dataset

Imagine a database of 100 people's salaries. We want to know the average salary.

In [None]:
np.random.seed(42)
salaries = np.random.normal(50000, 10000, 100)
true_mean = np.mean(salaries)

print(f"True Mean Salary: ${true_mean:.2f}")

## 2. Apply DP Noise

We will query the mean. 
Sensitivity for mean query = (Max possible value - Min possible value) / N.
Let's assume salaries are between 0 and 100,000.
Sensitivity = 100,000 / 100 = 1,000.

In [None]:
mech = LaplaceMechanism()
sensitivity = 1000.0

# Try different Epsilons
epsilons = [0.1, 1.0, 10.0]
results = {}

for eps in epsilons:
    # Run 1000 times to see distribution of answers
    noisy_answers = [mech.add_noise(true_mean, sensitivity, eps) for _ in range(1000)]
    results[eps] = noisy_answers

## 3. Visualize Trade-off

Low Epsilon (0.1) = High Privacy, High Noise.
High Epsilon (10.0) = Low Privacy, Low Noise.

In [None]:
plt.figure(figsize=(10, 6))

for eps, data in results.items():
    plt.hist(data, bins=50, alpha=0.5, label=f"Epsilon={eps}")

plt.axvline(true_mean, color='red', linestyle='dashed', linewidth=2, label='True Mean')
plt.title("Distribution of Noisy Answers (DP Mean Query)")
plt.xlabel("Reported Salary Mean")
plt.ylabel("Frequency")
plt.legend()
plt.show()