# 📚 Differential Privacy in Reinforcement Learning

Built by **Stu** 🚀

## Section 1: Basics of Reinforcement Learning + Privacy

### Exercise 1: Define Reinforcement Learning

In [1]:
rl_definition = "An agent learns to take actions in an environment to maximize cumulative rewards."

### Exercise 2: Sketch Why DP is Needed in RL

In [2]:
dp_in_rl_sketch = "Sensitive state/action logs could reveal user behavior; DP protects logged or training data."

## Section 2: Simulate RL Environment

### Exercise 3: Create Simple State-Reward Trajectories

In [3]:
np.random.seed(42)
states = np.random.randint(0, 20, size=100)
rewards = np.random.normal(1, 0.5, size=100)
states[:10], rewards[:10]

### Exercise 4: Compute Cumulative Reward

In [4]:
cumulative_reward = np.sum(rewards)
cumulative_reward

### Exercise 5: Add Laplace Noise to Cumulative Reward

In [5]:
def add_laplace_noise(value, sensitivity=1.0, epsilon=1.0):
    scale = sensitivity / epsilon
    return value + np.random.laplace(0, scale)

noisy_cumulative_reward = add_laplace_noise(cumulative_reward, sensitivity=5.0, epsilon=1.0)
noisy_cumulative_reward

## Section 3: Private Policy Gradient Sketch

### Exercise 6: Simulate Noisy Policy Gradient Updates

In [6]:
policy_gradients = np.random.normal(0, 1, size=(100, 5))
noisy_policy_gradients = add_laplace_noise(policy_gradients, sensitivity=5.0, epsilon=1.0)
noisy_policy_gradients[:5]

### Exercise 7: Sketch Effect of Noise on RL Training

In [7]:
effect_noise_training = "More noise → harder to learn optimal policy, slower convergence."

## Section 4: Reflections

### Exercise 8: Sketch Real-World RL + DP Applications

In [8]:
real_world_rl_apps = "Private recommendation systems, private robotics control with sensitive sensor data."

### Exercise 9: Future Research Directions

In [9]:
future_rl_research = "Improving private RL sample efficiency, using adaptive noise addition."

### Exercise 10: Reflect on Limitations

In [10]:
rl_limitations = "High sensitivity to noise, unstable training, need for better accounting methods for RL pipelines."