# 📚 Differential Privacy in Travel Recommender Systems

Built by **Stu** 🚀

## Section 1: Basics of Private Recommender Systems

### Exercise 1: Define Recommender System

In [1]:
recommender_definition = "A system that suggests items (hotels, flights) based on user preferences or behavior."

### Exercise 2: Sketch Privacy Risks in Travel Data

In [2]:
travel_privacy_risks = "User clicks/bookings reveal sensitive travel intent, locations, spending habits."

## Section 2: Simulate Toy Travel Booking Data

### Exercise 3: Create Tiny Booking Dataset

In [3]:
import numpy as np
import pandas as pd
np.random.seed(42)
n_samples = 500
destinations = np.random.choice(['Paris', 'Tokyo', 'New York', 'London'], size=n_samples)
prices = np.random.normal(500, 100, size=n_samples)
clicked = (prices < 550).astype(int)  # Users more likely to click cheaper offers
booking_data = pd.DataFrame({'Destination': destinations, 'Price': prices, 'Clicked': clicked})
booking_data.head()

## Section 3: Build Simple Private CTR Model

### Exercise 4: Encode Destination as Features

In [4]:
booking_data = pd.get_dummies(booking_data, columns=['Destination'])
booking_data.head()

### Exercise 5: Train Logistic Regression (No Privacy Yet)

In [5]:
from sklearn.linear_model import LogisticRegression

X = booking_data.drop('Clicked', axis=1)
y = booking_data['Clicked']

model = LogisticRegression(max_iter=1000)
model.fit(X, y)
model.score(X, y)

### Exercise 6: Add Laplace Noise to Model Coefficients

In [6]:
def add_noise_to_coefficients(coefs, epsilon=1.0):
    noise = np.random.laplace(0, 1/epsilon, size=coefs.shape)
    return coefs + noise

noisy_coefs = add_noise_to_coefficients(model.coef_, epsilon=0.5)
noisy_intercept = add_noise_to_coefficients(model.intercept_, epsilon=0.5)

noisy_model = LogisticRegression()
noisy_model.coef_ = noisy_coefs
noisy_model.intercept_ = noisy_intercept
noisy_model.classes_ = np.array([0,1])
noisy_model.score(X, y)

## Section 4: Privacy vs Accuracy

### Exercise 7: Vary Epsilon and Plot Accuracy

In [7]:
epsilons = [0.1, 0.5, 1.0, 2.0]
accuracies = []
for eps in epsilons:
    noisy_coefs = add_noise_to_coefficients(model.coef_, epsilon=eps)
    noisy_intercept = add_noise_to_coefficients(model.intercept_, epsilon=eps)
    model_temp = LogisticRegression()
    model_temp.coef_ = noisy_coefs
    model_temp.intercept_ = noisy_intercept
    model_temp.classes_ = np.array([0,1])
    accuracies.append(model_temp.score(X, y))

import matplotlib.pyplot as plt
plt.plot(epsilons, accuracies)
plt.xlabel('ε')
plt.ylabel('Accuracy')
plt.title('Privacy vs CTR Model Accuracy')
plt.show()

### Exercise 8: Reflect on Privacy-Accuracy Trade-off

In [8]:
privacy_accuracy_reflection = "Stronger privacy (lower ε) introduces more noise, reducing model predictive performance."

### Exercise 9: Real-world Application Sketch

In [9]:
recommender_real_apps = "Private travel recommendations on user devices; personalized offers without leaking travel plans."