# 📚 Private Graph Neural Networks (GNN)

Built by **Stu** 🚀

## Section 1: Basics of GNNs + Privacy

### Exercise 1: Define GNN Components

In [1]:
gnn_components = "Nodes (entities), Edges (relationships), Message Passing (aggregate neighbor info)."

### Exercise 2: Sketch Privacy Risks in GNNs

In [2]:
gnn_privacy_risks = "Node degrees, connectivity patterns, or node features could leak sensitive info if exposed."

## Section 2: Simulate Tiny Graph Data

### Exercise 3: Build Tiny Citation Graph

In [3]:
import networkx as nx
G = nx.gnm_random_graph(20, 40, directed=True)
nx.draw(G, with_labels=True)

### Exercise 4: Assign Random Node Features

In [4]:
import numpy as np
np.random.seed(42)
for node in G.nodes():
    G.nodes[node]['feature'] = np.random.normal(0, 1)

## Section 3: Private Aggregation

### Exercise 5: Simulate Message Passing + Add Laplace Noise

In [5]:
def private_message_aggregate(G, epsilon=1.0):
    features = {}
    for node in G.nodes():
        neighbors = list(G.predecessors(node))
        if neighbors:
            agg = np.mean([G.nodes[n]['feature'] for n in neighbors])
        else:
            agg = 0.0
        noise = np.random.laplace(0, 1/epsilon)
        features[node] = agg + noise
    return features

noisy_features = private_message_aggregate(G)

### Exercise 6: Visualize Noisy Features Distribution

In [6]:
import matplotlib.pyplot as plt
plt.hist(list(noisy_features.values()), bins=10)
plt.title('Distribution of Noisy Aggregated Features')
plt.show()

## Section 4: Private Node Classification

### Exercise 7: Create Random Binary Labels

In [7]:
labels = {node: np.random.choice([0,1]) for node in G.nodes()}
labels

### Exercise 8: Build Noisy Logistic Classifier

In [8]:
from sklearn.linear_model import LogisticRegression

X = np.array(list(noisy_features.values())).reshape(-1,1)
y = np.array(list(labels.values()))

model = LogisticRegression()
model.fit(X, y)
model.score(X, y)

### Exercise 9: Reflect on DP's Impact on Accuracy

In [9]:
dp_accuracy_reflection = "Noise in feature aggregation reduces classification accuracy, especially at low ε."

### Exercise 10: Sketch Real GNN + DP Applications

In [10]:
real_world_gnn_dp = "Private friend recommendation, private knowledge graph reasoning, private traffic prediction."

### Exercise 11: Vary ε and Plot Accuracy Curve

In [11]:
epsilons = [0.1, 0.5, 1.0, 2.0]
accuracies = []
for eps in epsilons:
    noisy_features = private_message_aggregate(G, epsilon=eps)
    X = np.array(list(noisy_features.values())).reshape(-1,1)
    model = LogisticRegression()
    model.fit(X, y)
    accuracies.append(model.score(X, y))

plt.plot(epsilons, accuracies)
plt.xlabel('ε')
plt.ylabel('Accuracy')
plt.title('Privacy vs Classification Accuracy')
plt.show()

### Exercise 12: Summarize Trade-offs

In [12]:
gnn_tradeoffs_summary = "Stronger privacy → more noise → worse message aggregation → lower model performance."