# Welcome to `ethp2psim`'s example notebook!

`ethp2psim` is a network privacy simulator for the Ethereum peer-to-peer (p2p) network. It allows developers and researchers to implement, test, and evaluate the anonymity and privacy guarantees of various routing protocols (e.g., Dandelion(++)) and custom privacy-enhanced message routing protocols. Issues, PRs, and contributions are welcome! Let's make Ethereum private together!

# 1. Quickstart

Here, we show an example of how to simulate the [Dandelion protocol](https://arxiv.org/pdf/1701.04439.pdf) in the case of the most basic adversarial setting (predict a node to be the message source if malicious nodes first heard of this message from the given node).

For reproducability, **fix a random seed**:

In [None]:
seed = 42

## i.) Initialize simulation components

In [None]:
from ethp2psim.network import Network, EdgeWeightGenerator, NodeWeightGenerator

First, initialize re-usable **generators for edge and node weights**, e.g. 
   * channel latency is sampled uniformly at random
   * nodes have weights proportional to their staked Ether amount

In [None]:
ew_gen = EdgeWeightGenerator("normal", seed=seed)
nw_gen = NodeWeightGenerator("stake", seed=seed)

With these generators, let's create a random 20 regular graph with 100 nodes to be the **peer-to-peer (P2P) network** in this experiment:

In [None]:
net = Network(nw_gen, ew_gen, num_nodes=100, k=20, seed=seed)

Next, initialize the Dandelion **protocol** where 
   * A message is broadcasted with 40% probability in the stem (anonymity) phase, or it is further propagated on the line graph with 60% probability.  
   * With the `broadcast_mode="sqrt"` the message is only sent to a randomly selected square root of neighbors in the spreading phase.

In [None]:
from ethp2psim.protocols import DandelionProtocol

dp = DandelionProtocol(net, 0.4, broadcast_mode="sqrt", seed=seed)

You can easily visualize the line (anonymity) graph for the Dandelion protocol:

In [None]:
import matplotlib.pyplot as plt
import networkx as nx

nx.draw(dp.anonymity_graph, node_size=20)

Finally, initilaize a passive **adversary** against the Dandelion protocol that controls random 10% of all nodes.

In [None]:
from ethp2psim.adversary import DandelionAdversary

adv = DandelionAdversary(dp, 0.1, active=False, seed=seed)

## ii.) Run simulation

In this experiment, let's **simulate** 20 random messages for the same P2P network and adversary with the Dandelion protocol.

First, initialize the simulator by setting the protocol, the adversary, the number of simulated messages, and how the message source nodes are sampled.

In [None]:
from ethp2psim.simulator import Simulator

sim = Simulator(adv, num_msg=20, use_node_weights=True, verbose=False, seed=seed)

Due to the `use_node_weights=True` setting, source nodes for messages are randomly sampled with respect to their staked Ether amount in accordance with the formerly prepared `NodeWeightGenerator`.

Next, **run the simulation**:

In [None]:
%%time
node_coverage_by_msg = sim.run()
print(node_coverage_by_msg)

## iii.) Evaluate the simulation

**Evaluate** the performance of the adversary for the given simulation. Here, you can choose different estimators for adversary performance evaluation (e.g., "first_sent", "first_reach", "dummy"):

In [None]:
from ethp2psim.simulator import Evaluator

evaluator = Evaluator(sim, estimator="first_sent")
print(evaluator.get_report())

The average results, calculated for the 20 random messages, show that

- 20% of the message sources were correctly identified by the adversary (`hit_ratio`)
- the original message source is identified at the third position on average from the ranked list of candidates (`inverse_rank=0.3527`)
- almost all messages reach every node in the P2P network (`message_spread_ratio`)

# 2. Compare different protocols

In the next experiment, we compare the deanonymization performance of the adversary for two parameters:

- Protocol used for message passing: we compare the simple 
- The ratio of adversarial nodes in the P2P network

## i.) Implement and run the experiment

First, we implement a function to measure the deanonymization performance of the adversary for different protocols.

In [None]:
from ethp2psim.protocols import (
    BroadcastProtocol,
    DandelionPlusPlusProtocol,
    OnionRoutingProtocol,
)
from ethp2psim.adversary import Adversary, OnionRoutingAdversary
import pandas as pd


def run_single_experiment(adversary_ratio: float, seed: int):
    # initialize P2P network topology
    net = Network(NodeWeightGenerator("stake"), EdgeWeightGenerator("normal"), 100, 20)
    # initialize protocols
    protocols = [
        BroadcastProtocol(net, broadcast_mode="sqrt"),
        DandelionProtocol(net, spreading_proba=0.5, broadcast_mode="sqrt"),
        DandelionProtocol(net, spreading_proba=0.25, broadcast_mode="sqrt"),
        DandelionPlusPlusProtocol(net, spreading_proba=0.5, broadcast_mode="sqrt"),
        DandelionPlusPlusProtocol(net, spreading_proba=0.25, broadcast_mode="sqrt"),
        OnionRoutingProtocol(net, num_relayers=3, broadcast_mode="sqrt"),
    ]
    # use the same set of adversarial nodes for all protocols
    num_adv_nodes = int(net.num_nodes * adversary_ratio)
    adv_nodes = net.sample_random_nodes(num_adv_nodes, False)
    single_run_results = []
    # run simulaion for each protocol
    for protocol in protocols:
        # initialize adversary with pre-defined adversarial node set
        if isinstance(protocol, DandelionProtocol):
            adv = DandelionAdversary(protocol, adversaries=adv_nodes)
        elif isinstance(protocol, OnionRoutingProtocol):
            adv = OnionRoutingAdversary(protocol, adversaries=adv_nodes)
        else:
            adv = Adversary(protocol, adversaries=adv_nodes)
        # by fixing the seed we simulate the same messages
        sim = Simulator(adv, 20, seed=seed, verbose=False)
        sim.run()
        # collect results
        evaluator = Evaluator(sim, estimator="first_sent")
        report = evaluator.get_report()
        report["protocol"] = str(protocol)
        single_run_results.append(report)
    # postprocessing results
    results_df = pd.DataFrame(single_run_results)
    results_df["adversary_ratio"] = adversary_ratio
    return results_df

**Run an experiment** with the following parameters:

- The ratio of adversarial nodes: `[0.05, 0.1, 0.2]`
- We use 20 independent samples to measure performance for each parameter setting

In [None]:
import numpy as np
from tqdm.notebook import tqdm

num_trials = 20
results = []
for adv_ratio in [0.05, 0.1, 0.2]:
    results += [
        run_single_experiment(adv_ratio, rnd_seed)
        for rnd_seed in tqdm(np.random.randint(10**5, size=num_trials))
    ]
results_df = pd.concat(results, ignore_index=True)
print(results_df.shape)

Results are stored in a `pandas.DataFrame` with the related experimental parameters

In [None]:
results_df.head()

## ii.) Visualization

In visualizing the results, we differentiate between metrics considering available information on message sources. In other words, these are metrics that rely on ground truth information.

### Metrics relying on ground truth information (`hit_ratio`, `inverse_rank`, `ndcg`)

Before visualization, we must restructure our dataframe by melting multiple performance metrics (e.g., `hit_ratio`, `inverse_rank`, etc.) into a single column (`metric`).

In [None]:
visu_df = results_df.melt(
    id_vars=["protocol", "estimator", "adversary_ratio"],
    value_vars=["hit_ratio", "inverse_rank", "ndcg"],
    var_name="metric",
)

In [None]:
visu_df["protocol"].value_counts()

Finally, visualize different deanonymization performance metrics for various protocols and adversarial node ratios.

In [None]:
import plotly.express as px

fig = px.box(
    visu_df,
    x="adversary_ratio",
    y="value",
    color="protocol",
    facet_col="metric",
    width=1400,
    height=500,
)
fig.update_layout(
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0.0)
)
fig.show()

Naturally, the adversary performs better when it controls more nodes in the P2P network. It is also clear that Dandelion(++) can significantly reduce the adversary's power.

### Metrics without ground truth information

Finally, we show the average entropy calculated from adversarial predictions:
- The simple `BroadcasProtocol` assigns probability 1.0 to the predicted message source while zero to every other node. Thus, the entropy is zero.
- For Dandelion(++), the entropy increases with the `spreading_proba` parameter, as expected.
- It is also clear that there is a higher uncertainty in the case of Dandelion++ compared to Dandelion due to the more complex anonymity graph in the protocol.
We note that for now the entropy of the `OnionRoutingAdversary` is not well defined that it is removed from the next figure. We will work on this issue in the future.

In [None]:
fig = px.box(
    results_df[~results_df["protocol"].apply(lambda x: "OnionRouting" in x)],
    x="adversary_ratio",
    y="entropy",
    color="protocol",
)
fig.show()

# 3. What's next?

We hope you liked playing with this notebook. Our main goal was to showcase the underlying potential of `ethp2psim` in developing and comparing privacy-enhanced message routing protocols. At the same time, keep it simple in the process. If you are looking for more complex examples, check out our [results](https://ethp2psim.readthedocs.io/en/latest/experiments.html) for larger P2P networks.