# Summary

- load LN graph snapshots with specified day threshold (e.g. load a snapshot from every week)
- exclude records with missing node policy information
- **transform undirected LN network into directed graph** (with respect to node policies of the given channel)
- fill missing values with most common values for each policy values
- export directed multi-graph (for transaction simulation)
- aggregate multi-edge information into single edges (due to centrality calculation)
- export aggregated directed graph

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from ln_utils import *

# 1. Select snapshots and files

In [None]:
first_day = 1549032366 # February 1, 2019 2:46:06 PM
last_day = 1553864399 # March 29, 2019 12:59:59 PM
day_interval = 1#3

In [None]:
output_dir = "/mnt/idms/fberes/data/bitcoin_ln_research/directed_graphs/"

In [None]:
data_dir = "../LNdata/lncaptures/lngraph/2019/"

In [None]:
snapshot_ends = extract_snapshot_ends(data_dir, first_day, day_interval)

In [None]:
additional_dates = extract_additional_dates(snapshot_ends[-1],last_day, day_interval)

In [None]:
graph_files = []

In [None]:
dir_1 = "../LNdata/lncaptures/lngraph/2019/"
for time in snapshot_ends:
    graph_files.append("%s/%i.json" % (dir_1, time))

In [None]:
dir_2 = "../LNdata/"
for time in additional_dates:
    graph_files.append("%s/%s.json" % (dir_2, time))

In [None]:
graph_files

# 2. Load data

In [None]:
EDGE_KEYS = ["node1_pub","node2_pub","last_update","capacity","channel_id",'node1_policy','node2_policy']
nodes, edges = load_temp_data(graph_files, edge_keys=EDGE_KEYS)
print(len(nodes), len(edges))

edges = edges.sort_values("last_update").reset_index(drop=True)
print(edges.head(3))

### Remove records with missing node policy

In [None]:
edges.isnull().sum() / len(edges)

In [None]:
origi_size = len(edges)
edges = edges[(~edges["node1_policy"].isnull()) & (~edges["node2_policy"].isnull())]
print(origi_size - len(edges))

### Number of channels in each snapshot

In [None]:
edges["snapshot_id"].value_counts().sort_values().plot(kind="bar")

# 3. Transform undirected graph into directed graph

We have to execute this transformation in order to calculate transaction fees

In [None]:
directed_df = generate_directed_graph(edges)
print(directed_df.head())

# 4. Fill missing policy values with most frequent values

Most of the people use the default policy values

In [None]:
print("missing values for columns:")
print(directed_df.isnull().sum())

In [None]:
print("filling missing values with most popular values per column!!!")
directed_df = directed_df.fillna({"disabled":False,"fee_base_msat":1000,"fee_rate_milli_msat":1,"min_htlc":1000})

In [None]:
for col in ["fee_base_msat","fee_rate_milli_msat","min_htlc"]:
    directed_df[col] = directed_df[col].astype("float64")
print(directed_df.dtypes)

# 5. Statistics

In [None]:
directed_df["disabled"].value_counts().plot(kind="pie")

In [None]:
directed_df["capacity"].hist(bins=50)

- Most of the people ask 1SAT as base fee

In [None]:
directed_df["fee_base_msat"].mean(), directed_df["fee_base_msat"].median(), directed_df["fee_base_msat"].max()

- There are some extraordinary high values : these edges will never be used for routing...
- Most of the nodes ask 10^-6 SAT / 1 SAT routing 

In [None]:
directed_df["fee_rate_milli_msat"].mean(), directed_df["fee_rate_milli_msat"].median(), directed_df["fee_rate_milli_msat"].max()

# 6. Export MultiGraph

For transaction simulation experiments we use the LN multigraph

In [None]:
directed_df.to_csv("%s/directed_temporal_multi_edges_%idays.csv" % (output_dir, day_interval))

# 7. Aggregate weights for multiple channels between nodes

Most centrality measures are not implemented for multigraphs thus we aggregate policy values over multiedges

In [None]:
grouped = directed_df.groupby(["src","trg","snapshot_id"])

In [None]:
directed_aggr = grouped.agg({
    "channel_id":"nunique",
    "capacity":"sum",
    "fee_base_msat":"mean",
    "fee_rate_milli_msat":"mean"
})

In [None]:
directed_aggr = directed_aggr.rename({"channel_id":"num_channels"}, axis=1)

### There are extreme cases of channel_id-s between nodes!!!

- are these attacks?
- probably not (just router nodes)

In [None]:
directed_aggr.sort_values("num_channels",ascending=False).reset_index().head(10)

In [None]:
directed_aggr = directed_aggr.reset_index()

In [None]:
directed_aggr.head()

# 8. Export aggregated edges

In [None]:
directed_aggr.to_csv("%s/directed_temporal_edges_%idays.csv" % (output_dir, day_interval), index=False)