docs and example notebook updates

ferencberes · Mar 6, 2020 · 70d6f2c · 70d6f2c
1 parent 86e3f73
commit 70d6f2c
Show file tree

Hide file tree

Showing 3 changed files with 249 additions and 41 deletions.
diff --git a/docs/source/getting_started.md b/docs/source/getting_started.md
@@ -2,13 +2,13 @@
 
 By executing the steps below you can set up `lnsimulator` for your environment in a few minutes.
 
-## 1. Requirements
+## Requirements
 
 - UNIX or macOS environment
     - For macOS users: you need to have wget (brew install wget)
 - This package was developed in Python 3.5 (conda environment) but it works with Python 3.6 and 3.7 as well.
 
-## 2. Install
+## Install
 
 After cloning the [repository](https://github.com/ferencberes/LNTrafficSimulator) from GitHub you can install the simulator with `pip`.
 
@@ -18,7 +18,7 @@ cd LNTrafficSimulator
 pip install .
 ```
 
-## 3. Data
+## Data
 
 By providing daily LN snapshots as input **(you can bring and use your own!)**, our simulator models the flow of daily transactions.
 
@@ -50,7 +50,7 @@ The *download_data.sh* script downloads 4 data files into the *ln_data* folder w
 | **1ml_meta_data.csv** | Yes | merchant meta data that we downloaded from [1ml.com](https://1ml.com/) |
 | **ln.tsv** | No | edge stream data about LN channels |
 
-## 4. First example
+## First example
 
 Execute the following code to see whether your configuration was successful.
 

diff --git a/docs/source/simulator_docs.md b/docs/source/simulator_docs.md
@@ -3,7 +3,7 @@
 In the steps below we suppose that you have already installed `lnsimulator` and downloaded the related LN data.
 If it is not the case then should follow the instructions in the [Getting Started](getting_started) section first.
 
-## 1. Prepare LN data
+## Prepare LN data
 
 In order to run the simulation you need to provide LN snapshots as well as information about merchants nodes.
 
@@ -43,7 +43,7 @@ node_meta = pd.read_csv("%s/1ml_meta_data.csv" % data_dir)
 providers = list(node_meta["pub_key"])
 ```
 
-## 2. Configuration
+## Configuration
 
 First we give you the list of main parameters. **By the word "transaction" we refer to LN payments.**
 
@@ -71,7 +71,7 @@ with_depletion = True
 simulator = ts.TransactionSimulator(directed_edges, providers, amount, count, drop_disabled=drop_disabled, drop_low_cap=drop_low_cap, eps=epsilon, with_depletion=with_depletion)
 ```
 
-## 3. Estimating daily income and traffic
+## Estimating daily income and traffic
 
 ### i.) Transactions
 
@@ -88,7 +88,7 @@ print(transactions.shape)
 In this step the simulator searches for cheapest payment paths from transaction senders to its receivers. Channel capacity changes are well maintained during the simulation. 
 
 ```python
-_, _, all_router_fees, _ = simulator.simulate(weight="total_fee", with_node_removals=False)
+cheapest_paths, _, all_router_fees, _ = simulator.simulate(weight="total_fee", with_node_removals=False)
 print(all_router_fees.head())
 ```
 
@@ -99,11 +99,89 @@ After payment simulation you can export the results as well as calculate traffic
 ```python
 output_dir = "test"
 total_income, total_fee = simulator.export(output_dir)
-
-total_income.set_index("node").head(10)
 ```
 
+In order to get stable daily LN node statistics, we recommend to run the simulator for multiple times over several consecutive snapshots. **Node statistics in each output file below are restricted to a single traffic simulator experiment!** You can find these file in the `output_dir` folder.
+
+#### a.) lengths_distrib.csv
+
+Distribution of payment path length for the sampled transactions. Due to the source routing nature of LN, we assumed that transactions are executed on the cheapest path between the sender and the recipient.
+
+| Column | Description |
+|     :---      |   :---   |
+| First | Payment path length |
+| Second | Number of sampled transactions with given length |
+
+**Note:** the length is marked -1 if the payment failed (there was no available path for routing)
+
+**Note:** the sum of transactions in the second column could be less then the predefined number of payments to simulate. The difference is the number of randomly sampled loop transactions with identical sender and recipient node.
+
+#### b.) router_incomes.csv
+
+Contains statistics on nodes that forwarded payments in the simulation. We refer to these nodes as **routers**.
+
+| Column | Description |
+|     :---      |   :---   |
+| node | LN node public key |
+| fee | routing income |
+| num_trans | number of routed transactions |
+
+#### c.) source_fees.csv
+
+Contains statistics on payment initiator nodes (senders).
+
+| Column | Description |
+|     :---      |   :---   |
+| source | LN node that initiated the payment (sender node) |
+| num_trans | the number of transactions initiated by this node in the simulation |
+| mean_fee | the mean transaction cost per payment |
+
+## Useful function calls
+
+There are alternative ways to interact with the simulator object beside exporting the results (with the `simulator.export(output_dir)` function). Please follow the examples below.
+
+#### Top nodes with highest daily income
+
 You can search for the identity of these nodes on [1ml.com](https://1ml.com).
 
-**In order to get stable daily LN node statistics, we recommend to run the simulator for multiple times over several consecutive snapshots!**
+```
+total_income.sort_values("fee", ascending=False).set_index("node").head(5)
+```
 
+#### Top nodes with highest daily traffic
+
+```
+total_income.sort_values("num_trans", ascending=False).set_index("node").head(5)
+```
+
+#### Payment path length distribution
+
+**Note:** the length is marked -1 if the payment failed (there was no available path for routing)
+
+```
+cheapest_paths["length"].value_counts()
+```
+
+#### Payment succes ratio
+
+```
+(cheapest_paths["length"] > -1).value_counts() / len(cheapest_paths)
+```
+
+#### Payment cost statistics
+
+```
+cheapest_paths["original_cost"].describe()
+```
+
+#### Most frequent payment receivers
+
+```
+simulator.transactions["target"].value_counts().head(5)
+```
+
+#### Number of unique payment senders and receivers
+
+```
+simulator.transactions["source"].nunique(), simulator.transactions["target"].nunique()
+```
diff --git a/notebooks/Examples.ipynb b/notebooks/Examples.ipynb
@@ -1,31 +1,5 @@
 {
  "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# 0. Download LN data\n",
-    "\n",
-    "It may take a few minutes to download prepared LN data for the first time"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%bash\n",
-    "if [ -d \"../ln_data\" ]; then\n",
-    "echo \"data exists\";\n",
-    "else\n",
-    "echo \"downloading data...\"\n",
-    "wget https://dms.sztaki.hu/~fberes/ln/ln_data_2019-10-29.zip;\n",
-    "unzip ln_data_2019-10-29.zip;\n",
-    "mv ln_data ..;\n",
-    "fi"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -39,7 +13,7 @@
    "source": [
     "## Prepare LN data\n",
     "\n",
-    "In order to run the simulation you need to provide LN snapshots as well as information about merchants nodes.\n",
+    "In order to run the simulation you need to provide LN snapshots as well as information about merchants nodes. In this notebooks you can try our LN traffic simulator on a small sample data. **If you want to use the full data set related to our work then please follow the instructions in the [documentation](https://lnsimulator.readthedocs.io/en/latest/getting_started.html#data).**\n",
     "\n",
     "### LN snapshots\n",
     "\n",
@@ -61,7 +35,7 @@
    "source": [
     "from lnsimulator.ln_utils import preprocess_json_file\n",
     "\n",
-    "data_dir = \"../ln_data/\" # path to the ln_data folder that contains the downloaded data\n",
+    "data_dir = \"../sample_data/\"\n",
     "directed_edges = preprocess_json_file(\"%s/sample.json\" % data_dir)"
    ]
   },
@@ -160,7 +134,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "_, _, all_router_fees, _ = simulator.simulate(weight=\"total_fee\", with_node_removals=False)"
+    "cheapest_paths, _, all_router_fees, _ = simulator.simulate(weight=\"total_fee\", with_node_removals=False)"
    ]
   },
   {
@@ -182,13 +156,169 @@
     "total_income, total_fee = simulator.export(output_dir)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In order to get stable daily LN node statistics, we recommend to run the simulator for multiple times over several consecutive snapshots. **Node statistics in each output file below are restricted to a single traffic simulator experiment!** You can find these file in the `output_dir` folder.\n",
+    "\n",
+    "#### a.) lengths_distrib.csv\n",
+    "\n",
+    "Distribution of payment path length for the sampled transactions. Due to the source routing nature of LN, we assumed that transactions are executed on the cheapest path between the sender and the recipient.\n",
+    "\n",
+    "| Column | Description |\n",
+    "|     :---      |   :---   |\n",
+    "| First | Payment path length |\n",
+    "| Second | Number of sampled transactions with given length |\n",
+    "\n",
+    "**Note:** the length is marked -1 if the payment failed (there was no available path for routing)\n",
+    "\n",
+    "**Note:** the sum of transactions in the second column could be less then the predefined number of payments to simulate. The difference is the number of randomly sampled loop transactions with identical sender and recipient node.\n",
+    "\n",
+    "#### b.) router_incomes.csv\n",
+    "\n",
+    "Contains statistics on nodes that forwarded payments in the simulation. We refer to these nodes as **routers**.\n",
+    "\n",
+    "| Column | Description |\n",
+    "|     :---      |   :---   |\n",
+    "| node | LN node public key |\n",
+    "| fee | routing income |\n",
+    "| num_trans | number of routed transactions |\n",
+    "\n",
+    "#### c.) source_fees.csv\n",
+    "\n",
+    "Contains statistics on payment initiator nodes (senders).\n",
+    "\n",
+    "| Column | Description |\n",
+    "|     :---      |   :---   |\n",
+    "| source | LN node that initiated the payment (sender node) |\n",
+    "| num_trans | the number of transactions initiated by this node in the simulation |\n",
+    "| mean_fee | the mean transaction cost per payment |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Useful function calls\n",
+    "\n",
+    "There are alternative ways to interact with the simulator object beside exporting the results (with the `simulator.export(output_dir)` function). Please follow the examples below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Top nodes with highest daily income\n",
+    "\n",
+    "You can search for the identity of these nodes on [1ml.com](https://1ml.com)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "total_income.sort_values(\"fee\", ascending=False).set_index(\"node\").head(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Top nodes with highest daily traffic"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "total_income.sort_values(\"num_trans\", ascending=False).set_index(\"node\").head(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Payment path length distribution\n",
+    "\n",
+    "**Note:** the length is marked -1 if the payment failed (there was no available path for routing)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cheapest_paths[\"length\"].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Payment succes ratio"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "(cheapest_paths[\"length\"] > -1).value_counts() / len(cheapest_paths)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Payment cost statistics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cheapest_paths[\"original_cost\"].describe()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Most frequent payment receivers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "simulator.transactions[\"target\"].value_counts().head(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Number of unique payment senders and receivers"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "total_income.set_index(\"node\").head(10)"
+    "simulator.transactions[\"source\"].nunique(), simulator.transactions[\"target\"].nunique()"
    ]
   },
   {