Skip to content

lnresearch/topology

Repository files navigation

https://zenodo.org/badge/DOI/10.5281/zenodo.4088530.svg

Lightning Network Gossip

Payments in the Lightning network are source-routed, meaning that the sender of a payment is responsible for finding a route from itself to the payment recipient. This is necessary due to the use of onion routing, based on the Sphinx construction [sphinx2009], in which the data to be transferred, i.e., the payment, is sent with an associated routing packet that specifies the route the data should be transferred over. In Lightning each hop on a route must correspond to a channel that is used to forward the payment, in the form of an HTLC, along with the routing onion.

In order to enable nodes to compute a route to the payment recipient, the nodes exchange information about the topology of the network, with edges corresponding to the channels, and vertices corresponding to the nodes in the network. The exchange of information is specified in the gossip protocol [gossip-spec], and is based on the channel endpoints broadcasting three types of messages to the network:

  • channel_announcement: signed by both endpoints and unique for each channel. It notifies nodes about the existence of a channel, where the funding transaction is in the blockchain and features that are supported on the channel.
  • channel_update: sent by one of the endpoints, and is specific for the outgoing direction from that endpoint. It specifies the parameters to use when traversing the channel in the specified direction. Notice that each channel therefore has two channel directions, which are handled independently. These half-channels can be updated by sending a new message with a higher timestamp field.
  • node_announcement: sent by a node that has at least one active channel, and specifies the metadata of the node. In particular it signals what protocol extensions the node can understand. Similar a node can send multiple announcements, that are then disambiguated through the timestamp field.

Data Collection

The data collection is based on a number of c-lightning nodes, that synchronize their view of the topology with peers by exchanging the gossip messages. Internally c-lightning will deduplicate messages, discard outdated node_announcements and channel_updates, and then apply them to the internal view. In order to persist the view across restarts, the node also writes the raw messages, along some internal messages, to a file called the gossip_store. The node compacts the gossip_store file from time to time in order to limit its growth. Compaction consists of rewriting the file, skipping messages that have been superceded in the meantime.

We have built a number of tools that allow the tracking of the gossip_store file, and persisting the messages in order to retain them even after compaction. From the raw messages it is then possible to generate number of derivative formats, that allow inspecting the state of the network at any point during the runtime of the collection.

File Format

In order to minimize the size of the datasets a simple custom file format. The file format consists of a header and a stream of raw gossip messages as they were exchanged over the wire. The header consists of a 3-byte prefix with the value GSP followed by a single byte version. Currently only version 0x01 is defined.

Each message in the raw message stream is prefixed by its length, encoded as ~CompactSize~ integer.

The following code snippet is based on the pyln-proto Python Package and can be used to load iterate through the messages in a BZ2 compressed dataset:

from pyln.proto.primitives import varint_decode
import bz2

def read_dataset(filename: str):
    with bz2.open(filename, 'rb') as f:
        header = f.read(4)
        print(header[3])
        assert(header[:3] == b'GSP' and header[3] == 1)
        while True:
            length = varint_decode(f)
            msg = f.read(length)
            if len(msg) != length:
                raise ValueError(f"Incomplete message read from {filename}")

            yield msg

For details on the gossip messages themselves please refer to the Lightning Network Specification.

Datasets

Available Datasets

The following table lists all available datasets and information about each dataset.

Link / FilenameSHA256 ChecksumMessages
gossip-20201014.gsp.bz28c507298d2d2e7f5577ae9484986fc05630ef0bd2b59da39a60b674fd743713c
gossip-20201102.gsp.bz2e6628e77907406288f476d5c86f02fb310474c430eb980e0232a520c98d390aa
gossip-20201203.gsp.bz2fa323aae6b1c4d3d659abab8ec42cbbe81dded2ed7b3c526d3bf85f03d7b93cc
gossip-20210104.gsp.bz2992199372dfb5cb1fa5e305c5ef4f2604f591798d522fc0576dc8de32315c79b
gossip-20210908.gsp.bz20ba0b31c12c4aec7f1255866acef485e239d54dedde99f4905cf869ec57804c1
gossip-20220823.gsp.bz2cb260b0d7d3633db3b267256e43b974d1ecbcd403ab559a80f5e80744578777d
gossip-20230924.gsp.bz2b6298fea4dd468e9f6857ab844993363143515b18f9e8c8278f33c601c058e7835’984’848

Data Coverage

We strive to provide the best possible datasets to researchers. The gossip mechanism in Lightning is however purposefully lossy:

  • Old gossip messages are not retained by nodes, since they are likely out of date or have been superceded by a newer message, and no longer useful for the operation of the node.
  • A staggered broadcast mechanism is used to limit the reach of redundant messages, both to protect the nodes from disclosing too much fine-grained information about themselves, and to protect the network from spam.
  • Messages may not be forwarded to each node in the network, for example if a subset of nodes deems the message invalid.

The first point is likely the most important, since it gives us a unique vantage point, having collected this information from the very beginning of the mainnet deployment. However, initially the collection was rather coarse-grained and some information may have been missed.

While collecting the gossip information we have changed format and methods a number of times, resulting in datasets that do not share the same format and coverage. Our current methodology ensures that we capture the information in its raw state, after applying only the deduplication filtering that c-lightning performs to protect against outdated data and spam from peers.

For collected information that predates the current collection methodology we are still working on updating and annotating it in order to backfill the datasets. This should provide us with the most complete picture of the evolution of the Lightning network ever collected.

Our formats and methodologies changed in the following ways:

  • Early 2018 - April 2018: a cronjob runs lightning-cli listchannels and stores the resulting JSON object on disk.
  • April 2018 - August 2019: a cronjob calls lightning-cli listchannels and processes the results. For each channel and state a timespan is generated during which the channel remained stable (no state change). Results matching the last previous timespan are extended, changes to the channel state result in a new timespan being created.
  • August 2019 – now: the raw protocol messages are extracted from the c-lightning gossip_store file, deduplicated and added to the database.

Sadly it is unlikely that the high-fidelity format can be recovered completely from the earlier formats, e.g., signatures cannot be recovered from the stored information. However it might be possible to recreate parts of the structural information from the JSON dumps and the timespans. We will eventually make this data public as well, as soon as we have confirmed it is sufficiently free of errors.

The data collection is on a best-effort basis and we don’t provide any guarantees that the datasets are complete. We are happy to accept missing gossip messages to backfill the datasets. If you have found missing gossip messages please open an issue or a PR on this repository.

![](message-hist.svg “Histogram of gossip messages over time”)

Citing a Dataset in your Publication

If you found these datasets useful or would like others to reproduce your research starting from the same dataset, please use the below BibTeX entry to reference this project, or a specific dataset:

@misc{lngossip,
  title = {Lightning Network Research \mdash; Topology Datasets},
  author = {Decker, Christian},
  howpublished = {\url{https://github.com/lnresearch/topology}},
  note = {Accessed: 2020-10-01},
  doi = {10.5281/zenodo.4088530}
}

In case you’d like to reference a specific dataset, please add the URL-fragment #dataset-2020-10-01 to the howpublished URL. This will ensure that visitors jump in to the above table, allowing them to directly download the dataset.

Publications based on these Datasets

  • Lin, Jian-Hong et al., Lightning network: a second path towards centralisation of the Bitcoin economy, arXiv preprint arXiv:2002.02819 (2020). PDF
  • Zabka, Philipp, et al., Node Classification and Geographical Analysis of the Lightning Cryptocurrency Network, 22nd International Conference on Distributed Computing and Networking (ICDCN), Nara, Japan, January 2021. PDF
  • Pietrzak, Krzysztof et al., LightPIR: Privacy-Preserving Route Discovery for Payment Channel Networks, arXiv PDF

Bibliography