Lightning Network Gossip

Payments in the Lightning network are source-routed, meaning that the sender of a payment is responsible for finding a route from itself to the payment recipient. This is necessary due to the use of onion routing, based on the Sphinx construction [sphinx2009], in which the data to be transferred, i.e., the payment, is sent with an associated routing packet that specifies the route the data should be transferred over. In Lightning each hop on a route must correspond to a channel that is used to forward the payment, in the form of an HTLC, along with the routing onion.

In order to enable nodes to compute a route to the payment recipient, the nodes exchange information about the topology of the network, with edges corresponding to the channels, and vertices corresponding to the nodes in the network. The exchange of information is specified in the gossip protocol [gossip-spec], and is based on the channel endpoints broadcasting three types of messages to the network:

channel_announcement: signed by both endpoints and unique for each channel. It notifies nodes about the existence of a channel, where the funding transaction is in the blockchain and features that are supported on the channel.
channel_update: sent by one of the endpoints, and is specific for the outgoing direction from that endpoint. It specifies the parameters to use when traversing the channel in the specified direction. Notice that each channel therefore has two channel directions, which are handled independently. These half-channels can be updated by sending a new message with a higher timestamp field.
node_announcement: sent by a node that has at least one active channel, and specifies the metadata of the node. In particular it signals what protocol extensions the node can understand. Similar a node can send multiple announcements, that are then disambiguated through the timestamp field.

Data Collection

The data collection is based on a number of c-lightning nodes, that synchronize their view of the topology with peers by exchanging the gossip messages. Internally c-lightning will deduplicate messages, discard outdated node_announcements and channel_updates, and then apply them to the internal view. In order to persist the view across restarts, the node also writes the raw messages, along some internal messages, to a file called the gossip_store. The node compacts the gossip_store file from time to time in order to limit its growth. Compaction consists of rewriting the file, skipping messages that have been superceded in the meantime.

We have built a number of tools that allow the tracking of the gossip_store file, and persisting the messages in order to retain them even after compaction. From the raw messages it is then possible to generate number of derivative formats, that allow inspecting the state of the network at any point during the runtime of the collection.

File Format

In order to minimize the size of the datasets a simple custom file format. The file format consists of a header and a stream of raw gossip messages as they were exchanged over the wire. The header consists of a 3-byte prefix with the value GSP followed by a single byte version. Currently only version 0x01 is defined.

Each message in the raw message stream is prefixed by its length, encoded as ~CompactSize~ integer.

The following code snippet is based on the pyln-proto Python Package and can be used to load iterate through the messages in a BZ2 compressed dataset:

from pyln.proto.primitives import varint_decode
import bz2

def read_dataset(filename: str):
    with bz2.open(filename, 'rb') as f:
        header = f.read(4)
        print(header[3])
        assert(header[:3] == b'GSP' and header[3] == 1)
        while True:
            length = varint_decode(f)
            msg = f.read(length)
            if len(msg) != length:
                raise ValueError(f"Incomplete message read from {filename}")

            yield msg

For details on the gossip messages themselves please refer to the Lightning Network Specification.

Datasets

Available Datasets

The following table lists all available datasets and information about each dataset.

Link / Filename	SHA256 Checksum	Messages
gossip-20201014.gsp.bz2	8c507298d2d2e7f5577ae9484986fc05630ef0bd2b59da39a60b674fd743713c
gossip-20201102.gsp.bz2	e6628e77907406288f476d5c86f02fb310474c430eb980e0232a520c98d390aa
gossip-20201203.gsp.bz2	fa323aae6b1c4d3d659abab8ec42cbbe81dded2ed7b3c526d3bf85f03d7b93cc
gossip-20210104.gsp.bz2	992199372dfb5cb1fa5e305c5ef4f2604f591798d522fc0576dc8de32315c79b
gossip-20210908.gsp.bz2	0ba0b31c12c4aec7f1255866acef485e239d54dedde99f4905cf869ec57804c1
gossip-20220823.gsp.bz2	cb260b0d7d3633db3b267256e43b974d1ecbcd403ab559a80f5e80744578777d
gossip-20230924.gsp.bz2	b6298fea4dd468e9f6857ab844993363143515b18f9e8c8278f33c601c058e78	35’984’848

Data Coverage

We strive to provide the best possible datasets to researchers. The gossip mechanism in Lightning is however purposefully lossy:

Old gossip messages are not retained by nodes, since they are likely out of date or have been superceded by a newer message, and no longer useful for the operation of the node.
A staggered broadcast mechanism is used to limit the reach of redundant messages, both to protect the nodes from disclosing too much fine-grained information about themselves, and to protect the network from spam.
Messages may not be forwarded to each node in the network, for example if a subset of nodes deems the message invalid.

The first point is likely the most important, since it gives us a unique vantage point, having collected this information from the very beginning of the mainnet deployment. However, initially the collection was rather coarse-grained and some information may have been missed.

While collecting the gossip information we have changed format and methods a number of times, resulting in datasets that do not share the same format and coverage. Our current methodology ensures that we capture the information in its raw state, after applying only the deduplication filtering that c-lightning performs to protect against outdated data and spam from peers.

For collected information that predates the current collection methodology we are still working on updating and annotating it in order to backfill the datasets. This should provide us with the most complete picture of the evolution of the Lightning network ever collected.

Our formats and methodologies changed in the following ways:

Early 2018 - April 2018: a cronjob runs lightning-cli listchannels and stores the resulting JSON object on disk.
April 2018 - August 2019: a cronjob calls lightning-cli listchannels and processes the results. For each channel and state a timespan is generated during which the channel remained stable (no state change). Results matching the last previous timespan are extended, changes to the channel state result in a new timespan being created.
August 2019 – now: the raw protocol messages are extracted from the c-lightning gossip_store file, deduplicated and added to the database.

Sadly it is unlikely that the high-fidelity format can be recovered completely from the earlier formats, e.g., signatures cannot be recovered from the stored information. However it might be possible to recreate parts of the structural information from the JSON dumps and the timespans. We will eventually make this data public as well, as soon as we have confirmed it is sufficiently free of errors.

The data collection is on a best-effort basis and we don’t provide any guarantees that the datasets are complete. We are happy to accept missing gossip messages to backfill the datasets. If you have found missing gossip messages please open an issue or a PR on this repository.

![](message-hist.svg “Histogram of gossip messages over time”)

Citing a Dataset in your Publication

If you found these datasets useful or would like others to reproduce your research starting from the same dataset, please use the below BibTeX entry to reference this project, or a specific dataset:

@misc{lngossip,
  title = {Lightning Network Research \mdash; Topology Datasets},
  author = {Decker, Christian},
  howpublished = {\url{https://github.com/lnresearch/topology}},
  note = {Accessed: 2020-10-01},
  doi = {10.5281/zenodo.4088530}
}

In case you’d like to reference a specific dataset, please add the URL-fragment #dataset-2020-10-01 to the howpublished URL. This will ensure that visitors jump in to the above table, allowing them to directly download the dataset.

Publications based on these Datasets

Lin, Jian-Hong et al., Lightning network: a second path towards centralisation of the Bitcoin economy, arXiv preprint arXiv:2002.02819 (2020). PDF
Zabka, Philipp, et al., Node Classification and Geographical Analysis of the Lightning Cryptocurrency Network, 22nd International Conference on Distributed Computing and Networking (ICDCN), Nara, Japan, January 2021. PDF
Pietrzak, Krzysztof et al., LightPIR: Privacy-Preserving Route Discovery for Payment Channel Networks, arXiv PDF

Bibliography

[sphinx2009]: Danezis, George & Goldberg, Ian. (2009)., Sphinx: A Compact and Provably Secure Mix Format., IACR Cryptology ePrint Archive. 2008. 269-282., 10.1109/SP.2009.15.
[gossip-spec]: https://github.com/lightningnetwork/lightning-rfc/blob/master/07-routing-gossip.md

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
lntopo		lntopo
.gitignore		.gitignore
LICENSE		LICENSE
README.org		README.org
message-hist.svg		message-hist.svg
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lntopo

lntopo

.gitignore

.gitignore

LICENSE

LICENSE

README.org

README.org

message-hist.svg

message-hist.svg

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Lightning Network Gossip

Data Collection

File Format

Datasets

Available Datasets

Data Coverage

Citing a Dataset in your Publication

Publications based on these Datasets

Bibliography

About

Releases 1

Packages

Contributors 3

Languages

License

lnresearch/topology

Folders and files

Latest commit

History

Repository files navigation

Lightning Network Gossip

Data Collection

File Format

Datasets

Available Datasets

Data Coverage

Citing a Dataset in your Publication

Publications based on these Datasets

Bibliography

About

Resources

License

Stars

Watchers

Forks

Languages