# Reading networks from netzschleuder

[Run notebook in Google Colab](https://colab.research.google.com/github/pathpy/pathpy/blob/master/doc/tutorial/netzschleuder.ipynb)

The [netzschleuder](https://networks.skewed.de) repository is an online repository of more thn 100,000 networks maintained by [Tiago Peixoto](https://skewed.de/tiago). With `pathpy` you can directly read any network from the netzschleuder repository to analyze and visualize it.

In [None]:
pip install git+git://github.com/pathpy/pathpy.git

In [None]:
import pathpy as pp

from pprint import pprint

Since the `netzschleuder` repository uses the graphtool binary format to store network data, support to retrieve networks from the repository is included in `pathpy`'s `io.graphtool` submodule.

Each `netzschleuder` data set can contain one or more networks. If there is more than one network in a data set, we have to additionally specify the name of the network that we wish to retrieve. In a first step, we can use the function `list_netzschleuder_records` to retrieve a list of all data sets. In the following, we only print the first 20 records:

In [None]:
datasets = pp.io.graphtool.list_netzschleuder_records()
pprint(datasets[:20])

We can use keyword arguments to set additional query parameters (e.g. looking for data with specific tags or returning full records with all attributes). The supported query parameters can be found in the [API description](https://networks.skewed.de/api). To return all social networks in the `netzschleuder` repository, we call (here we only return the records 50 through 70):

In [None]:
datasets = pp.io.graphtool.list_netzschleuder_records(tags='Social')
pprint(datasets[50:70])

To retrieve detailed metadata on a specific data set, we can use the following function:

In [None]:
datasets = pp.io.graphtool.read_netzschleuder_record('karate')
pprint(datasets)

Those metadata contain citation information (including a BibTeX record), the original URL from which the data was retrieved, a textual description of the data, as well as a list of networks contained in the data set. In the example above, the `karate` data set contains two networks named `77` and `78`, referring to different versions of the data. For each network, the metadata contain a number of network-level metrics.

## Reading static networks

Let us now read the network into an instance of `pathpy.Network`. For this, we can use the function `read_netzschleuder_network`. To read a specific network, we must specify both the name of the data set as well as the name of the network (in case there is more than one). The function will automatically determine the type of network to return, i.e. static or temporal, directed or undirected, single or multi-edge.

In [None]:
n = pp.io.graphtool.read_netzschleuder_network('karate', '77')
print(n)

In [None]:
pp.plot(n)

## Reading temporal networks

`karate` is an example for a static network, where edges do not have associated timestamps. However. the `netzschleuder` repository contains a number of temporal networks where edges are observed at specific times. To retrieve a list of temporal networks in the netzschleuder database, we can again use the function `list_netschleuder_records` setting the query parameter `tag=Temporal`. We only output records 200 through 250:

In [None]:
pp.io.graphtool.list_netzschleuder_records(tag='Temporal')[200:250]

To retrieve the full information on a specific record, we again call the `read_netzschleuder_record` with the associated data set name:

In [None]:
pp.io.graphtool.read_netzschleuder_record('sp_hospital')

If there is only a single network in the data set, we can omit the network name (which then assumes the same value as the data set). In the network above, each edge has a `time` attribute. `pathpy` will thus return an instance of `TemporalNetwork`:

In [None]:
tn = pp.io.graphtool.read_netzschleuder_network('sp_hospital')
print(tn)

To generate dynamic visualisation of this temporal network, we can simpy call:

In [None]:
pp.plot(tn)

## Reading temporal data as static networks 

Sometimes, we have network data sets where edges include time stamps, but we may want to ignore the timestamps, treating them as multiple observations of the same edge instead. To return a static projection of such a network, we can set `ignore_temporal=True`. By default, an unweighted single-edge network will be generated, i.e. additional observations of the same edge at different time stamps are simply discarded. To highlight that we ignore part of the data, `pathpy` issues a warning:

In [None]:
n = pp.io.graphtool.read_netzschleuder_network('sp_hypertext', 'contacts', ignore_temporal=True)
print(n)

We may instead want to keep all information on the edges, either by returning a multi-edge network in which multiple edges between the same nodes are allowed, or by projecting the multiple observations to a numerical `weight` attribute of edges, where an edge weigt of `n` indicates that this specific edge has been observed `n` times. We can control this behavior using the additional parameter `mutliedges`:

In [None]:
n = pp.io.graphtool.read_netzschleuder_network('sp_hypertext', 'contacts', 
                                               ignore_temporal=True, multiedges=True)
print(n)

We can easily turn this into a **weighted** network, where each edge is included only once while an additional `weight` attribute counts the occurrences of that edge:

In [None]:
weighted_net = pp.Network.to_weighted_network(n)
print(weighted_net)