# Cross-validation experiments with networks

[Open notebook in Google Colab](https://colab.research.google.com/github/pathpy/pathpy/blob/master/doc/tutorial/cross_validation.ipynb)

`pathpy` provides basic support for evaluations based on cross-validation experiments. In particular, the `train_test_split` method can be used to create train and test splits. The semantics of the method as well as the arguments is similar to the corresponding function in `sklearn`.

To demonstrate the use, we generate a random graph:

In [None]:
pip install git+git://github.com/pathpy/pathpy.git

In [None]:
import pathpy as pp

n = pp.generators.ER_np(100, 0.04)
print(n)
n.plot()

To generate a test and train network instance, where the test network contains a random fraction of 25 % of the nodes, we can write:

In [None]:
test, train = pp.algorithms.evaluation.train_test_split(n, test_size = 0.25)
print(test)
print(train)

The method generates two new Network instances that refer to the same node and edge objects as the original network, i.e. the new objects do not consume a lot of memory. The original network instance is not changed. The uids of the newly generated networks will be set to the original uid with a suffix of `_test` and `_train` respectively.

By default, the split will be made based on the nodes, and the train and test networks will include all incident edges for the corresponding node sets. This implies that some edges can be lost if the split is made along the endpoints. To preserve the number of edges, we can set the split method to `edge`. This will sample a random fraction of edges, and all nodes are added to both networks, i.e. the node sets between the two networks are identical. The sum of the edges of the training and test network equals the number of edges in the original network.

In [None]:
test, train = pp.algorithms.evaluation.train_test_split(n, test_size = 0.25, split='edge')
print(test)
print(train)

We can alternatively set the size of the training set:

In [None]:
test, train = pp.algorithms.evaluation.train_test_split(n, train_size = 0.25, split='edge')
print(test)
print(train)

Apart from static networks, we can also create cross-validation sets for temporal networks. For this, we first load a temporal network from the KONECT database:

In [None]:
tn = pp.io.konect.read_konect_name('sociopatterns-hypertext')
print(tn)
tn.plot()

We can call the same function on a temporal network instance. By default, the split will be made based on the observed interactions, i.e. in the following example the first 75 % of all time-stamped interactions will be included in the training network, while the last 25 % will be included in the test network. 

In [None]:
test, train = pp.algorithms.evaluation.train_test_split(tn, test_size=0.25)
print(train)
print(test)

In [None]:
train.plot()

In [None]:
test.plot()

We can also split based on the observed time, i.e. here we include all interactions ocurring within in the first 75 % of the observed time period in the training network, while the remaining interactions are included in the test network.

In [None]:
test, train = pp.algorithms.evaluation.train_test_split(tn, test_size=0.25, split='time')
print(train)
print(test)