# 01. Create loop census
## Project: Bicycle node network loop analysis

This notebook creates a loop census from the input data set and calculates/plots basic descriptive statistics.

Contact: Michael Szell (michael.szell@gmail.com)

Created: 2024-01-24  
Last modified: 2024-10-02

## To do

- [ ] Double-check loop/link lengths. For example 3-loop east of Faxe
- [ ] Double-check edge_ids during simplifications
- [ ] Add node distances to closest kommune boundary
- [ ] Compress results (Jutland+Fyn = 50GB), by storing data smarter or algorithm, e.g. https://stackoverflow.com/questions/57983431/whats-the-most-space-efficient-way-to-compress-serialized-python-data
- [X] Create a preprocessing step for poi snapping
- [X] fix: minimum cycle basis is not necessarily face cycle basis (https://en.wikipedia.org/wiki/Cycle_basis#In_planar_graphs)
- [X] Create testing possibility with random poi data, without poi snapping
- [X] Make all constants allcaps
- [x] Snap POIs to the original link geometries, within a threshold
- [x] Incorporate gradients
- [x] Add loop permutations for node-based analysis
- [x] Drop non-main nodes
- [x] Drop loops (they are really dangling links)
- [x] Find all simple loops (bounded?-max length?) with networkX

## Imports

In [None]:
import geopandas as gpd
import shapely
import igraph as ig
import matplotlib.pyplot as plt
import numpy as np
import networkx as nx
from functools import reduce
import pickle
from itertools import combinations
import lzma

## Parameters

In [None]:
%run -i setup_parameters.py
np.random.seed(42)
debug = True  # Set to True for extra plots and verbosity

## Functions

In [None]:
%run -i functions.py

## Load data

In [None]:
with lzma.open(PATH["data_out"] + "network_preprocessed.xz", "rb") as f:
    G = pickle.load(f)
G.summary()

In [None]:
nodes = gpd.read_file(PATH["data_in_network"] + "nodes.gpkg")
nodes.head()

In [None]:
nodes_id = list(nodes.nodeID)
nodes_x = list(nodes.geometry.x)
nodes_y = list(nodes.geometry.y)
nodes_coords = list(zip(NormalizeData(nodes_x), NormalizeData(nodes_y)))

## Loop generation

### Get face loops

The minimum cycle basis is generally not the cycle basis of face loops, see: https://en.wikipedia.org/wiki/Cycle_basis#In_planar_graphs  
Therefore, we can't use https://python.igraph.org/en/latest/api/igraph.GraphBase.html#minimum_cycle_basis here. Instead, we solve the problem geometrically via shapely.

#### Polygonize

In [None]:
edgegeoms = G.es["geometry"]
facepolygons, _, _, _ = shapely.polygonize_full(edgegeoms)
if debug:
    p = gpd.GeoSeries(facepolygons)
    p.plot()
    plt.axis("off")

#### Intersect polygons with graph to get face loops

In [None]:
faceloops = {}
for cid, facepoly in tqdm(
    enumerate(facepolygons.geoms), desc="Face loops", total=len(facepolygons.geoms)
):
    facenodeids = list(np.where(list(nodes.intersects(facepoly)))[0])
    facenodeidpairs = list(combinations(facenodeids, 2))
    edgeids = set()  # tuple of edge ids
    l = 0  # total length
    for p in (
        facenodeidpairs
    ):  # We only have node ids but no edge info. Need to try all node pairs.
        try:
            eid = G.get_eid(G.vs.find(name=p[0]), G.vs.find(name=p[1]))
            edgeinfo = G.es[eid]
            edgeids.add(eid)
            l += edgeinfo["weight"]
        except:
            pass
    faceloops[cid] = {
        "edges": tuple(edgeids),
        "length": l,
        "numnodes": len(edgeids),
    }

In [None]:
if debug:  # Show longest face loop
    res = {key: val["length"] for key, val in faceloops.items()}
    k = max(res, key=res.get)

    edge_colors = []
    for e in G.es:
        if e.index in faceloops[k]["edges"]:
            edge_colors.append("red")
        else:
            edge_colors.append("grey")

    fig = plotCheck(
        G,
        nodes_id,
        nodes_coords,
        vertex_size=get_vertexsize(G.vcount()),
        edge_color=edge_colors,
    )
    plt.text(
        0,
        0.04,
        "Longest face loop highlighted: "
        + str(int((MPERUNIT / 1000) * faceloops[k]["length"]))
        + "km",
    )
    plt.tight_layout()

Getting all simple loops has not yet been implemented in igraph, see:  
* https://github.com/igraph/igraph/issues/379  
* https://github.com/igraph/igraph/issues/1398  
Some potential progress here, but only for C, not Python:
* https://github.com/igraph/igraph/pull/2181

But they can be XORed through the loop base.  

It has been implemented in networkX though: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cycles.simple_cycles.html#networkx.algorithms.cycles.simple_cycles

Therefore, we do not use igraph's loop basis, but go ahead with networkX.

### Get all loops via nx

In [None]:
Gnx = G.to_networkx()

In [None]:
# Get all unique loops, meaning a loop ABCA is counted only once and not as ABCA, BCAB, and CABC
allloops_unique = {}
nodes_done = set()
numloops_unique = 0
allloops_generator = nx.simple_cycles(Gnx, length_bound=LOOP_NUMNODE_BOUND)
for c in tqdm(allloops_generator):
    sourcenode = c[0]
    c_length = getLoopLength(c)
    c_max_slope = getLoopMaxSlope(c)
    c_water = getLoopWaterProfile(c)
    c_poi_diversity = getLoopPOIDiversity(c)
    numloops_unique += 1
    if sourcenode in nodes_done:
        allloops_unique[sourcenode]["loops"].append(c)
        allloops_unique[sourcenode]["lengths"].append(c_length)
        allloops_unique[sourcenode]["numnodes"].append(len(c))
        allloops_unique[sourcenode]["max_slopes"].append(c_max_slope)
        allloops_unique[sourcenode]["water_profile"].append(c_water)
        allloops_unique[sourcenode]["poi_diversity"].append(c_poi_diversity)
    else:
        allloops_unique[sourcenode] = {
            "loops": [c],
            "lengths": [c_length],
            "numnodes": [len(c)],
            "max_slopes": [c_max_slope],
            "water_profile": [c_water],
            "poi_diversity": [c_poi_diversity],
        }
        nodes_done.add(sourcenode)
print(
    "Found "
    + str(numloops_unique)
    + " unique loops for length bound "
    + str(LOOP_NUMNODE_BOUND)
)

In [None]:
# Get all loops, meaning a loop ABCA is counted also as ABCA, BCAB, and CABC
allloops = {}
nodes_done = set()
numloops = 0
allloops_generator = nx.simple_cycles(Gnx, length_bound=LOOP_NUMNODE_BOUND)
for c in tqdm(allloops_generator):
    sourcenode = c[0]
    c_length = getLoopLength(c)
    c_max_slope = getLoopMaxSlope(c)
    c_water = getLoopWaterProfile(c)
    c_poi_diversity = getLoopPOIDiversity(c)
    for sourcenode in c:
        numloops += 1
        if sourcenode in nodes_done:
            allloops[sourcenode]["loops"].append(c)
            allloops[sourcenode]["lengths"].append(c_length)
            allloops[sourcenode]["numnodes"].append(len(c))
            allloops[sourcenode]["max_slopes"].append(c_max_slope)
            allloops[sourcenode]["water_profile"].append(c_water)
            allloops[sourcenode]["poi_diversity"].append(c_poi_diversity)
        else:
            allloops[sourcenode] = {
                "loops": [c],
                "lengths": [c_length],
                "numnodes": [len(c)],
                "max_slopes": [c_max_slope],
                "water_profile": [c_water],
                "poi_diversity": [c_poi_diversity],
            }
            nodes_done.add(sourcenode)
print("Found " + str(numloops) + " loops for length bound " + str(LOOP_NUMNODE_BOUND))

In [None]:
alllooplengths = np.zeros(numloops)
allloopnumnodes = np.zeros(numloops, dtype=int)
allloopmaxslopes = np.zeros(numloops)
i = 0
for j in tqdm(allloops):
    l = len(allloops[j]["lengths"])
    alllooplengths[i : i + l] = allloops[j]["lengths"]
    allloopnumnodes[i : i + l] = allloops[j]["numnodes"]
    allloopmaxslopes[i : i + l] = allloops[j]["max_slopes"]
    i += l

## Descriptive network statistics

### Link lengths and max slopes

In [None]:
linklengths = [e["weight"] * MPERUNIT for e in G.es]
linkmaxslopes = [e["max_slope"] for e in G.es]
fig = plt.figure(figsize=(8, 3))
axes1 = fig.add_axes([0.08, 0.16, 0.4, 0.75])
axes2 = fig.add_axes([0.58, 0.16, 0.4, 0.75])

histxy = axes1.hist(
    linklengths, bins=[i * 500 / MPERUNIT for i in list(range(30))], density=False
)
axes1.plot([LINK_LIMIT[0], LINK_LIMIT[0]], [0, max(histxy[0])], ":k")
axes1.plot([LINK_LIMIT[1], LINK_LIMIT[1]], [0, max(histxy[0])], ":k")
axes1.plot([LINK_LIMIT[2], LINK_LIMIT[2]], [0, max(histxy[0])], ":r")
indcond = [
    i for i, x in enumerate(linklengths) if (x >= LINK_LIMIT[0] and x <= LINK_LIMIT[1])
]
massinallowedrange = round(len(indcond) / len(linklengths) * 100)  # Should be high
axes1.text(
    (LINK_LIMIT[0] + LINK_LIMIT[1]) / 2,
    max(histxy[0]),
    str(massinallowedrange) + "%",
    horizontalalignment="center",
    verticalalignment="top",
)
axes1.text(
    LINK_LIMIT[0] * 0.9,
    max(histxy[0]),
    str(
        round(
            len([i for i, x in enumerate(linklengths) if (x <= LINK_LIMIT[0])])
            / len(linklengths)
            * 100
        )
    )
    + "%",
    horizontalalignment="right",
    verticalalignment="top",
)
axes1.text(
    (LINK_LIMIT[1] + LINK_LIMIT[2]) / 2,
    max(histxy[0]),
    str(
        round(
            len(
                [
                    i
                    for i, x in enumerate(linklengths)
                    if (x >= LINK_LIMIT[1] and x <= LINK_LIMIT[2])
                ]
            )
            / len(linklengths)
            * 100
        )
    )
    + "%",
    horizontalalignment="center",
    verticalalignment="top",
)
axes1.text(
    LINK_LIMIT[2] * 1.01,
    max(histxy[0]),
    str(
        round(
            len([i for i, x in enumerate(linklengths) if (x > LINK_LIMIT[2])])
            / len(linklengths)
            * 100
        )
    )
    + "%",
    horizontalalignment="left",
    verticalalignment="top",
    color="red",
)

axes1.set_xlabel("Length [m]")
axes1.set_ylabel("Frequency")
axes1.set_title("Link lengths")
axes1.set_xlim([0, 15000 / MPERUNIT])

histxy = axes2.hist(linkmaxslopes, bins=[i / 4 for i in list(range(32))], density=False)
axes2.plot([MAXSLOPE_LIMIT, MAXSLOPE_LIMIT], [0, max(histxy[0])], ":r")
axes2.text(
    MAXSLOPE_LIMIT * 0.95,
    max(histxy[0]),
    str(
        round(
            len([i for i, x in enumerate(linkmaxslopes) if (x < MAXSLOPE_LIMIT)])
            / len(linkmaxslopes)
            * 100
        )
    )
    + "%",
    horizontalalignment="right",
    verticalalignment="top",
)
axes2.text(
    MAXSLOPE_LIMIT * 1.05,
    max(histxy[0]),
    str(
        round(
            len([i for i, x in enumerate(linkmaxslopes) if (x >= MAXSLOPE_LIMIT)])
            / len(linkmaxslopes)
            * 100
        )
    )
    + "%",
    horizontalalignment="left",
    verticalalignment="top",
    color="red",
)
axes2.set_xlabel("Max slope [%]")
axes2.set_ylabel("")
axes2.set_title("Link max slopes")
axes2.set_xlim([0, 8])

fig.savefig(PATH["plot"] + "linkstats")

### Loop lengths

In [None]:
fig = plt.figure(figsize=(8, 3))
axes1 = fig.add_axes([0.1, 0.1, 0.35, 0.8])
axes2 = fig.add_axes([0.55, 0.1, 0.35, 0.8])

axes1.hist(alllooplengths, density=True)
if MPERUNIT == 1000:
    axes1.set_xlabel("Length [km]")
elif MPERUNIT == 1:
    axes1.set_xlabel("Length [m]")
else:
    axes1.set_xlabel("Length")
axes1.set_ylabel("Probability")
axes1.set_title("Loop lengths")


axes2.hist(allloopnumnodes, density=True, bins=list(range(LOOP_NUMNODE_BOUND + 1)))
axes2.set_xlabel("Nodes")
axes2.set_title("Nodes per loop")
axes2.set_xlim([0, LOOP_NUMNODE_BOUND + 0.5])

plt.text(LOOP_NUMNODE_BOUND / 20, 0.01, "Bound: " + str(LOOP_NUMNODE_BOUND))
plt.text(LOOP_NUMNODE_BOUND / 20, 0.04, "Loops: " + str(numloops));

In [None]:
fig = plt.figure(figsize=(8, 3))
axes1 = fig.add_axes([0.08, 0.16, 0.4, 0.75])
axes2 = fig.add_axes([0.58, 0.16, 0.4, 0.75])

facelooplengths = [c["length"] * MPERUNIT for c in faceloops.values()]

histxy = axes1.hist(
    facelooplengths, bins=[i * 1000 / MPERUNIT for i in list(range(50))], density=False
)
if MPERUNIT == 1000:
    axes1.set_xlabel("Length [km]")
elif MPERUNIT == 1:
    axes1.set_xlabel("Length [m]")
else:
    axes1.set_xlabel("Length")
axes1.set_ylabel("Frequency")
axes1.set_title("Face loop lengths")
axes1.plot([FACELOOP_LIMIT[0], FACELOOP_LIMIT[0]], [0, max(histxy[0])], ":k")
axes1.plot([FACELOOP_LIMIT[1], FACELOOP_LIMIT[1]], [0, max(histxy[0])], ":r")
axes1.text(
    (FACELOOP_LIMIT[0] + FACELOOP_LIMIT[1]) / 2,
    max(histxy[0]),
    str(
        round(
            len(
                [
                    i
                    for i, x in enumerate(facelooplengths)
                    if (x >= FACELOOP_LIMIT[0] and x <= FACELOOP_LIMIT[1])
                ]
            )
            / len(facelooplengths)
            * 100
        )
    )
    + "%",
    horizontalalignment="center",
    verticalalignment="top",
)
axes1.text(
    FACELOOP_LIMIT[0] * 0.95,
    max(histxy[0]),
    str(
        round(
            len([i for i, x in enumerate(facelooplengths) if (x < FACELOOP_LIMIT[0])])
            / len(facelooplengths)
            * 100
        )
    )
    + "%",
    horizontalalignment="right",
    verticalalignment="top",
)
axes1.text(
    FACELOOP_LIMIT[1] * 1.01,
    max(histxy[0]),
    str(
        round(
            len([i for i, x in enumerate(facelooplengths) if (x > FACELOOP_LIMIT[1])])
            / len(facelooplengths)
            * 100
        )
    )
    + "%",
    horizontalalignment="left",
    verticalalignment="top",
    color="red",
)
axes1.set_xlim([0, 50000 / MPERUNIT])


axes2.hist(
    [c["numnodes"] for c in faceloops.values()], bins=list(range(30)), density=False
)
axes2.set_xlabel("Nodes")
axes2.set_title("Face loop nodes")
axes2.set_xlim([0, 30])

fig.savefig(PATH["plot"] + "faceloopstats");

In [None]:
# histxy[0][min(indcond) : max(indcond) + 1]

In [None]:
if debug:  # Show face loops that conform to the length thresholds
    okedges = set()
    for c in faceloops.values():
        if (
            c["length"] * MPERUNIT >= FACELOOP_LIMIT[0]
            and c["length"] * MPERUNIT <= FACELOOP_LIMIT[1]
        ):
            okedges = okedges.union(set(c["edges"]))

    edge_colors = []
    for e in G.es:
        if e.index in okedges:
            edge_colors.append("green")
        else:
            edge_colors.append("grey")

    fig = plotCheck(
        G,
        nodes_id,
        nodes_coords,
        vertex_size=get_vertexsize(G.vcount()),
        edge_color=edge_colors,
    )
    plt.text(0, 0.04, "Conforming face loops highlighted")
    plt.tight_layout()

In [None]:
if debug:  # Show face loops that do not conform to the length thresholds
    toosmalledges = set()
    toolargeedges = set()
    for c in faceloops.values():
        if c["length"] * MPERUNIT > FACELOOP_LIMIT[1]:
            toolargeedges = toolargeedges.union(set(c["edges"]))
        elif c["length"] * MPERUNIT < FACELOOP_LIMIT[0]:
            toosmalledges = toosmalledges.union(set(c["edges"]))

    edge_colors = []
    for e in G.es:
        if e.index in toolargeedges:
            edge_colors.append("red")
        elif e.index in toosmalledges:
            edge_colors.append("orange")
        else:
            edge_colors.append("grey")

    fig = plotCheck(
        G,
        nodes_id,
        nodes_coords,
        vertex_size=get_vertexsize(G.vcount()),
        edge_color=edge_colors,
    )
    plt.text(0, 0.04, "Non-conforming face loops highlighted")
    plt.tight_layout()

## Save loop census

In [None]:
with open(
    PATH["data_out"] + "loopcensus_" + str(LOOP_NUMNODE_BOUND) + ".pkl", "wb"
) as f:
    pickle.dump(allloops, f)
    pickle.dump(alllooplengths, f)
    pickle.dump(allloopnumnodes, f)
    pickle.dump(allloopmaxslopes, f)
    pickle.dump(G, f)
    pickle.dump(LOOP_NUMNODE_BOUND, f)
    pickle.dump(nodes_id, f)
    pickle.dump(nodes_coords, f)
    pickle.dump(numloops, f)
    pickle.dump(faceloops, f)

In [None]:
# bit too slow
# with lzma.open(PATH["data_out"] + "loopcensus_" + str(LOOP_NUMNODE_BOUND) + ".xz", "wb") as f:
#     pickle.dump(allloops, f)
#     pickle.dump(alllooplengths, f)
#     pickle.dump(allloopnumnodes, f)
#     pickle.dump(allloopmaxslopes, f)
#     pickle.dump(G, f)
#     pickle.dump(LOOP_NUMNODE_BOUND, f)
#     pickle.dump(nodes_id, f)
#     pickle.dump(nodes_coords, f)
#     pickle.dump(numloops, f)
#     pickle.dump(faceloops, f)