# 01. Create loop census
## Project: Bicycle node network loop analysis

This notebook creates a loop census from the input data set and calculates/plots basic descriptive statistics.

Contact: Michael Szell (michael.szell@gmail.com)

Created: 2024-01-24  
Last modified: 2024-10-07

## To do

- [ ] Double-check loop/link lengths. For example 3-loop east of Faxe
- [ ] Double-check edge_ids during simplifications
- [ ] Add node distances to closest kommune boundary
- [ ] Compress results (Jutland+Fyn = 50GB), by storing data smarter or algorithm, e.g. https://stackoverflow.com/questions/57983431/whats-the-most-space-efficient-way-to-compress-serialized-python-data
- [X] Create a preprocessing step for poi snapping
- [X] fix: minimum cycle basis is not necessarily face cycle basis (https://en.wikipedia.org/wiki/Cycle_basis#In_planar_graphs)
- [X] Create testing possibility with random poi data, without poi snapping
- [X] Make all constants allcaps
- [x] Snap POIs to the original link geometries, within a threshold
- [x] Incorporate gradients
- [x] Add loop permutations for node-based analysis
- [x] Drop non-main nodes
- [x] Drop loops (they are really dangling links)
- [x] Find all simple loops (bounded?-max length?) with networkX

## Parameters

In [None]:
%run -i setup_parameters.py
np.random.seed(42)
debug = True  # Set to True for extra plots and verbosity

## Functions

In [None]:
%run -i functions.py

## Load data

In [None]:
with lzma.open(PATH["data_out"] + "network_preprocessed.xz", "rb") as f:
    G = pickle.load(f)
G.summary()

In [None]:
nodes = gpd.read_file(PATH["data_in_network"] + "nodes.gpkg")
nodes.head()

In [None]:
nodes_id = list(nodes.nodeID)
nodes_x = list(nodes.geometry.x)
nodes_y = list(nodes.geometry.y)
nodes_coords = list(zip(NormalizeData(nodes_x), NormalizeData(nodes_y)))

## Loop generation

### Get face loops

The minimum cycle basis is generally not the cycle basis of face loops, see: https://en.wikipedia.org/wiki/Cycle_basis#In_planar_graphs  
Therefore, we can't use https://python.igraph.org/en/latest/api/igraph.GraphBase.html#minimum_cycle_basis here. Instead, we solve the problem geometrically via shapely.

#### Polygonize

In [None]:
edgegeoms = G.es["geometry"]
facepolygons, _, _, _ = shapely.polygonize_full(edgegeoms)
if debug:
    p = gpd.GeoSeries(facepolygons)
    p.plot()
    plt.axis("off")

#### Intersect polygons with graph to get face loops

In [None]:
faceloops = {}
for cid, facepoly in tqdm(
    enumerate(facepolygons.geoms), desc="Face loops", total=len(facepolygons.geoms)
):
    facenodeids = list(np.where(list(nodes.intersects(facepoly)))[0])
    facenodeidpairs = list(combinations(facenodeids, 2))
    edgeids = set()  # tuple of edge ids
    l = 0  # total length
    for p in (
        facenodeidpairs
    ):  # We only have node ids but no edge info. Need to try all node pairs.
        try:
            eid = G.get_eid(G.vs.find(name=p[0]), G.vs.find(name=p[1]))
            edgeinfo = G.es[eid]
            edgeids.add(eid)
            l += edgeinfo["weight"]
        except:
            pass
    faceloops[cid] = {
        "edges": tuple(edgeids),
        "length": l,
        "numnodes": len(edgeids),
    }

In [None]:
if debug:  # Show longest face loop
    res = {key: val["length"] for key, val in faceloops.items()}
    k = max(res, key=res.get)

    edge_colors = []
    for e in G.es:
        if e.index in faceloops[k]["edges"]:
            edge_colors.append("red")
        else:
            edge_colors.append("grey")

    fig = plotCheck(
        G,
        nodes_id,
        nodes_coords,
        vertex_size=get_vertexsize(G.vcount()),
        edge_color=edge_colors,
    )
    plt.text(
        0,
        0.04,
        "Longest face loop highlighted: "
        + str(int((MPERUNIT / 1000) * faceloops[k]["length"]))
        + "km",
    )
    plt.tight_layout()

Getting all simple loops has not yet been implemented in igraph, see:  
* https://github.com/igraph/igraph/issues/379  
* https://github.com/igraph/igraph/issues/1398  
Some potential progress here, but only for C, not Python:
* https://github.com/igraph/igraph/pull/2181

But they can be XORed through the loop base.  

It has been implemented in networkX though: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cycles.simple_cycles.html#networkx.algorithms.cycles.simple_cycles

Therefore, we do not use igraph's loop basis, but go ahead with networkX.

### Get all loops via nx

In [None]:
Gnx = G.to_networkx()

In [None]:
# Get all unique loops, meaning a loop ABCA is counted only once and not as ABCA, BCAB, and CABC
allloops_unique = {}
nodes_done = set()
numloops_unique = 0
allloops_generator = nx.simple_cycles(Gnx, length_bound=LOOP_NUMNODE_BOUND)
for c in tqdm(allloops_generator):
    sourcenode = c[0]
    c_length = getLoopLength(c)
    c_max_slope = getLoopMaxSlope(c)
    c_water = getLoopWaterProfile(c)
    c_poi_diversity = getLoopPOIDiversity(c)
    numloops_unique += 1
    if sourcenode in nodes_done:
        allloops_unique[sourcenode]["loops"].append(c)
        allloops_unique[sourcenode]["lengths"].append(c_length)
        allloops_unique[sourcenode]["numnodes"].append(len(c))
        allloops_unique[sourcenode]["max_slopes"].append(c_max_slope)
        allloops_unique[sourcenode]["water_profile"].append(c_water)
        allloops_unique[sourcenode]["poi_diversity"].append(c_poi_diversity)
    else:
        allloops_unique[sourcenode] = {
            "loops": [c],
            "lengths": [c_length],
            "numnodes": [len(c)],
            "max_slopes": [c_max_slope],
            "water_profile": [c_water],
            "poi_diversity": [c_poi_diversity],
        }
        nodes_done.add(sourcenode)
print(
    "Found "
    + str(numloops_unique)
    + " unique loops for length bound "
    + str(LOOP_NUMNODE_BOUND)
)

In [None]:
# Get all loops, meaning a loop ABCA is counted also as ABCA, BCAB, and CABC
allloops = {}
nodes_done = set()
numloops = 0
allloops_generator = nx.simple_cycles(Gnx, length_bound=LOOP_NUMNODE_BOUND)
for c in tqdm(allloops_generator):
    sourcenode = c[0]
    c_length = getLoopLength(c)
    c_max_slope = getLoopMaxSlope(c)
    c_water = getLoopWaterProfile(c)
    c_poi_diversity = getLoopPOIDiversity(c)
    for sourcenode in c:
        numloops += 1
        if sourcenode in nodes_done:
            allloops[sourcenode]["loops"].append(c)
            allloops[sourcenode]["lengths"].append(c_length)
            allloops[sourcenode]["numnodes"].append(len(c))
            allloops[sourcenode]["max_slopes"].append(c_max_slope)
            allloops[sourcenode]["water_profile"].append(c_water)
            allloops[sourcenode]["poi_diversity"].append(c_poi_diversity)
        else:
            allloops[sourcenode] = {
                "loops": [c],
                "lengths": [c_length],
                "numnodes": [len(c)],
                "max_slopes": [c_max_slope],
                "water_profile": [c_water],
                "poi_diversity": [c_poi_diversity],
            }
            nodes_done.add(sourcenode)
print("Found " + str(numloops) + " loops for length bound " + str(LOOP_NUMNODE_BOUND))

In [None]:
alllooplengths = np.zeros(numloops)
allloopnumnodes = np.zeros(numloops, dtype=int)
allloopmaxslopes = np.zeros(numloops)
i = 0
for j in tqdm(allloops):
    l = len(allloops[j]["lengths"])
    alllooplengths[i : i + l] = allloops[j]["lengths"]
    allloopnumnodes[i : i + l] = allloops[j]["numnodes"]
    allloopmaxslopes[i : i + l] = allloops[j]["max_slopes"]
    i += l

## Save loop census

In [None]:
with open(
    PATH["data_out"] + "loopcensus_" + str(LOOP_NUMNODE_BOUND) + ".pkl", "wb"
) as f:
    pickle.dump(allloops, f)
    pickle.dump(alllooplengths, f)
    pickle.dump(allloopnumnodes, f)
    pickle.dump(allloopmaxslopes, f)
    pickle.dump(G, f)
    pickle.dump(LOOP_NUMNODE_BOUND, f)
    pickle.dump(nodes_id, f)
    pickle.dump(nodes_coords, f)
    pickle.dump(numloops, f)
    pickle.dump(faceloops, f)

In [None]:
# bit too slow
# with lzma.open(PATH["data_out"] + "loopcensus_" + str(LOOP_NUMNODE_BOUND) + ".xz", "wb") as f:
#     pickle.dump(allloops, f)
#     pickle.dump(alllooplengths, f)
#     pickle.dump(allloopnumnodes, f)
#     pickle.dump(allloopmaxslopes, f)
#     pickle.dump(G, f)
#     pickle.dump(LOOP_NUMNODE_BOUND, f)
#     pickle.dump(nodes_id, f)
#     pickle.dump(nodes_coords, f)
#     pickle.dump(numloops, f)
#     pickle.dump(faceloops, f)