# 01. POI snapping, Denmark-wide
## Project: Bicycle node network loop analysis

This notebook snaps the POIs to the Denmark-wide network data which is assembled from the single study area networks created in 00.  
Please select `denmark` as the `study_area` in the `config.yml`.

Contact: Michael Szell (michael.szell@gmail.com)

Created: 2025-08-07  
Last modified: 2025-08-07

## To do

- [ ] Rewrite snapping with momepy/geopandas

## Parameters

In [None]:
%run -i setup_parameters.py
debug = True  # Set to True for extra plots and verbosity

## Functions

In [None]:
%run -i functions.py

## Processing data

### Load data

In [None]:
Gnx = nx.empty_graph()
for subarea in STUDY_AREA_COMBINED[STUDY_AREA]:
    with lzma.open(PATH[subarea]["data_out"] + "network_preprocessed0.xz", "rb") as f:
        G_new = pickle.load(f)
        Gnx = nx.disjoint_union(Gnx, G_new.to_networkx())

Turn into gdf:

In [None]:
# https://docs.momepy.org/en/stable/user_guide/graph/convert.html
nodes, edges = momepy.nx_to_gdf(Gnx, points=True, lines=True)

In [None]:
edges.head()

### Snap POIs

Snap POIs to network. POIs come in 3 categories:
- Facilities (water station, bicycle repair station, supermarket, etc.)
- Services (camping ground, hotel, gas station, etc.)
- Attractions (church, museum, beach, etc.)  

If any of these is within reach of a link, the link is assumed to provide water. Further, we define a link's POI diversity $Y \in \{0,1,2,3\}$ as the unique number of POI categories within reach. For simplicity, reach is defined with the same constant `SNAP_THRESHOLD` for all POI categories.

We have a small number of facilities and links, so computationally it should be fine to loop through all pairwise.

In [None]:
poi_files = {
    "facility": [
        "facility.gpkg",
    ],
    "service": [
        "service.gpkg",
    ],
    "attraction": ["poi.gpkg"],
}

In [None]:
# Initialize
for e in G.es:
    e["has_water"] = False
    e["has_facility"] = False
    e["has_service"] = False
    e["has_attraction"] = False
    e["poi_diversity"] = 0

if not POIS_AVAILABLE:  # Create random data for testing
    for e in G.es:
        e["has_facility"] = (
            True if np.random.rand() < 0.11 else False
        )  # Reasonably looking thresholds
        e["has_service"] = True if np.random.rand() < 0.17 else False
        e["has_attraction"] = True if np.random.rand() < 0.08 else False
        poi_diversity = 0
        if e["has_facility"]:
            e["has_water"] = True
            poi_diversity += 1
        if e["has_service"]:
            e["has_water"] = True
            poi_diversity += 1
        if e["has_attraction"]:
            e["has_water"] = True
            poi_diversity += 1
        e["poi_diversity"] = poi_diversity

else:  # Use available poi files
    e_haspoi = {"facility": set(), "service": set(), "attraction": set()}
    for cat in [*poi_files]:
        for f in poi_files[cat]:
            print("Adding POIs from file: " + f)
            pois = gpd.read_file(PATH["data_in_pois"] + f)
            for _, poirow in tqdm(pois.iterrows(), total=pois.shape[0]):
                d = 999999999999
                eid = False
                if poirow["type"]:  # Could add conditions on type later, like Vandpost
                    poi_this = poirow["geometry"]
                    for eindex, erow in edges_orig.iterrows():
                        d_this = poi_this.distance(erow["geometry"])
                        if (
                            d_this < d
                            and erow["edge_id"] in edges_orig["edge_id"].values
                        ):
                            d = d_this
                            eid = erow["edge_id"]
                if eid and d <= SNAP_THRESHOLD:
                    e_haspoi[cat].add(eid)

    e_haswater = set()
    e_haswater = e_haspoi["facility"] | e_haspoi["service"] | e_haspoi["attraction"]

    for e in G.es:
        poi_diversity = 0
        if e["edge_id"] in e_haswater:
            e["has_water"] = True
        if e["edge_id"] in e_haspoi["facility"]:
            e["has_facility"] = True
            poi_diversity += 1
        if e["edge_id"] in e_haspoi["service"]:
            e["has_service"] = True
            poi_diversity += 1
        if e["edge_id"] in e_haspoi["attraction"]:
            e["has_attraction"] = True
            poi_diversity += 1
        e["poi_diversity"] = poi_diversity

#### Plot POI diversity

In [None]:
if debug:
    edge_widths = []
    for e in G.es:
        edge_widths.append((e["poi_diversity"] * 2) + 0.25)

    fig = plot_check(
        G,
        nodes_id,
        nodes_coords,
        vertex_size=get_vertex_size_constant(G.vcount()),
        edge_width=edge_widths,
    )
    plt.text(0, 0.04, "POI diversity")
    plt.tight_layout()
    fig.savefig(PATH["plot"] + "poidiversity")

#### Plot water links

In [None]:
if debug:
    edge_colors = []
    for e in G.es:
        if e["has_water"]:
            edge_colors.append("blue")
        else:
            edge_colors.append("grey")

    fig = plot_check(
        G,
        nodes_id,
        nodes_coords,
        vertex_size=get_vertex_size_constant(G.vcount()),
        edge_color=edge_colors,
    )
    plt.text(0, 0.04, "Water links highlighted")
    plt.tight_layout()
    fig.savefig(PATH["plot"] + "waterlinks")

#### Plot max slopes

In [None]:
if debug:
    edge_colors = []
    for e in G.es:
        if e["max_slope"] < 4:
            edge_colors.append("green")
        elif e["max_slope"] < 6:
            edge_colors.append("orange")
        else:
            edge_colors.append("red")

    fig = plot_check(
        G,
        nodes_id,
        nodes_coords,
        vertex_size=get_vertex_size_constant(G.vcount()),
        edge_color=edge_colors,
    )
    plt.text(0, 0.04, "Max slopes highlighted")
    plt.tight_layout()
    fig.savefig(PATH["plot"] + "maxslopes")

## Save preprocessed network data

In [None]:
G.summary()

In [None]:
with lzma.open(PATH["data_out"] + "network_preprocessed.xz", "wb") as f:
    pickle.dump(G, f)