# NETWORKS TRANSFORMATION

This notebook contains all the functions needed to perform various trasformations to the networks of the tracing usecase, and prepare the data for harmonization.

Datasets Needed:

- Hydro-network dataset
- Sewer network dataset / Discharge Points dataset

## Expected Outputs

- Connection nodes
- Discharge points
- Start and end nodes for water
- Water dataset with start and end ID
- Split water dataset with new start and end ids
- Fully connected water dataset

In [1]:
import os
import sys
path = os.path.dirname(os.path.abspath(''))
os.chdir(path)
print(path)
sys.path.insert(0, path)

c:\Workdir\Develop\repository\go-peg


In [2]:
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point, LineString, MultiPoint, MultiLineString
from shapely import wkt
from shapely.ops import nearest_points
import shapely.wkt
import numpy as np

import random


import warnings
from shapely.errors import ShapelyDeprecationWarning
warnings.filterwarnings("ignore", category=ShapelyDeprecationWarning)
pd.options.mode.chained_assignment = None  # default='warn'


from src.config import config




# 1. Prepare Water Network
A water network is received as an edge only network wih no nodes. Here, we generate hydro-nodes from the begining and end points of a linestring geometry, and assign them unique ids that can then be added to the water segments as begin and end points.

The following steps are performed:

## 1.1. CRS
Assign the coordinate reference system as a global variable to use through out the application.

In [3]:
PROJ_CRS = "EPSG:31370"

## 1.2. Load Water Data
Load the data into a dataframe. Various formats can be loaded onto a dataframe in Geopandas. Here, both shapefiles and GML data are used.

Set the crs to the project crs. 

In [4]:
def load_data(path, PROJ_CRS):
    """
    Loads the data from the given path,
    and prints the shape and crs of the data.
    """
    data = gpd.read_file(path)
    print(data.shape)
    #print("Original crs:", data.crs)
    data = data.to_crs(PROJ_CRS)
    print("Project crs:", data.crs)
    data = data.drop_duplicates(subset=["geometry"]).reset_index(drop=True)
    return data


path = config.data_src / "flanders_hydro_network/Wlas.shp"

water_data = load_data(path, PROJ_CRS)

(63767, 19)
Project crs: EPSG:31370


In [5]:
water_data.columns

Index(['OIDN', 'UIDN', 'VHAS', 'VHAG', 'NAAM', 'REGCODE', 'REGCODE1', 'BEHEER',
       'CATC', 'LBLCATC', 'BEKNR', 'BEKNAAM', 'STRMGEB', 'GEO', 'LBLGEO',
       'VHAZONENR', 'WTRLICHC', 'LENGTE', 'geometry'],
      dtype='object')

## 1.3. Turn multiline water network into single line water network

The water segments geometries can sometimes be stored as a multilinestring. This means that the geometries are represented by nested lists and this can make programmatically manipulating them difficult. Therefore multiline water networks are converted into single line water networks by splitting the linestrings into individual linestrings. This is done by 'flattening' the nested list that makes up a multilinestring structure.

This ensures we can extract begin and end points of a water segment.

In [8]:
# Check for multiline strings in a dataset
def check_multiline(df):
    """This function checks for multiline strings
    from the geometry column in a given dataset"""
    lst = df["geometry"].to_list()
    multiline_count = 0
    for item in lst:
        if isinstance(item, MultiLineString):
            multiline_count += 1
    print("MultiLinesStrings:", multiline_count)


# filter out multilinestring dataset
def multiline_to_linestring(df):
    # filter out multilinestring dataset
    multiline_df = df[df["geometry"].apply(lambda x: isinstance(x, MultiLineString))]
    linestrings_df = df[df.geom_type == "LineString"]
    if len(linestrings_df) == len(df):
        print("No multiline strings found")
        return df

    else:
        print("Checking for multiline strings...")
        check_multiline(df)
        # turn multilinestrings into linestrings
        linestrings = []
        for idx, row in multiline_df.iterrows():
            inlines = row.geometry
            outcoords = [list(item.coords) for item in inlines]
            outline = shapely.geometry.LineString(
                [i for sublist in outcoords for i in sublist]
            )
            # outline_geom = shapely.wkt.dumps(outline)
            linestrings.append(outline)

        # add  linestrings to dataframe and drop original geom column
        multiline_df["exploded"] = linestrings
        multiline_df = (
            multiline_df.drop(["geometry"], axis=1)
            .rename(columns={"exploded": "geometry"})
            .reset_index(drop=True)
        )
        multiline_gdf = gpd.GeoDataFrame(
            multiline_df, geometry="geometry", crs=PROJ_CRS
        )

        gdf = linestrings_df.append(multiline_gdf).reset_index(drop=True)
        print("Checking for multiline strings after...")
        check_multiline(gdf)

    return gdf

In [9]:
water_data = multiline_to_linestring(water_data)

Checking for multiline strings...
MultiLinesStrings: 3


  gdf = linestrings_df.append(multiline_gdf).reset_index(drop=True)


Checking for multiline strings after...
MultiLinesStrings: 0


![Alt text](../documentation/output_images/1.%20water.PNG)

## 1.4. Generate begin and end nodes

Get begin and end point geometries by extracting the first and the last point geometries of a linestring.

In [10]:
def add_beginpoints(df):
    startnodes_gdf = df.copy()
    lst = startnodes_gdf["geometry"].to_list()
    beginpoints = []
    for item in lst:
        first = Point(item.coords[0])
        first_precise = shapely.wkt.dumps(first)
        beginpoints.append(first_precise)

    startnodes_gdf["start_point"] = [wkt.loads(g) for g in beginpoints]
    startnodes_gdf = startnodes_gdf.drop(["geometry"], axis=1).rename(
        columns={"start_point": "geometry"}
    )

    startnodes_gdf = gpd.GeoDataFrame(
        startnodes_gdf, geometry=startnodes_gdf["geometry"], crs=PROJ_CRS
    )  # .drop(columns=[col])
    return startnodes_gdf


def add_endpoints(df):
    endnodes_gdf = df.copy()
    lst = endnodes_gdf["geometry"].to_list()
    endpoints = []
    for item in lst:
        last = Point(item.coords[-1])
        last_precise = shapely.wkt.dumps(last)
        endpoints.append(last_precise)

    endnodes_gdf["end_point"] = [wkt.loads(g) for g in endpoints]
    endnodes_gdf = endnodes_gdf.drop(["geometry"], axis=1).rename(
        columns={"end_point": "geometry"}
    )

    endnodes_gdf = gpd.GeoDataFrame(
        endnodes_gdf, geometry=endnodes_gdf["geometry"], crs=PROJ_CRS
    )  # .drop(columns=[col])
    return endnodes_gdf

In [11]:
startnodes_gdf = add_beginpoints(water_data)
endnodes_gdf = add_endpoints(water_data)

#### Note
Assert statements are used to test if the results generated are the expected results

In [12]:
assert startnodes_gdf.shape == endnodes_gdf.shape

## 1.5. Document the nodes

After the nodes have been created, perform spatial join the startnodes and endnodes dataframes to create one nodes geometry.

These nodes have a sequentially generated id, with a chosen prefix to make it a unique node identifier.


In [32]:
def get_nodes(id_col, region):
    nodes_geom = pd.merge(
        startnodes_gdf[[id_col, "geometry"]],
        endnodes_gdf[[id_col, "geometry"]],
        on="geometry",
        how="outer",
    ).reset_index(drop=True)
    unique_id_df = (
        nodes_geom[["geometry"]].drop_duplicates().reset_index().drop(columns=["index"])
    )
    assert len(unique_id_df) == nodes_geom.geometry.nunique()

    unique_id_df["New_ID"] = range(1, len(unique_id_df) + 1)
    unique_id_df["node_id"] = (region + '_HN') + unique_id_df["New_ID"].astype(str)
    gdf = gpd.GeoDataFrame(
        unique_id_df, geometry=unique_id_df["geometry"], crs=PROJ_CRS
    ).drop(columns=["New_ID"])
    return gdf

In [33]:
water_nodes_df = get_nodes("VHAS", "VL")
assert len(water_nodes_df) == water_nodes_df.geometry.nunique()

In [34]:
water_nodes_df.sample(5)

Unnamed: 0,geometry,node_id
14917,POINT (69846.459 210189.732),VL_HN14918
9157,POINT (182189.521 163366.357),VL_HN9158
43449,POINT (202949.251 183295.203),VL_HN43450
6438,POINT (187227.271 182974.495),VL_HN6439
4758,POINT (168502.455 178367.606),VL_HN4759


In [35]:
# water_nodes_df.to_file(r"data_transform\\vl_water_nodes.shp")

## 1.6. Add the nodes to water segments, and create start and end id columns

Using the sjoin method, map the nodes onto the linestrings to identify water segment start nodes and end nodes. Label nodes as either start_id or end_id in the water dataframe.

In [36]:
def add_ids_to_edges():
    # Label nodes as either start_id or end_id
    startnodes_merged = (
        gpd.sjoin(startnodes_gdf, water_nodes_df, how="left")
        .rename(columns={"node_id": "start_ID"})
        .drop("index_right", axis=1))
    endnodes_merged = (
        gpd.sjoin(endnodes_gdf, water_nodes_df, how="left")
        .rename(columns={"node_id": "end_ID"})
        .drop("index_right", axis=1))

    nodes_geom = pd.merge(startnodes_merged, endnodes_merged, on="VHAS")

    nodes = nodes_geom[["VHAS", "start_ID", "end_ID"]]

    water_edges_nodes = pd.merge(
        water_data, nodes, left_on="VHAS", right_on="VHAS")  # .drop('id', axis=1)
    return water_edges_nodes


water_final = add_ids_to_edges()
assert water_final.VHAS.nunique() == water_final.geometry.nunique()

![Alt text](../documentation/output_images/2.water_nodes.PNG)

In [37]:
# water_final.to_file(r"data_transform\\vl_water_edges.shp")
#![2.water_nodes.PNG](attachment:2.water_nodes.PNG)

# 2. Prepare Sewer Network

Load the sewer network edges and nodes files.
If there is no sewer network, then load discharge points.

In this example dataset, the sewer network dataset consists of both nodes and edges.

Some networks will contain only nodes(discharge points). When only the discharge points are available, these will be used to get the connection points on the waer network as demonstrated  below.


## 2.1. Load sewer data

In [215]:
# path = data_src / "flanders_sewernetwork/Streng.shp"
sewer_edges = load_data(config.data_src / "flanders_sewernetwork/Streng.shp", PROJ_CRS)

# path = data_src / "flanders_sewernetwork/Hydpnt.shp"
hpoint_data = load_data(config.data_src / "flanders_sewernetwork/Hydpnt.shp", PROJ_CRS)

# path = data_src / "flanders_sewernetwork/Koppnt.shp"
koppnt_data = load_data(config.data_src / "flanders_sewernetwork/Koppnt.shp", PROJ_CRS)

(327212, 28)
Project crs: EPSG:31370
(36190, 22)
Project crs: EPSG:31370
(336225, 5)
Project crs: EPSG:31370


In [39]:
# sewer_node_id: dcpCode
# water_node_id: node_id
# region: WAL

### 2.1.1.  Merge the nodes datasets

This dataset comes with two node data. To work with properties from both datasets, merge koppnt_data and hpoint_data dataframes.

In [325]:
joined_sewer_nodes = gpd.overlay(koppnt_data, hpoint_data, how='union', keep_geom_type=False, make_valid=False)

In [348]:
drop_cols = ['OIDN_1', 'UIDN_1', 'OIDN_2', 'UIDN_2']
all_sewer_nodes = joined_sewer_nodes.drop(drop_cols, axis=1)

#Copy values from similar columns in the joined datasets
all_sewer_nodes.loc[all_sewer_nodes['NRKPNT'].isnull(), 'NRKPNT'] = all_sewer_nodes['CODEKOPPNT']
all_sewer_nodes.loc[all_sewer_nodes['RWZI_1'].isnull(), 'RWZI_1'] = all_sewer_nodes['RWZI_2']

In [532]:
all_sewer_nodes = all_sewer_nodes.convert_dtypes()

In [350]:
all_sewer_nodes.head(3)

Unnamed: 0,NRKPNT,RWZI_1,NRHPNT,TYPE,LBLTYPE,STATUS,CODEUITL,UITLWAT,LBLUITLWAT,VHAS,NAAMWTL,CODEKOPPNT,VRSTLLNG,LBLVRSTLNG,STARTDATUM,STOPDATUM,RENDATUM,GUPPROJ,NISCODE,GEMEENTE,RWZI_2,geometry
0,212894933872_2,Genk,16632.0,OVST,Overstort,Actief,,n.v.t.,Niet van toepassing,0.0,,212894933872_2,1.0,Overstort,2004-04-30,9999-01-01,9999-01-01,,71016,Genk,Genk,POINT (229937.880 184382.900)
1,212895090285_1,Genk,16571.0,OVST,Overstort,Actief,,n.v.t.,Niet van toepassing,0.0,,212895090285_1,1.0,Overstort,1970-01-01,9999-01-01,2021-12-31,,71016,Genk,Genk,POINT (229008.380 185925.343)
2,212895093541_1,Genk,16569.0,OVST,Overstort,Actief,,n.v.t.,Niet van toepassing,0.0,,212895093541_1,1.0,Overstort,1970-01-01,9999-01-01,9999-01-01,,71016,Genk,Genk,POINT (229034.099 185951.002)


In [533]:
all_sewer_nodes.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 366664 entries, 0 to 366663
Data columns (total 22 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   NRKPNT      366664 non-null  string  
 1   RWZI_1      366664 non-null  string  
 2   NRHPNT      36098 non-null   Int64   
 3   TYPE        36098 non-null   string  
 4   LBLTYPE     36098 non-null   string  
 5   STATUS      36098 non-null   string  
 6   CODEUITL    22191 non-null   string  
 7   UITLWAT     36098 non-null   string  
 8   LBLUITLWAT  36098 non-null   string  
 9   VHAS        36098 non-null   Int64   
 10  NAAMWTL     18540 non-null   string  
 11  CODEKOPPNT  36098 non-null   string  
 12  VRSTLLNG    36098 non-null   Int64   
 13  LBLVRSTLNG  36098 non-null   string  
 14  STARTDATUM  36098 non-null   string  
 15  STOPDATUM   36098 non-null   string  
 16  RENDATUM    36098 non-null   string  
 17  GUPPROJ     0 non-null       Int64   
 18  NISCODE     3609

In [534]:
# all_sewer_nodes.to_file(r'C:\Workdir\Develop\test_data\all_sewer_nodes.shp')
# all_sewer_nodes.to_file(config.data_dest / "VL_sewernodes.shp")

  pd.Int64Index,


### Create connection lines between the sewer points and hydraulic points

In [353]:
def make_intersection_lines(df, from_point, to_point):
    lines = []
    for index, row in df.iterrows():
        p_1 = Point(row[from_point])
        p_2 = Point(row[to_point])
        intersect = LineString([p_1, p_2])
        # linestring = loads(intersect)
        lines.append(intersect)
    return lines

In [354]:
merged_sewer_nodes = pd.merge(
    koppnt_data, hpoint_data, left_on="NRKPNT", right_on="CODEKOPPNT", how="left"
)

connection_df = merged_sewer_nodes[merged_sewer_nodes['geometry_y'].notna()]

connection_df['connection_lines'] = make_intersection_lines(connection_df, 'geometry_x', 'geometry_y')


connection_lines_df = gpd.GeoDataFrame(
                (connection_df[['CODEKOPPNT', 'LBLTYPE', 'NAAMWTL', 'LBLUITLWAT', 'LBLVRSTLNG', 'RWZI_y', 'connection_lines']].rename(
                        columns={"connection_lines": "geometry"}
                    )
                ),
                geometry="geometry",
                crs=PROJ_CRS,
            ).reset_index(drop=True)

connection_lines_df["New_ID"] = range(1, len(connection_lines_df) + 1)
connection_lines_df["newID"] = 'CONN_' + connection_lines_df["New_ID"].astype(str)

connection_lines_df = connection_lines_df.drop('New_ID', axis=1)

connection_lines_df['fictitious'] = 'true'

In [356]:
connection_lines_df.head(3)

Unnamed: 0,CODEKOPPNT,LBLTYPE,NAAMWTL,LBLUITLWAT,LBLVRSTLNG,RWZI_y,geometry,newID,fictitious
0,12678833639_1,Uitlaat,KATTEBEEK,Regenwater,Regenwater,Poperinge,"LINESTRING (27833.780 168369.750, 27834.940 16...",CONN_1,True
1,12762492806_1,Uitlaat,STEENVOORDEBEEK,Regenwater,Regenwater,Watou,"LINESTRING (26427.471 173001.841, 26423.944 17...",CONN_2,True
2,12763278988_1,Overstort,,Niet van toepassing,Overstort,Watou,"LINESTRING (26288.000 173798.990, 26290.655 17...",CONN_3,True


In [535]:
# connection_lines_df.to_file(r'C:\Workdir\Develop\test_data\connection_lines_df.shp')

  pd.Int64Index,


![Alt text](../documentation/output_images/3.water_sewer.PNG)

### Disconnected Hpoints

Some of the Hpoints in the network have no connection to any point on the sewer network

- In this section decide how to connect the nodes to the external nodes of the sewer using a line, and add these nodes to sewer_nodes dataset.
- This should be easy because the connection point identifier is the same but the geometries are different.

In [359]:
merged_sewer_nodes2 = pd.merge(
    koppnt_data, hpoint_data, left_on="NRKPNT", right_on="CODEKOPPNT", how="outer"
)

values = merged_sewer_nodes2.loc[merged_sewer_nodes2['NRKPNT'].isnull(), 'NRHPNT'].tolist()

unconnected = hpoint_data[hpoint_data['NRHPNT'].isin(values)].reset_index(drop=True)

## 2.2. Expose external nodes

Extract the external nodes from the sewer network. External nodes refers to the nodes that have no start point, indicating that they empty into the river network. Here, using the attributes 'BEGINKPNT' and 'EINDKPNT' which are the node codes, we can find the external nodes.

In [375]:
def find_external_nodes(df, begin_col, end_col):

    """This function extracts the endpoints of a sewer segment
    that are not beginpoints of another sewer segment"""

    beginpoints = df[begin_col].to_list()
    endpoints = df[end_col].to_list()
    beginpoints_set = set(beginpoints)
    endpoints_set = set(endpoints)
    external_nodes = list(endpoints_set - beginpoints_set)
    return external_nodes

In [399]:
external_nodes = find_external_nodes(sewer_edges, "BEGINKPNT", "EINDKPNT")

ext_nodes_df = (
    all_sewer_nodes.query("CODEKOPPNT in @external_nodes") #hpoint
    .query("VHAS != 0")
    .drop_duplicates(subset="CODEKOPPNT", keep='last')
    .drop(columns=["NRKPNT"])
    .drop_duplicates(subset="geometry")
)

In [400]:
ext_nodes_df.shape

(14471, 21)

In [408]:
ext_nodes_df.head(3)

Unnamed: 0,RWZI_1,NRHPNT,TYPE,LBLTYPE,STATUS,CODEUITL,UITLWAT,LBLUITLWAT,VHAS,NAAMWTL,CODEKOPPNT,VRSTLLNG,LBLVRSTLNG,STARTDATUM,STOPDATUM,RENDATUM,GUPPROJ,NISCODE,GEMEENTE,RWZI_2,geometry
8,Houthalen - Centrum,11042.0,UITL,Uitlaat,Actief,72039_0089,RWA,Regenwater,6003668.0,GROENSTRAATBEEK,212901908802_1,808.0,Regenwater,2004-04-08,9999-01-01,9999-01-01,,72039,Houthalen-Helchteren,Houthalen - Centrum,POINT (220980.710 191082.570)
12,Houthalen - Centrum,11012.0,UITL,Uitlaat,Actief,72039_0069,RWA,Regenwater,6003650.0,RODEBEEK,212910331114_1,808.0,Regenwater,2001-08-22,9999-01-01,9999-01-01,,72039,Houthalen-Helchteren,Houthalen - Centrum,POINT (221311.620 190314.250)
30,Bilzen,9040.0,UITL,Uitlaat,Actief,73006_0202,RWA,Regenwater,6001637.0,ECHELWATER,213706570888_1,808.0,Regenwater,2005-02-25,9999-01-01,9999-01-01,,73006,Bilzen,Bilzen,POINT (230508.570 176788.970)


In [401]:
assert ext_nodes_df.geometry.nunique() == ext_nodes_df.CODEKOPPNT.nunique()

ext_nodes_df.shape

(14471, 21)

In [45]:
#![3.water_sewer.PNG](attachment:3.water_sewer.PNG)

## 2.3 Find Connection Nodes

Connection nodes are the points on a water network where a sewer network 'connects' to, or in the real world, where the sewer empties into a water network.

Use a custom function to identify a connection point by projecting to the nearest point on a river from an external node.

For sewer networks made up of just discharge points, these are used in place of external nodes.

The expected output is a dataframe with sewer nodes projected onto water segments.

In [382]:
def get_nearest_point(df, line_col, point_col):
    """
    For each point in points_df, find the nearest point in lines_df.
    """
    point_geoms = []
    for idx, row in df.iterrows():
        destinations = MultiPoint(np.array(row[line_col].coords))
        # print(destinations[0])
        # destinations = MultiPoint(row[line_col].coords)  # geometry_y
        nearest_geoms = nearest_points(row[point_col], destinations)  # geometry_x
        try:
            for coord in destinations:
                # print(coord)
                if coord == nearest_geoms[1]:
                    point_geoms.append(coord)
        except ValueError:
            print("No nearest point found for {}".format(row.CODEKOPPNT))
    return point_geoms

In [383]:
assert len(water_final) == water_final.VHAS.nunique()

In [410]:
water_final_cols = ["VHAS", "geometry"]
ext_nodes_cols = ["NRHPNT", "CODEKOPPNT", "VHAS", "geometry"]
sewer_water_df_full = (
    ext_nodes_df[ext_nodes_cols]
    .merge(water_final[water_final_cols], on="VHAS", how="left")
    .drop_duplicates(subset="geometry_x", keep="first")
    .query("geometry_y.notnull()")
    .assign(new_points=lambda x: get_nearest_point(x, "geometry_y", "geometry_x"))
    )
print(sewer_water_df_full.shape)

sewer_water_df = gpd.GeoDataFrame(
    sewer_water_df_full, geometry="new_points", crs=PROJ_CRS
    ).drop_duplicates(subset="new_points")

conn_node_cols = ["NRHPNT", "CODEKOPPNT", "VHAS", "new_points"]
water_cols = ["VHAS", "CODEKOPPNT", "geometry_y"]
connection_nodes_df = (
    sewer_water_df[conn_node_cols]
    .rename(columns={"new_points": "geometry"})
    .reset_index(drop=True)
)

connection_nodes_gdf = gpd.GeoDataFrame(
    connection_nodes_df, geometry="geometry", crs=PROJ_CRS
)
print("Connection_nodes_df: ", connection_nodes_gdf.shape)

water_df = (
    sewer_water_df[water_cols]
    .rename(columns={"geometry_y": "geometry"})
    .reset_index(drop=True)
)
water_gdf = gpd.GeoDataFrame(water_df, geometry="geometry", crs=PROJ_CRS)
print("Water_df: ", water_gdf.shape)

(13936, 6)
Connection_nodes_df:  (11771, 4)
Water_df:  (11771, 3)


In [422]:
print(sewer_water_df.shape)
sewer_water_df.head(2)

(11771, 6)


Unnamed: 0,NRHPNT,CODEKOPPNT,VHAS,geometry_x,geometry_y,new_points
0,11042.0,212901908802_1,6003668.0,POINT (220980.710 191082.570),"LINESTRING (221286.459 191222.609, 221265.412 ...",POINT (220979.395 191072.247)
1,11012.0,212910331114_1,6003650.0,POINT (221311.620 190314.250),"LINESTRING (221305.233 190156.962, 221303.686 ...",POINT (221305.233 190156.962)


In [2]:
# ![4.connection_nodes.PNG](attachment:4.connection_nodes.PNG)

## 2.4 Join the sewer network to the water network

This is done by transforming the linestring and point geometries into coordinates, and using these coordinates to identify split points on a water segment, by extracting the coordinates on a linestring that correspond to the connection nodes coordinates.

In [426]:
def get_point_coords(gdf):

    """Returns coordinates as tuples of coordinates"""

    return gdf.geometry.apply(lambda geom: (geom.x, geom.y))


def get_line_coords(line):

    """Returns a list of tuples of coordinates"""

    coords_list = []
    multi_points = MultiPoint(np.array(line.coords))

    multi_points_list = [shapely.wkt.dumps(g) for g in multi_points]
    multi_points_geoms = [shapely.wkt.loads(i) for i in multi_points_list]
    for i in multi_points_geoms:
        long, lat = i.x, i.y
        coords_list.append((long, lat))

    return coords_list

In [427]:
# print(connection_nodes_gdf.shape)
# print(connection_nodes_gdf.geometry.nunique())
nodes_gdf = connection_nodes_gdf.copy()
nodes_gdf["coords"] = get_point_coords(nodes_gdf)

water_gdf["coords"] = water_gdf.apply(lambda row: get_line_coords(row.geometry), axis=1)
print(water_gdf.shape)

(11771, 4)


![Alt text](../documentation/output_images/4.connection_nodes.PNG)

## 2.5. Split function

The split function splits the water segements where the sewer empties into the river. One segment can have several splits. All the split segments, using the unique identifier of the original segment are added onto a split segment dataframe.

In [475]:
def get_line_segments(l, points_list):

    idx_list = [
        i for i, item in enumerate(l) if item in points_list
    ]  # compares the two lists and returns the indexes of occurence

    p = [l[i] for i in idx_list]  # get correct order of points list on the line

    super_list = []

    start_idx = 0

    # print("Index list: ", idx_list)
    if len(idx_list) == 1 and (
        p[0] == l[0] or p[0] == l[-1]
    ):  #      (i == 0 or i == len(l)-1) and len(idx_list) == 1:
        # print("One split point, at first or last index")
        line_segment = LineString(l)
        super_list.append(line_segment)

    elif len(idx_list) == 2 and (p[0] == l[0] or p[1] == l[-1]):
        # print("Two split points, at first and last index")
        line_segment = LineString(l)
        super_list.append(line_segment)

    else:
        # import pdb; pdb.set_trace()
        for i in idx_list:
            # In the case of the first coordinates of a line being a split point but there are other split points
            if i == 0 and len(idx_list) > 1:
                index_list = len(idx_list)
                # print(f"First index is a split point, with {index_list} split points")
                continue

            else:
                # print("Many split points")
                stop_idx = i + 1  # grab list elements until index i
                # print(f"stop index is {stop_idx}")
                line_list = l[start_idx:stop_idx]
                line_segment = LineString(line_list)
                super_list.append(line_segment)
                start_idx = (
                    i  # reset the start index to the number of the prevous stop index
                )

                # super list still has one more segment to add
                if len(super_list) == len(idx_list):
                    last_segment = l[stop_idx - 1 : len(l)]
                    if stop_idx == len(l):
                        # print("Split point at end of list") # stop index goes beyond the line list
                        break
                    # n = len(l) - len(super_list)
                    # last_segment = l[stop_idx-1:len(l)] # Grab the last segments of the list from the prevous stop_idx-1, to the end of the lin len(l)
                    else:
                        # print("Split point at end of list")
                        last_segment_geom = LineString(last_segment)
                        super_list.append(last_segment_geom)
    return super_list


# pass a dataframe to the function
def split_lines(water_gdf, nodes_gdf, unique_id):

    water_no_duplicates = water_gdf.drop_duplicates(subset=unique_id)

    groups = nodes_gdf.groupby(unique_id)

    codes_list = nodes_gdf[unique_id].to_list()
    
    unique_code_list = list(set(codes_list))

    all_segments = []
    ids = []
    # counter = 0

    for num, i in enumerate(unique_code_list):
        points_list = groups.get_group(i).coords.to_list()
        # print("Points list: ", points_list)
        line = water_no_duplicates[water_no_duplicates[unique_id] == i]["coords"][:1]
        # indx = water_no_duplicates[water_no_duplicates[unique_id] == i].index [0]
        points_list = groups.get_group(i).coords.to_list()

        line_segments = get_line_segments(*line, points_list)
        
        num_segments = len(line_segments)

        all_segments.extend(line_segments)
        # ids.append(indx)
        num_unique_ids = [i] * num_segments
        # assert len(flat_list) == len(water_no_duplicates)
        ids.extend(num_unique_ids)

    gdf_segments = gpd.GeoDataFrame(
        list(range(len(all_segments))), geometry=all_segments, crs=PROJ_CRS
    )
    gdf_segments.columns = ["index", "geometry"]
    gdf_segments[unique_id] = ids
    gdf_segments = gdf_segments.set_index("index")
    return gdf_segments

In [476]:
import time

initialTime = time.time()
splitlines_df = split_lines(water_gdf, nodes_gdf, "VHAS")
finishTime = time.time()
# print(ids)
splitlines_df['VHAS'] = splitlines_df['VHAS'].astype('int')

print(splitlines_df.shape)
print(splitlines_df.crs)
print(f"Time taken: {finishTime - initialTime}")
print("********************************************************")

(15246, 2)
EPSG:31370
Time taken: 25.956522464752197
********************************************************


In [481]:
splitlines_df.head()

Unnamed: 0_level_0,geometry,VHAS
index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,"LINESTRING (138979.165 172375.119, 138977.345 ...",32768
1,"LINESTRING (138977.109 172383.406, 138969.547 ...",32768
2,"LINESTRING (139426.094 170006.797, 139438.876 ...",32776
3,"LINESTRING (139697.126 170395.470, 139697.777 ...",32776
4,"LINESTRING (138841.797 171003.798, 138847.552 ...",32784


In [3]:
#![5.split_segments2.PNG](attachment:5.split_segments2.PNG)

![Alt text](../documentation/output_images/5.split_segments2.PNG)

# 3. Creating new network

## 3.1 Water Nodes

###  3.1.1. Final water nodes

Merge the original water nodes to the new water nodes which are the split points used in the previous operation. These nodes will be added back to the final water edges

In [484]:
water_nodes_df["source"] = "water_node"

connection_nodes = (
    connection_nodes_gdf[["CODEKOPPNT", "geometry"]]
    .rename(columns={"CODEKOPPNT": "node_id"})
    .assign(source="connection_node")
)

final_nodes_combined = (
    pd.concat([water_nodes_df, connection_nodes])
    .drop_duplicates(subset="geometry", keep="first")
    .reset_index(drop=True)
)

### 3.1.2. Final node ids

The original water nodes have sequentially generated ids. After these are merged to the connection nodes, new ids for the connection nodes are generated sequentially also. A column called sewernode_id is retained to indicate the water nodes that have a corresponding sewer node in the dataset.

In [486]:
def add_sewernode_id(row):
    if row["source"] == "connection_node":
        return row["node_id"]
    else:
        return None


def get_water_nodes(df, prefix):
    df["sewernode_id"] = df.apply(add_sewernode_id, axis=1)

    conn_df = df[df["source"] == "connection_node"]
    water_nodes = df.loc[df["source"] == "water_node"]

    nodes_list = water_nodes["node_id"].to_list()
    start_num = max([int(i[len(prefix) + 3:]) for i in nodes_list])

    diff = len(df.index) - len(water_nodes_df.index)

    conn_df["node_id"] = range((start_num + 1), (start_num + diff + 1))
    conn_df["node_id"] = (prefix + '_HN') + conn_df["node_id"].astype(str)

    nodes_all = pd.concat([water_nodes, conn_df])
    nodes_all_gdf = gpd.GeoDataFrame(nodes_all, geometry="geometry", crs=PROJ_CRS)

    return nodes_all_gdf

In [487]:
waternodes = get_water_nodes(final_nodes_combined, "VL")

In [489]:
waternodes

Unnamed: 0,geometry,node_id,source,sewernode_id
0,POINT (177317.033 187108.927),VL_HN1,water_node,
1,POINT (175948.922 187590.860),VL_HN2,water_node,
2,POINT (168312.751 188947.734),VL_HN3,water_node,
3,POINT (190287.875 162834.403),VL_HN4,water_node,
4,POINT (177620.500 182754.219),VL_HN5,water_node,
...,...,...,...,...
69925,POINT (115291.160 206522.389),VL_HN69926,connection_node,7030241_1
69926,POINT (113896.571 206452.453),VL_HN69927,connection_node,7156216_1
69927,POINT (222018.811 180938.390),VL_HN69928,connection_node,6002930_1
69928,POINT (221539.327 180803.419),VL_HN69929,connection_node,6002935_1


### 3.1.3. Added properties

To get the final water nodes for a tracing water network, add various sewer node properties to the water nodes.

In [490]:
waternodes_final = (
    waternodes.merge(
        merged_sewer_nodes[["STATUS", "LBLTYPE", "NRKPNT"]],
        left_on="sewernode_id",
        right_on="NRKPNT",
        how="left",
    )
    .drop_duplicates(subset="geometry", keep="first")
    .reset_index(drop=True)
)

In [491]:
waternodes_final

Unnamed: 0,geometry,node_id,source,sewernode_id,STATUS,LBLTYPE,NRKPNT
0,POINT (177317.033 187108.927),VL_HN1,water_node,,,,
1,POINT (175948.922 187590.860),VL_HN2,water_node,,,,
2,POINT (168312.751 188947.734),VL_HN3,water_node,,,,
3,POINT (190287.875 162834.403),VL_HN4,water_node,,,,
4,POINT (177620.500 182754.219),VL_HN5,water_node,,,,
...,...,...,...,...,...,...,...
69925,POINT (115291.160 206522.389),VL_HN69926,connection_node,7030241_1,Actief,Uitlaat,7030241_1
69926,POINT (113896.571 206452.453),VL_HN69927,connection_node,7156216_1,Gepland,Uitlaat,7156216_1
69927,POINT (222018.811 180938.390),VL_HN69928,connection_node,6002930_1,Actief,Uitlaat,6002930_1
69928,POINT (221539.327 180803.419),VL_HN69929,connection_node,6002935_1,Actief,Uitlaat,6002935_1


In [530]:
# with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
#    print(merged_nodes)

# merged_nodes.to_file(r"data_transform\vl_nodes_combined_V02.shp")
waternodes_final.to_file(config.data_dest / "vl_nodes_combined.shp")

  pd.Int64Index,
  waternodes_final.to_file(config.data_dest / "vl_nodes_combined.shp")


## 3.2. Water Edges

### 3.2.1. Add nodes to water segments



In [493]:
def line_segments_start_end_ids(splitlines_df, all_nodes_gdf, node_id, PROJ_CRS):
    """Returns the start and end ids of the line segments for a given node id"""
    splitlines_df["coords"] = splitlines_df.apply(
        lambda row: get_line_coords(row.geometry), axis=1
    )  
    all_nodes_gdf["coords"] = get_point_coords(all_nodes_gdf)
    # join linestrings to the nearest node, in this case the node attached to the line
    joined_lines_nodes = gpd.sjoin_nearest(
        splitlines_df, all_nodes_gdf, how="left"
    ).reset_index()
    # identify the nodes that corresponding to the line start and end points
    idx_start = []
    start_id = []
    idx_end = []
    end_id = []

    for idx, row in joined_lines_nodes.iterrows():
        if row.coords_right == row.coords_left[0]:
            idx_start.append(row["index"])
            start_id.append(row[node_id])
        elif row.coords_right == row.coords_left[-1]:
            idx_end.append(row["index"])
            end_id.append(row[node_id])

    start_id_df = (
        pd.DataFrame({"line_index": idx_start, f"start_{node_id}": start_id})
        .merge(
            joined_lines_nodes[["index", node_id, "VHAS", "geometry"]],
            left_on="line_index",
            right_on="index",
            how="left",
        )
        .drop_duplicates("geometry")
    )

    end_id_df = (
        pd.DataFrame({"line_index": idx_end, f"end_{node_id}": end_id})
        .merge(
            joined_lines_nodes[["index", node_id, "VHAS", "geometry"]],
            left_on="line_index",
            right_on="index",
            how="left",
        )
        .drop_duplicates("geometry")
    )

    merged_start_end_df = (
        pd.merge(start_id_df, end_id_df, on="geometry", how="outer")
        .drop(
            [
                "line_index_x",
                "index_x",
                "node_id_x",
                "index_y",
                "line_index_y",
                "node_id_y",
                "VHAS_y",
            ],
            axis=1,
        )
        .rename(columns={"start_node_id": "start_ID", "end_node_id": "end_ID"})
    )

    return gpd.GeoDataFrame(merged_start_end_df, geometry="geometry", crs=PROJ_CRS)

In [494]:
node_id = "node_id"
splitlines_with_ids = line_segments_start_end_ids(
    splitlines_df, waternodes[["geometry", "node_id"]], node_id, PROJ_CRS
)

In [495]:
splitlines_with_ids.head()

Unnamed: 0,start_ID,VHAS_x,geometry,end_ID
0,VL_HN55954,32768,"LINESTRING (138979.165 172375.119, 138977.345 ...",VL_HN66364
1,VL_HN66364,32768,"LINESTRING (138977.109 172383.406, 138969.547 ...",VL_HN42787
2,VL_HN579,32776,"LINESTRING (139426.094 170006.797, 139438.876 ...",VL_HN67953
3,VL_HN67953,32776,"LINESTRING (139697.126 170395.470, 139697.777 ...",VL_HN580
4,VL_HN22835,32784,"LINESTRING (138841.797 171003.798, 138847.552 ...",VL_HN65880


### 3.3.2. Get water segments Unidue Ids

The new water segments still retain the original unique ids, meaning some of them share an id. A systematic method to assign new ids is applied, by adding a suffix to the original id, indicating the number of times the original water segment was split. By retaining part of the original id, one can quickly identify if the water segment is a split one or an original one.

In [496]:
# get unique line segments ids
def get_unique_ID(df, col):
    """Get unique ID for each new split segment in a dataframe
    Assert that the number of unique IDs is equal to the number of split segments
    """
    # the new split lines need a new unique uniqueID value
    df["num_id"] = df.groupby(col).cumcount() + 1
    df["new_string_id"] = df[col].astype(str) + "_" + df["num_id"].astype(str)

    return df

In [497]:
splitlines_vhas = get_unique_ID(splitlines_with_ids, "VHAS_x")
assert len(splitlines_vhas) == splitlines_vhas.new_string_id.nunique()

In [498]:
splitlines_vhas

Unnamed: 0,start_ID,VHAS_x,geometry,end_ID,num_id,new_string_id
0,VL_HN55954,32768,"LINESTRING (138979.165 172375.119, 138977.345 ...",VL_HN66364,1,32768_1
1,VL_HN66364,32768,"LINESTRING (138977.109 172383.406, 138969.547 ...",VL_HN42787,2,32768_2
2,VL_HN579,32776,"LINESTRING (139426.094 170006.797, 139438.876 ...",VL_HN67953,1,32776_1
3,VL_HN67953,32776,"LINESTRING (139697.126 170395.470, 139697.777 ...",VL_HN580,2,32776_2
4,VL_HN22835,32784,"LINESTRING (138841.797 171003.798, 138847.552 ...",VL_HN65880,1,32784_1
...,...,...,...,...,...,...
15241,VL_HN64821,7012201,"LINESTRING (70895.602 219738.531, 70903.891 21...",VL_HN56242,4,7012201_4
15242,VL_HN34678,32672,"LINESTRING (138704.147 176812.610, 138688.060 ...",VL_HN29687,1,32672_1
15243,VL_HN25141,6029276,"LINESTRING (161243.920 186896.994, 161230.637 ...",VL_HN66851,1,6029276_1
15244,VL_HN66851,6029276,"LINESTRING (159618.409 188060.371, 159396.076 ...",VL_HN66853,2,6029276_2


### 3.2.3. Getting water properties to water segments

Merge the split segments to original water dataframe to get orignal water properties to the split segments before joining them back to the final water dataset

In [499]:
splitlines_final = (
    splitlines_vhas.merge(water_final, left_on="VHAS_x", right_on="VHAS", how="left")
    .drop(["VHAS_x", "VHAS", "num_id", "start_ID_y", "end_ID_y", "geometry_y"], axis=1)
    .rename(
        columns={
            "new_string_id": "VHAS",
            "start_ID_x": "start_ID",
            "end_ID_x": "end_ID",
            "geometry_x": "geometry",
        }
    )
)

In [500]:
splitlines_final.head(2)

Unnamed: 0,start_ID,geometry,end_ID,VHAS,OIDN,UIDN,VHAG,NAAM,REGCODE,REGCODE1,BEHEER,CATC,LBLCATC,BEKNR,BEKNAAM,STRMGEB,GEO,LBLGEO,VHAZONENR,WTRLICHC,LENGTE
0,VL_HN55954,"LINESTRING (138979.165 172375.119, 138977.345 ...",VL_HN66364,32768_1,44659,723201,6200,Zierbeek,B5111,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,2,< 0.25 m,422,213,866.12
1,VL_HN66364,"LINESTRING (138977.109 172383.406, 138969.547 ...",VL_HN42787,32768_2,44659,723201,6200,Zierbeek,B5111,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,2,< 0.25 m,422,213,866.12


### 3.2.4. Gather all linestrings into one dataset

With the split segments now joined to the water dataset to get the necessary attributes, we can now combine all the linestrings into one dataset. This is done by dropping all the water sements that were split in the splitting funtion, and merging the splitlines_final dataset.

In [502]:
def merge_segments_to_water(split_segments, split_segments_final, water_df, col):
    
    """This function merges segments to water linestrings"""
    
    # drop the linestrings to be split and merge the df with split lines
    split_segments = split_segments.astype({col: str}, errors="raise")
    split_segments_final = split_segments_final.astype({col: str}, errors="raise")
    water_df = water_df.astype({col: str}, errors="raise")
    print("water_df: ", len(water_df))

    assert (
        split_segments_final[col].nunique()
        == split_segments_final["geometry"].nunique()
    )
    assert water_df[col].nunique() == water_df["geometry"].nunique()

    linestrings_to_drop = list(set(split_segments[col].to_list()))
    print("linestrings_to_drop: ", len(linestrings_to_drop))
    print("split_segments: ", len(split_segments))

    water_df_trimmed = water_df.query(
        col + " not in @linestrings_to_drop"
    )  # .reset_index(drop=True)
    water_df_drop = water_df.query(col + " in @linestrings_to_drop")
    print('water_df_trimmed: ', len(water_df_trimmed))

    # merge the split lines with the original water lines
    merged_df = gpd.GeoDataFrame(
        pd.concat([split_segments_final, water_df_trimmed], ignore_index=True),
        geometry="geometry",
        crs=PROJ_CRS,
    )
    print("merged df: ", len(merged_df))
    
    assert merged_df["geometry"].nunique() == merged_df[col].nunique()
    assert (len(water_df) - len(linestrings_to_drop)) + len(split_segments) == len(merged_df)

    return merged_df

In [503]:
segments_to_water = merge_segments_to_water(
    splitlines_df, splitlines_final, water_final, "VHAS"
)

water_df:  63762
linestrings_to_drop:  6845
split_segments:  15246
water_df_trimmed:  56917
merged df:  72163


In [506]:
split_segments = splitlines_df.astype({"VHAS": str}, errors="raise")
split_segments_final = splitlines_final.astype({"VHAS": str}, errors="raise")
water_df = water_df.astype({"VHAS": str}, errors="raise")
print("water_df: ", len(water_df))

assert (
    split_segments_final["VHAS"].nunique()
    == split_segments_final["geometry"].nunique()
)
assert water_df["VHAS"].nunique() == water_df["geometry"].nunique()

water_df:  11771


In [507]:
linestrings_to_drop = list(set(split_segments['VHAS'].to_list()))
print("linestrings_to_drop: ", len(linestrings_to_drop))
print("split_segments: ", len(split_segments))

water_df_trimmed = water_df.query(
    'VHAS' + " not in @linestrings_to_drop"
)  # .reset_index(drop=True)
water_df_drop = water_df.query('VHAS' + " in @linestrings_to_drop")
print('water_df_trimmed: ', len(water_df_trimmed))

linestrings_to_drop:  6845
split_segments:  15246
water_df_trimmed:  11771


### 3.2.5. Recalculate the length of the linestrings

This is necessary to get the correct length of the linestrings after the split function.

In [514]:
segments_to_water["new_length"] = segments_to_water["geometry"].apply(
    lambda x: x.length
)

In [515]:
segments_to_water

Unnamed: 0,start_ID,geometry,end_ID,VHAS,OIDN,UIDN,VHAG,NAAM,REGCODE,REGCODE1,BEHEER,CATC,LBLCATC,BEKNR,BEKNAAM,STRMGEB,GEO,LBLGEO,VHAZONENR,WTRLICHC,LENGTE,new_length
0,VL_HN55954,"LINESTRING (138979.165 172375.119, 138977.345 ...",VL_HN66364,32768_1,44659,723201,6200,Zierbeek,B5111,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,2,< 0.25 m,422,213,866.12,8.541216
1,VL_HN66364,"LINESTRING (138977.109 172383.406, 138969.547 ...",VL_HN42787,32768_2,44659,723201,6200,Zierbeek,B5111,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,2,< 0.25 m,422,213,866.12,857.583556
2,VL_HN579,"LINESTRING (139426.094 170006.797, 139438.876 ...",VL_HN67953,32776_1,44660,664399,6243,Peverstraatbeek,B5114,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,1,2.5 tot 0.25 m,422,1033,952.75,489.227392
3,VL_HN67953,"LINESTRING (139697.126 170395.470, 139697.777 ...",VL_HN580,32776_2,44660,664399,6243,Peverstraatbeek,B5114,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,1,2.5 tot 0.25 m,422,1033,952.75,463.520409
4,VL_HN22835,"LINESTRING (138841.797 171003.798, 138847.552 ...",VL_HN65880,32784_1,44567,687269,6258,Zibbeek,B5116,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,1,2.5 tot 0.25 m,422,1033,781.83,698.498245
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72158,VL_HN59861,"LINESTRING (213005.119 178752.613, 212973.115 ...",VL_HN463,7075990,126450,727571,68394,Caenengracht,,,PARTIC,9,Niet geklasseerd,9,Demerbekken,Schelde,0,10 tot 2.5 m,612,10508,158.53,158.530711
72159,VL_HN59862,"LINESTRING (123105.119 155738.763, 123116.253 ...",VL_HN250,7076007,126451,727572,6060,Riveau d'Onscalle,,,ONBEKEND,9,Niet geklasseerd,7,Denderbekken,Schelde,0,10 tot 2.5 m,400,11021,835.62,835.621730
72160,VL_HN59863,"LINESTRING (124133.064 159975.285, 124110.186 ...",VL_HN324,7076009,126452,727573,6133,Mottingemeersbeek,,,ONBEKEND,9,Niet geklasseerd,7,Denderbekken,Schelde,0,10 tot 2.5 m,401,11021,152.87,152.866326
72161,VL_HN59864,"LINESTRING (198243.188 165368.687, 198243.228 ...",VL_HN59930,7067790,122282,668015,7161,Kleine Gete,B4003,,V0102,1,"Geklasseerd, eerste categorie",9,Demerbekken,Schelde,2,< 0.25 m,621,651,114.46,208.939715


In [520]:
segments_to_water.columns

Index(['start_ID', 'geometry', 'end_ID', 'VHAS', 'OIDN', 'UIDN', 'VHAG',
       'NAAM', 'REGCODE', 'REGCODE1', 'BEHEER', 'CATC', 'LBLCATC', 'BEKNR',
       'BEKNAAM', 'STRMGEB', 'GEO', 'LBLGEO', 'VHAZONENR', 'WTRLICHC',
       'LENGTE', 'new_length'],
      dtype='object')

In [540]:
segments_to_water2 = (segments_to_water.rename
                                        (columns={'LBLCATC':'category',
                                        'EINDKPNT':'end_ID', 
                                        'VHAS': 'line_id',
                                        'LENGTE': 'length',
                                        'BEGINKPNT':'start_ID', 
                                        'STRMGEB':'basin',
                                        'NAAM':'name'
                                        }))

In [541]:
segments_to_water2.head(2)

Unnamed: 0,start_ID,geometry,end_ID,line_id,OIDN,UIDN,VHAG,name,REGCODE,REGCODE1,BEHEER,CATC,category,BEKNR,BEKNAAM,basin,GEO,LBLGEO,VHAZONENR,WTRLICHC,length,new_length
0,VL_HN55954,"LINESTRING (138979.165 172375.119, 138977.345 ...",VL_HN66364,32768_1,44659,723201,6200,Zierbeek,B5111,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,2,< 0.25 m,422,213,866.12,8.541216
1,VL_HN66364,"LINESTRING (138977.109 172383.406, 138969.547 ...",VL_HN42787,32768_2,44659,723201,6200,Zierbeek,B5111,,20001,2,"Geklasseerd, tweede categorie",7,Denderbekken,Schelde,2,< 0.25 m,422,213,866.12,857.583556


In [529]:
# segments_to_water.to_file(config.data_dest / "vl_water_PROCESSED.shp")

  pd.Int64Index,


## 4. Intermodal network connection object

In [516]:
final_nodes_combined.sample(3)

Unnamed: 0,geometry,node_id,source,sewernode_id
65234,POINT (54690.707 209888.344),7086328_1,connection_node,7086328_1
24782,POINT (55094.816 175140.191),VL_HN24783,water_node,
16209,POINT (89291.624 198201.938),VL_HN16210,water_node,


In [517]:
# waternodes_final
network_connection_object2 = (
    waternodes_final[["node_id", "sewernode_id"]]
    .query("sewernode_id.notnull()")
    .rename(columns={"node_id": "hydronode_id"})
    .reset_index(drop=True)
)

In [518]:
network_connection_object2

Unnamed: 0,hydronode_id,sewernode_id
0,VL_HN60772,212901908802_1
1,VL_HN60773,213707112029_1
2,VL_HN60774,213717048471_1
3,VL_HN60775,213719456039_1
4,VL_HN60776,213719774860_1
...,...,...
9154,VL_HN69926,7030241_1
9155,VL_HN69927,7156216_1
9156,VL_HN69928,6002930_1
9157,VL_HN69929,6002935_1


In [527]:
# network_connection_object2.to_csv(r"data_transform\vl_network_connection_object.csv", index=False)
# network_connection_object2.to_csv(config.data_dest / "vl_network_connection.csv")