# Point Locations of Interest

Locations of interest refer to any phonomenon occuring along the networks that have the potential to affect or be affected by pollution.

The locations of interest have surface geometry, either point or polygon.

This notebook develops the methodology for point locations of interest.

In [1]:
import geopandas as gpd
import pandas as pd

from shapely.geometry import Point, LineString, MultiLineString, MultiPoint
from shapely import wkt
from shapely.ops import nearest_points
import shapely.wkt

import random
import folium
import plotly.express as px
import numpy as np 

import warnings
#from shapely.errors import ShapelyDeprecationWarning
#warnings.filterwarnings("ignore", category=ShapelyDeprecationWarning) 

#pd.options.mode.chained_assignment = None  # default='warn'

In [2]:
PROJ_CRS = 'EPSG:31370'

In [3]:
def load_data(path, crs=PROJ_CRS):
    """
    Loads the data from the given path, 
    and prints the shape and crs of the data.
    """
    data = gpd.read_file(path)
    print(data.shape)
    print("Original crs:", data.crs)
    data = data.to_crs(crs)
    print("Project crs:", data.crs)
    return data

In [4]:
pwd

'c:\\workdir\\develop\\gopeg\\preprocessing\\notebooks'

In [5]:
PATH = r"..\data\data_preprocess\flanders_locations\Production and industrial facilities\ProductionInstallation_points.shp"
prod_installations = load_data(PATH)

(1962, 9)
Original crs: epsg:3857
Project crs: EPSG:31370


In [6]:
PATH = r"..\data\data_transform\vl_water_PROCESSED_V2.shp"
water = load_data(PATH)

(71983, 21)
Original crs: epsg:31370
Project crs: epsg:31370


In [7]:
water.geometry.nunique()

71983

In [8]:
# water.geometry.nunique() --> 71995

In [9]:
prod_installations.head(2)

Unnamed: 0,gml_id,identifier,name,localId,namespace,status,type,dist,geometry
0,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Van Looveren Leo,BE.VL.000000416.INSTALLATION,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,,,,POINT (174053.026 229391.163)
1,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Kela,BE.VL.000000132.INSTALLATION,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,,,,POINT (175372.287 230904.830)


In [10]:
prod_points = prod_installations.drop(['namespace', 'status', 'type', 'dist'], axis=1)
prod_points.head(2)

Unnamed: 0,gml_id,identifier,name,localId,geometry
0,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Van Looveren Leo,BE.VL.000000416.INSTALLATION,POINT (174053.026 229391.163)
1,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Kela,BE.VL.000000132.INSTALLATION,POINT (175372.287 230904.830)


## Working with point locations of interest

For working with point locations of interests, we will project a point to the nearest water geometry, applying a threshold distance to exclude points that are too far away from the nearest water geometry.

**Load water data to perform the intersection and identify the points of intersection between water and locations of iterest**

In [11]:
#Check for multiline strings in a dataset
def check_multiline(df):
    """This function checks for multiline strings
        from the geometry column in a given dataset"""
    lst = df['geometry'].to_list()
    multiline_count = 0
    for item in lst:
        if isinstance(item, MultiLineString):
            multiline_count += 1
    print("MultiLinesStrings:" , multiline_count)

In [12]:
check_multiline(water)

MultiLinesStrings: 0


In [13]:
water_df = water[['VHAS', 'NAAM', 'start_ID', 'end_ID', 'geometry']]
water_df.crs

#water_df = water_df.to_crs(epsg=31370)

<Derived Projected CRS: EPSG:31370>
Name: BD72 / Belgian Lambert 72
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: Belgium - onshore.
- bounds: (2.5, 49.5, 6.4, 51.51)
Coordinate Operation:
- name: Belgian Lambert 72
- method: Lambert Conic Conformal (2SP)
Datum: Reseau National Belge 1972
- Ellipsoid: International 1924
- Prime Meridian: Greenwich

In [14]:
water_multiline = water_df[water_df['geometry'].apply(lambda x: isinstance(x, MultiLineString))]
water_linestrings = water_df[water_df.geom_type == 'LineString']

water_linestrings.equals(water_df)

True

In [15]:
water_df.geometry.nunique()

71983

In [16]:
gdf_p = prod_points.copy()
gdf_l = water_df.copy()


df_n = (gpd.sjoin_nearest(gdf_p, gdf_l)
            .merge(gdf_l['geometry'], left_on="index_right", right_index=True)
            .drop(columns=['index_right'])
            .rename(columns={'index_left': 'ID'})
            .reset_index(drop=True)
            ) #merge operation adds the geometry column
            

#get distance of location of interest from water. With this distance we can filter out locations by distance from water
df_n["distance"] = df_n.apply(lambda r: r["geometry_x"].distance(r["geometry_y"]), axis=1)

assert df_n['geometry_x'].isnull().values.any() == False
assert df_n['geometry_y'].isnull().values.any() == False

In [17]:
df_n

Unnamed: 0,gml_id,identifier,name,localId,geometry_x,VHAS,NAAM,start_ID,end_ID,geometry_y,distance
0,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Van Looveren Leo,BE.VL.000000416.INSTALLATION,POINT (174053.026 229391.163),7007018_2,Laboureurloop,VL68352,VL6847,"LINESTRING (174218.478 229321.030, 174211.910 ...",134.766839
1,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Kela,BE.VL.000000132.INSTALLATION,POINT (175372.287 230904.830),6801187,Raamloop,VL368,VL51695,"LINESTRING (175886.862 230719.803, 175878.810 ...",438.889939
2,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Swaegers Slachthuis,BE.VL.000000186.INSTALLATION,POINT (176099.703 231237.668),6801187,Raamloop,VL368,VL51695,"LINESTRING (175886.862 230719.803, 175878.810 ...",114.222335
3,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,BASF Antwerpen_Tensiden,BE.VL.000000039.INSTALLATION,POINT (143177.597 228171.405),6033868,Insteekdok 4,VL51863,VL51864,"LINESTRING (143795.000 227895.000, 144421.139 ...",676.451185
4,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,BASF Antwerpen_Polyetherolen,BE.VL.000000038.INSTALLATION,POINT (143368.840 228354.153),6033868,Insteekdok 4,VL51863,VL51864,"LINESTRING (143795.000 227895.000, 144421.139 ...",626.445034
...,...,...,...,...,...,...,...,...,...,...,...
1983,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,DEBAEKE VOEDERS,BE.VL.000002200.INSTALLATION,POINT (36938.689 179169.448),6016051_1,Kannunikbeek,VL15931,VL67435,"LINESTRING (36581.648 179304.726, 36585.088 17...",381.808977
1984,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Slachthuis Heyst,BE.VL.000002203.INSTALLATION,POINT (173287.100 195545.714),7027637,,VL30494,VL46086,"LINESTRING (173256.937 195463.724, 173286.200 ...",63.849691
1985,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Hoornaert Kristof Zwevegem,BE.VL.000002204.INSTALLATION,POINT (81531.960 166875.680),6012250,Kleine Kasselrijbeek,VL24786,VL56955,"LINESTRING (81006.285 166563.352, 81022.221 16...",29.992997
1986,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Nutrifert,BE.VL.000002207.INSTALLATION,POINT (174447.586 236204.920),6027825,Muntloop,VL57677,VL34687,"LINESTRING (174270.755 235965.519, 174275.724 ...",114.964652


In [18]:
# revisit and figure out why there are duplicates in the dataframe
df_n = df_n.drop_duplicates(subset=['geometry_x'])

In [19]:
df_n.shape

(1962, 11)

In [20]:
df_n.head(3)

Unnamed: 0,gml_id,identifier,name,localId,geometry_x,VHAS,NAAM,start_ID,end_ID,geometry_y,distance
0,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Van Looveren Leo,BE.VL.000000416.INSTALLATION,POINT (174053.026 229391.163),7007018_2,Laboureurloop,VL68352,VL6847,"LINESTRING (174218.478 229321.030, 174211.910 ...",134.766839
1,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Kela,BE.VL.000000132.INSTALLATION,POINT (175372.287 230904.830),6801187,Raamloop,VL368,VL51695,"LINESTRING (175886.862 230719.803, 175878.810 ...",438.889939
2,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Swaegers Slachthuis,BE.VL.000000186.INSTALLATION,POINT (176099.703 231237.668),6801187,Raamloop,VL368,VL51695,"LINESTRING (175886.862 230719.803, 175878.810 ...",114.222335


In [21]:
#df_n.to_csv(r"C:\Workdir\Develop\TR_USECASE\data_transform\df_n.csv")

In [22]:
def get_nearest_point(df, line_col, point_col):
    """
    For each point in points_df, find the nearest point in lines_df.
    This identifies the projected point on the water network, from a location of interest point.
    """
    indexes = []
    geoms = []
    for idx, row in df.iterrows():
        destinations = MultiPoint(row[line_col].coords) #geometry_y
        nearest_geoms = nearest_points(row[point_col], destinations) #geometry_x
        try:
            for coord in destinations:
                if coord == nearest_geoms[1]:
                    geoms.append(coord)
                    indexes.append(idx)
                if idx in indexes:
                    break
                    #geoms.append(coord)
                    #indexes.append(idx)
        except ValueError:
            print("No nearest point found for {}".format(row.CODEKOPPNT))
    return geoms
    #return zip(indexes, geoms)

In [23]:
#nearest_pts = get_nearest_point(df_n, 'geometry_y', 'geometry_x')

#df = pd.DataFrame(nearest_pts, columns=['geometry'])

df_n['loc_nodes'] = get_nearest_point(df_n, 'geometry_y', 'geometry_x')


#consider retaining the original point geometry for the linear referenced df.
gdf_n = gpd.GeoDataFrame(df_n, geometry='loc_nodes').drop(['geometry_x'], axis=1)

  for coord in destinations:
  arr = construct_1d_object_array_from_listlike(values)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_n['loc_nodes'] = get_nearest_point(df_n, 'geometry_y', 'geometry_x')


In [24]:
gdf_n.head(2)

Unnamed: 0,gml_id,identifier,name,localId,VHAS,NAAM,start_ID,end_ID,geometry_y,distance,loc_nodes
0,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Van Looveren Leo,BE.VL.000000416.INSTALLATION,7007018_2,Laboureurloop,VL68352,VL6847,"LINESTRING (174218.478 229321.030, 174211.910 ...",134.766839,POINT (174186.723 229374.213)
1,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Kela,BE.VL.000000132.INSTALLATION,6801187,Raamloop,VL368,VL51695,"LINESTRING (175886.862 230719.803, 175878.810 ...",438.889939,POINT (175811.036 230893.712)


In [25]:
gdf_n.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1962 entries, 0 to 1987
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   gml_id      0 non-null      object  
 1   identifier  1962 non-null   object  
 2   name        1962 non-null   object  
 3   localId     1962 non-null   object  
 4   VHAS        1962 non-null   object  
 5   NAAM        1486 non-null   object  
 6   start_ID    1962 non-null   object  
 7   end_ID      1962 non-null   object  
 8   geometry_y  1962 non-null   geometry
 9   distance    1962 non-null   float64 
 10  loc_nodes   1962 non-null   geometry
dtypes: float64(1), geometry(2), object(8)
memory usage: 183.9+ KB


In [26]:
distances = []
for row in gdf_n.iterrows():
    dist = row[1]['geometry_y'].project(row[1]['loc_nodes'])
    distances.append(dist)

In [27]:
len(distances)

1962

In [28]:
gdf_n['ref_at'] = distances

In [29]:
gdf_n.head(2)

Unnamed: 0,gml_id,identifier,name,localId,VHAS,NAAM,start_ID,end_ID,geometry_y,distance,loc_nodes,ref_at
0,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Van Looveren Leo,BE.VL.000000416.INSTALLATION,7007018_2,Laboureurloop,VL68352,VL6847,"LINESTRING (174218.478 229321.030, 174211.910 ...",134.766839,POINT (174186.723 229374.213),61.991412
1,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Kela,BE.VL.000000132.INSTALLATION,6801187,Raamloop,VL368,VL51695,"LINESTRING (175886.862 230719.803, 175878.810 ...",438.889939,POINT (175811.036 230893.712),189.966312


In [30]:
point_loc = gdf_n.drop(['geometry_y', 'start_ID', 'end_ID'], axis=1) #.rename(columns={'loc_nodes': 'geometry'})

In [31]:
point_loc.head(2)

Unnamed: 0,gml_id,identifier,name,localId,VHAS,NAAM,distance,loc_nodes,ref_at
0,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Van Looveren Leo,BE.VL.000000416.INSTALLATION,7007018_2,Laboureurloop,134.766839,POINT (174186.723 229374.213),61.991412
1,,https://data.gpbv.omgeving.vlaanderen.be/id/pr...,Kela,BE.VL.000000132.INSTALLATION,6801187,Raamloop,438.889939,POINT (175811.036 230893.712),189.966312


In [32]:
#point_loc['geometry'] = point_loc['geometry'].apply(lambda x: shapely.wkt.loads(x.wkt))

In [33]:
point_loc.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1962 entries, 0 to 1987
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   gml_id      0 non-null      object  
 1   identifier  1962 non-null   object  
 2   name        1962 non-null   object  
 3   localId     1962 non-null   object  
 4   VHAS        1962 non-null   object  
 5   NAAM        1486 non-null   object  
 6   distance    1962 non-null   float64 
 7   loc_nodes   1962 non-null   geometry
 8   ref_at      1962 non-null   float64 
dtypes: float64(2), geometry(1), object(6)
memory usage: 153.3+ KB


In [34]:
point_loc2 = point_loc.set_crs(epsg=3035)

In [35]:
#point_loc2.to_file(r"..\data_transform\vl_point_locations.shp")

In [36]:
# point_loc.to_file(r"..\data_transform\vl_point_loc.shp")