# Point Locations of Interest

Locations of interest refer to any phonomenon occuring along the networks that have the potential to affect or be affected by pollution.

The locations of interest have surface geometry, either point or polygon.

This notebook develops the methodology for point locations of interest.

In [1]:
import os
import sys
path = os.path.dirname(os.path.abspath(''))
os.chdir(path)
print(path)

c:\Workdir\Develop\repository\go-peg


In [2]:
import geopandas as gpd
import pandas as pd

from shapely.geometry import Point, LineString, MultiLineString, MultiPoint
from shapely import wkt
from shapely.ops import nearest_points
import shapely.wkt

import numpy as np 

import warnings
from shapely.errors import ShapelyDeprecationWarning
warnings.filterwarnings("ignore", category=ShapelyDeprecationWarning) 

pd.options.mode.chained_assignment = None  # default='warn'

from src.config import config

**Declare global variables**

In [8]:
# PROJ_CRS = 'EPSG:31370'
PROJ_CRS = 'EPSG:3035'

buffer_distance = 200
region = 'DK'

object_type = 'ProductionFacility' #no spaces

**Load the processed water dataset and the object of interest dataset**

In [4]:
def load_data(path):
    """
    Loads the data from the given path, 
    and prints the shape and crs of the data.
    """
    data = gpd.read_file(path)
    print(data.shape)

    data_crs = data.crs

    print("Data crs:", data.crs)
    return data, data_crs

In [5]:
PATH = config.data_dest / "DK_waterPROCESSED.shp"
water, data_crs = load_data(PATH)

(890635, 6)
Data crs: EPSG:3044


In [7]:
PATH = (config.data_src / "DK/OOI/ProductionFacility2.shp")
ooi_data, data_crs = load_data(PATH)
# prod_installations = ooi_data.to_crs(PROJ_CRS)
# print('Project crs:', prod_installations.crs)

(1877, 23)
Data crs: EPSG:3035


In [9]:
water = water.to_crs(PROJ_CRS)
ooi_data = ooi_data.to_crs(PROJ_CRS)
print('Project crs:', ooi_data.crs)

Project crs: EPSG:3035


In [10]:
ooi_data.head()

Unnamed: 0,gml_id,localId,namespace,validFrom,beginLifes,nameOfFeat,organisati,individual,electronic,streetName,...,telephoneN,parentComp,EPRTRAnnex,remarks,dateOfStar,address|Ad,address|_1,address|_2,address|_3,geometry
0,_000057466.FACILITY,000057466.FACILITY,DK.CAED,,,VICUS P ApS (Wedellsborg),Miljøstyrelsen,MST Erhverv,mst@mst.dk,Tolderlundsvej,...,72544000,VICUS P ApS,,ProductionSite and ProductionFacilities in Den...,2017-07-20+00:00,Tybrindvej,47,Ejby,5592.0,POINT (4312450.247 3588333.431)
1,_000057500.FACILITY,000057500.FACILITY,DK.CAED,,,Arla Foods Amba Kruså Mejeri,Miljøstyrelsen,MST Erhverv,mst@mst.dk,Tolderlundsvej,...,72544000,,,ProductionSite and ProductionFacilities in Den...,1988-01-01+00:00,Aabenraavej,2A,Kruså,6340.0,POINT (4282657.527 3527407.302)
2,_000057518.FACILITY,000057518.FACILITY,DK.CAED,,,Arla Foods Amba Branderup Mejeri,Miljøstyrelsen,MST Erhverv,mst@mst.dk,Tolderlundsvej,...,72544000,,,ProductionSite and ProductionFacilities in Den...,2013-12-31+00:00,Engdraget,4,Branderup J,6535.0,POINT (4261660.782 3557601.093)
3,_000057519.FACILITY,000057519.FACILITY,DK.CAED,,,Frederikshavn Kraftvarmeværk,Miljøstyrelsen,MST Erhverv,mst@mst.dk,Tolderlundsvej,...,72544000,,,ProductionSite and ProductionFacilities in Den...,1986-06-19+00:00,Vendsysselvej,8,Frederikshavn,9900.0,POINT (4352284.031 3817333.008)
4,_000057522.FACILITY,000057522.FACILITY,DK.CAED,,,Arla Foods Amba AKAFA,Miljøstyrelsen,MST Erhverv,mst@mst.dk,Tolderlundsvej,...,72544000,ARLA FOODS AMBA,,ProductionSite and ProductionFacilities in Den...,2013-12-31+00:00,Svenstrup Skolevej,25,Svenstrup J,9230.0,POINT (4311551.696 3763529.360)


In [63]:
ooi_data.loc[0]

gml_id                                      _000057466.FACILITY
localId                                      000057466.FACILITY
namespace                                               DK.CAED
validFrom                                                   NaN
beginLifes                                                  NaN
nameOfFeat                            VICUS P ApS (Wedellsborg)
organisati                                       Miljøstyrelsen
individual                                          MST Erhverv
electronic                                           mst@mst.dk
streetName                                       Tolderlundsvej
buildingNu                                                    5
city                                                   Odense C
postalCode                                                 5000
telephoneN                                             72544000
parentComp                                          VICUS P ApS
EPRTRAnnex                              

## Working with point locations of interest

For working with point locations of interests, we will project a point to the nearest water geometry, applying a threshold distance to exclude points that are too far away from the nearest water geometry.

**Select relevant columns from the object of interest dataset, which includes identifiers and geometry**

In [None]:
# {'nameOfFeat': 'nameOfFacility', 'organisati': 'organisation'}

In [65]:
prod_points = ooi_data[['localId', 'namespace', 'nameOfFeat', 'organisati', 'city', 'dateOfStar','validFrom', 'beginLifes', 'geometry']]
prod_points.head(2)
prod_points.shape

(1877, 9)

**Select relevant columns from the water dataset**

In [66]:
#Check for multiline strings in a dataset
def check_multiline(df):
    """This function checks for multiline strings
        from the geometry column in a given dataset"""
    lst = df['geometry'].to_list()
    multiline_count = 0
    for item in lst:
        if isinstance(item, MultiLineString):
            multiline_count += 1
    print("MultiLinesStrings:" , multiline_count)

check_multiline(water)   

MultiLinesStrings: 0


In [67]:
# water_df = water[['VHAS', 'NAAM', 'start_ID', 'end_ID', 'geometry']]
water_df = water[['line_id', 'geometry']]

In [68]:
assert water_df.shape[0] == water_df.geometry.nunique()

**Perform an sjoin to get the two dataset properties into one dataset**

In [69]:
gdf_p = prod_points.copy()
gdf_l = water_df.copy()


df_n = (gpd.sjoin_nearest(gdf_p, gdf_l)
            .merge(gdf_l['geometry'], left_on="index_right", right_index=True)
            .drop(columns=['index_right'])
            .rename(columns={'index_left': 'ID', 'geometry_x':'point_geom', 'geometry_y':'line_geom'})
            .reset_index(drop=True)
            ) #merge operation adds the geometry column
            
#get distance of location of interest from water. With this distance we can filter out locations by distance from water
# df_n["distance"] = df_n.apply(lambda r: r["geometry_x"].distance(r["geometry_y"]), axis=1)
df_n = df_n.drop_duplicates(subset=['point_geom'])
assert df_n['point_geom'].isnull().values.any() == False
assert df_n['line_geom'].isnull().values.any() == False

Ensure the linestring geometry is not multilinestring

In [70]:
def multiline_to_linestring_col(df, geom_col):
    linestrings = []
    for idx, row in df.iterrows():
        if isinstance(row[geom_col], LineString):
            linestrings.append(row[geom_col])
        elif isinstance(row[geom_col], MultiLineString):
            inlines = row[geom_col]
            outcoords = [list(item.coords) for item in inlines]
            outline = shapely.geometry.LineString(
                [i for sublist in outcoords for i in sublist])
            linestrings.append(outline)
    return linestrings

df_n['line_geom'] = multiline_to_linestring_col(df_n, 'line_geom')

In [71]:
df_n.head(2)

Unnamed: 0,localId,namespace,nameOfFeat,organisati,city,dateOfStar,validFrom,beginLifes,point_geom,line_id,line_geom
0,000057466.FACILITY,DK.CAED,VICUS P ApS (Wedellsborg),Miljøstyrelsen,Odense C,2017-07-20+00:00,,,POINT (4312450.247 3588333.431),1092056957,LINESTRING (4312569.089749929 3588023.78440905...
1,000057500.FACILITY,DK.CAED,Arla Foods Amba Kruså Mejeri,Miljøstyrelsen,Odense C,1988-01-01+00:00,,,POINT (4282657.527 3527407.302),1213686790,LINESTRING (4281874.205463002 3527554.87769692...


In [72]:
def project_point_to_line(df):
    nearest_geoms = []
    for i, row in df.iterrows():
        nearest_point = nearest_points(row['line_geom'], row['point_geom'])
        point = nearest_point[0]
        nearest_geoms.append(point)
    return nearest_geoms

df_n['nearest_point'] = project_point_to_line(df_n)
type(df_n['nearest_point'][0])

shapely.geometry.point.Point

In [73]:
df_n.head(2)

Unnamed: 0,localId,namespace,nameOfFeat,organisati,city,dateOfStar,validFrom,beginLifes,point_geom,line_id,line_geom,nearest_point
0,000057466.FACILITY,DK.CAED,VICUS P ApS (Wedellsborg),Miljøstyrelsen,Odense C,2017-07-20+00:00,,,POINT (4312450.247 3588333.431),1092056957,LINESTRING (4312569.089749929 3588023.78440905...,POINT (4312569.089749929 3588023.7844090504)
1,000057500.FACILITY,DK.CAED,Arla Foods Amba Kruså Mejeri,Miljøstyrelsen,Odense C,1988-01-01+00:00,,,POINT (4282657.527 3527407.302),1213686790,LINESTRING (4281874.205463002 3527554.87769692...,POINT (4282439.747302851 3527784.0230870843)


In [74]:
# test_df = gpd.GeoDataFrame((df_n[['line_id', 'nearest_geoms']].rename(columns={'nearest_geoms':'geometry'})), geometry='geometry')
# test_df = test_df.set_crs(PROJ_CRS)
# test_df.to_file('data/test_data/links_test.shp')

**Filter out objects of interest by distance from network**

In [75]:
# Filret objects of interest by distance
df_n["distance"] = df_n.apply(lambda r: r["point_geom"].distance(r["nearest_point"]), axis=1)

df_filtered = df_n[df_n['distance'] < buffer_distance].reset_index(drop=True)
print(df_filtered.shape)

(646, 13)


**Insert the nearest geoms into the linestring before making connection lines**

In [76]:
df_filtered.columns

Index(['localId', 'namespace', 'nameOfFeat', 'organisati', 'city',
       'dateOfStar', 'validFrom', 'beginLifes', 'point_geom', 'line_id',
       'line_geom', 'nearest_point', 'distance'],
      dtype='object')

**get new linestrings with added new point vertex**

In [77]:
## This is the function that works
def insert_coordinates():
    linestrings = []
    for row in df_filtered.iterrows():
        line = row[1]['line_geom']
        point = row[1]['nearest_point']

        min_dist = float('inf')
        for i, coord in enumerate(line.coords[:-1]):
            dist = LineString([coord, line.coords[i+1]]).distance(point)
            if dist < min_dist:
                min_dist = dist
                index = i + 1

        # Insert the new vertex into the LineString geometry
        new_coords = list(line.coords)
        # print(len(new_coords))
        new_coords.insert(index, point.coords[0])
        new_line = LineString(new_coords)
        linestrings.append(new_line)

    return linestrings

In [78]:
df_filtered['new_line_geom'] = insert_coordinates()

**Make connection lines from objects of interest to the point in the new line**

In [79]:
df_n.columns

Index(['localId', 'namespace', 'nameOfFeat', 'organisati', 'city',
       'dateOfStar', 'validFrom', 'beginLifes', 'point_geom', 'line_id',
       'line_geom', 'nearest_point', 'distance'],
      dtype='object')

In [80]:
def make_connection_lines(df, from_point, to_point):
    lines = []
    for index, row in df.iterrows():
        p_1 = Point(row[from_point])
        p_2 = Point(row[to_point])
        intersect = LineString([p_1, p_2])
        # linestring = loads(intersect)
        lines.append(intersect)
    return lines

df_filtered['connection_lines'] = make_connection_lines(df_filtered, 'point_geom', 'nearest_point')
df_filtered.head(2)

Unnamed: 0,localId,namespace,nameOfFeat,organisati,city,dateOfStar,validFrom,beginLifes,point_geom,line_id,line_geom,nearest_point,distance,new_line_geom,connection_lines
0,000057518.FACILITY,DK.CAED,Arla Foods Amba Branderup Mejeri,Miljøstyrelsen,Odense C,2013-12-31+00:00,,,POINT (4261660.782 3557601.093),1211639814,LINESTRING (4261723.299670125 3557738.63441224...,POINT (4261741.430483028 3557717.0707296734),141.262144,LINESTRING (4261723.299670125 3557738.63441224...,LINESTRING (4261660.78186408 3557601.093172827...
1,000057522.FACILITY,DK.CAED,Arla Foods Amba AKAFA,Miljøstyrelsen,Odense C,2013-12-31+00:00,,,POINT (4311551.696 3763529.360),1105368984,LINESTRING (4311616.3600454405 3763682.4157406...,POINT (4311616.3600454405 3763682.4157406315),166.154595,LINESTRING (4311616.3600454405 3763682.4157406...,LINESTRING (4311551.696413628 3763529.36031273...


In [81]:
df_n.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1736 entries, 0 to 1925
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   localId        1736 non-null   object  
 1   namespace      1736 non-null   object  
 2   nameOfFeat     1736 non-null   object  
 3   organisati     1736 non-null   object  
 4   city           1736 non-null   object  
 5   dateOfStar     1736 non-null   object  
 6   validFrom      0 non-null      float64 
 7   beginLifes     0 non-null      float64 
 8   point_geom     1736 non-null   geometry
 9   line_id        1736 non-null   object  
 10  line_geom      1736 non-null   object  
 11  nearest_point  1736 non-null   object  
 12  distance       1736 non-null   float64 
dtypes: float64(3), geometry(1), object(9)
memory usage: 254.4+ KB


In [82]:
type(df_filtered['new_line_geom'][0])

shapely.geometry.linestring.LineString

**Calculate distance on network segment where the object of interest is referenced**

In [83]:
df_filtered.head(2)

Unnamed: 0,localId,namespace,nameOfFeat,organisati,city,dateOfStar,validFrom,beginLifes,point_geom,line_id,line_geom,nearest_point,distance,new_line_geom,connection_lines
0,000057518.FACILITY,DK.CAED,Arla Foods Amba Branderup Mejeri,Miljøstyrelsen,Odense C,2013-12-31+00:00,,,POINT (4261660.782 3557601.093),1211639814,LINESTRING (4261723.299670125 3557738.63441224...,POINT (4261741.430483028 3557717.0707296734),141.262144,LINESTRING (4261723.299670125 3557738.63441224...,LINESTRING (4261660.78186408 3557601.093172827...
1,000057522.FACILITY,DK.CAED,Arla Foods Amba AKAFA,Miljøstyrelsen,Odense C,2013-12-31+00:00,,,POINT (4311551.696 3763529.360),1105368984,LINESTRING (4311616.3600454405 3763682.4157406...,POINT (4311616.3600454405 3763682.4157406315),166.154595,LINESTRING (4311616.3600454405 3763682.4157406...,LINESTRING (4311551.696413628 3763529.36031273...


In [84]:
df_filtered.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 646 entries, 0 to 645
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   localId           646 non-null    object  
 1   namespace         646 non-null    object  
 2   nameOfFeat        646 non-null    object  
 3   organisati        646 non-null    object  
 4   city              646 non-null    object  
 5   dateOfStar        646 non-null    object  
 6   validFrom         0 non-null      float64 
 7   beginLifes        0 non-null      float64 
 8   point_geom        646 non-null    geometry
 9   line_id           646 non-null    object  
 10  line_geom         646 non-null    object  
 11  nearest_point     646 non-null    object  
 12  distance          646 non-null    float64 
 13  new_line_geom     646 non-null    object  
 14  connection_lines  646 non-null    object  
dtypes: float64(3), geometry(1), object(11)
memory usage: 75.8+ KB


In [85]:
val = 134.766839
round_val = round(val, 2)
round_val

134.77

In [86]:
# def get_ref_distance(df, line_geom, point_geom):
distances = []
for row in df_filtered.iterrows():
    dist = row[1]['new_line_geom'].project(row[1]['nearest_point'])
    distances.append(round(dist, 3))

df_filtered['atPosition'] = distances

In [87]:
df_filtered.head(2)

Unnamed: 0,localId,namespace,nameOfFeat,organisati,city,dateOfStar,validFrom,beginLifes,point_geom,line_id,line_geom,nearest_point,distance,new_line_geom,connection_lines,atPosition
0,000057518.FACILITY,DK.CAED,Arla Foods Amba Branderup Mejeri,Miljøstyrelsen,Odense C,2013-12-31+00:00,,,POINT (4261660.782 3557601.093),1211639814,LINESTRING (4261723.299670125 3557738.63441224...,POINT (4261741.430483028 3557717.0707296734),141.262144,LINESTRING (4261723.299670125 3557738.63441224...,LINESTRING (4261660.78186408 3557601.093172827...,36.487
1,000057522.FACILITY,DK.CAED,Arla Foods Amba AKAFA,Miljøstyrelsen,Odense C,2013-12-31+00:00,,,POINT (4311551.696 3763529.360),1105368984,LINESTRING (4311616.3600454405 3763682.4157406...,POINT (4311616.3600454405 3763682.4157406315),166.154595,LINESTRING (4311616.3600454405 3763682.4157406...,LINESTRING (4311551.696413628 3763529.36031273...,0.0


**Add information about the type of objct of interest**

In [88]:
object_type

'ProductionFacility'

In [89]:
def create_ooi_type(object_type):
    ooi_type = object_type
    return ooi_type

df_filtered['OOI_type'] = create_ooi_type(object_type)

**Add the namespace of the network**

In [90]:
def create_watercourse_namespace(country):
    namespace = 'gopeg.eu/tracing'
    return namespace

df_filtered['watercourse_namespace'] = 'gopeg.eu/tracing'

**Create a df with the final columns**

In [91]:
df_links = (df_filtered.rename(columns={'connection_lines':'geometry',
                                # 'identifier':'OOI_identifier', 
                                 'nameOfFeat': 'OOI_name', 
                                 'localId': 'OOI_localId', 
                                 'namespace': 'OOI_namespace',
                                 'city' : 'location',
                                 'line_id': 'hydroId',
                                 'line_name': 'watercourse_localName',
                                 'basin': 'watercourseBasin'})
                                .reset_index(drop=True))

**Create unique id using UUID**

In [92]:
import uuid
df_links['UUID'] = [uuid.uuid4().hex for _ in range(len(df_links.index))]

In [93]:
df_links.head(2)

Unnamed: 0,OOI_localId,OOI_namespace,OOI_name,organisati,location,dateOfStar,validFrom,beginLifes,point_geom,hydroId,line_geom,nearest_point,distance,new_line_geom,geometry,atPosition,OOI_type,watercourse_namespace,UUID
0,000057518.FACILITY,DK.CAED,Arla Foods Amba Branderup Mejeri,Miljøstyrelsen,Odense C,2013-12-31+00:00,,,POINT (4261660.782 3557601.093),1211639814,LINESTRING (4261723.299670125 3557738.63441224...,POINT (4261741.430483028 3557717.0707296734),141.262144,LINESTRING (4261723.299670125 3557738.63441224...,LINESTRING (4261660.78186408 3557601.093172827...,36.487,ProductionFacility,gopeg.eu/tracing,6bf7b2d13c4147a1af53848993bae163
1,000057522.FACILITY,DK.CAED,Arla Foods Amba AKAFA,Miljøstyrelsen,Odense C,2013-12-31+00:00,,,POINT (4311551.696 3763529.360),1105368984,LINESTRING (4311616.3600454405 3763682.4157406...,POINT (4311616.3600454405 3763682.4157406315),166.154595,LINESTRING (4311616.3600454405 3763682.4157406...,LINESTRING (4311551.696413628 3763529.36031273...,0.0,ProductionFacility,gopeg.eu/tracing,89af23e221314b198bee1b545bb6070a


In [99]:
cols = ['UUID', 'OOI_type', 'OOI_localId', 'OOI_name', 'OOI_namespace', 'location', 'hydroId', 'atPosition', 'watercourse_namespace', 'geometry']

gdf_links = gpd.GeoDataFrame((df_links[cols]), geometry='geometry')

gdf_links= gdf_links.set_crs(PROJ_CRS)
gdf_links_final = gdf_links.to_crs(PROJ_CRS)

In [100]:
gdf_links_final.head(2)

Unnamed: 0,UUID,OOI_type,OOI_localId,OOI_name,OOI_namespace,location,hydroId,atPosition,watercourse_namespace,geometry
0,6bf7b2d13c4147a1af53848993bae163,ProductionFacility,000057518.FACILITY,Arla Foods Amba Branderup Mejeri,DK.CAED,Odense C,1211639814,36.487,gopeg.eu/tracing,"LINESTRING (4261660.782 3557601.093, 4261741.4..."
1,89af23e221314b198bee1b545bb6070a,ProductionFacility,000057522.FACILITY,Arla Foods Amba AKAFA,DK.CAED,Odense C,1105368984,0.0,gopeg.eu/tracing,"LINESTRING (4311551.696 3763529.360, 4311616.3..."


In [101]:
gdf_links_final.crs

<Derived Projected CRS: EPSG:3035>
Name: ETRS89-extended / LAEA Europe
Axis Info [cartesian]:
- Y[north]: Northing (metre)
- X[east]: Easting (metre)
Area of Use:
- name: Europe - European Union (EU) countries and candidates. Europe - onshore and offshore: Albania; Andorra; Austria; Belgium; Bosnia and Herzegovina; Bulgaria; Croatia; Cyprus; Czechia; Denmark; Estonia; Faroe Islands; Finland; France; Germany; Gibraltar; Greece; Hungary; Iceland; Ireland; Italy; Kosovo; Latvia; Liechtenstein; Lithuania; Luxembourg; Malta; Monaco; Montenegro; Netherlands; North Macedonia; Norway including Svalbard and Jan Mayen; Poland; Portugal including Madeira and Azores; Romania; San Marino; Serbia; Slovakia; Slovenia; Spain including Canary Islands; Sweden; Switzerland; Turkey; United Kingdom (UK) including Channel Islands and Isle of Man; Vatican City State.
- bounds: (-35.58, 24.6, 44.83, 84.73)
Coordinate Operation:
- name: Europe Equal Area 2001
- method: Lambert Azimuthal Equal Area
Datum: Europ

In [102]:
pwd

'c:\\Workdir\\Develop\\repository\\go-peg'

In [103]:
# gdf_links_final.to_file(f"harmonized_data/{region}_ObjectsOfInterest.gpkg", layer=f"{object_type}_links", driver='GPKG')