This code creates a buffer(50 meters max) around the spill points and finds the nearest flowline intersection, asserting that the operator name is the same. It then changes the matches spills geometry location to its respective flowline intersection point.

## Setup

In [1]:
import os
import pandas as pd
import geopandas as gpd
from shapely.geometry import MultiLineString, LineString, Point
from shapely.ops import nearest_points

os.chdir('/Users/ichittumuri/Desktop/MINES/COGCC-Risk-Analysis/Data')
pd.options.display.max_columns = None

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [2]:
# Load Data
all_flowlines_gdf = gpd.read_file('all_flowlines.geojson')
spills_gdf = gpd.read_file('spills.geojson')

In [3]:
# Check if CRS is the same for both files
if all_flowlines_gdf.crs != spills_gdf.crs:
    flowlines_gdf = all_flowlines_gdf.to_crs(spills_gdf.crs)

In [4]:
print(spills_gdf.crs)
print(all_flowlines_gdf.crs)

EPSG:26913
EPSG:26913


In [5]:
# Check size
print(all_flowlines_gdf.shape)
print(spills_gdf.shape)

(335177, 26)
(849, 14)


In [6]:
print(spills_gdf.is_valid.all())
print(all_flowlines_gdf.is_valid.all())

False
False


In [7]:
spills_gdf = spills_gdf[spills_gdf.is_valid]
all_flowlines_gdf = all_flowlines_gdf[all_flowlines_gdf.is_valid]

In [8]:
print(spills_gdf.is_valid.all())
print(all_flowlines_gdf.is_valid.all())

True
True


In [9]:
all_flowlines_gdf.head(2)

Unnamed: 0,Operator,Fluid,Material,Diam_in,Status,Length_ft,SHAPE_Length,LOCATION_ID,FLOWLINEID,STARTLOCATIONID,FLOWLINEACTION,ENTIRELINEREMOVED,ACTIONDESCRIPTION,RECEIVE_DATE,OPERATOR_NUM,COMPANY_NAME,LOCATIONTYPE,ENDLAT,ENDLONG,STARTLAT,STARTLONG,PIPEMATERIAL,BEDDINGMATERIAL,TYPEOFFLUIDTRANS,MAXOPPRESSURE,geometry
0,EVERGREEN NATURAL RESOURCES LLC,Gas,polly,4.0,Active,2277.71,693.972162,,,,,,,,,,,,,,,,,,,"MULTILINESTRING ((524642.670 4117088.796, 5246..."
1,NOBLE ENERGY INC,Multiphase,Carbon Steel,3.0,Abandoned,651.58,198.525215,,,,,,,,,,,,,,,,,,,"MULTILINESTRING ((527997.281 4463899.920, 5281..."


In [10]:
spills_gdf.head(2)

Unnamed: 0,trkg_num,Operator Name,facility_type,Spill_Desc,Spill Type,Root Cause,Preventative Measure,Root Cause Type,Detailed Root Cause Type,Long,Lat,facility_status,Metallic?,geometry
0,401530532,K P KAUFFMAN COMPANY INC,FLOWLINE,Historical contamination was discovered during...,1,Facility #7 consolidation line failed do to un...,The damaged section of flowline was repaired a...,Unknown,Unknown,-104.914183,40.019361,CL,Unknown,POINT (507323.245 4429909.631)
1,401524345,GREAT WESTERN OPERATING COMPANY LLC,TANK BATTERY,Soil impacts were discovered during removal of...,1,Unknown,,Unknown,Unknown,-104.467746,39.602613,AC,Unknown,POINT (545695.434 4383787.964)


In [11]:
# Initialize an empty GeoDataFrame to store the updated spills
matched_spills_gdf = gpd.GeoDataFrame(columns=spills_gdf.columns, crs=spills_gdf.crs)

max_buffer_distance = 49.5  # Maximum buffer distance in meters
initial_buffer_distance = 0  # Initial buffer distance

for index, spill in spills_gdf.iterrows():
    buffer_distance = initial_buffer_distance
    match_found = False
    
    while not match_found and buffer_distance <= max_buffer_distance:
        # Buffer the spill geometry
        buffered_spill = spill.geometry.buffer(buffer_distance)
        temp_spill_gdf = gpd.GeoDataFrame([spill], geometry=[buffered_spill], crs=spills_gdf.crs)
        
        # Perform the spatial join
        joined_gdf = gpd.sjoin(temp_spill_gdf, all_flowlines_gdf, how='inner', predicate='intersects')

        if not joined_gdf.empty:
            for _, match in joined_gdf.iterrows():
                if match['Operator Name'].strip().lower() == match['Operator'].strip().lower():
                    print(f"Operator match found at buffer distance {buffer_distance} meters for spill at index {index}.")
                    match_found = True
                    
                    # Find the nearest point on the flowline to the original spill location
                    nearest_geom = nearest_points(spill.geometry, match.geometry)[1]
                    
                    # Update the spill's geometry to this nearest point
                    updated_spill = spill.copy()
                    updated_spill.geometry = nearest_geom
                    
                    # Append the updated spill to the matched_spills_gdf
                    # matched_spills_gdf = matched_spills_gdf.append(updated_spill, ignore_index=True)
                    matched_spills_gdf = pd.concat([matched_spills_gdf, gpd.GeoDataFrame([updated_spill], crs=spills_gdf.crs)], ignore_index=True)

                    
                    break
            
            if match_found:
                break
        
        if not match_found:
            buffer_distance += .5

    if not match_found:
        print(f"No match found for spill at index {index} even after expanding buffer to {buffer_distance} meters.")

No match found for spill at index 0 even after expanding buffer to 50.0 meters.
No match found for spill at index 1 even after expanding buffer to 50.0 meters.
No match found for spill at index 2 even after expanding buffer to 50.0 meters.
No match found for spill at index 3 even after expanding buffer to 50.0 meters.
No match found for spill at index 4 even after expanding buffer to 50.0 meters.
Operator match found at buffer distance 11.0 meters for spill at index 5.


  matched_spills_gdf = pd.concat([matched_spills_gdf, gpd.GeoDataFrame([updated_spill], crs=spills_gdf.crs)], ignore_index=True)


Operator match found at buffer distance 18.0 meters for spill at index 6.
No match found for spill at index 7 even after expanding buffer to 50.0 meters.
No match found for spill at index 8 even after expanding buffer to 50.0 meters.
No match found for spill at index 9 even after expanding buffer to 50.0 meters.
Operator match found at buffer distance 27.5 meters for spill at index 10.
No match found for spill at index 11 even after expanding buffer to 50.0 meters.
Operator match found at buffer distance 3.5 meters for spill at index 12.
Operator match found at buffer distance 19.0 meters for spill at index 13.
No match found for spill at index 14 even after expanding buffer to 50.0 meters.
Operator match found at buffer distance 3.0 meters for spill at index 15.
No match found for spill at index 16 even after expanding buffer to 50.0 meters.
Operator match found at buffer distance 4.5 meters for spill at index 17.
Operator match found at buffer distance 12.5 meters for spill at index 

In [12]:
matched_spills_gdf.to_file("matched_spills.geojson", driver='GeoJSON')

In [13]:
matched_spills_gdf.shape

(417, 14)