<a href="https://colab.research.google.com/github/kschweikert/GeoDataManager/blob/master/UseCases/UC3-Tracing/UC3-CQ15/SAWGRAPH_demo_UC3_Q15.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

The purpose of this notebook is to query the SAWGraph knowledge graph for paper industry facilities and their nearby sample results for PFAS testing.

For a specific industry of concern (e.g. papermills), how are the known releases linked to contamination in water and to fish tissue samples nearby or downstream?

# Setup

Here we set up SPARQLWrapper to work with our endpoint and create our query.

## Install & Import Statements

Install: The SPARQLWrapper libary provides tools for querying SPARQL endpoints. The sparql_dataframe library can be used with SPARQLWrapper to convert JSON results from a SPARQL query directly to a Pandas dataframe. The mapclassify library is required by GeoPandas for its .explore functionality.

Import: See the inline comments for a brief rational of each library.

In [None]:
%%capture
!pip install mapclassify --upgrade --quiet
!pip install SPARQLWrapper --upgrade --quiet
!pip install sparql_dataframe --upgrade --quiet

In [None]:
from branca.element import Figure                                  # For controlling the size of the final map
import folium                                                      # For map layer control
import geopandas as gpd                                            # For geospatial dataframes
import pandas as pd                                                # For dataframes
from shapely import wkt                                            # For working with WKT coordinates in a GeoDataFrame
from SPARQLWrapper import SPARQLWrapper, JSON, GET, POST, DIGEST   # For querying SPARQL endpoints
import sparql_dataframe                                            # For converting SPARQL query results to Pandas dataframes

## Variable Initialization

A SPARQLWrapper is created to access the SAWGraph repository for the SAWGraph project.

In [None]:
%%capture

server = "https://gdb.acg.maine.edu:7200/" # @param ["https://gdb.acg.maine.edu:7200/","http://tarski.ume/maine.edu:7200/"] {"allow-input":true}
#pd.options.display.width = 240

industry = "NAICS-IndustryGroup-3222" # @param ["NAICS-IndustryGroup-3222","NAICS-Subsector-322"]

if server == "https://gdb.acg.maine.edu:7200/":
  endpointGET = 'https://gdb.acg.maine.edu:7200/repositories/Hydrology'
  hydrology = "Hydrology"
  admin = "S2L13_AdminRegions"
elif server == "http://tarski.ume/maine.edu:7200/":
  endpointGET = 'http://tarski.ume.maine.edu:7200/repositories/Hydrology'
  hydrology ="ME_Hydrology"
  admin ="ME_S2L13_and_AdminRegions"


sparqlGET = SPARQLWrapper(endpointGET)
sparqlGET.setHTTPAuth(DIGEST)
sparqlGET.setCredentials('sawgraph-endpoint', 'skailab')
sparqlGET.setMethod(GET)
sparqlGET.setReturnFormat(JSON)


## Queries

This query directly accesses data in  SAWGraph Knowledge Graph. It uses federation to access additional data in the FIO, S2L13_AdminRegions and Hydrology repositories.



The query is executed and returned as a dataframe.

In [None]:
%%time
#Get paper product facilities
query1 = f"""
PREFIX naics: <http://sawgraph.spatialai.org/v1/fio/naics#>
PREFIX fio: <http://sawgraph.spatialai.org/v1/fio#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX kwgr: <http://stko-kwg.geog.ucsb.edu/lod/resource/>
select DISTINCT ?facility ?faclabel (concat('[',group_concat(DISTINCT ?industry;separator='; '),']') as ?industries) (group_concat(DISTINCT ?industryLabel;separator='; ') as ?industryLabels) ?fac_wkt ?s2
where  {{
    SERVICE <repository:FIO> {{
    	?facility a fio:Facility .
        ?facility rdfs:label ?faclabel.
        ?facility geo:hasGeometry/geo:asWKT ?fac_wkt .
    	?facility fio:ofIndustry ?industry.
      ?industry fio:subcodeOf naics:{industry}. #Only Converted Paper Manufacturing: 10 facilities
    	  ?industry rdfs:label ?industryLabel.
      ?facility kwg-ont:sfWithin ?s2.
      ?facility kwg-ont:sfWithin ?countysub.
    }}
     SERVICE <repository:{admin}>
    {{
        ?s2 a kwg-ont:S2Cell_Level13.
        ?countysub a kwg-ont:AdministrativeRegion_3.
        ?countysub kwg-ont:administrativePartOf+ kwgr:administrativeRegion.USA.23.
    }}
}} GROUP BY ?facility ?faclabel ?fac_wkt ?s2
"""
#Get s2 cells of facilities
query2 = f"""
PREFIX naics: <http://sawgraph.spatialai.org/v1/fio/naics#>
PREFIX fio: <http://sawgraph.spatialai.org/v1/fio#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
select DISTINCT ?s2 ?s2WKT where  {{
    SERVICE <repository:FIO> {{
    	?facility a fio:Facility .
    	?facility fio:ofIndustry ?industry.
      ?industry fio:subcodeOf naics:{industry}.
      ?facility kwg-ont:sfWithin ?s2.
    }}
     SERVICE <repository:{admin}>
    {{
        ?s2 a kwg-ont:S2Cell_Level13.
        ?s2 geo:hasGeometry/geo:asWKT ?s2WKT .

    }}
}}

"""
#Get waterbodies/flowlines from s2 cells
query3=f"""
PREFIX naics: <http://sawgraph.spatialai.org/v1/fio/naics#>
PREFIX fio: <http://sawgraph.spatialai.org/v1/fio#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX hyf: <https://www.opengis.net/def/schema/hy_features/hyf/>
PREFIX schema: <https://schema.org/>
select DISTINCT ?s2 ?reach ?name ?wb_wkt where  {{
    SERVICE <repository:FIO> {{
    	?facility a fio:Facility .
    	?facility fio:ofIndustry ?industry.
      ?industry fio:subcodeOf naics:{industry}.
      ?facility kwg-ont:sfWithin ?s2.
    }}
     SERVICE <repository:{admin}>
    {{
        ?s2 a kwg-ont:S2Cell_Level13.

    }}
    SERVICE <repository:{hydrology}>
    {{
        ?s2 kwg-ont:sfCrosses ?reach.
        ?reach a hyf:HY_FlowPath.

        ?reach geo:hasGeometry/geo:asWKT ?wb_wkt .
        OPTIONAL{{?reach schema:name ?name}}
    }}
}}
"""
#get downstream waterbodies
query4 = f"""
PREFIX naics: <http://sawgraph.spatialai.org/v1/fio/naics#>
PREFIX fio: <http://sawgraph.spatialai.org/v1/fio#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX hyf: <https://www.opengis.net/def/schema/hy_features/hyf/>
PREFIX saw_water: <http://sawgraph.spatialai.org/v1/saw_water#>
select ?downstream ?downstream_wkt ?fl_type where  {{
    SERVICE <repository:FIO> {{
    	?facility a fio:Facility .
    	?facility fio:ofIndustry ?industry.
      ?industry fio:subcodeOf naics:{industry}.
      ?facility kwg-ont:sfWithin ?s2.
    }}
     SERVICE <repository:{admin}>
    {{
        ?s2 a kwg-ont:S2Cell_Level13.

    }}
    SERVICE <repository:{hydrology}>
    {{
        ?s2 kwg-ont:sfCrosses ?reach.
        ?reach a hyf:HY_FlowPath.
        ?reach hyf:downstreamWaterbody+ ?downstream.
        ?downstream saw_water:hasFTYPE ?fl_type .
        FILTER ( ?fl_type != "Coastline" )
        ?downstream geo:hasGeometry/geo:asWKT ?downstream_wkt .
    }}
}} GROUP BY ?downstream ?downstream_wkt ?fl_type
"""

#get s2 cells from downstream waterbodies
query5 = f"""
PREFIX naics: <http://sawgraph.spatialai.org/v1/fio/naics#>
PREFIX fio: <http://sawgraph.spatialai.org/v1/fio#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX hyf: <https://www.opengis.net/def/schema/hy_features/hyf/>
PREFIX saw_water: <http://sawgraph.spatialai.org/v1/saw_water#>
select DISTINCT ?s2downstream ?downstream_s2_wkt where  {{
    SERVICE <repository:FIO> {{
    	?facility a fio:Facility .
    	?facility fio:ofIndustry ?industry.
      ?industry fio:subcodeOf naics:{industry}.
      ?facility kwg-ont:sfWithin ?s2.
    }}
    SERVICE <repository:{hydrology}>
    {{
        ?s2 kwg-ont:sfCrosses ?reach.
        ?reach a hyf:HY_FlowPath.
        ?reach hyf:downstreamWaterbody+ ?downstream.
        ?downstream saw_water:hasFTYPE ?fl_type .
        FILTER ( ?fl_type != "Coastline" )
        ?downstream kwg-ont:sfCrosses ?s2downstream.
    }}
         SERVICE <repository:{admin}>
    {{
        ?s2 a kwg-ont:S2Cell_Level13.
        ?s2downstream a kwg-ont:S2Cell_Level13.
        ?s2downstream geo:hasGeometry/geo:asWKT ?downstream_s2_wkt .

    }}
}} GROUP BY ?s2downstream ?downstream_s2_wkt
"""

query6 = f"""
PREFIX naics: <http://sawgraph.spatialai.org/v1/fio/naics#>
PREFIX fio: <http://sawgraph.spatialai.org/v1/fio#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX kwg-ont: <http://stko-kwg.geog.ucsb.edu/lod/ontology/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX hyf: <https://www.opengis.net/def/schema/hy_features/hyf/>
PREFIX coso: <http://sawgraph.spatialai.org/v1/contaminoso#>
PREFIX me_egad: <http://sawgraph.spatialai.org/v1/me-egad#>
PREFIX me_egad_data: <http://sawgraph.spatialai.org/v1/me-egad-data#>
PREFIX sosa: <http://www.w3.org/ns/sosa/>
PREFIX qudt: <http://qudt.org/schema/qudt/>
PREFIX saw_water: <http://sawgraph.spatialai.org/v1/saw_water#>
PREFIX stad: <http://sawgraph.spatialai.org/v1/stad#>
select ?samplePoint ?sp_wkt (?ptl as ?pointType) ?materialSample ?observation  ?sampleType ?substance ?substanceL ?value ?unit
where  {{
  #get all distinct downstream s2 cells
    {{select ?s2downstream where  {{
    SERVICE <repository:FIO> {{
    	?facility a fio:Facility .
    	?facility fio:ofIndustry ?industry.
      ?industry fio:subcodeOf naics:{industry}.
      ?facility kwg-ont:sfWithin ?s2.
    }}
    SERVICE <repository:Hydrology>
    {{
        ?s2 kwg-ont:sfCrosses ?reach.
        ?reach a hyf:HY_FlowPath.
        ?reach hyf:downstreamWaterbody+ ?downstream.
        ?downstream saw_water:hasFTYPE ?fl_type .
        FILTER ( ?fl_type != "Coastline" )
        ?downstream kwg-ont:sfCrosses ?s2downstream.
    }}
         SERVICE <repository:S2L13_AdminRegions>
    {{
        ?s2 a kwg-ont:S2Cell_Level13.
        ?s2downstream a kwg-ont:S2Cell_Level13.
    }}
                }} GROUP BY ?s2downstream }}
    #samples in s2 cells, results of samples
    SERVICE <repository:SAWGraph>
    {{
        ?s2downstream kwg-ont:sfContains ?samplePoint.
        ?samplePoint a coso:SamplePoint.
        ?samplePoint geo:hasGeometry/geo:asWKT ?sp_wkt .
        ?samplePoint me_egad:samplePointType ?pointType.
        VALUES ?pointType {{me_egad:featureType.PD me_egad:featureType.RI me_egad:featureType.LK}}
		    ?pointType rdfs:label ?ptl.
        ?samplePoint ^coso:fromSamplePoint ?materialSample.
        ?materialSample coso:ofSampleMaterialType ?st.
        ?st rdfs:label ?sampleType.
        ?observation coso:analyzedSample ?materialSample.
        ?observation coso:ofSubstance ?substance.
        ?substance rdfs:label ?substanceL.
        ?observation sosa:hasResult ?measure.
        ?measure qudt:quantityValue ?result.
        ?result qudt:numericValue ?value.
        ?result qudt:unit ?unit.
        FILTER NOT EXISTS{{ ?measure a stad:StatisticalAggregateData}}
    }}
}} GROUP BY ?samplePoint ?sp_wkt ?ptl ?observation ?materialSample ?st ?sampleType ?substance ?substanceL ?value ?unit
"""


CPU times: user 11 µs, sys: 1 µs, total: 12 µs
Wall time: 15 µs


### Run Queries

In [None]:
%%time
facility_df = sparql_dataframe.get(endpointGET, query1)
print(facility_df.info())
facility_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   facility        10 non-null     object
 1   faclabel        10 non-null     object
 2   industries      10 non-null     object
 3   industryLabels  10 non-null     object
 4   fac_wkt         10 non-null     object
 5   s2              10 non-null     object
dtypes: object(6)
memory usage: 608.0+ bytes
None
CPU times: user 36.3 ms, sys: 2.87 ms, total: 39.1 ms
Wall time: 1.96 s


Unnamed: 0,facility,faclabel,industries,industryLabels,fac_wkt,s2
0,http://sawgraph.spatialai.org/v1/us-frs-data#d...,SOUTHERN CONTAINER CORP.,[http://sawgraph.spatialai.org/v1/fio/naics#NA...,Corrugated and Solid Fiber Box Manufacturing,POINT (-70.35708 43.67662),http://stko-kwg.geog.ucsb.edu/lod/resource/s2....
1,http://sawgraph.spatialai.org/v1/us-frs-data#d...,INTERNATIONAL PAPER CONTAINER FACILITY,[http://sawgraph.spatialai.org/v1/fio/naics#NA...,Corrugated and Solid Fiber Box Manufacturing,POINT (-70.26326 44.0383),http://stko-kwg.geog.ucsb.edu/lod/resource/s2....
2,http://sawgraph.spatialai.org/v1/us-frs-data#d...,R T S PACKAGING LLC,[http://sawgraph.spatialai.org/v1/fio/naics#NA...,Corrugated and Solid Fiber Box Manufacturing ;...,POINT (-70.35584 43.57448),http://stko-kwg.geog.ucsb.edu/lod/resource/s2....
3,http://sawgraph.spatialai.org/v1/us-frs-data#d...,INTERNATIONAL PAPER COMPANY-PASSADUMKEAG,[http://sawgraph.spatialai.org/v1/fio/naics#NA...,Corrugated and Solid Fiber Box Manufacturing,POINT (-68.59575 45.21394),http://stko-kwg.geog.ucsb.edu/lod/resource/s2....
4,http://sawgraph.spatialai.org/v1/us-frs-data#d...,VOLK PACKAGING CORP,[http://sawgraph.spatialai.org/v1/fio/naics#NA...,Corrugated and Solid Fiber Box Manufacturing,POINT (-70.49212 43.46834),http://stko-kwg.geog.ucsb.edu/lod/resource/s2....


In [None]:
%%time
s2_df = sparql_dataframe.get(endpointGET, query2)
print(s2_df.info())
s2_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 174 entries, 0 to 173
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   s2      174 non-null    object
 1   s2WKT   174 non-null    object
dtypes: object(2)
memory usage: 2.8+ KB
None
CPU times: user 26.4 ms, sys: 1.98 ms, total: 28.4 ms
Wall time: 516 ms


Unnamed: 0,s2,s2WKT
0,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,POLYGON ((-87.65811301590037 41.90577043614168...
1,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,"POLYGON ((-88.2363819482265 41.81677679023794,..."
2,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,POLYGON ((-87.96288404743616 42.04037370916136...
3,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,"POLYGON ((-89.46476438041994 41.8389630314584,..."
4,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,"POLYGON ((-88.1779181738307 39.52394302809989,..."


In [None]:
%%time
wb_df = sparql_dataframe.get(endpointGET, query3)
print(wb_df.info())
wb_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   s2      16 non-null     object
 1   reach   16 non-null     object
 2   name    8 non-null      object
 3   wb_wkt  16 non-null     object
dtypes: object(4)
memory usage: 640.0+ bytes
None
CPU times: user 27 ms, sys: 892 µs, total: 27.9 ms
Wall time: 557 ms


Unnamed: 0,s2,reach,name,wb_wkt
0,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,https://geoconnex.us/nhdplusv2/comid/6721463,,LINESTRING Z (-70.36252286789 43.6917159988483...
1,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,https://geoconnex.us/nhdplusv2/comid/6722645,,LINESTRING Z (-70.35524546790128 43.6825685988...
2,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,https://geoconnex.us/nhdplusv2/comid/6722643,Presumpscot River,LINESTRING Z (-70.3518580679065 43.68260813219...
3,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,https://geoconnex.us/nhdplusv2/comid/6722641,Presumpscot River,LINESTRING Z (-70.3547300679021 43.68238179886...
4,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,https://geoconnex.us/nhdplusv2/comid/6722647,Presumpscot River,LINESTRING Z (-70.37728700120039 43.6959521321...


In [None]:
%%time
downstream_df = sparql_dataframe.get(endpointGET, query4)
print(downstream_df.info())
downstream_df


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 170 entries, 0 to 169
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   downstream      170 non-null    object
 1   downstream_wkt  170 non-null    object
 2   fl_type         170 non-null    object
dtypes: object(3)
memory usage: 4.1+ KB
None
CPU times: user 32.5 ms, sys: 0 ns, total: 32.5 ms
Wall time: 769 ms


Unnamed: 0,downstream,downstream_wkt,fl_type
0,https://geoconnex.us/nhdplusv2/comid/6721463,LINESTRING Z (-70.36252286789 43.6917159988483...,StreamRiver
1,https://geoconnex.us/nhdplusv2/comid/6722645,LINESTRING Z (-70.35524546790128 43.6825685988...,ArtificialPath
2,https://geoconnex.us/nhdplusv2/comid/6722641,LINESTRING Z (-70.3547300679021 43.68238179886...,ArtificialPath
3,https://geoconnex.us/nhdplusv2/comid/6722643,LINESTRING Z (-70.3518580679065 43.68260813219...,Connector
4,https://geoconnex.us/nhdplusv2/comid/6721459,LINESTRING Z (-70.35039226790883 43.6852205988...,StreamRiver
...,...,...,...
165,https://geoconnex.us/nhdplusv2/comid/5205318,LINESTRING Z (-69.78827760211465 43.8008553320...,ArtificialPath
166,https://geoconnex.us/nhdplusv2/comid/5205320,LINESTRING Z (-69.78662100211722 43.7810855320...,ArtificialPath
167,https://geoconnex.us/nhdplusv2/comid/6721969,LINESTRING Z (-70.31693186796076 43.6103211323...,StreamRiver
168,https://geoconnex.us/nhdplusv2/comid/6724793,LINESTRING Z (-70.33190566793752 43.5723221323...,ArtificialPath


In [None]:
%%time
s2downstream_df = sparql_dataframe.get(endpointGET, query5)
s2downstream_df.head()

CPU times: user 25.3 ms, sys: 2.95 ms, total: 28.3 ms
Wall time: 1.25 s


Unnamed: 0,s2downstream,downstream_s2_wkt
0,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,POLYGON ((-70.35017474667907 43.69377556359776...
1,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,POLYGON ((-70.34410516194998 43.68329391651671...
2,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,POLYGON ((-70.35613332622616 43.68543907221098...
3,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,POLYGON ((-70.33814352527287 43.69162912924049...
4,http://stko-kwg.geog.ucsb.edu/lod/resource/s2....,POLYGON ((-70.32611206101447 43.68948123590661...


In [None]:
%%time
samples_df = sparql_dataframe.get(endpointGET, query6)
samples_df.head()

CPU times: user 29.3 ms, sys: 10.4 ms, total: 39.7 ms
Wall time: 1.36 s


Unnamed: 0,samplePoint,sp_wkt,pointType,materialSample,observation,sampleType,substance,substanceL,value,unit
0,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.3239539 43.7041331),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONIC ACID,2.03,http://qudt.org/vocab/unitNanoGM-PER-L
1,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.3239539 43.7041331),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROPENTANOIC ACID,1.07,http://qudt.org/vocab/unitNanoGM-PER-L
2,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.3239539 43.7041331),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,6:2 FLUOROTELOMER SULFONIC ACID,3.32,http://qudt.org/vocab/unitNanoGM-PER-L
3,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.3239539 43.7041331),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROHEXANOIC ACID,0.973,http://qudt.org/vocab/unitNanoGM-PER-L
4,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.3239539 43.7041331),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANOIC ACID,1.89,http://qudt.org/vocab/unitNanoGM-PER-L


In [None]:
samples_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 479 entries, 0 to 478
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   samplePoint     479 non-null    object 
 1   sp_wkt          479 non-null    object 
 2   pointType       479 non-null    object 
 3   materialSample  479 non-null    object 
 4   observation     479 non-null    object 
 5   sampleType      479 non-null    object 
 6   substance       479 non-null    object 
 7   substanceL      479 non-null    object 
 8   value           479 non-null    float64
 9   unit            479 non-null    object 
dtypes: float64(1), object(9)
memory usage: 37.5+ KB


In [None]:
samples_df.to_csv('samples.csv')
#files.download('samples.csv')

In [None]:
facility_df['fac_wkt'] = facility_df['fac_wkt'].apply(wkt.loads)
s2_df['s2WKT'] = s2_df['s2WKT'].apply(wkt.loads)


In [None]:
wb_df['wb_wkt'] = wb_df['wb_wkt'].apply(wkt.loads)
downstream_df['downstream_wkt'] = downstream_df['downstream_wkt'].apply(wkt.loads)
s2downstream_df['downstream_s2_wkt'] = s2downstream_df['downstream_s2_wkt'].apply(wkt.loads)
samples_df['sp_wkt'] = samples_df['sp_wkt'].apply(wkt.loads)

Create Geodataframes

In [None]:
facility_gdf = gpd.GeoDataFrame(facility_df, geometry='fac_wkt')
facility_gdf.set_crs(epsg=4326, inplace=True, allow_override=True)

s2_gdf = gpd.GeoDataFrame(s2_df, geometry='s2WKT')
s2_gdf.set_crs(epsg=4326, inplace=True, allow_override=True)

wb_gdf = gpd.GeoDataFrame(wb_df, geometry='wb_wkt')
wb_gdf.set_crs(epsg=4326, inplace=True, allow_override=True)

downstream_gdf = gpd.GeoDataFrame(downstream_df, geometry='downstream_wkt')
downstream_gdf.set_crs(epsg=4326, inplace=True, allow_override=True)

s2downstream_gdf = gpd.GeoDataFrame(s2downstream_df, geometry='downstream_s2_wkt')
s2downstream_gdf.set_crs(epsg=4326, inplace=True, allow_override=True)

samples_gdf = gpd.GeoDataFrame(samples_df, geometry='sp_wkt')
samples_gdf.set_crs(epsg=4326, inplace=True, allow_override=True)

Unnamed: 0,samplePoint,sp_wkt,pointType,materialSample,observation,sampleType,substance,substanceL,value,unit
0,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.32395 43.70413),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONIC ACID,2.0300,http://qudt.org/vocab/unitNanoGM-PER-L
1,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.32395 43.70413),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROPENTANOIC ACID,1.0700,http://qudt.org/vocab/unitNanoGM-PER-L
2,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.32395 43.70413),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,6:2 FLUOROTELOMER SULFONIC ACID,3.3200,http://qudt.org/vocab/unitNanoGM-PER-L
3,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.32395 43.70413),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROHEXANOIC ACID,0.9730,http://qudt.org/vocab/unitNanoGM-PER-L
4,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.32395 43.70413),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SURFACE WATER,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANOIC ACID,1.8900,http://qudt.org/vocab/unitNanoGM-PER-L
...,...,...,...,...,...,...,...,...,...,...
474,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.7632 44.12443),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SKINLESS FILLET,http://sawgraph.spatialai.org/v1/me-egad#param...,N-ETHYL PERFLUOROOCTANE SULFONAMIDOACETIC ACID,0.1738,http://sawgraph.spatialai.org/v1/me-egad#unit....
475,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.7632 44.12443),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SKINLESS FILLET,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,6.7070,http://sawgraph.spatialai.org/v1/me-egad#unit....
476,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.7632 44.12443),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SKINLESS FILLET,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUORODECANOATE,0.8644,http://sawgraph.spatialai.org/v1/me-egad#unit....
477,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.7632 44.12443),RIVER,http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad-data#...,SKINLESS FILLET,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONAMIDE,0.7263,http://sawgraph.spatialai.org/v1/me-egad#unit....


Group the samples by location

In [None]:
agg_samples_df = samples_gdf[['samplePoint', 'sp_wkt', 'sampleType', 'value', 'substanceL', 'unit']].sort_values(by=['value'], ascending=False).groupby(['samplePoint', 'sp_wkt', 'sampleType']).agg(max=('value', 'max'), max_chem=('substanceL', 'first'), max_unit=('unit', 'first'), min=('value', 'min')).reset_index()

In [None]:
agg_samples_df

Unnamed: 0,samplePoint,sp_wkt,sampleType,max,max_chem,max_unit,min
0,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.62967 44.54454),SKINLESS FILLET,20.42,PERFLUOROOCTANE SULFONATE,http://sawgraph.spatialai.org/v1/me-egad#unit....,0.4083
1,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.62967 44.54454),SURFACE WATER,1.72,PERFLUOROBUTANOIC ACID,http://qudt.org/vocab/unitNanoGM-PER-L,0.284
2,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.67614 44.5011),SKIN-ON FILLET,28.3,PERFLUOROOCTANE SULFONATE,http://sawgraph.spatialai.org/v1/me-egad#unit....,3.7
3,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-70.46423 43.49886),SKIN-ON FILLET,3.61,PERFLUOROOCTANE SULFONATE,http://sawgraph.spatialai.org/v1/me-egad#unit....,0.431
4,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.7056 44.42156),SKIN-ON FILLET,32.2,PERFLUOROOCTANE SULFONATE,http://sawgraph.spatialai.org/v1/me-egad#unit....,0.59
5,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.67614 44.5011),SKIN-ON FILLET,33.5,PERFLUOROOCTANE SULFONATE,http://sawgraph.spatialai.org/v1/me-egad#unit....,0.348
6,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.61371 44.5867),SKIN-ON FILLET,836.0,PERFLUOROOCTANE SULFONIC ACID,http://sawgraph.spatialai.org/v1/me-egad#unit....,0.052
7,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.61371 44.5867),SKINLESS FILLET,660.0,PERFLUOROOCTANE SULFONIC ACID,http://sawgraph.spatialai.org/v1/me-egad#unit....,0.028
8,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.61371 44.5867),SURFACE WATER,2540.0,PERFLUOROOCTANE SULFONIC ACID,http://qudt.org/vocab/unitNanoGM-PER-L,0.317
9,http://sawgraph.spatialai.org/v1/me-egad-data#...,POINT (-69.77657 44.30039),SKINLESS FILLET,12.5,PERFLUOROOCTANE SULFONATE,http://sawgraph.spatialai.org/v1/me-egad#unit....,0.1475


In [None]:
agg_samples_gdf = gpd.GeoDataFrame(agg_samples_df, geometry='sp_wkt')
agg_samples_gdf.set_crs(epsg=4326, inplace=True, allow_override=True)
agg_samples_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   samplePoint  18 non-null     object  
 1   sp_wkt       18 non-null     geometry
 2   sampleType   18 non-null     object  
 3   max          18 non-null     float64 
 4   max_chem     18 non-null     object  
 5   max_unit     18 non-null     object  
 6   min          18 non-null     float64 
dtypes: float64(2), geometry(1), object(4)
memory usage: 1.1+ KB


# Visualizing on a map

## Create map with multiple layers

Each GeoDataFrame is a layer in the final map.

In [None]:
from folium.plugins import MarkerCluster
from folium import CircleMarker

In [None]:
#%%capture
wb_color = 'DarkBlue'
downstream_color= 'DodgerBlue'
fac_color = 'red'
s2_color = 'light gray'
s2n_color = 'LightCyan'
boundweight = 3
industry=folium.map.Icon(icon='fa-regular fa-industry', prefix='fa')

#'<span style="color: blue;">Water Bodies</span>'

map = facility_gdf.explore(color=fac_color,
                     marker_kwds=dict(radius=5, icon=industry),
                     style_kwds=dict(weight=boundweight),
                     tooltip=True,
                     name=f'<span style="color: {fac_color};">Facilities</span>',
                     show=True)
fg = folium.FeatureGroup(name="S2 Cells", control=True, show=False)
s2_gdf.explore(m=fg,
               color=s2_color,
               style_kwds=dict(weight=boundweight),
               tooltip=False,
               name=f'<span style="color: {s2_color};">S2 Cells</span>',
               show=True)
wb_gdf.explore(m=map,
               color=wb_color,
               style_kwds=dict(weight=boundweight),
               tooltip=True,
               name=f'<span style="color: {wb_color};">Reaches</span>',
               show=False)

downstream_gdf.explore(m=map,
               color=downstream_color,
               style_kwds=dict(weight=boundweight),
               tooltip=True,
               name=f'<span style="color: {downstream_color};">Downstream Reaches</span>',
               show=True)


s2downstream_gdf.explore(m=fg,
               color=s2n_color,
               style_kwds=dict(weight=boundweight),
               tooltip=False,
               name=f'<span style="color: {downstream_color};">Downstream S2 Cells</span>',
               show=True)

#marker_cluster = MarkerCluster(name="Water Samples"e).add_to(map)


agg_samples_gdf[agg_samples_gdf.sampleType == 'SURFACE WATER'].explore(m=map,
               style_kwds=dict(style_function=lambda x: {"radius": (x['properties']["max"]) if (x['properties']["max"]) < 20 else 21 }),
              #marker_kwds=dict(radius=agg_samples_gdf['max']),
               popup=True,
               name=f'<span style="color: blue;">Water Samples </span>',
               show=True)
agg_samples_gdf[agg_samples_gdf.sampleType == 'SKINLESS FILLET' ].explore(m=map,
               color='salmon',
               style_kwds=dict(style_function=lambda x: {"radius": (x['properties']["max"]) if (x['properties']["max"]) < 20 else 21 }),
               popup=True,
               name=f'<span style="color: salmon;">Skinless Fish Samples</span>',
               show=True)
agg_samples_gdf[agg_samples_gdf.sampleType == 'SKIN-ON FILLET' ].explore(m=map,
               color='pink',
               style_kwds=dict(style_function=lambda x: {"radius": (x['properties']["max"]) if (x['properties']["max"]) < 20 else 21 }),
               popup=True,
               name=f'<span style="color:pink;">Skin-On Fish Samples</span>',
               show=True)
fg.add_to(map)

# folium.TileLayer("stamenterrain", show=False).add_to(map)
# folium.TileLayer("MapQuest Open Aerial", show=False).add_to(map)
folium.LayerControl(collapsed=False).add_to(map)

<folium.map.LayerControl at 0x7e13b8533700>

## Show map

The map is created inside a Figure box to control its size. This displays the samples sized by the max measured concentration of any PFAs chemical

In [None]:
# map.save('Flowlines_and_Facilities.html')

fig = Figure(width=1200, height=800)
fig.add_child(map)

In [None]:
fig.save('UC3-CQ15.html')

In [None]:
query = """
PREFIX qudt: <http://qudt.org/schema/qudt/>
PREFIX me_egad: <http://sawgraph.spatialai.org/v1/me-egad#>
PREFIX stad: <http://sawgraph.spatialai.org/v1/stad#>
PREFIX pfas: <http://sawgraph.spatialai.org/v1/pfas#>
PREFIX sosa: <http://www.w3.org/ns/sosa/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX coso: <http://sawgraph.spatialai.org/v1/contaminoso#>
#Where are fish tissue samples?
select * where {
  SERVICE <repository:SAWGraph>{
  #get each sample and its properties
    ?s a coso:MaterialSample.
    ?s rdfs:label ?sample.
    ?s coso:ofSampleMaterialType ?t.
    ?t rdfs:label ?type.
    VALUES ?type {"SKINLESS FILLET" "SKIN-ON FILLET"}
    ?s coso:fromSamplePoint/geo:hasGeometry/geo:asWKT ?s_wkt.
  #get chemical observations for each sample
    ?s ^coso:analyzedSample ?o.
    ?o coso:ofSubstance ?c.
    VALUES ?c {me_egad:parameter.PFOS}
    ?c rdfs:label ?chemical.
    ?o coso:sampledTime ?date.
    ?o sosa:hasResult ?r.
    ?r a stad:SingleData. #no summary results
    ?r qudt:quantityValue/qudt:numericValue ?conc.
    ?r qudt:quantityValue/qudt:unit ?u.
    OPTIONAL{?u rdfs:label ?unit}
}
}


"""
df_allFish = sparql_dataframe.get(endpointGET, query)

In [None]:
df_allFish

Unnamed: 0,s,sample,t,type,s_wkt,o,c,chemical,date,r,conc,u,unit
0,http://sawgraph.spatialai.org/v1/me-egad-data#...,EGAD sample FT-LT-05-01-F,http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKINLESS FILLET,POINT (-67.9231433 46.9250054),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2013-09-24,http://sawgraph.spatialai.org/v1/me-egad-data#...,59.850,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
1,http://sawgraph.spatialai.org/v1/me-egad-data#...,EGAD sample FT-LT-05-02-F,http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKINLESS FILLET,POINT (-67.9231433 46.9250054),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2013-09-24,http://sawgraph.spatialai.org/v1/me-egad-data#...,21.600,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
2,http://sawgraph.spatialai.org/v1/me-egad-data#...,EGAD sample FT-LT-05-03-F,http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKINLESS FILLET,POINT (-67.9231433 46.9250054),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2013-09-24,http://sawgraph.spatialai.org/v1/me-egad-data#...,114.920,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
3,http://sawgraph.spatialai.org/v1/me-egad-data#...,EGAD sample FT-LT-10-03-F,http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKINLESS FILLET,POINT (-67.9111371 46.9197325),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2013-09-25,http://sawgraph.spatialai.org/v1/me-egad-data#...,23.999,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
4,http://sawgraph.spatialai.org/v1/me-egad-data#...,EGAD sample FT-LT-10-05-F,http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKINLESS FILLET,POINT (-67.9065608 46.9214477),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2013-09-26,http://sawgraph.spatialai.org/v1/me-egad-data#...,38.520,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
...,...,...,...,...,...,...,...,...,...,...,...,...,...
384,http://sawgraph.spatialai.org/v1/me-egad-data#...,EGAD sample FTLT14CBT091917,http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKIN-ON FILLET,POINT (-67.8440178 46.9472622),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2017-09-19,http://sawgraph.spatialai.org/v1/me-egad-data#...,33.100,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
385,http://sawgraph.spatialai.org/v1/me-egad-data#...,"EGAD sample MXW-BKT-SOF-C1(1,2,3,4,5)-2022",http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKIN-ON FILLET,POINT (-67.8040657 46.1816703),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2022-05-14,http://sawgraph.spatialai.org/v1/me-egad-data#...,12.630,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
386,http://sawgraph.spatialai.org/v1/me-egad-data#...,"EGAD sample MXW-BKT-SOF-C2(6,7,8,9,10)-202",http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKIN-ON FILLET,POINT (-67.8040657 46.1816703),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2022-05-14,http://sawgraph.spatialai.org/v1/me-egad-data#...,19.660,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
387,http://sawgraph.spatialai.org/v1/me-egad-data#...,"EGAD sample LST-BKT-SOF-C1(1,3,5,7,9)-2002",http://sawgraph.spatialai.org/v1/me-egad#sampl...,SKIN-ON FILLET,POINT (-67.810316 46.904626),http://sawgraph.spatialai.org/v1/me-egad-data#...,http://sawgraph.spatialai.org/v1/me-egad#param...,PERFLUOROOCTANE SULFONATE,2022-09-08,http://sawgraph.spatialai.org/v1/me-egad-data#...,32.970,http://sawgraph.spatialai.org/v1/me-egad#unit....,NANOGRAMS PER GRAM
