# OKN KG Construction using CSV from Drone data aquisition.

> Imagine that you are an agent assistant to a scientist who has some drone data that he wants to publish as a distributed knowledge graph known as an Open Knowledge Network (OKN). The OKN should use known vocabularies (SOSA,OWL-Time,GeoSparql, Prov-o, Dcat) where possible to enable interoperability. The scientist has data in a CSV that are CO2 measurements from a drone flight.

>Great! To create an Open Knowledge Network (OKN) from the scientist's drone data, we can follow these steps:

> Convert the CSV data into a format that is compatible with RDF (Resource Description Framework), which is the foundation of the semantic web. There are various tools available for this, such as OpenRefine or Python libraries like Pandas and RDFlib.

> Define the data model using existing ontologies and vocabularies, such as SOSA (Sensor, Observation, Sample, and Actuator), OWL-Time (a vocabulary for expressing dates and times), GeoSparql (a vocabulary for representing geospatial data), Prov-o (a vocabulary for representing provenance information), and Dcat (Data Catalog vocabulary). These ontologies and vocabularies are widely used and well-documented, which makes it easier to achieve interoperability with other knowledge graphs and data sources.

> Map the CSV data to the appropriate classes and properties in the defined ontology. For example, the CO2 measurement data could be mapped to the SOSA ontology's Observation class, with properties such as observedProperty (the type of observation being made, in this case CO2 concentration), observedBy (the sensor used to make the observation, in this case the drone), and hasTime (the time at which the observation was made, which can be represented using OWL-Time).

> Add provenance information to the data, using the Prov-o ontology. This helps to trace the origin of the data, the process by which it was collected and transformed, and any other relevant information about its history.

> Publish the data as an OKN, using a standards-compliant RDF serialization format such as Turtle or RDF/XML. This will enable other researchers and applications to discover, access, and integrate the data into their own knowledge graphs.

> By following these steps, we can create an Open Knowledge Network (OKN) from the scientist's drone data, using existing ontologies and vocabularies to ensure interoperability and semantic richness.

In [None]:
#| default_exp core

In [None]:
#| hide
from nbdev.showdoc import *

In [None]:
# Import the packages in your notebook by adding the following lines at the beginning of the notebook:
import pandas as pd
from rdflib import Graph, Literal, BNode, Namespace, RDF, URIRef
from rdflib.namespace import DCTERMS, FOAF, XSD


In [None]:
#|echo: true
# I want to load the csv file "./data/CO2Meter_GPS.csv" to a pandas data frame.
data = pd.read_csv("./data/CO2Meter_GPS.csv")

In [None]:
#|echo: true
# I want to print the column headers from the data frame we just imported.
# I would also like to print the first five rows of the dat file.

In [None]:
#|echo: true
# Print column headers from the data frame
print(data.columns)

# Print first five rows of the data frame
print(data.head())

Index(['CO2 (PPM)', ' Latitude', ' Longitude', ' Altitude', ' Air Speed (m/s)',
       ' Mode', ' Fixed Satellites', ' Available Satellites', 'voltage',
       'current', 'level', 'id'],
      dtype='object')
   CO2 (PPM)   Latitude   Longitude   Altitude   Air Speed (m/s)       Mode  \
0        395  44.382214  -73.258468      66.12              0.02  STABILIZE   
1        394  44.382213  -73.258468      66.12              0.03  STABILIZE   
2        394  44.382213  -73.258468      66.13              0.05  STABILIZE   
3        394  44.382214  -73.258467      66.16              0.08  STABILIZE   
4        395  44.382214  -73.258467      66.19              0.01     GUIDED   

    Fixed Satellites   Available Satellites  voltage current level  \
0                  4                     11      0.0    None  None   
1                  4                     11      0.0    None  None   
2                  4                     11      0.0    None  None   
3                  4                

To map the CSV data to the appropriate classes and properties in the defined ontology, you could follow the following steps:

Define the classes and properties in the ontology:
Use SOSA Observation class for CO2 measurement data.
Use QUDT for defining the unit of measurement for CO2 concentration as PPM.
Use OWL-Time for representing the time at which the observation was made.
Use SOSA FeatureOfInterest for latitude, longitude, and altitude.
Use SOSA Platform for the drone.
Use SOSA Sensor for the sensor used to make the observation.
Use SOSA ObservableProperty for the observed property, which is CO2 concentration.
Use SOSA Result for the result of the observation.
Use SOSA Actuation for mode.
Map the CSV data to the defined classes and properties:
CO2 (PPM) can be mapped to the SOSA ObservableProperty with a unit of measurement defined using QUDT.
Latitude, longitude, and altitude can be mapped to the SOSA FeatureOfInterest.
The drone can be mapped to the SOSA Platform.
The sensor used to make the observation can be mapped to the SOSA Sensor.
The observedBy property can be used to link the Observation to the Sensor.
Air Speed (m/s) can be mapped to a property of the Observation.
The mode can be mapped to the SOSA Actuation.
Fixed Satellites and Available Satellites can be mapped to a property of the Platform.
Voltage, current, and level are not defined in the ontology and may need to be mapped to a custom property or extension.
Use DCAT to describe the dataset:
Use DCAT to describe the dataset and link it to the classes and properties in the ontology.
Provide metadata such as the creator, publisher, license, and date of publication.
By following these steps, you can create an Open Knowledge Network (OKN) using known vocabularies (SOSA, OWL-Time, Prov-o, Dcat) where possible to enable interoperability, and map the CSV data to the appropriate classes and properties in the defined ontology.

In [None]:
#|echo: true
# Lets start with creating a rdflib graph and the name spaces needed for
# KG Construction. Lets define the namespaces needed for the KG including
# SOSA, DCAT, QUDT, OWL-Time
from rdflib import Graph, Namespace, Literal
from rdflib.namespace import RDF, RDFS, OWL, XSD
from datetime import datetime

# Define namespaces
SOSA = Namespace("http://www.w3.org/ns/sosa/")
QUDT = Namespace("http://qudt.org/schema/qudt/")
TIME = Namespace("http://www.w3.org/2006/time#")
GEO = Namespace("http://www.opengis.net/ont/geosparql#")

# Define an RDF Graph
g = Graph()

# Iterate through the rows of the dataframe
for i, row in data.head(5).iterrows():
    # Convert Unix timestamp to datetime
    timestamp = datetime.fromtimestamp(int(row['id']))
    # Define the observation URI
    obs_uri = SOSA['obs_{}'.format(i)]
    # Define the FeatureOfInterest URI
    foi_uri = SOSA['foi_{}'.format(i)]
    g.add((foi_uri, RDF.type, SOSA['FeatureOfInterest']))

    
    # Define the Sensor URI
    sensor_uri = SOSA['sensor_{}'.format(i)]
    
    # Define the Platform URI
    platform_uri = SOSA['platform_{}'.format(i)]
    
    # Define the observed property URI
    obs_prop_uri = SOSA['obsProp_CO2']
    
    # Define the CO2 concentration measurement
    obs_value = Literal(row['CO2 (PPM)'], datatype=QUDT['PPM'])
    
    # Define the latitude, longitude, and altitude
    lat = Literal(row[' Latitude'], datatype=XSD['float'])
    lon = Literal(row[' Longitude'], datatype=XSD['float'])
    alt = Literal(row[' Altitude'], datatype=XSD['float'])
    
    # Define the geometry URI and WKT literal
    geom_uri = foi_uri + '_geom'
    geom_wkt = 'POINT({} {} {})'.format(lon, lat, alt)
    g.add((geom_uri, RDF.type, GEO['Geometry']))
    g.add((geom_uri, GEO['asWKT'], Literal(geom_wkt, datatype=GEO['wktLiteral'])))
    
    # Add the geometry to the FeatureOfInterest
    g.add((foi_uri, GEO['hasGeometry'], geom_uri))
    
    # Define the air speed
    air_speed = Literal(row[' Air Speed (m/s)'], datatype=XSD['float'])
    
    # Define the mode (STABILIZE or GUIDED)
    mode = Literal(row[' Mode'], datatype=XSD['string'])
    
    # Define the fixed and available satellites
    fixed_satellites = Literal(row[' Fixed Satellites'], datatype=XSD['integer'])
    avail_satellites = Literal(row[' Available Satellites'], datatype=XSD['integer'])
    # Print the values of the variables for the current row
    # print(f"Observation URI: {obs_uri}, Latitude: {lat}, Longitude: {lon}, Altitude: {alt}, Air Speed: {air_speed}, Mode: {mode}, Fixed Satellites: {fixed_satellites}, Available Satellites: {avail_satellites}")
    # Add triples to the graph
    g.add((obs_uri, RDF.type, SOSA['Observation']))
    g.add((obs_uri, SOSA.observedProperty, obs_prop_uri))
    g.add((obs_uri, SOSA.madeBySensor, sensor_uri))
    g.add((obs_uri, SOSA.observedBy, platform_uri))
    g.add((obs_uri, SOSA.hasResult, obs_uri+'_result'))
    g.add((obs_uri, SOSA.phenomenonTime, TIME['Instant_'+str(i)]))
    g.add((obs_uri, SOSA.observedAt, foi_uri))
    g.add((obs_uri+'_result', RDF.type, SOSA['Result']))
    g.add((obs_uri+'_result', SOSA.hasValue, obs_value))
    


In [None]:
# Serialize the graph to Turtle format
turtle = g.serialize(format='turtle')

# Print the Turtle output
print(turtle)


@prefix ns1: <http://www.w3.org/ns/sosa/> .
@prefix ns2: <http://www.opengis.net/ont/geosparql#> .

ns1:obs_0 a ns1:Observation ;
    ns1:hasResult ns1:obs_0_result ;
    ns1:madeBySensor ns1:sensor_0 ;
    ns1:observedAt ns1:foi_0 ;
    ns1:observedBy ns1:platform_0 ;
    ns1:observedProperty ns1:obsProp_CO2 ;
    ns1:phenomenonTime <http://www.w3.org/2006/time#Instant_0> .

ns1:obs_1 a ns1:Observation ;
    ns1:hasResult ns1:obs_1_result ;
    ns1:madeBySensor ns1:sensor_1 ;
    ns1:observedAt ns1:foi_1 ;
    ns1:observedBy ns1:platform_1 ;
    ns1:observedProperty ns1:obsProp_CO2 ;
    ns1:phenomenonTime <http://www.w3.org/2006/time#Instant_1> .

ns1:obs_2 a ns1:Observation ;
    ns1:hasResult ns1:obs_2_result ;
    ns1:madeBySensor ns1:sensor_2 ;
    ns1:observedAt ns1:foi_2 ;
    ns1:observedBy ns1:platform_2 ;
    ns1:observedProperty ns1:obsProp_CO2 ;
    ns1:phenomenonTime <http://www.w3.org/2006/time#Instant_2> .

ns1:obs_3 a ns1:Observation ;
    ns1:hasResult ns1:obs_3_resul

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()