# Welcome!
This notebook intends to be a template for an end-to-end (or, perhaps more aptly, a discovery-analysis-publish) pipeline for using Open NASA data. In this version, there is much more documentation/tutorial, as we explain _what_ is supposed to be happening and _why_ we have done these things.

## Bookend for PIDs
Bookend is a sort-of template Jupyter notebook whose purpose is to demonstrate the usefulness of semantic technologies for accelerating research and facilitating [FAIR](https://www.go-fair.org/fair-principles/) and open science. This is generally completed through "bookending" novel code with cells that capture context and publish output with appropriate metdata.

![interchange](./figures/spase-raid-pidinst.png)

This `Bookend` is for emitting metadata in specific formats: SPASE, RAID, and PIDINST. The interchange format, to keep these `Bookends` thematically unified, is RDF, and thus interoperable with the larger knowledge graph ecosystem.

![spase-to-kg](./figures/spase-to-kg.png)

The PIDINST schema is depicted below. `hasValue` and `hasProperty` are included for explainability. The root node of a PIDINST document is the `Instrument` that has various properties. Each of these properties has a value (depicted through `hasValue`. There are several sub-root notes for collections (e.g., `InstrumentTypes` and `InstrumentType`). These relations are left unlabled. These sub-root nodes
![pidinst-schema](./figures/pidinst-schema.png)

## Bookend for PIDs Requirements
* [rdflib](https://rdflib.readthedocs.io/en/stable/)
* [sparqlwrapper](https://sparqlwrapper.readthedocs.io/en/latest/)

## The BookBEGINNING

In [1]:
# rdflib is the general purpose python library for modifying a kg in memory and outputting it to a file
import rdflib
## Just some convenient classes to pull out
from rdflib import URIRef, Graph, Namespace, Literal
## namespaces are below. These are where identifiers "live", so to speak.
from rdflib import OWL, RDF, RDFS, XSD, TIME

# sparqlwrapper is used to query a triplestore
import SPARQLWrapper
from SPARQLWrapper import SPARQLWrapper, JSON

## Prefixes

In [4]:
# Some default prefixes for namespaces.
# Which are generally useful
pfs = {
"geo": Namespace("http://www.opengis.net/ont/geosparql#"),
"geof": Namespace("http://www.opengis.net/def/function/geosparql/"),
"sf": Namespace("http://www.opengis.net/ont/sf#"),
"wd": Namespace("http://www.wikidata.org/entity/"),
"wdt": Namespace("http://www.wikidata.org/prop/direct/"),
"dbo": Namespace("http://dbpedia.org/ontology/"),
"time": Namespace("http://www.w3.org/2006/time#"),
"ssn": Namespace("http://www.w3.org/ns/ssn/"),
"sosa": Namespace("http://www.w3.org/ns/sosa/"),
"cdt": Namespace("http://w3id.org/lindt/custom_datatypes#"),
"ex": Namespace("https://example.com/"),
"rdf": RDF,
"rdfs": RDFS,
"xsd": XSD,
"owl": OWL,
"time": TIME
}

# The namespace and prefixes which we will use for the metadata storage
name_space = "https://polyneme.xyz/"
pfs["polyr"] = Namespace(f"{name_space}lod/resource#")
pfs["poly-ont"] =  Namespace(f"{name_space}lod/ontology#")

## KG Data Structure
The KG data structure, i.e., what is storing the metadata (for now) stays in memory as `Graph` from `rdflib`.

In [5]:
def init_kg(prefixes=pfs):
    kg = Graph()
    for prefix in pfs:
        kg.bind(prefix, pfs[prefix])
    return kg
# rdf:type shortcut
a = pfs["rdf"]["type"]

# Initialize an empty graph
graph = init_kg()