# Welcome!
This notebook intends to be a template for an end-to-end (or, perhaps more aptly, a discovery-analysis-publish) pipeline for using Open NASA data. In this version, there is much more documentation/tutorial, as we explain _what_ is supposed to be happening and _why_ we have done these things.

### Bookend Structure
![bookend-structure](./figures/bookend-structure.png)

### Knowledge Graphs & Semantic Technologies
* [What is Metadata](https://github.com/KGConf/open-kg-curriculum/blob/master/curriculum/modules/What_is_Metadata/What_is_Metadata.md)
* [What is an Identifier](https://github.com/KGConf/open-kg-curriculum/blob/master/curriculum/modules/What_is_an_Identifier/What_is_an_Identifier.md)
* [What is a KG](https://github.com/KGConf/open-kg-curriculum/blob/master/curriculum/modules/What_is_a_Knowledge_Graph/What_is_a_Knowledge_Graph.md)
* [What is a Taxonomy](https://github.com/KGConf/open-kg-curriculum/blob/master/curriculum/modules/What_is_a_Taxonomy/What_is_a_Taxonomy.md)
* [What is an Ontology](https://github.com/KGConf/open-kg-curriculum/blob/master/curriculum/modules/What_is_an_Ontology/What_is_an_Ontology.md)

#### Ontology Design Patterns
Ontology Design Patterns (ODPs) are self-contained miniature ontologies that solve domain-invariant modeling problems. Our approach uses several to create a modular "plug and play" KG schema (or architecture).
* Computational Environment
* [Computational Observation](https://github.com/kastle-lab/computational-observation-pattern)
* [Data Transformation](https://github.com/Data-Semantics-Laboratory/data-transformation-pattern)

## Bookend Software
* [rdflib](https://rdflib.readthedocs.io/en/stable/)
* [sparqlwrapper](https://sparqlwrapper.readthedocs.io/en/latest/)

In [12]:
# rdflib is the general purpose python library for modifying a kg in memory and outputting it to a file
import rdflib
## Just some convenient classes to pull out
from rdflib import URIRef, Graph, Namespace, Literal
## namespaces are below. These are where identifiers "live", so to speak.
from rdflib import OWL, RDF, RDFS, XSD, TIME

# sparqlwrapper is used to query a triplestore
import SPARQLWrapper

## Prefixes

In [10]:
name_space = "https://polyneme.xyz/"
pfs = {
"polyr": Namespace(f"{name_space}lod/resource/"),
"poly-ont": Namespace(f"{name_space}lod/ontology/"),
"geo": Namespace("http://www.opengis.net/ont/geosparql#"),
"geof": Namespace("http://www.opengis.net/def/function/geosparql/"),
"sf": Namespace("http://www.opengis.net/ont/sf#"),
"wd": Namespace("http://www.wikidata.org/entity/"),
"wdt": Namespace("http://www.wikidata.org/prop/direct/"),
"dbo": Namespace("http://dbpedia.org/ontology/"),
"time": Namespace("http://www.w3.org/2006/time#"),
"ssn": Namespace("http://www.w3.org/ns/ssn/"),
"sosa": Namespace("http://www.w3.org/ns/sosa/"),
"cdt": Namespace("http://w3id.org/lindt/custom_datatypes#"),
"ex": Namespace("https://example.com/"),
"rdf": RDF,
"rdfs": RDFS,
"xsd": XSD,
"owl": OWL,
"time": TIME
}

## Storing Metadata
It should perhaps come as no surprise the rest of the notebook, but we will store the metadata generated in this notebook in a knowledge graph. For now, it will stay in memory as `Graph` from `rdflib`. When we publish the dataset generated in this notebook, we will upload the dataset into a graph database. 

In [11]:
def init_kg(prefixes=pfs):
    kg = Graph()
    for prefix in pfs:
        kg.bind(prefix, pfs[prefix])
    return kg
# rdf:type shortcut
a = pfs["rdf"]["type"]

# Initialize an empty graph
graph = init_kg()

## Accessing Your Local Graph Database
For this notebook, we assume you are running a `developer` (i.e., non-production) Apache Jena Fuseki triplestore as your graph database. This will be useful in several different cells.

## Capturing the Current Computational Environment
![computational environment](./figures/computational-environment-pattern.png)
The purpose of this is to capture the environment in which you transform data (i.e., create something new from something old). This is useful for replicability.

In [6]:
# Code to populate this pattern for this notebook

## Mint a URI for this computational environment
### There are many ways to create an identifier
### We have chosen a way that encodes some information for identifiability, without searching for the label.
comp_env_name = "polyneme.donny.home"
###




## If you have done this before (i.e., this is not your first time running this notebook) AND your 
## computational environment hasn't changed.








## Computational Observations
![simulation activity](./figures/computational-observation-pattern.jpg)

In [4]:
# Code to populate this pattern for this notebook
pass

## Dataset Discovery

In [5]:
# Code to do data set discovery
## PySat?
## HDPE.io
## CDAWeb
pass

## This is where your code goes!

In [None]:
pass

## Dataset Publishing (Internal)
Now it is time to publish your work.

### Data Transformation Pattern
![data transformation pattern](./figures/data-transformation-pattern.jpg)