# X To JSON-LD

### About
This notebook is just an example of some exploring of converting things like YAML or TOML to JSON-LD.  This is a simple notebook but it's easy to do in Go (https://github.com/OpenCoreData/ocdGarden/tree/master/JSON-goLD/YAML2JSONLD) or other languages. Obviously YAML and TOML are not semantic.  However, it is possible to declare the terms in something like YAML and then map by a convention into JSON-LD and a defined semantic.   For cases of similar terms across vocabulars something like a 
ns-term pattern would be needed.

## Why
A key element of the success of the "self publishing" approach will be to provide an easy path to the publishing of this type of structured data.  Facilities will need to:

* Be able to generate the JSON-LD, preferably based off existing structured data like DataCite or other
* Be able to integrate with their web publishing platforms like Drupal, Flask, custome code or others.

The difficulty will range based on the level facilities publish.  For purposes of discussion these can take many levels.  

1. Basic metadata about the facility such as contact points and service/search end points
2. A "re3" profile that involves publishing enough metadata to meet the minimum field requirements for re3data
3. All of the above and also exposing one or more spatial and time querries following GeoWS or OpenSearch URL patterns.
4. All of the above and also exposing at some of the data catalog and or data set holdings of the facility in connected JSON-LD documents
5. All of the above but scoping all data catalog and or data set holdings of the facility

In [3]:
import yaml

y = yaml.load("""
    name: BCO-DMO
    url: http://www.bco-dmo.org
    contactPoint:
        name: Adam Shepherd
        email: theman@whoi.edu
    """)

In [4]:
from pyld import jsonld
import json

# {
#     "@context": "http://schema.org/",
#     "@type": "Organization",
#     "name": "R2R",
#     "contactPoint": {
#         "@type": "ContactPoint",
#         "name": "Bob Arko",
#         "email": "nemo@nobody.com",
#         "url": "http://foo.com",
#         "contactType": "technical support"
#     },
#   "url": "http://www.rvdata.us"
# }

doc = {
    "http://schema.org/name": y["name"],
    "http://schema.org/url": {"@id": y["url"]},
    "http://schema.org/contactPoint": {
            "http://schema.org/name" : y["contactPoint"]["name"]
        }
     }

context = "http://schema.org/"
# context = {
#     "name": "http://schema.org/name",
#     "homepage": {"@id": "http://schema.org/url", "@type": "@id"},
#     "image": {"@id": "http://schema.org/image", "@type": "@id"}
# }

# compact a document according to a particular context
# see: http://json-ld.org/spec/latest/json-ld/#compacted-document-form
compacted = jsonld.compact(doc, context)

print(json.dumps(compacted, indent=2))

{
  "url": "http://www.bco-dmo.org", 
  "@context": "http://schema.org/", 
  "contactPoint": {
    "name": "Adam Shepherd"
  }, 
  "name": "BCO-DMO"
}


In [36]:
dc = """<?xml version="1.0" encoding="UTF-8"?>
<resource xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd" xmlns="http://datacite.org/schema/kernel-3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<identifier identifierType="DOI">(:tba)</identifier>
	<creators>
		<creator>
			<creatorName>Fosmire, Michael</creatorName>
		</creator>
		<creator>
			<creatorName>Wertz, Ruth</creatorName>
		</creator>
		<creator>
			<creatorName>Purzer, Senay</creatorName>
		</creator>
	</creators>
	<titles>
		<title>Critical Engineering Literacy Test (CELT)</title>
	</titles>
	<publisher>Purdue University Research Repository (PURR)</publisher>
	<publicationYear>2013</publicationYear>
	<subjects>
		<subject>Assessment</subject>
		<subject>Information Literacy</subject>
		<subject>Engineering</subject>
		<subject>Undergraduate Students</subject>
		<subject>CELT</subject>
		<subject>Purdue University</subject>
	</subjects>
	<language>eng</language>
	<resourceType resourceTypeGeneral="Dataset">Dataset</resourceType>
	<version>1</version>
	<descriptions>
		<description descriptionType="Abstract">We developed an instrument, Critical Engineering Literacy Test (CELT), which is a multiple choice instrument designed to measure undergraduate students’ scientific and information literacy skills. It requires students to first read a technical memo and, based on the memo’s arguments, answer eight multiple choice and six open-ended response questions. We collected data from 143 first-year engineering students and conducted an item analysis. The KR-20 reliability of the instrument was .39. Item difficulties ranged between .17 to .83. The results indicate low reliability index but acceptable levels of item difficulties and item discrimination indices. Students were most challenged when answering items measuring scientific and mathematical literacy (i.e., identifying incorrect information).
	</description>
	</descriptions>
</resource>
"""

In [50]:
import xml.etree.ElementTree as ET

# Just want to see I can pull items from the tree...
# mapping to JSON-LD is easy then (tedious...   but easy)

root = ET.fromstring(dc)
print root.tag

# for child in root:
#     print child.tag, child.attrib

pub = root.find('{http://datacite.org/schema/kernel-3}publisher')
print pub.text


{http://datacite.org/schema/kernel-3}resource
Purdue University Research Repository (PURR)
