### Notebook Goals
* Create dataframe with xpath content for the collection 
* translate xpath names into the schema.org vocabulary
* create valid JSON-LD for a record
* Use Google's Structured Data Testing Tool to test results

In [None]:
# refine dataframe, create record json
import pandas as pd
# create dataframe structure that contains the records content
import MDeval as md

#### Describe the metadata. 
* What organization created the records? (Organization)
* What collection are the records from? (Collection)
* What dialect are the records written in? (Dialect)

In [None]:
# variables for function arguments, fill these out
Organization = 'LTER'
Collection = 'MILES'
Dialect = 'EML'

#### Read in the metadata's xpath evaluated csv

In [None]:
# Read in the recommendation evaluated csv defined by the above variables
RecommendationEvaluatedDF = pd.read_csv(
    './data/'+Organization+'/'+Collection+'_'+Dialect+'_ElementEvaluated.csv'
)

#### Record Xpath Content Function
* Rows are records
* Columns are Xpaths

In [None]:
''' requires a dataframe with concepts. Creates a vertical view of
xpath content for each record in the collection. Useful in the
creation of json. 
'''
recordDF = md.recordXpathContent(RecommendationEvaluatedDF)

recordDF

#### Choose a record to translate

In [None]:
# Set RecordChoice variable
RecordChoice = 'dataset_3484.xml'

In [None]:
# Select record row
recordDF = recordDF[recordDF['Record'] == RecordChoice]
# Drop the Collection and Record columns
recordDF = recordDF.drop(['Collection', 'Record'], 'columns')
# Display the chosen record's content
recordDF

#### Translate xpaths to schema.org vocabulary
* identify xpaths that crosswalk
* replace 'xpathN' with xpath from dialect

In [None]:
recordDF = recordDF.rename({'xpath0':'name', 'xpath1':'description', 'xpath2':'url', 'xpath3':'keywords', 'xpath4':'creator', 'xpath5': 'distribution', 'xpath6':'@type', 'xpath7': 'version', 'xpath8': 'temporalCoverage', 'xpath9': 'spatialCoverage', 'xpath10':'citation'}, axis='columns')
recordDF

#### Add the required context

In [None]:
recordDF.insert(2, '@context', 'http://schema.org/')
recordDF

#### Create JSON-LD string for adding to the header of the landing page in the repository

In [None]:
recordJSON = recordDF.to_json(orient='records')
RecordJSONld = '<script type="application/ld+json">' + recordJSON[1:-1] + '</script>'
RecordJSONld

#### Test JSON-LD for validity
* Take string produced by the cell above and copy it.
* Go to [Google's Structured Data Testing Tool](https://search.google.com/structured-data/testing-tool#new-test)
* Select the "Code Snippet Tab"
* Paste string and "Run Test"
* Click on errors to highlight the portion of the string that needs improvement
* rerun test with the play button in the middle bottom of the screen

#### Consider using another JSON writer

* Some of the schema.org vocabulary used in the recommendation for a dataset is nested.
* [hone](https://github.com/chamkank/hone) can create a nested JSON representation.
* While I haven't explored it yet, it is possible to replace the slashes in the xpaths with spaces. This should mean it's possible to create a nested JSON representation of the dialect as well. 

In [None]:
# needed to create nest JSON
import hone

#requires a csv to read

# define csv location
toNestedJSON = (
    './data/' + Organization + '/' + Collection +
    '_' + Dialect + '_toNestedJSON.csv')
# change xpath headers in dataframe to delimiters used to nest
recordDF.columns = recordDF.columns.str.replace("[/]", "")
# write results to csv
recordDF.to_csv(
    toNestedJSON, index=False
)
# write JSON
Hone = hone.Hone()
schema = Hone.get_schema(toNestedJSON)
# display JSON
schema

#### Consider creating a validation workflow for your JSON output from tools such as:

* https://github.com/digitalbazaar/pyld
* https://github.com/RDFLib/rdflib-jsonld

#### Improve workflow for specific dialect needs and share the resulting notebook(s) in the workshop_shared directory in the scgordon/participants directory to facilitate collaboration!