### Notebook Goals
* Create dataframe with xpath content for the collection 
* translate xpath names into the schema.org vocabulary
* create valid JSON-LD for a record
* Use Google's Structured Data Testing Tool to test results

In [21]:
# refine dataframe, create record json
import pandas as pd
# create dataframe structure that contains the records content
import MDeval as md

#### Describe the metadata. 
* What organization created the records? (Organization)
* What collection are the records from? (Collection)
* What dialect are the records written in? (Dialect)

In [22]:
Organization = 'BCO-DMO'
Collection = 'GeoTraces'
Dialect = 'ISO'

#### Read in the metadata's xpath evaluated csv

In [23]:
# Read in the recommendation evaluated csv defined by the above variables
RecommendationEvaluatedDF = pd.read_csv(
    './data/'+Organization+'/'+Collection+'_'+Dialect+'_ElementEvaluated.csv'
)

#### Record Xpath Content Function
* Rows are records
* Columns are Xpaths

In [24]:
''' requires a dataframe with concepts. Creates a vertical view of
xpath content for each record in the collection. Useful in the
creation of json. 
'''
recordDF = md.recordXpathContent(RecommendationEvaluatedDF)

recordDF

Unnamed: 0,Collection,Record,/gmi:MI_Metadata/@xsi:schemaLocation,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode/@codeList,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode/@codeListValue,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:administrativeArea/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:city/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:country/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:deliveryPoint/gco:CharacterString,...,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:authority/gmd:CI_Citation/gmd:title/gmx:Anchor/@xlink:href,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:actuate,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:href,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:title,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:instrument/@gco:nilReason,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:status/gmd:MD_ProgressCode/@codeList,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:status/gmd:MD_ProgressCode/@codeListValue,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:type/gmi:MI_OperationTypeCode/@codeList,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:type/gmi:MI_OperationTypeCode/@codeListValue
0,GeoTraces,dataset_3470.xml,http://www.isotc211.org/2005/gmi http://www.ng...,utf8,http://www.isotc211.org/2005/resources/Codelis...,utf8,MA,Woods Hole,USA,WHOI MS#36,...,"http://lod.bco-dmo.org/id/authority/1.rdf, htt...","R/V Knorr, R/V Knorr","onRequest, onRequest","http://lod.bco-dmo.org/id/platform/53994.rdf, ...","316N, 316N","unknown, unknown",http://www.ngdc.noaa.gov/metadata/published/xs...,"completed, completed",http://www.ngdc.noaa.gov/metadata/published/xs...,"real, real"
1,GeoTraces,dataset_3484.xml,http://www.isotc211.org/2005/gmi http://www.ng...,utf8,http://www.isotc211.org/2005/resources/Codelis...,utf8,MA,Woods Hole,USA,WHOI MS#36,...,"http://lod.bco-dmo.org/id/authority/1.rdf, htt...","R/V Knorr, R/V Knorr","onRequest, onRequest","http://lod.bco-dmo.org/id/platform/53994.rdf, ...","316N, 316N","unknown, unknown",http://www.ngdc.noaa.gov/metadata/published/xs...,"completed, completed",http://www.ngdc.noaa.gov/metadata/published/xs...,"real, real"


#### Choose a record to translate

In [25]:
# Set RecordChoice variable
RecordChoice = 'dataset_3484.xml'

In [26]:
# Select record row
recordDF = recordDF[recordDF['Record'] == RecordChoice]
# Drop the Collection and Record columns
recordDF = recordDF.drop(['Collection', 'Record'], 'columns')
# Display the chosen record's content
recordDF

Unnamed: 0,/gmi:MI_Metadata/@xsi:schemaLocation,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode/@codeList,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode/@codeListValue,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:administrativeArea/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:city/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:country/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:deliveryPoint/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:postalCode/gco:CharacterString,...,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:authority/gmd:CI_Citation/gmd:title/gmx:Anchor/@xlink:href,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:actuate,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:href,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:title,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:instrument/@gco:nilReason,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:status/gmd:MD_ProgressCode/@codeList,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:status/gmd:MD_ProgressCode/@codeListValue,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:type/gmi:MI_OperationTypeCode/@codeList,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:type/gmi:MI_OperationTypeCode/@codeListValue
1,http://www.isotc211.org/2005/gmi http://www.ng...,utf8,http://www.isotc211.org/2005/resources/Codelis...,utf8,MA,Woods Hole,USA,WHOI MS#36,info@bco-dmo.org,2543,...,"http://lod.bco-dmo.org/id/authority/1.rdf, htt...","R/V Knorr, R/V Knorr","onRequest, onRequest","http://lod.bco-dmo.org/id/platform/53994.rdf, ...","316N, 316N","unknown, unknown",http://www.ngdc.noaa.gov/metadata/published/xs...,"completed, completed",http://www.ngdc.noaa.gov/metadata/published/xs...,"real, real"


#### Translate xpaths to schema.org vocabulary
* identify xpaths that crosswalk
* replace 'xpathN' with xpath from dialect

In [27]:
recordDF = recordDF.rename({'xpath0':'name', 'xpath1':'description', 'xpath2':'url', 'xpath3':'keywords', 'xpath4':'creator', 'xpath5': 'distribution', 'xpath6':'@type', 'xpath7': 'version', 'xpath8': 'temporalCoverage', 'xpath9': 'spatialCoverage', 'xpath10':'citation'}, axis='columns')
recordDF

Unnamed: 0,/gmi:MI_Metadata/@xsi:schemaLocation,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode/@codeList,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode/@codeListValue,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:administrativeArea/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:city/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:country/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:deliveryPoint/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:postalCode/gco:CharacterString,...,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:authority/gmd:CI_Citation/gmd:title/gmx:Anchor/@xlink:href,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:actuate,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:href,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:title,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:instrument/@gco:nilReason,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:status/gmd:MD_ProgressCode/@codeList,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:status/gmd:MD_ProgressCode/@codeListValue,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:type/gmi:MI_OperationTypeCode/@codeList,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:type/gmi:MI_OperationTypeCode/@codeListValue
1,http://www.isotc211.org/2005/gmi http://www.ng...,utf8,http://www.isotc211.org/2005/resources/Codelis...,utf8,MA,Woods Hole,USA,WHOI MS#36,info@bco-dmo.org,2543,...,"http://lod.bco-dmo.org/id/authority/1.rdf, htt...","R/V Knorr, R/V Knorr","onRequest, onRequest","http://lod.bco-dmo.org/id/platform/53994.rdf, ...","316N, 316N","unknown, unknown",http://www.ngdc.noaa.gov/metadata/published/xs...,"completed, completed",http://www.ngdc.noaa.gov/metadata/published/xs...,"real, real"


#### Add the required context

In [28]:
recordDF.insert(2, '@context', 'http://schema.org/')
recordDF

Unnamed: 0,/gmi:MI_Metadata/@xsi:schemaLocation,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode,@context,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode/@codeList,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_CharacterSetCode/@codeListValue,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:administrativeArea/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:city/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:country/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:deliveryPoint/gco:CharacterString,/gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString,...,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:authority/gmd:CI_Citation/gmd:title/gmx:Anchor/@xlink:href,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:actuate,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:href,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:identifier/gmd:MD_Identifier/gmd:code/gmx:Anchor/@xlink:title,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:platform/gmi:MI_Platform/gmi:instrument/@gco:nilReason,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:status/gmd:MD_ProgressCode/@codeList,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:status/gmd:MD_ProgressCode/@codeListValue,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:type/gmi:MI_OperationTypeCode/@codeList,/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:operation/gmi:MI_Operation/gmi:type/gmi:MI_OperationTypeCode/@codeListValue
1,http://www.isotc211.org/2005/gmi http://www.ng...,utf8,http://schema.org/,http://www.isotc211.org/2005/resources/Codelis...,utf8,MA,Woods Hole,USA,WHOI MS#36,info@bco-dmo.org,...,"http://lod.bco-dmo.org/id/authority/1.rdf, htt...","R/V Knorr, R/V Knorr","onRequest, onRequest","http://lod.bco-dmo.org/id/platform/53994.rdf, ...","316N, 316N","unknown, unknown",http://www.ngdc.noaa.gov/metadata/published/xs...,"completed, completed",http://www.ngdc.noaa.gov/metadata/published/xs...,"real, real"


#### Create JSON-LD string for adding to the header of the landing page in the repository

In [29]:
recordJSON = recordDF.to_json(orient='records')
RecordJSONld = '<script type="application/ld+json">' + recordJSON[1:-1] + '</script>'
RecordJSONld

'<script type="application/ld+json">{"\\/gmi:MI_Metadata\\/@xsi:schemaLocation":"http:\\/\\/www.isotc211.org\\/2005\\/gmi http:\\/\\/www.ngdc.noaa.gov\\/metadata\\/published\\/xsd\\/schema.xsd","\\/gmi:MI_Metadata\\/gmd:characterSet\\/gmd:MD_CharacterSetCode":"utf8","@context":"http:\\/\\/schema.org\\/","\\/gmi:MI_Metadata\\/gmd:characterSet\\/gmd:MD_CharacterSetCode\\/@codeList":"http:\\/\\/www.isotc211.org\\/2005\\/resources\\/Codelist\\/gmxCodelists.xml#MD_CharacterSetCode","\\/gmi:MI_Metadata\\/gmd:characterSet\\/gmd:MD_CharacterSetCode\\/@codeListValue":"utf8","\\/gmi:MI_Metadata\\/gmd:contact\\/gmd:CI_ResponsibleParty\\/gmd:contactInfo\\/gmd:CI_Contact\\/gmd:address\\/gmd:CI_Address\\/gmd:administrativeArea\\/gco:CharacterString":"MA","\\/gmi:MI_Metadata\\/gmd:contact\\/gmd:CI_ResponsibleParty\\/gmd:contactInfo\\/gmd:CI_Contact\\/gmd:address\\/gmd:CI_Address\\/gmd:city\\/gco:CharacterString":"Woods Hole","\\/gmi:MI_Metadata\\/gmd:contact\\/gmd:CI_ResponsibleParty\\/gmd:contactInf

#### Test JSON-LD for validity
* Take string produced by the cell above and copy it.
* Go to [Google's Structured Data Testing Tool](https://search.google.com/structured-data/testing-tool#new-test)
* Select the "Code Snippet Tab"
* Paste string and "Run Test"
* Click on errors to highlight the portion of the string that needs improvement
* rerun test with the play button in the middle bottom of the screen

#### Consider using another JSON writer

* Some of the schema.org vocabulary used in the recommendation for a dataset is nested.
* [hone](https://github.com/chamkank/hone) can create a nested JSON representation.
* While I haven't explored it yet, it is possible to replace the slashes in the xpaths with spaces. This should mean it's possible to create a nested JSON representation of the dialectas well. 