##  This notebook allows the user to select XML collections and zip them up to send to a service that runs a transform on them and returns a simple CSV made up of six data points. The data included is the Collection name, Dialect name, Record name, Concept name, Content, Xpath location, and the Dialect Definition for the concept. 

## This CSV contains a row for each concept that is found, so some locations may fulfill multiple concepts. A good example of this are the cncepts Keyword and Place Keyword. Every Place Keyword is also a Keyword, so the row would repeat with a different Concept name. It also contains a row for each undefined node that contains text, marking these rows with an Unknown in the Concept column. 

## This data can be used in a variety of analyses including RAD and QuickE as well as Concept Verticals. It can also be used to teach the system dialect definitions for concepts that are currently unknown by exposing all of the content at undefined nodes. 

In [1]:
%%HTML
<img src=https://image.slidesharecdn.com/scgordonesipwinter2017-170125170939/95/recommendations-analysis-dashboard-1-1024.jpg height="420" width="420">

In [2]:
import pandas as pd
import os
from os import walk
import shutil
from ipywidgets import *
import ipywidgets as widgets
import requests
from contextlib import closing
import csv

In [3]:
Organizations = []
for (dirpath, dirnames, filenames) in walk('../collection/'):
    Organizations.extend(dirnames)
    break  

In [4]:
def OrganizationChoices(organization):
    global OrganizationChoice
    global Organization
    Organization=organization
    print("Organization of the collection is", Organization)


In [5]:
interactive(OrganizationChoices, organization=Organizations)

Organization of the collection is BCO-DMO


In [6]:
Collections = []
for (dirpath, dirnames, filenames) in walk(os.path.join('../collection',Organization)):
    Collections.extend(dirnames)
    break 
Collections

['GeoTraces']

In [7]:
def CollectionChoices(collection):
    global CollectionChoice
    global Collection
    Collection=collection

In [8]:
interactive(CollectionChoices, collection=Collections)

In [9]:
Dialects = []
for (dirpath, dirnames, filenames) in walk(os.path.join('../collection',Organization,Collection)):
    Dialects.extend(dirnames)
    break 
dialectList=Dialects


In [10]:
def dialectChoice(dialect):
    global Dialect
    Dialect=dialect
    print("Dialect of the collection is", Dialect)


In [11]:
interactive(dialectChoice,dialect=dialectList)

Dialect of the collection is ISO


In [12]:
cd ../zip

/Users/scgordon/MILE2/zip


In [13]:
MetadataDestination=os.path.join(Organization,Collection,Dialect,'xml')
MetadataDestination

'BCO-DMO/GeoTraces/ISO/xml'

In [14]:
os.makedirs(MetadataDestination, exist_ok=True)

In [15]:
MetadataLocation=os.path.join('../collection/',Organization,Collection,Dialect,'xml')

MetadataLocation

'../collection/BCO-DMO/GeoTraces/ISO/xml'

In [16]:
src_files = os.listdir(MetadataLocation)
for file_name in src_files:
    full_file_name = os.path.join(MetadataLocation, file_name)
    if (os.path.isfile(full_file_name)):
        shutil.copy(full_file_name, MetadataDestination)

In [17]:
shutil.make_archive('../upload/metadata', 'zip', os.getcwd())

'/Users/scgordon/MILE2/upload/metadata.zip'

In [18]:
cd ../upload

/Users/scgordon/MILE2/upload


In [124]:
%cd ../
shutil.rmtree('upload')
%cd zip
shutil.rmtree(Organization)
%cd ../data

/Users/scgordon/MILE2
/Users/scgordon/MILE2/zip
/Users/scgordon/MILE2/data


In [54]:
CollectionConceptsDF= pd.read_csv('data.csv')
CollectionConceptsDF

Unnamed: 0,Collection,Dialect,Record,Concept,Content,XPath,DialectDefinition,DocumentLocation
0,GeoTraces,ISO,dataset_3687.xml,Unknown,http://www.isotc211.org/2005/gmi http://www.ng...,/gmi:MI_Metadata/@xsi:schemaLocation,Undefined,/gmi:MI_Metadata/@xsi:schemaLocation
1,GeoTraces,ISO,dataset_3687.xml,Metadata Identifier,http://lod.bco-dmo.org/id/dataset/3687,/gmi:MI_Metadata/gmd:fileIdentifier,/*/gmd:fileIdentifier,/gmi:MI_Metadata/gmd:fileIdentifier[1]
2,GeoTraces,ISO,dataset_3687.xml,Metadata Language,eng; USA,/gmi:MI_Metadata/gmd:language,/*/gmd:language,/gmi:MI_Metadata/gmd:language[1]
3,GeoTraces,ISO,dataset_3687.xml,Unknown,utf8,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...,Undefined,/gmi:MI_Metadata/gmd:characterSet[1]/gmd:MD_Ch...
4,GeoTraces,ISO,dataset_3687.xml,Unknown,http://www.isotc211.org/2005/resources/Codelis...,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...,Undefined,/gmi:MI_Metadata/gmd:characterSet[1]/gmd:MD_Ch...
5,GeoTraces,ISO,dataset_3687.xml,Unknown,utf8,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...,Undefined,/gmi:MI_Metadata/gmd:characterSet[1]/gmd:MD_Ch...
6,GeoTraces,ISO,dataset_3687.xml,Resource Type,dataset,/gmi:MI_Metadata/gmd:hierarchyLevel/gmd:MD_Sco...,/*/gmd:hierarchyLevel/gmd:MD_ScopeCode,/gmi:MI_Metadata/gmd:hierarchyLevel[1]/gmd:MD_...
7,GeoTraces,ISO,dataset_3687.xml,Unknown,"Highest level of data collection, from a commo...",/gmi:MI_Metadata/gmd:hierarchyLevelName/gco:Ch...,Undefined,/gmi:MI_Metadata/gmd:hierarchyLevelName[1]/gco...
8,GeoTraces,ISO,dataset_3687.xml,Metadata Contact,Biological and Chemical Oceanography Data Mana...,/gmi:MI_Metadata/gmd:contact,/*/gmd:contact,/gmi:MI_Metadata/gmd:contact[1]
9,GeoTraces,ISO,dataset_3687.xml,Metadata Modified Date,2012-08-01,/gmi:MI_Metadata/gmd:dateStamp/gco:Date,/*/gmd:dateStamp/gco:Date,/gmi:MI_Metadata/gmd:dateStamp[1]/gco:Date[1]


In [22]:
shutil.copy("data.csv", os.path.join(Organization,Collection+'_'+Dialect+'_'+'data.csv'))

'NASA/GHRC_ISO_data.csv'

### Now that we have our metadata data prepared and stored, we can look at collection analytics, cross collection analytics, and concept verticals.

In [26]:
url = 'http://metadig.nceas.ucsb.edu/metadata/evaluator'
files = {'zipxml': open('metadata.zip', 'rb')}
r = requests.post(url, files=files)
r.text

RuntimeError: The content for this response was already consumed

### Select the notebook that prepares the data for different types of analysis

* [Concept Verticals](ConceptVerticals.ipynb)
* [Quick Evaluation Cross Collection Comparisons](QuickEvaluation-CrossCollectionComparisons.ipynb)
* [Create RAD Data](CreateRADdata.ipynb)
