##  This notebook allows the user to select XML collections and zip them up to send to a service that runs a transform on them and returns a simple CSV made up of six data points. The data included is the Collection name, Dialect name, Record name, Concept name, Content, Xpath location, and the Dialect Definition for the concept. 

## This CSV contains a row for each concept that is found, so some locations may fulfill multiple concepts. A good example of this are the cncepts Keyword and Place Keyword. Every Place Keyword is also a Keyword, so the row would repeat with a different Concept name. It also contains a row for each undefined node that contains text, marking these rows with an Unknown in the Concept column. 

## This data can be used in a variety of analyses including RAD and QuickE as well as Concept Verticals. It can also be used to teach the system dialect definitions for concepts that are currently unknown by exposing all of the content at undefined nodes. 

In [27]:
%%HTML
<img src=https://image.slidesharecdn.com/scgordonesipwinter2017-170125170939/95/recommendations-analysis-dashboard-1-1024.jpg>

In [23]:
import pandas as pd
import os
from os import walk
import shutil
from ipywidgets import *
import ipywidgets as widgets

In [24]:
Organizations = []
for (dirpath, dirnames, filenames) in walk('../collection/'):
    Organizations.extend(dirnames)
    break  

In [25]:
def OrganizationChoices(organization):
    global OrganizationChoice
    global Organization
    Organization=organization
    print("Organization of the collection is", Organization)


In [26]:
interactive(OrganizationChoices, organization=Organizations)

Organization of the collection is NASA


In [6]:
Collections = []
for (dirpath, dirnames, filenames) in walk(os.path.join('../collection',Organization)):
    Collections.extend(dirnames)
    break 
Collections

['GES_DISC', 'GHRC', 'LARC', 'NSIDC']

In [29]:
def CollectionChoices(collection):
    global CollectionChoice
    global Collection
    Collection=collection

In [8]:
interactive(CollectionChoices, collection=Collections)

In [9]:
Dialects = []
for (dirpath, dirnames, filenames) in walk(os.path.join('../collection',Organization,Collection)):
    Dialects.extend(dirnames)
    break 
dialectList=Dialects


In [10]:
def dialectChoice(dialect):
    global Dialect
    Dialect=dialect
    print("Dialect of the collection is", Dialect)


In [11]:
interactive(dialectChoice,dialect=dialectList)

Dialect of the collection is ISO


In [12]:
cd ../zip

/Users/scgordon/MILE2/zip


In [13]:
MetadataDestination=os.path.join(Organization,Collection,Dialect,'xml')
MetadataDestination

'NASA/GHRC/ISO/xml'

In [14]:
os.makedirs(MetadataDestination, exist_ok=True)

In [15]:
MetadataLocation=os.path.join('../collection/',Organization,Collection,Dialect,'xml')

MetadataLocation

'../collection/NASA/GHRC/ISO/xml'

In [16]:
src_files = os.listdir(MetadataLocation)
for file_name in src_files:
    full_file_name = os.path.join(MetadataLocation, file_name)
    if (os.path.isfile(full_file_name)):
        shutil.copy(full_file_name, MetadataDestination)

In [17]:
shutil.make_archive('../upload/metadata', 'zip', os.getcwd())

'/Users/scgordon/MILE2/upload/metadata.zip'

In [18]:
cd ../upload

/Users/scgordon/MILE2/upload


In [19]:
%%bash
curl -o ../data/data.csv -F "zipxml=@metadata.zip" http://metadig.nceas.ucsb.edu/metadata/evaluator

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  273k    0     0  100  273k      0   243k  0:00:01  0:00:01 --:--:--  263k100  273k    0     0  100  273k      0   128k  0:00:02  0:00:02 --:--:--  133k100  273k    0     0  100  273k      0  89380  0:00:03  0:00:03 --:--:-- 91851100  273k    0     0  100  273k      0  67736  0:00:04  0:00:04 --:--:-- 69155100  273k    0     0  100  273k      0  54481  0:00:05  0:00:05 --:--:-- 55398 23 2587k   14  329k  100  273k  59377  49207  0:00:39  0:00:05  0:00:34 12662100 2587k  100 2314k  100  273k   394k  47711  0:00:05  0:00:05 --:--:--  545k


In [20]:
%cd ../
shutil.rmtree('upload')
%cd zip
shutil.rmtree(Organization)
%cd ../data

/Users/scgordon/MILE2
/Users/scgordon/MILE2/zip
/Users/scgordon/MILE2/data


In [21]:
CollectionConceptsDF= pd.read_csv('data.csv')
CollectionConceptsDF

Unnamed: 0,Collection,Dialect,Record,Concept,Content,XPath,DialectDefinitions
0,GHRC,ISO,rssmif08w.xml,Metadata Identifier,gov.nasa.echo:RSS SSM/I OCEAN PRODUCT GRIDS WE...,/gmi:MI_Metadata/gmd:fileIdentifier/gco:Charac...,/*/gmd:fileIdentifier//*
1,GHRC,ISO,rssmif08w.xml,Metadata Language,eng,/gmi:MI_Metadata/gmd:language/gco:CharacterString,/*/gmd:language//*
2,GHRC,ISO,rssmif08w.xml,Unknown,utf8,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...
3,GHRC,ISO,rssmif08w.xml,Unknown,http://www.ngdc.noaa.gov/metadata/published/xs...,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...
4,GHRC,ISO,rssmif08w.xml,Unknown,utf8,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...,/gmi:MI_Metadata/gmd:characterSet/gmd:MD_Chara...
5,GHRC,ISO,rssmif08w.xml,Resource Type,series,/gmi:MI_Metadata/gmd:hierarchyLevel/gmd:MD_Sco...,/*/gmd:hierarchyLevel/gmd:MD_ScopeCode
6,GHRC,ISO,rssmif08w.xml,Metadata Contact,GHRC pointOfContact,/gmi:MI_Metadata/gmd:contact,/*/gmd:contact
7,GHRC,ISO,rssmif08w.xml,Metadata Modified Date,2015-01-20T09:58:31.006-05:00,/gmi:MI_Metadata/gmd:dateStamp/gco:DateTime,/*/gmd:dateStamp/gco:DateTime
8,GHRC,ISO,rssmif08w.xml,Metadata Dates,2015-01-20T09:58:31.006-05:00,/gmi:MI_Metadata/gmd:dateStamp/gco:DateTime,/*/gmd:dateStamp/gco:DateTime
9,GHRC,ISO,rssmif08w.xml,Metadata Standard Citation,ISO 19115-2 Geographic Information - Metadata ...,/gmi:MI_Metadata/gmd:metadataStandardName,/*/gmd:metadataStandardName


In [22]:
shutil.copy("data.csv", os.path.join(Organization,Collection+'_'+Dialect+'_'+'data.csv'))

'NASA/GHRC_ISO_data.csv'

### Now that we have our metadata data prepared and stored, we can look at collection analytics, cross collection analytics, and concept verticals.

In [30]:
import requests

In [33]:
requests.get('http://metadig.nceas.ucsb.edu/metadata/evaluator',-o, '../data/data.csv', -F, "zipxml=@metadata.zip") 

NameError: name 'o' is not defined

In [None]:
#figure out how to link other notebooks, especially nice if it's possible to pass the current dataframe