# Ingest and Prepare a Metadata Collection for Evaluation
### Notebook Goals
* How you can use JupyterLab's GUI to upload a metadata record or a zip of many records and move the metadata to a directory
* Download a metadata collection from a repository or other URL
* normalize namespace location so concepts can be read accurately by the Metadata Evaluation Web Service

In [3]:
# create directories
import os
# compress metadata collection
import zipfile
# download records from a repository and prepare for evaluation
import MDeval as md

#### Describe the metadata. 
* What organization created the records? (Organization)
* What collection are the records from? (Collection)
* What dialect are the records written in? (Dialect)

In [4]:
# variables for function arguments, fill these out
Organization = 'MetadataAnalysis'
Collection = 'MILESworkshop'
Dialect = 'ISO'

# variable created from other variables, defining where to put the metadata
MetadataLocation = (
    './metadata/' + Organization + '/' +
    Collection + '/' + Dialect + '/xml'
)
# creates a directory
os.makedirs(MetadataLocation, exist_ok=True)

#### Upload through the graphical user interface:
* use the file explorer on the left of your screen to navigate to the MILES directory. You'll see a metadata directory. Navigate to the *MetadataLocation* you created 
* Just above the directory and below the Lab toolbar is an arrow pointing up over a horizontal line. Click that and use the file explorer to select your metadata.
* Optional: Use the GUI to upload a zip file called *metadata.zip* to the MILES directory, then unzip to the *MetadataLocation*.

In [13]:
# unpack the ingest of compressed files into the MetadataLocation
with zipfile.ZipFile('./metadata.zip') as z:
    fileNames = z.namelist()
    for fileName in fileNames:
        # only the xml and not other artifacts from the os that compressed it
        if fileName.endswith('xml') and not fileName.startswith("_"):
            content = z.open(fileName).read()
            # write the metadata to the MetadataLocation with the original name
            open(MetadataLocation + fileName, 'wb').write(content)

#### Download records from a repository
* Create a list of record urls and a list of names for the records
* Zip two lists together to supply pairs of arguments declaring the URL of the metadata record and the name of the record. The items will be paired in order, first *urls* with first *xml_files*

In [3]:
# variables for function arguments

# locations of metadata
urls = ["https://data.datacite.org/application/vnd.datacite.datacite+xml/10.1016/j.ecoinf.2017.09.005",
        "https://data.datacite.org/application/vnd.datacite.datacite+xml/10.1016/j.ecoinf.2017.09.006"
       ]

# names you want to give the records
xml_files = [MetadataLocation + '/' + '10.1016.j.ecoinf.2017.09.005' + '.xml',
             MetadataLocation + '/' + '10.1016.j.ecoinf.2017.09.006' + '.xml'
            ]   
# MDeval function to retrieve records
md.get_records(urls, xml_files, well_formed=False)

#### Ensure namespace conformance
A note on namespaces - the transform identifies dialect from the default or explicit schema location. This means if I have records referencing DataCite kernel 4 instead of kernel 3, the conceptual content of the record will not be recognized. The records must be altered to point to kernel 3.

In [9]:
# variables for function arguments
oldNamespaceLocation = 'xmlns="http://datacite.org/schema/kernel-4"'
newNamespaceLocation = 'xmlns="http://datacite.org/schema/kernel-3"'
# MDeval function to find and replace the old with the new.
md.normalizeNamespace(MetadataLocation, newNamespaceLocation, oldNamespaceLocation)

./metadata/MetadataAnalysis/MILESworkshop/DCITE/xml/10.1016.j.ecoinf.2017.09.006.xml is normalized
./metadata/MetadataAnalysis/MILESworkshop/DCITE/xml/10.1016.j.ecoinf.2017.09.005.xml is normalized


#### Now you are ready to evaluate, analyze, and create reports on your own metadata!
[Next Notebook: Create Recommendation Report for a Metadata Collection](./01.CreateRecReport.ipynb)