# Creating and Transforming ISO Metadata for Geospatial Datasets

### XML

Many applications currently used to manage geospatial data require metadata structured in XML. Increasingly these metadata are created with automated tools in order to facilitate standarization, interoperability, etc.
The following examples demonstrate metadata creation and management operations for the Sierra Nevada Conservancy GIS Data collection. Two workflows are highlighted, one using straight ISO19139/19110 XML and one using an ArcGIS to ISO conversion. 


### XML Templates

Cataloging a collection of data layers is best managed using an XML template in order to provide both structural formatting and placeholders values for required metadata.

The folder called *templates* contains XML templates for ArcGIS (*template_arc*), ISO 19139 (*template_iso*), ISO 19110 (*template_19110*). There is also an html representation of the ISO19139 template in order to inspect the required and recommended elements.

The following metadata values are hardcoded into the ArcGIS and ISO 19139 templates because they occur in every record: 

|Element                                           |Value                            |
|:---------------------------------------------------|:----------------------------------|
| Theme Keyword Thesaurus Citation Name             | lcsh                             |
| Place Keyword Thesaurus Citation Name             | geonames                         |
| Metadata Contact Organization                     | Stanford Geospatial Center       |
| Distributor                                       | Stanford Geospatial Center       |
| Digital Transfer Options Online Resource Protocol | http                             |
| Feature Catalog Title (for shapefile data)        | Entity and Attribute Information |

Examples of other elements which serve as placeholders for required metadata.

|Element |Value|
|:-------|:----------------------------------|
| Abstract             | ABSTRACT                            |
| Place Keywords             | PLACE                         |
| Theme Keywords                   | THEME       |
| Identifier                                      | URL       |
| Metadata FIle Identiifer | METADATAID                           |
 



### Getting Metadata from Data

Certain metadata values can be read directly from the data, including coordinates, spatial reference systems, and geometry type.

The script `getDataInfo.py` uses gdal/ogr to read data layers and create a list of files along with relevant metadata.

Open the file *layers.csv* to view the output. 

In [1]:
import pandas as pd
layers = pd.read_csv('layers.csv', index_col='filename')
layers

Unnamed: 0_level_0,spatial reference,type,west,south,east,north,format,identifier,title,description,...,temporal,subject,topicCat,spatialSubject,fc_uuid,language,collectionTitle,collectionId,access,rights
filename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
American_River_Watershed.shp,4326,Polygon,-121.284429,38.618599,-119.986716,39.315901,Shapefile,,,,...,,,,,,,,,,
CarsonWalkerMonoAdobe_Watershed.shp,3310,Polygon,784.3927,-32970.7471,130937.2934,99178.7674,Shapefile,,,,...,,,,,,,,,,
Feather_Watershed.shp,3310,Polygon,-135103.6579,163832.2584,-7926.0376,276396.8357,Shapefile,,,,...,,,,,,,,,,
KingsKaweahTule_Watershed.shp,3310,Polygon,47999.8252,-319571.0384,159850.9914,-89530.0759,Shapefile,,,,...,,,,,,,,,,
Sierra_Nevada_Conservancy_Boundary.shp,4326,Polygon,-122.247666,35.064862,-117.735384,41.994919,Shapefile,,,,...,,,,,,,,,,
Southern_Sierra_Fisher_Conservation_Area.shp,4326,Polygon,-119.831339,35.501965,-118.088351,37.660563,Shapefile,,,,...,,,,,,,,,,
Stanislaus_Watershed.shp,3310,Polygon,-58010.4921,-22916.9251,33890.9959,55585.106,Shapefile,,,,...,,,,,,,,,,
Tehachapi_Watershed.shp,3310,Polygon,142672.9753,-324766.5647,198675.9959,-195687.6031,Shapefile,,,,...,,,,,,,,,,
TruckeeEagle_LakesSurprise_Valley_Watershed.shp,3310,Polygon,-106140.7082,125859.222,371.6801,441853.2629,Shapefile,,,,...,,,,,,,,,,
Tuolumne_Watershed.shp,3310,Polygon,-52575.0788,-39908.3086,69770.8839,23211.9961,Shapefile,,,,...,,,,,,,,,,


## Creating ISO 19139 XML

#### Importing a Template

The following script finds layers that match a particular format (shapefiles) and makes a copy of the template as a new document matching the layer name. This will overwrite any existing metadata file with the same name.

In [23]:
%run iso19139template.py

American_River_Watershed.shp
CarsonWalkerMonoAdobe_Watershed.shp
Feather_Watershed.shp
KingsKaweahTule_Watershed.shp
Sierra_Nevada_Conservancy_Boundary.shp
Southern_Sierra_Fisher_Conservation_Area.shp
Stanislaus_Watershed.shp
Tehachapi_Watershed.shp
TruckeeEagle_LakesSurprise_Valley_Watershed.shp
Tuolumne_Watershed.shp
Upper_Sacramento_Watershed.shp
Upper_San_Joaquin_Watershed.shp


#### Adding Descriptive Metadata

The file *metadata.csv* contains a completed set of metadata for all 13 layers in the collection. 

In [4]:
metadata = pd.read_csv('metadata.csv', index_col='filename')
metadata.head()

Unnamed: 0_level_0,spatial reference,type,west,south,east,north,format,identifier,title,description,...,temporal,subject,topicCat,spatialSubject,fc_uuid,language,collectionTitle,collectionId,access,rights
filename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
American_River_Watershed.shp,4326,Polygon,-121.284429,38.618599,-119.986716,39.315901,Shapefile,md558xs5958,"American River Watershed, Sierra Nevada, 2017",This polygon shapefile illustrates the boundar...,...,9/12/17,Watersheds,Boundaries,Sierra Nevada (Calif. and Nev.),6c034c54-7725-43ac-96f4-b73ba79afb64,eng,Sierra Nevada Conservancy GIS Data,dk493qv6144,Public,These data are made available as a public serv...
CarsonWalkerMonoAdobe_Watershed.shp,3310,Polygon,784.3927,-32970.7471,130937.2934,99178.7674,Shapefile,ss831fn6261,"Carson-Walker-Mono-Adobe Watershed, Sierra Nev...",This polygon shapefile illustrates the boundar...,...,9/13/17,Watersheds,Boundaries,Sierra Nevada (Calif. and Nev.),a64d0032-edfe-4f69-8c0b-5967b8d0e5ed,eng,Sierra Nevada Conservancy GIS Data,dk493qv6144,Public,These data are made available as a public serv...
Feather_Watershed.shp,3310,Polygon,-135103.6579,163832.2584,-7926.0376,276396.8357,Shapefile,pz495kz2134,"Feather Watershed, Sierra Nevada, 2017",This polygon shapefile illustrates the boundar...,...,9/13/17,Watersheds,Boundaries,Sierra Nevada (Calif. and Nev.),9fadc859-701f-4a39-94f7-7b4085cb8907,eng,Sierra Nevada Conservancy GIS Data,dk493qv6144,Public,These data are made available as a public serv...
KingsKaweahTule_Watershed.shp,3310,Polygon,47999.8252,-319571.0384,159850.9914,-89530.0759,Shapefile,vg503wx0296,"Kings-Kaweah-Tule Watershed, Sierra Nevada, 2017",This polygon shapefile illustrates the boundar...,...,9/13/17,Watersheds,Boundaries,Sierra Nevada (Calif. and Nev.),7214bd31-3253-4c7b-9b29-0fbe323233da,eng,Sierra Nevada Conservancy GIS Data,dk493qv6144,Public,These data are made available as a public serv...
Lake_Tahoe_West_Restoration_Partnership.shp,4326,Polygon,-120.429216,38.867141,-120.077265,39.268366,Shapefile,fq078pp8770,"Lake Tahoe West Restoration Partnership, Sierr...",This polygon shapefile illustrates the boundar...,...,8/29/17,Watersheds,Boundaries,Sierra Nevada (Calif. and Nev.),a80a71fc-22f1-47b4-96d0-7ef54e2c7e5c,eng,Sierra Nevada Conservancy GIS Data,dk493qv6144,Public,These data are made available as a public serv...


Create a dictionary containing the metadata for each layer:

In [6]:
import csv
metadict = {}
reader = csv.reader(open('metadata.csv', 'r'))
next(reader)
for rows in reader:
    filename = rows[0]
    srs = rows[1]
    geomType = rows[2]
    westbc = rows[3]
    southbc = rows[4]
    eastbc = rows[5]
    northbc = rows[6]
    fileFormat = rows[7]
    identifier = rows[8]
    title = rows[9]
    description = rows[10]
    originator = rows[11]
    publisher = rows[12]
    issueDate = rows[13]
    temporal = rows[14]
    subjects = rows[15].split('|')
    topicCat = rows[16]
    spatialSubjects = rows[17].split('|')
    fcUUID = rows[18]
    language = rows[19]
    collTitle = rows[20]
    collId = rows[21]
    access = rows[22]
    usageRights = rows[23]
    metadict[filename] = srs, geomType, westbc, southbc, eastbc, northbc, fileFormat, identifier, title, description, originator, publisher, issueDate, temporal, subjects, topicCat, spatialSubjects, fcUUID, collTitle, collId, access, usageRights
    
for k, v in metadict.items():
    print (k, v[12])

American_River_Watershed.shp 9/12/17
CarsonWalkerMonoAdobe_Watershed.shp 9/13/17
Feather_Watershed.shp 9/13/17
KingsKaweahTule_Watershed.shp 9/13/17
Lake_Tahoe_West_Restoration_Partnership.shp 8/29/17
Sierra_Nevada_Conservancy_Boundary.shp 3/1/17
Southern_Sierra_Fisher_Conservation_Area.shp 8/25/17
Stanislaus_Watershed.shp 9/13/17
Tehachapi_Watershed.shp 9/13/17
TruckeeEagle_LakesSurprise_Valley_Watershed.shp 9/13/17
Tuolumne_Watershed.shp 9/13/17
Upper_Sacramento_Watershed.shp 9/12/17
Upper_San_Joaquin_Watershed.shp 9/12/17


#### Dates and Times

The dateTime module can work with date and time data types. This is useful for reformatting date and time values.

In [7]:
from datetime import datetime
import time

old_date = '11/1/18'
new_date = datetime.strptime(old_date, '%m/%d/%y')
long_date = datetime.strftime(new_date,"%b %d %Y %H:%M:%S")
new_date = datetime.strftime(new_date,'%Y-%m-%dT%H:%M:%S')
print ('Old date: ' + old_date)
print ('New date: ' + new_date)
print ('Long date: ' + long_date)
print ('Current Date and Time: ' + time.ctime())

Old date: 11/1/18
New date: 2018-11-01T00:00:00
Long date: Nov 01 2018 00:00:00
Current Date and Time: Mon Feb  3 06:49:08 2020


#### Apply Metadata

The following script finds elements in the layer metadata and replaces them with values from the csv file.

In [24]:
%run iso19139metadata.py

American_River_Watershed.xml
CarsonWalkerMonoAdobe_Watershed.xml
Feather_Watershed.xml
KingsKaweahTule_Watershed.xml
Sierra_Nevada_Conservancy_Boundary.xml
Southern_Sierra_Fisher_Conservation_Area.xml
Stanislaus_Watershed.xml
Tehachapi_Watershed.xml
TruckeeEagle_LakesSurprise_Valley_Watershed.xml
Tuolumne_Watershed.xml
Upper_Sacramento_Watershed.xml
Upper_San_Joaquin_Watershed.xml


### Codebook Metadata (Feature Catalogs)

GIS data often contain feature attributes which can be crucial to understanding, interpreting, and reusing the content. These feature set and descriptions are sometimes referred to as feature catalogs, entity and attribute information, codebooks, or more generally data dictionaries.

ISO uses the 19110 Standard to record metadata for feature catalogs. This is a separate document which is linked to the 19139 record through the *uuid* element.

The file *attributes.csv* contains a set of attribute labels and definitions for the entire collection. The script `iso19110.py` uses this file, as well as *metadata.csv* to create 19110 XML.

    > python iso19110.py

## Creating ArcGISXML using the 19139 Standard

Metadata created using ArcCatalog is stored in the ArcGIS XML Metadata format. ArcGIS XML is composed of elements taken from standards currently supported in ArcGIS Desktop: FGDC, ISO 19139, NAP, and INSPIRE. ArcGIS XML documents are stored as part of the data and are recognizable by their file extensions:

* GeoTIFF: _tif.xml_
* Shapefile: _shp.xml_ 
* Grid: _metadata.xml_ 

Packaging data and metadata together allows for automatic updates ('synchronization') to occur in the metadata whenever the data are updated or refreshed. Synchronized elements contain the attribute: `Sync="TRUE"`

Example elements for a geographic extent:
<br>
`<GeoBndBox esriExtentType="search">
<westBL Sync="TRUE">-119.058163</westBL>
<eastBL Sync="TRUE">-118.974425</eastBL>
<northBL Sync="TRUE">35.415833</northBL>
<southBL Sync="TRUE">35.319713</southBL>
</GeoBndBox>`

In ArcCatalog, certain metadata are synced directly from the data: geographic extent, spatial reference, attribute labels, vector spatial representation, file format, etc.

### Import Template

Importing *template_arc*:


In [19]:
%run -i arcTemplate.py

American_River_Watershed.shp
CarsonWalkerMonoAdobe_Watershed.shp
Feather_Watershed.shp
KingsKaweahTule_Watershed.shp
Sierra_Nevada_Conservancy_Boundary.shp
Southern_Sierra_Fisher_Conservation_Area.shp
Stanislaus_Watershed.shp
Tehachapi_Watershed.shp
TruckeeEagle_LakesSurprise_Valley_Watershed.shp
Tuolumne_Watershed.shp
Upper_Sacramento_Watershed.shp
Upper_San_Joaquin_Watershed.shp


### Add Metadata

Adding values from *metadata.csv*:

In [20]:
%run -i arcMetadata.py

American_River_Watershed.shp.xml
CarsonWalkerMonoAdobe_Watershed.shp.xml
Feather_Watershed.shp.xml
KingsKaweahTule_Watershed.shp.xml
Sierra_Nevada_Conservancy_Boundary.shp.xml
Southern_Sierra_Fisher_Conservation_Area.shp.xml
Stanislaus_Watershed.shp.xml
Tehachapi_Watershed.shp.xml
TruckeeEagle_LakesSurprise_Valley_Watershed.shp.xml
Tuolumne_Watershed.shp.xml
Upper_Sacramento_Watershed.shp.xml
Upper_San_Joaquin_Watershed.shp.xml


Create the Feature Catalog Metadata

In [12]:
%run arcFeatureCat.py

American_River_Watershed.shp.xml
CarsonWalkerMonoAdobe_Watershed.shp.xml
Feather_Watershed.shp.xml
KingsKaweahTule_Watershed.shp.xml
Sierra_Nevada_Conservancy_Boundary.shp.xml
Southern_Sierra_Fisher_Conservation_Area.shp.xml
Stanislaus_Watershed.shp.xml
Tehachapi_Watershed.shp.xml
TruckeeEagle_LakesSurprise_Valley_Watershed.shp.xml
Tuolumne_Watershed.shp.xml
Upper_Sacramento_Watershed.shp.xml
Upper_San_Joaquin_Watershed.shp.xml
American_River_Watershed.shp.xml
CarsonWalkerMonoAdobe_Watershed.shp.xml
Feather_Watershed.shp.xml
KingsKaweahTule_Watershed.shp.xml
Sierra_Nevada_Conservancy_Boundary.shp.xml
Southern_Sierra_Fisher_Conservation_Area.shp.xml
Stanislaus_Watershed.shp.xml
Tehachapi_Watershed.shp.xml
TruckeeEagle_LakesSurprise_Valley_Watershed.shp.xml
Tuolumne_Watershed.shp.xml
Upper_Sacramento_Watershed.shp.xml
Upper_San_Joaquin_Watershed.shp.xml


#### Create JPEG

For polygon shapefiles, create a thumbnail image and write the byte string into the thumbnail element.

In [13]:
%run -i createJPEG.py

American_River_Watershed.shp
CarsonWalkerMonoAdobe_Watershed.shp
Feather_Watershed.shp
KingsKaweahTule_Watershed.shp
Sierra_Nevada_Conservancy_Boundary.shp
Southern_Sierra_Fisher_Conservation_Area.shp
Stanislaus_Watershed.shp
Tehachapi_Watershed.shp
TruckeeEagle_LakesSurprise_Valley_Watershed.shp
Tuolumne_Watershed.shp
Upper_Sacramento_Watershed.shp
Upper_San_Joaquin_Watershed.shp


### Process Files with XSLT

To transform ArcGIS Metadata to ISO19139, run this command from the terminal:

In [16]:
#for file in data/*/*shp.xml ; do output_path="output/$(basename "$file")"; xsltproc ARCGIS2ISO19139.xsl "$file"  > "$output_path"; done

### Rename Files

In [28]:
for dirName, subDirs, fileNames in os.walk('./output/iso19139'):
    for f in fileNames:
        oldFilename = os.path.join(dirName, f)
        f = f.replace('.shp','')
        newFilename = os.path.join(dirName, f)
        print (oldFilename, newFilename)
        os.rename(oldFilename, newFilename)

./output/iso19139/American_River_Watershed.shp.xml ./output/iso19139/American_River_Watershed.xml
./output/iso19139/American_River_Watershed.xml ./output/iso19139/American_River_Watershed.xml
./output/iso19139/CarsonWalkerMonoAdobe_Watershed.shp.xml ./output/iso19139/CarsonWalkerMonoAdobe_Watershed.xml
./output/iso19139/CarsonWalkerMonoAdobe_Watershed.xml ./output/iso19139/CarsonWalkerMonoAdobe_Watershed.xml
./output/iso19139/Feather_Watershed.shp.xml ./output/iso19139/Feather_Watershed.xml
./output/iso19139/Feather_Watershed.xml ./output/iso19139/Feather_Watershed.xml
./output/iso19139/KingsKaweahTule_Watershed.shp.xml ./output/iso19139/KingsKaweahTule_Watershed.xml
./output/iso19139/KingsKaweahTule_Watershed.xml ./output/iso19139/KingsKaweahTule_Watershed.xml
./output/iso19139/Sierra_Nevada_Conservancy_Boundary.shp.xml ./output/iso19139/Sierra_Nevada_Conservancy_Boundary.xml
./output/iso19139/Sierra_Nevada_Conservancy_Boundary.xml ./output/iso19139/Sierra_Nevada_Conservancy_Boundary.

In [None]:
#for file in data/*/*shp.xml ; do output_path="output/iso19110/$(basename "$file")"; xsltproc arcgis_to_19110.xsl "$file"  > "$output_path"; done

Rename Files

In [27]:
for dirName, subDirs, fileNames in os.walk('./output/iso19110'):
    for f in fileNames:
        oldFilename = os.path.join(dirName, f)
        f = f.replace('.shp','_19110')
        newFilename = os.path.join(dirName, f)
        print (oldFilename, newFilename)
        os.rename(oldFilename, newFilename)

./output/iso19110/American_River_Watershed.shp.xml ./output/iso19110/American_River_Watershed_19110.xml
./output/iso19110/CarsonWalkerMonoAdobe_Watershed.shp.xml ./output/iso19110/CarsonWalkerMonoAdobe_Watershed_19110.xml
./output/iso19110/Feather_Watershed.shp.xml ./output/iso19110/Feather_Watershed_19110.xml
./output/iso19110/KingsKaweahTule_Watershed.shp.xml ./output/iso19110/KingsKaweahTule_Watershed_19110.xml
./output/iso19110/Sierra_Nevada_Conservancy_Boundary.shp.xml ./output/iso19110/Sierra_Nevada_Conservancy_Boundary_19110.xml
./output/iso19110/Southern_Sierra_Fisher_Conservation_Area.shp.xml ./output/iso19110/Southern_Sierra_Fisher_Conservation_Area_19110.xml
./output/iso19110/Stanislaus_Watershed.shp.xml ./output/iso19110/Stanislaus_Watershed_19110.xml
./output/iso19110/Tehachapi_Watershed.shp.xml ./output/iso19110/Tehachapi_Watershed_19110.xml
./output/iso19110/TruckeeEagle_LakesSurprise_Valley_Watershed.shp.xml ./output/iso19110/TruckeeEagle_LakesSurprise_Valley_Watershed_

### ISO to GeoBlacklight

Run the following command to transform the ISO metadata to GeoBlacklight json:

In [None]:
for file in output/iso19139/*.xml ; do output_path="output/geoblacklight/$(basename "$file")"; xsltproc iso2geoBL.xsl "$file"  > "$output_path"; done

Rename Files

In [31]:
for dirName, subDirs, fileNames in os.walk('output/geoblacklight'):
    for f in fileNames:
        oldFilename = os.path.join(dirName, f)
        f = f.replace('.xml','.json')
        newFilename = os.path.join(dirName, f)
        print (oldFilename, newFilename)
        os.rename(oldFilename, newFilename)

output/geoblacklight/.DS_Store output/geoblacklight/.DS_Store
output/geoblacklight/American_River_Watershed.xml output/geoblacklight/American_River_Watershed.json
output/geoblacklight/CarsonWalkerMonoAdobe_Watershed.xml output/geoblacklight/CarsonWalkerMonoAdobe_Watershed.json
output/geoblacklight/Feather_Watershed.xml output/geoblacklight/Feather_Watershed.json
output/geoblacklight/KingsKaweahTule_Watershed.xml output/geoblacklight/KingsKaweahTule_Watershed.json
output/geoblacklight/Sierra_Nevada_Conservancy_Boundary.xml output/geoblacklight/Sierra_Nevada_Conservancy_Boundary.json
output/geoblacklight/Southern_Sierra_Fisher_Conservation_Area.xml output/geoblacklight/Southern_Sierra_Fisher_Conservation_Area.json
output/geoblacklight/Stanislaus_Watershed.xml output/geoblacklight/Stanislaus_Watershed.json
output/geoblacklight/Tehachapi_Watershed.xml output/geoblacklight/Tehachapi_Watershed.json
output/geoblacklight/TruckeeEagle_LakesSurprise_Valley_Watershed.xml output/geoblacklight/Truc