# Metadata

## Overview
At its core, metadata is data about data.  In day-to-day GIS data management workflows, data is created, updated,
archived and used for various decision support systems.  Part of the information management lifecycle of data includes maintenance, protection and preservation, as well as facilitating discovery.  Metadata serves to meet these requirements.

## Core concepts
Documentation is critical in order to describe:

- who is responsible and who to contact for the data
- what the data represents (features, grids, etc.)
- where the data is located
- when the data was created, updated and what time span is the data based on
- why the data exists
- how the data was generated

## Standards
There are numerous standards that exist in support of documenting data.  The [Dublin Core](http://dublincore.org) standard provides 16 core elements to describe any resource.  The [OGC Catalogue Service for the Web](https://opengeospatial.org/standards/cat) leverages Dublin Core in providing a core metadata model for geospatial catalogues and search.

The geospatial community has had long standing efforts around developing metadata standards for geospatial data, including (but not limited to) [FGDC CSDGM](https://www.fgdc.gov/metadata/csdgm-standard), [DIF](https://earthdata.nasa.gov/esdis/eso/standards-and-references/directory-interchange-format-dif-standard), and [ISO 19115](https://www.iso.org/standard/26020.html).  Using these standards to generate geospatial metadata provides value for easy integration into geospatial search catalogues and desktop GIS tools to help organize, categorize and find geospatial data.  The challenge of geospatial metadata remains in its complexity.  Tools are need to easily create and manage geospatial metadata.

## Easy metadata workflows with pygeometa
[pygeometa](https://github.com/geopython/pygeometa) provides a lightweight toolkit allowing users to easily create geospatial metadata in standards-based formats using simple configuration files (affectionately called metadata control files [MCF]).  Leveraging the simple but powerful YAML format, pygeometa can generate metadata in numerous standards.  Users can also create their own custom metadata formats which can be plugged into pygeometa for custom metadata format output.

For developers, pygeometa provides an intuitive Python API that allows Python developers to tightly couple metadata generation within their systems.

## Creating metadata


Let's walk through examples of using pygeometa on the command line as well the API.

Let's start with the CLI.

In [8]:
!pygeometa

Usage: pygeometa [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  generate-metadata  generate metadata
  migrate


In [29]:
!cat ../data/countries.yml

mcf:
    version: 1.0

metadata:
    identifier: naturalearth-countries
    language: en
    charset: utf8
    hierarchylevel: dataset
    datestamp: 2018-05-21
    dataseturi: https://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries

spatial:
    datatype: vector
    geomtype: point
    crs: 4326
    bbox: -180,-90,180,90

identification:
    language: en
    charset: utf8
    title: Admin 0 - Countries
    abstract: Countries distinguish between metropolitan (homeland) and independent and semi-independent portions of sovereign states. If you want to see the dependent overseas regions broken out (like in ISO codes, see France for example), use map units instead.  Each country is coded with a world region that roughly follows the United Nations setup.  Includes some thematic data from the United Nations, U.S. Central Intelligence Agency, and elsewhere.
    dates:
        creation: 2009-12-02
        publication: 2009-12-02
    key

In [1]:
!pygeometa generate-metadata --mcf ../data/countries.yml --schema iso19139 --output /tmp/countries.xml

  dict_ = yaml.load(fh)


In [37]:
!cat /tmp/countries.xml

<?xml version="1.0" ?>
<gmd:MD_Metadata xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gml="http://www.opengis.net/gml" xmlns:gmx="http://www.isotc211.org/2005/gmx" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://www.isotc211.org/2005/gmd/gmd.xsd http://www.isotc211.org/2005/gmx http://www.isotc211.org/2005/gmx/gmx.xsd">
  <gmd:fileIdentifier>
    <gco:CharacterString>naturalearth-countries</gco:CharacterString>
  </gmd:fileIdentifier>
  <gmd:language>
    <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="en" codeSpace="ISO 639-2">en</gmd:LanguageCode>
  </gmd:language>
  <gmd:characterSet>
    <gmd:MD_CharacterSetCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode" codeListValue="utf8" codeSpace="ISOTC211/19115">utf8</gmd:MD_CharacterSetCode>
  </gmd:characterSet>
  <gmd:hierarchyL

Now let's use the API to make some updates

In [46]:
from pygeometa.core import read_mcf, render_template
mdata = read_mcf('../data/countries.yml')
mdata

{'mcf': {'version': 1.0},
 'metadata': {'identifier': 'naturalearth-countries',
  'language': 'en',
  'charset': 'utf8',
  'hierarchylevel': 'dataset',
  'datestamp': datetime.date(2018, 5, 21),
  'dataseturi': 'https://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries'},
 'spatial': {'datatype': 'vector',
  'geomtype': 'point',
  'crs': 4326,
  'bbox': '-180,-90,180,90'},
 'identification': {'language': 'en',
  'charset': 'utf8',
  'title': 'Admin 0 - Countries',
  'abstract': 'Countries distinguish between metropolitan (homeland) and independent and semi-independent portions of sovereign states. If you want to see the dependent overseas regions broken out (like in ISO codes, see France for example), use map units instead.  Each country is coded with a world region that roughly follows the United Nations setup.  Includes some thematic data from the United Nations, U.S. Central Intelligence Agency, and elsewhere.',
  'dates': {'creation': datetime.date(200

In [47]:
mdata['identification']['title']

'Admin 0 - Countries'

Let's change the dataset title

In [48]:
mdata['identification']['title'] = 'Countries of the world'

In [49]:
xml_string = render_template(mdata, schema='iso19139')

In [51]:
print(xml_string)

<?xml version="1.0" ?>
<gmd:MD_Metadata xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gml="http://www.opengis.net/gml" xmlns:gmx="http://www.isotc211.org/2005/gmx" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://www.isotc211.org/2005/gmd/gmd.xsd http://www.isotc211.org/2005/gmx http://www.isotc211.org/2005/gmx/gmx.xsd">
  <gmd:fileIdentifier>
    <gco:CharacterString>naturalearth-countries</gco:CharacterString>
  </gmd:fileIdentifier>
  <gmd:language>
    <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="en" codeSpace="ISO 639-2">en</gmd:LanguageCode>
  </gmd:language>
  <gmd:characterSet>
    <gmd:MD_CharacterSetCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode" codeListValue="utf8" codeSpace="ISOTC211/19115">utf8</gmd:MD_CharacterSetCode>
  </gmd:characterSet>
  <gmd:hierarchyLevel>
    <

Now try updating the `mdata` variable (`dict`) with updated values and use the pygeometa API to generate a new ISO XML.

[<- Visualization](07-visualization.ipynb) | [Publishing ->](09-publishing.ipynb)