# MT Metadata Example 01

## Basics

Metadata is important to describe data, its also a pain to keep track of and standardize.  `mt_metadata` was written to make it easier to standardize metadata, specifically MT metadata, though not exclusively.  There are standard ways of created schema's for metadata for example in XML or JSON.  We decided to be agnostic to those formats and internally use Python's built-in dictionary object.  Provided are tools to read/write XML and JSON formats if desired.  

All values input are validated against the standards and makes sure the data type is correct. More on that below.

Here basic usages of the `mt_metadata` module are demonstrated.  

## Base Class

`mt_metadata.base.Base` is the base for which all metadata objects are built upon.  `Base` provides convenience methods to input and output metadata in different formats XML, JSON, Python dictionary, Pandas Series.  It also provides functions to help the user understand what's inside.

The underlying attribute of `Base` that controls how inputs are validated and what keywords are included is `_attr_dict`.  This dictionary can be input manually, but are usually loaded automatically when called.  `Base._attr_dict = {}` to begin with.  To build useful versions of `Base` an `_attr_dict` needs to be input, commonly on initialization.  

The metadata objects that inherit `Base` have the `_attr_dict` input on initialization from JSON files that provide the keywords and attributes of those keywords that describe how to validate them.  For example

```
{
    "name": {
        "type": "string",
        "required": true,
        "style": "free form",
        "units": null,
        "description": "Persons name, should be full first and last name.",
        "options": [],
        "alias": [],
        "example": "person name",
        "default": null
        }
}
```

Here the keyword is `name`, it should be a `free_form` `string` that describes the name of a person.  The default value is `null`. Any keyword added needs to have this form with the attributes

| Attribute | Description | Options |
|-----------|-------------|---------|
| "type" | Data type this keyword should be, must be a native Python type | float, str, int, bool, list |
| "required" | Is this keyword required by the metadata standard | True or False |
| "style" | If the "type" is a string, what type of string is it, how should it be formatted | "name", "url", "email", "number", "date", "free form", "time", "date time", "name list", "number list", "controlled vocabulary", "alpha numeric" |
| "units" | What units the keyword should be in | SI units |
| "description" | Full description of what this keyword describes | |,
| "options" | If the "style" is controlled provide a list of options the keyword can be | list of options |
| "alias" | Is this keyword known by other names (not currently implemented) | |
| "example" | An example of what the keyword should look like | |
| "default" | Default value | depends on "type |

### Under the hood

To make it easier for the user and to help standardize metadata, the standards for each metadata element and more complex metadata objects are stored as JSON files within the package.  When a class is initialized it opens the appropriate JSON file loads to populate the `_attr_dict` which is used to initilize the metadata object.  For example if you want to have a metadata object for location you would do `from mt_metadata.timeseries import Location`.  `Location` is going to open the `location.json` file stored in `mt_metadata.timeseries.standards` to populate the `_attr_dict`.  This is in turn used to initalize a `Location` object.  In this way the standards can be respected whilst allowing the metadata objects to be user friendly because all metadata attributes can be accessed in a Python way like

```
l = Location()
l.latitude = 50
```

## Methods of Base

Base has the following methods.  It also overloads built-in methods like `__eq__`, `__ne__`, for comparing 2 similar metadata objects, `__len__`, `__str__`, `__repr__` (see below).

| Method | Purpose |
|--------|---------|
| `add_base_attribute` | add a base attribute with a dictionary as above |
| `attribute_information` | print attribute information for a given attribute or all|
| `from_dict` | fill keyword values from a dictionary |
| `from_json` | fill keyword values from a json string or file|
| `from_series` | fill keyword values from a `pandas.Series` |
| `from_xml` | fill keyword values from and XML string or file |
| `get_attr_from_name` | get an attribute from a complex name separated by a `.` like `location.latitude` |
| `get_attribute_list` | get a list of attributes in the object |
| `set_attr_from_name` | set an attribute from a complex name separated by a `.` like `location.longitude` |
| `to_dict` | export the keywords and values as a dictionary |
| `to_json` | export the keywords and values as a JSON string |
| `to_series` | export the ke words and values as a `pandas.Series` object |
| `to_xml` | export the keywords and values as an XML element |
| `update` | update the values from a similar metadata object |

## Exmple

A simple demonstration of `Base` and how to add attributes and figure out what is in the metadata and standards.

In [1]:
from mt_metadata.base import Base

b = Base()

#### Add attributes

You can add attibutes to an existing metadata object.  All you need is to add a standards dictionary that describes the new attribute.

Here we will add an extra attribute for temperature.  We will allow it to only have two options 'ambient' or 'air'.  It will be a `string` but is not required.  

In [2]:
extra = {
    'type': str,
    'style': 'controlled vocabulary',
    'required': False,
    'units': 'celsius',
    'description': 'local temperature',
    'alias': ['temp'],
    'options': [ 'ambient', 'air'],
    'example': 'ambient',
    'default': None
}

In [4]:
b.add_base_attribute("temperature", "ambient", extra)

#### The `__repr__`

The base class `__repr__` is represented by the JSON representation of the object. 

In [5]:
b

{
    "base": {
        "temperature": "ambient"
    }
}

#### The `__str__`

The `__str__` of the class is a printed list

In [6]:
print(b)

base:
	temperature = ambient


#### Attribute Information and List

There is also a convenience method to get attribute information.

In [7]:
b.get_attribute_list()

['temperature']

In [8]:
b.attribute_information()

temperature:
	alias: ['temp']
	default: None
	description: local temperature
	example: ambient
	options: ['ambient', 'air']
	required: False
	style: controlled vocabulary
	type: <class 'str'>
	units: celsius


In [9]:
b.attribute_information("temperature")

temperature:
	alias: ['temp']
	default: None
	description: local temperature
	example: ambient
	options: ['ambient', 'air']
	required: False
	style: controlled vocabulary
	type: <class 'str'>
	units: celsius


## Validation

Validation of the attribute is the most important part of having a separate module for the metadata.  The validation processes

1. First assures the `type` is the correct type prescribed by the metadata.  For example in the above example the prescribed data type for `temperature` is a `string`.  Therefore when the value is set, the validators make sure the value is a string.  If it is not it is converted to a string if possible.  If not a `ValueError` is thrown. 
2. If the `style` is `controlled vocabulary` then the value is checked against `options`.  If `other` is in options that allows other options to be input that are not in the list, kind of a accept anything key.  
3. If a value of None is given the proper None type is set.  If the `style` is a date then the None value for is set to 1980-01-01T00:00:00, or if `list` in `style` the value is set to [].  

When the standards are first read in if `required` is True the value is set to the given default value.  If `required` is False the value is set to the appropriate None value.

In [10]:
extra = {
    'type': float,
    'style': 'number',
    'required': True,
    'units': None,
    'description': 'height',
    'alias': [],
    'options': [],
    'example': 10.0,
    'default': 0.0
}
b.add_base_attribute("height", 0, extra)

In [11]:
b.height = "11.7"
print(b)

base:
	height = 11.7
	temperature = ambient


In [12]:
b.temperature = "fail"

[31m[1m2023-11-04T09:26:52.340453+1100 | ERROR | mt_metadata.base.metadata | __setattr__ | fail not found in options list ['ambient', 'air'][0m


MTSchemaError: fail not found in options list ['ambient', 'air']

## A more complicated example

We will look at a more complicated metadata object `mt_metadata.timeseries.Location`

In [13]:
from mt_metadata.timeseries import Location

In [14]:
here = Location()
here.get_attribute_list()

['datum',
 'declination.comments',
 'declination.epoch',
 'declination.model',
 'declination.value',
 'elevation',
 'elevation_uncertainty',
 'latitude',
 'latitude_uncertainty',
 'longitude',
 'longitude_uncertainty',
 'x',
 'x2',
 'x_uncertainty',
 'y',
 'y2',
 'y_uncertainty',
 'z',
 'z2',
 'z_uncertainty']

In [15]:
here.attribute_information()

latitude:
	alias: ['lat']
	default: 0.0
	description: latitude of location in datum specified at survey level
	example: 23.134
	options: []
	required: True
	style: number
	type: float
	units: degrees
longitude:
	alias: ['lon', 'long']
	default: 0.0
	description: longitude of location in datum specified at survey level
	example: 14.23
	options: []
	required: True
	style: number
	type: float
	units: degrees
elevation:
	alias: ['elev']
	default: 0.0
	description: elevation of location in datum specified at survey level
	example: 123.4
	options: []
	required: True
	style: number
	type: float
	units: meters
latitude_uncertainty:
	alias: []
	default: None
	description: uncertainty in latitude estimation in degrees
	example: 0.01
	options: []
	required: False
	style: number
	type: float
	units: degrees
longitude_uncertainty:
	alias: []
	default: None
	description: uncertainty in longitude estimation in degrees
	example: 0.01
	options: []
	required: False
	style: number
	type: float
	units: de

#### Getting/Setting an attribute

These methods are convenience methods for getting/setting complicated attributes.  For instance getting/setting the declination value from a single call.  This is helpful when filling metadata from a file.  

In [16]:
here.set_attr_from_name("declination.value", 10)
print(here)

location:
	declination.model = WMM
	declination.value = 10.0
	elevation = 0.0
	latitude = 0.0
	longitude = 0.0


In [17]:
here.get_attr_from_name("declination.value")

10.0

In [18]:
# This is the same as
here.declination.value

10.0

## Dictionary

The basic element that the metadata can be in is a Python dictionary with key, value pairs. 

In [19]:
here.to_dict()

{'location': OrderedDict([('declination.model', 'WMM'),
              ('declination.value', 10.0),
              ('elevation', 0.0),
              ('latitude', 0.0),
              ('longitude', 0.0)])}

In [20]:
here.from_dict(
    {
        "location": {
            "declination.value": -11.0,
            "elevation": 759.0,
            "latitude": -34.0,
            "longitude": -104.0
        }
    }
)
print(here)

location:
	declination.model = WMM
	declination.value = -11.0
	elevation = 759.0
	latitude = -34.0
	longitude = -104.0


## JSON

JSON is a standard format human/machine readable and well supported in Python.  There are methods to to read/write JSON files.    

In [21]:
# Compact form
print(here.to_json())

{
    "location": {
        "declination.model": "WMM",
        "declination.value": -11.0,
        "elevation": 759.0,
        "latitude": -34.0,
        "longitude": -104.0
    }
}


In [22]:
here.from_json('{"location": {"declination.model": "WMM", "declination.value": 10.0, "elevation": 99.0, "latitude": 40.0, "longitude": -120.0}}')
print(here)

location:
	declination.model = WMM
	declination.value = 10.0
	elevation = 99.0
	latitude = 40.0
	longitude = -120.0


In [23]:
# Nested form
print(here.to_json(nested=True))

{
    "location": {
        "declination": {
            "model": "WMM",
            "value": 10.0
        },
        "elevation": 99.0,
        "latitude": 40.0,
        "longitude": -120.0
    }
}


In [24]:
here.from_json('{"location": {"declination": {"model": "WMM", "value": -12.0}, "elevation": 199.0, "latitude": 20.0, "longitude": -110.0}}')
print(here)

location:
	declination.model = WMM
	declination.value = -12.0
	elevation = 199.0
	latitude = 20.0
	longitude = -110.0


## XML

XML is also a common format for metadata, though not as human readable.  

In [25]:
print(here.to_xml(string=True))

<?xml version="1.0" encoding="UTF-8"?>
<location>
    <declination>
        <model>WMM</model>
        <value units="degrees">-12.0</value>
    </declination>
    <elevation units="meters">199.0</elevation>
    <latitude units="degrees">20.0</latitude>
    <longitude units="degrees">-110.0</longitude>
</location>



In [26]:
from xml.etree import cElementTree as et
location = et.Element('location')
lat = et.SubElement(location, 'latitude')
lat.text = "-10"
here.from_xml(location)
print(here)

location:
	declination.model = WMM
	declination.value = -12.0
	elevation = 199.0
	latitude = -10.0
	longitude = -110.0


## Pandas Series

Pandas is a common data base object that is commonly used for columnar data.  A series is basically like a single row in a data base. 

In [27]:
pd_series = here.to_series()
print(pd_series)

declination.model      WMM
declination.value    -12.0
elevation            199.0
latitude             -10.0
longitude           -110.0
dtype: object


In [28]:
from pandas import Series

location_series = Series(
    {
        'declination.model': 'WMM',
         'declination.value': -14.0,
         'elevation': 399.0,
         'latitude': -14.0,
         'longitude': -112.0
    }
)

here.from_series(location_series)
print(here)

location:
	declination.model = WMM
	declination.value = -14.0
	elevation = 399.0
	latitude = -14.0
	longitude = -112.0
