# DataModelDict Class Demonstration

The DataModelDict class is used for handling data models that have equivalent representations in XML, JSON, and Python.  Constructing data models in this way is convenient as it supports compatibility across different software tools, such as different types of databases.

The DataModelDict class:

- is a child of OrderedDict,
- has methods for converting to/from XML and JSON, 
- has methods for searching through elements, and
- has methods that help with constructing and interacting with compliant data models.

## Designing Compatible Data Models

Some considerations need to be taken into account for designing data models that allow for exact reversible transformations between the three formats:

- Valid, full XML requires that there is exactly one root element.  In other words, the top-level DataModelDict of a data model   can have only one key.
- Do not use lists of lists for representing data.  The XML conversions are only reversible for lists of values or lists of    
  dictionaries.  Future updates may allow this.
- Avoid using XML attributes if possible.  While the XML conversions do reversibly handle attributes, it complicates the Python
  and JSON representations.
- Embedded XML content, i.e. "text with &lt;embed&gt;embedded&lt;/embed&gt; content", might not be reversible:

    - If this is in a Python/JSON value, converting to XML gives "text with &amp;lt;embed&amp;gt;embedded&amp;lt;/embed&amp;gt; content". This is reversible.
    - If this is an XML text field, parsing to Python pulls the embedded elements out of the text, which is not reversible!

- XML subelements of the same name within an element should be given consecutively.  When parsed, all values of subelements of the same name are collected together in a list.  This will alter the original order of subelements if matching names were not originally consecutive. 

#### Library imports

In [1]:
# Standard Python libraries
from pathlib import Path
import random

# DataModelDict class
# https://github.com/usnistgov/DataModelDict
from DataModelDict import DataModelDict as DM

# Print DataModelDict version
import DataModelDict
print('DataModelDict version =', DataModelDict.__version__)

DataModelDict version = 0.9.9


## 1. Class Basics 

The DataModelDict is a child class of OrderedDict.  As such, is has all the functionality of OrderedDict and more.

Here, we construct a multi-level demonstration data model using lists and DataModelDicts.

In [2]:
# Create an empty DataModel
model = DM()

# Build model element by element
model['my-data-model'] = DM()

model['my-data-model']['name'] = 'Demo'
model['my-data-model']['author'] = 'Me'

model['my-data-model']['process'] = DM()
model['my-data-model']['process']['Instrument'] = DM()
model['my-data-model']['process']['Instrument']['Name'] = 'Shiny Thing'
model['my-data-model']['process']['Instrument']['Model'] = 'Newest Most\nExpensive'
model['my-data-model']['process']['method'] = 'By the book'

# Assign multiple elements at once
model['my-data-model']['measurement'] = []
for temperature in range(0, 2000, 200):
    measurement = DM([('temperature', DM([('value', temperature),                    
                                          ('unit', 'K')])),
                      ('length',      DM([('value', temperature*random.random()/50), 
                                          ('unit', 'm')]))])
    model['my-data-model']['measurement'].append(measurement) 

## 2. Output Conversion

DataModelDict has methods json() and xml() that return the data model as either of these formats. 

### Conversion from Python to JSON

The Python-JSON conversions use the standard Python JSON library.  In converting from Python to JSON, all elements of the DataModelDict must be an instance of a supported data type (with unicode and long being specific to Python 2).


| Python           | JSON      |
| ---------------- | --------- |
| dict             | object    |
| list, tuple      | array     |
| str, unicode     | string    |
| int, long, float | number    |
| True             | true      |
| False            | false     |
| None             | null      |
| np.nan           | NaN       |
| np.inf           | Infinity  |
| -np.inf          | -Infinity |


As DataModelDict is a child of OrderedDict, it registers as being an instance of dict. Any other objects would first need to be converted to one of these types, e.g. a numpy array would need to be converted to a list. 

In [3]:
# Save DataModelDict as json file
jsonfile = Path('model.json')
with open(jsonfile, 'w') as f:
    model.json(fp=f)

# Print the DataModelDict as a json string.
print(model.json(indent=2))

{
  "my-data-model": {
    "name": "Demo",
    "author": "Me",
    "process": {
      "Instrument": {
        "Name": "Shiny Thing",
        "Model": "Newest Most\nExpensive"
      },
      "method": "By the book"
    },
    "measurement": [
      {
        "temperature": {
          "value": 0,
          "unit": "K"
        },
        "length": {
          "value": 0.0,
          "unit": "m"
        }
      },
      {
        "temperature": {
          "value": 200,
          "unit": "K"
        },
        "length": {
          "value": 0.6612713953968172,
          "unit": "m"
        }
      },
      {
        "temperature": {
          "value": 400,
          "unit": "K"
        },
        "length": {
          "value": 7.083537152235489,
          "unit": "m"
        }
      },
      {
        "temperature": {
          "value": 600,
          "unit": "K"
        },
        "length": {
          "value": 11.305520882584501,
          "unit": "m"
        }
      },
      {
        

### Conversion from Python to XML

The Python-XML conversions use the xmltodict Python package. The XML content is constructed based on the Python data types 

| Python           | XML              |
| ---------------- | ---------------- |
| dict             | subelement       |
| list, tuple      | repeated element |
| str, unicode     | text             |
| int, long, float | text (from repr) |
| True             | text = True      |
| False            | text = False     |
| None             | empty text field |
| np.nan           | text = NaN       |
| np.inf           | text = Infinity  |
| -np.inf          | text = -Infinity |

Some characters in the XML text fields will also be converted to avoid conflicts.

- XML limited characters such as &lt;, &gt; and &amp; are converted to their HTML entities.
- \n, \t, \r are converted to \\\n, \\\t, and \\\r 

Any dictionary keys starting with '@' will be converted into XML attributes, and the dictionary key '#text' is interpreted as the text value of the element. 

In [4]:
# Save DataModelDict as xml file
xmlfile = Path('model.xml')
with open(xmlfile, 'w') as f:
    model.xml(fp=f)

# Print the DataModelDict as an xml string. 
print(model.xml(indent=4))

<?xml version="1.0" encoding="utf-8"?>
<my-data-model>
    <name>Demo</name>
    <author>Me</author>
    <process>
        <Instrument>
            <Name>Shiny Thing</Name>
            <Model>Newest Most\nExpensive</Model>
        </Instrument>
        <method>By the book</method>
    </process>
    <measurement>
        <temperature>
            <value>0</value>
            <unit>K</unit>
        </temperature>
        <length>
            <value>0.0</value>
            <unit>m</unit>
        </length>
    </measurement>
    <measurement>
        <temperature>
            <value>200</value>
            <unit>K</unit>
        </temperature>
        <length>
            <value>0.6612713953968172</value>
            <unit>m</unit>
        </length>
    </measurement>
    <measurement>
        <temperature>
            <value>400</value>
            <unit>K</unit>
        </temperature>
        <length>
            <value>7.083537152235489</value>
            <unit>m</unit>
        </leng

## 3. Loading Data Models

DataModelDict has a load() method that reads in xml or json content. The class initilizer also calls load() if the argument is a string or file-like object. Both work if the supplied argument is:

- a string directory path to an xml or json file.

- a string containing xml or json content.

- an open file-like object containing xml or json content.


### Conversion from JSON to Python

The Python-JSON conversions use the standard Python JSON library.  In converting from JSON to Python, the conversions of types is straight-forward.

| JSON          | Python        |
| ------------- | ------------- |
| object        | DataModelDict |
| array         | list          |
| string        | str           |
| number (int)  | int           |
| number (real) | float         |
| true          | True          |
| false         | False         |
| null          | None          |
| NaN           | np.nan        |
| Infinity      | np.inf        |
| -Infinity     | -np.inf       |

### Conversion from XML to Python

The Python-XML conversions use the xmltodict Python package.  The text fields will be interpreted based on the following sequential tests:

| XML text                                 | Python  |
| ---------------------------------------- | ------- |
| text == 'True'                           | True    |
| text == 'False'                          | False   |
| text == ''                               | None    |
| text == 'NaN'                            | np.nan  |
| text == 'Infinity'                       | np.inf  |
| text == '-Infinity'                      | -np.inf |
| try int(text) and text == str(int(text)) | int     |
| try float(text)                          | float   |
| otherwise                                | str     |

The reverse conversions are done for the special characters mentioned in the Conversion from Python to XML section above.

Any 'attr' attribute fields are converted to elements named '@attr' and corresponding '#text' elements are created if needed.

In [5]:
# Load from xml file during initilization
model2 = DM(xmlfile)
    
# Test that models are equivalent# Load from json file using load()
model2 = DM()
model2.load(jsonfile)
    
# Test that models are equivalent
print(model.json() == model2.json() and model.xml()  == model2.xml())

True


In [6]:
# Load from json file using load()
model2 = DM()
model2.load(jsonfile)
    
# Test that models are equivalent
print(model.json() == model2.json() and model.xml()  == model2.xml())

True


In [7]:
# Load from json string during initilization
json_string = model.json()
model2 = DM(json_string)
    
# Test that models are equivalent
print(model.json() == model2.json() and model.xml()  == model2.xml())

True


In [8]:
# Load from xml string using load()
xml_string = model.xml()
model2 = DM()       
model2.load(xml_string)

#test that models are equivalent
print(model.json() == model2.json() and model.xml()  == model2.xml())

True


## 4. Finding and Accessing Elements

A number of methods have been added to DataModelDict to assist in finding, accessing, and modifying the various elements and subelements of a data model.

### 4.1 Index with path lists

Normally, accessing or setting the values contained in a data model consisting of tiered dictionaries and lists requires knowing the full path list beforehand.  This can be tedious and requires that the programmer hard-code the absolute path of any elements of interest.  To improve upon this, values contained in a DataModelDict can be accessed using a _path list_ consisting of a list of indicies.  The terms in the list can be either dictionary keys or list indicies.

In [9]:
# Use indexing to retrieve the instrument name in the standard way
print(model['my-data-model']['process']['Instrument']['Name'])

# Use path list indexing to retrieve the instrument name
path = ['my-data-model', 'process', 'Instrument', 'Name']
print(model[path])

Shiny Thing
Shiny Thing


### 4.2 Find value(s) with key

If you know the key for an element you are interested in but don't know where it is located in the data model, you can access the element's value using the find(), finds() and iterfinds() methods.  
- __find()__ will return a value if the search produces a unique result, and issue an error if no match or multiple matches are found.  
- __finds()__ returns a list of all values obtained by the search conditions.
- __iterfinds()__ returns an iterator of all values obtained by the search conditions. Use this over finds() if you are only iterating and do not need to store the list.

In [10]:
# There is only one Name field in the whole model
print(model.find('Name'))

Shiny Thing


In [11]:
# Iterate over all measurements
for measurement in model.iterfinds('measurement'):
    print(measurement['temperature']['value'], measurement['length']['value'])

0 0.0
200 0.6612713953968172
400 7.083537152235489
600 11.305520882584501
800 12.604727521607035
1000 13.094139413390137
1200 9.881144431029341
1400 12.049026119156492
1600 27.901342348876334
1800 27.455178677784506


In [12]:
# Build a list of measurements
measurements = model.finds('measurement')
print(len(measurements), "measurements found, with first measurement being:")
print(measurements[0].json(indent=2))

10 measurements found, with first measurement being:
{
  "temperature": {
    "value": 0,
    "unit": "K"
  },
  "length": {
    "value": 0.0,
    "unit": "m"
  }
}


All three find methods allow for additional search conditions using dictionary arguments yes and no.  

- __yes__ (*dict*) Key-value terms which the subelement must have to be considered a match.
- __no__ (*dict*) Key-value terms which the subelement must not have to be considered a match.  If any key-value pairs listed in no are found in the element, then it is rejected.  

In [13]:
# Define temperature value to check for
temp = DM([('value',1600), ('unit', 'K')])

# Find the measurement with the temperature and print the associated length value
print(model.find('measurement', yes={'temperature':temp})['length']['value'])

# Define temperature value to check for
temp = DM([('value', 800), ('unit', 'K')])

# Find all measurements that do not have that temperature
measurements = model.finds('measurement', no={'temperature':temp})
print(len(measurements), "measurements found not at the indicated temperature")

27.901342348876334
9 measurements found not at the indicated temperature


### 4.3 Find path(s) with key

Alternatively, if you want to learn the full path to any elements in unknown locations, you can use the path(), paths() and iterpaths() methods.  These behave similarly to find methods but return path lists instead of values.

- __path()__ will return a path if the search produces a unique result, and issue an error if no match or multiple matches are found.  
- __paths()__ returns a list of all paths obtained by the search conditions.
- __iterpaths()__ returns an iterator of all paths obtained by the search conditions. Use this over paths() if you are only iterating and do not need to store the list.

In [14]:
# There is only one Name field in the whole model
path = model.path('Name')
print('path:', path)
print('value:', model[path])

path: ['my-data-model', 'process', 'Instrument', 'Name']
value: Shiny Thing


The path methods also allow for yes and no dictionaries to be used as arguments.

In [15]:
for path in model.iterpaths('measurement', no={'temperature':temp}):
    print(path)    

['my-data-model', 'measurement', 0]
['my-data-model', 'measurement', 1]
['my-data-model', 'measurement', 2]
['my-data-model', 'measurement', 3]
['my-data-model', 'measurement', 5]
['my-data-model', 'measurement', 6]
['my-data-model', 'measurement', 7]
['my-data-model', 'measurement', 8]
['my-data-model', 'measurement', 9]


There are also some basic path manipulation tools in the DataModelDict package that allow for the path to be converted back and forth from lists and delimited strings.  The existing path methods and possible new methods may build on these in the future.

- __joinpath()__ takes a path list and converts it into a string.
- __parsepath()__ takes a path string and parses it into a list.

Both of these take the same optional parameters that specify the delimiting terms used to join/parse the paths
- __delimiter__ (*str*) The delimiter between subsequent element names. Default value is '.'.
- __openbracket__ (*str*) The opening indicator of list indices. Default value is '\['.
- __closebracket__ (*str*) The closing indicator of list indices. Default value is '\]'.


In [16]:
print(path)
print()

# Join path with the default values
pathstr1 = DataModelDict.joinpath(path)
print(pathstr1)

# Join path with slashes and {}
pathstr2 = DataModelDict.joinpath(path, delimiter='/', openbracket='{', closebracket='}')
print(pathstr2)
print()

# Show that parsepath reverts the string paths to lists when given correct parameters
print(DataModelDict.parsepath(pathstr1))
print(DataModelDict.parsepath(pathstr2, delimiter='/', openbracket='{', closebracket='}'))

['my-data-model', 'measurement', 9]

my-data-model.measurement[9]
my-data-model/measurement{9}

['my-data-model', 'measurement', 9]
['my-data-model', 'measurement', 9]


### 4.4 Iterate over all value paths

There is also an itervaluepaths() method that iterates over the full model and returns the paths to all elements that are values, i.e. not lists or dicts.  This provides a convenient tool for mapping the entire structure of a heavily-embedded model.

In [17]:
# Construct a flat dict
flatdict = {}
for path in model.itervaluepaths():
    
    # Convert path to a string key for dict
    pathstr = DataModelDict.joinpath(path)
    flatdict[pathstr] = model[path]

flatdict

{'my-data-model.name': 'Demo',
 'my-data-model.author': 'Me',
 'my-data-model.process.Instrument.Name': 'Shiny Thing',
 'my-data-model.process.Instrument.Model': 'Newest Most\nExpensive',
 'my-data-model.process.method': 'By the book',
 'my-data-model.measurement[0].temperature.value': 0,
 'my-data-model.measurement[0].temperature.unit': 'K',
 'my-data-model.measurement[0].length.value': 0.0,
 'my-data-model.measurement[0].length.unit': 'm',
 'my-data-model.measurement[1].temperature.value': 200,
 'my-data-model.measurement[1].temperature.unit': 'K',
 'my-data-model.measurement[1].length.value': 0.6612713953968172,
 'my-data-model.measurement[1].length.unit': 'm',
 'my-data-model.measurement[2].temperature.value': 400,
 'my-data-model.measurement[2].temperature.unit': 'K',
 'my-data-model.measurement[2].length.value': 7.083537152235489,
 'my-data-model.measurement[2].length.unit': 'm',
 'my-data-model.measurement[3].temperature.value': 600,
 'my-data-model.measurement[3].temperature.un

## 5. Treatment of Unbounded Sequences

When converting from XML there is some ambiguity associated with sequences.  The normal parsing method will convert sequences with one element to single values, and sequences with multiple elements to lists.  To help with this, DataModelDict has a couple methods that allow for the handling of elements that may or may not be lists.

The append() method allows for a key-value pair to be added to the DataModelDict.  If the key doesn't already exist, then it is assigned like a regular dictionary.  If the key does exist, the current value is converted into a list if it isn't one and the new value is appended.

The aslist() method returns the value(s) associated with a dictionary key as a list, even if it isn't one.

In [18]:
# Check element value and aslist before key is assigned
print("model['my-data-model'].get('ordinal', None) ->", end='') 
print(model['my-data-model'].get('ordinal', None)) 
print("model['my-data-model'].aslist('ordinal') ->   ", end='') 
print(model['my-data-model'].aslist('ordinal')) 
print() 

# Append a value and check again
print("model['my-data-model'].append('ordinal', 'first')")
model['my-data-model'].append('ordinal', 'first')

print("model['my-data-model'].get('ordinal', None) ->", end='') 
print(model['my-data-model'].get('ordinal', None))
print("model['my-data-model'].aslist('ordinal') ->   ", end='') 
print(model['my-data-model'].aslist('ordinal'))
print() 

# Append a value and check again
print("model['my-data-model'].append('ordinal', 'second')")
model['my-data-model'].append('ordinal', 'second')

print("model['my-data-model'].get('ordinal', None) ->", end='') 
print(model['my-data-model'].get('ordinal', None)) 
print("model['my-data-model'].aslist('ordinal') ->   ", end='') 
print(model['my-data-model'].aslist('ordinal'))
print() 

# Append a value and check again
print("model['my-data-model'].append('ordinal', 'third')")
model['my-data-model'].append('ordinal', 'third')

print("model['my-data-model'].get('ordinal', None) ->", end='') 
print(model['my-data-model'].get('ordinal', None)) 
print("model['my-data-model'].aslist('ordinal') ->   ", end='') 
print(model['my-data-model'].aslist('ordinal')) 

model['my-data-model'].get('ordinal', None) ->None
model['my-data-model'].aslist('ordinal') ->   []

model['my-data-model'].append('ordinal', 'first')
model['my-data-model'].get('ordinal', None) ->first
model['my-data-model'].aslist('ordinal') ->   ['first']

model['my-data-model'].append('ordinal', 'second')
model['my-data-model'].get('ordinal', None) ->['first', 'second']
model['my-data-model'].aslist('ordinal') ->   ['first', 'second']

model['my-data-model'].append('ordinal', 'third')
model['my-data-model'].get('ordinal', None) ->['first', 'second', 'third']
model['my-data-model'].aslist('ordinal') ->   ['first', 'second', 'third']


#### File removal to keep Notebook directory clean.

In [19]:
jsonfile.unlink()
xmlfile.unlink()