# `emdfile` package walkthrough

Here we demo the user-facing functionality in `emdfile`.


`emdfile` is a Python package providing an interface between Python runtimes and EMD 1.0 (HDF5) files.  It does so by defining
1. write and read functions, and 
2. a set of classes which hold data as well as tree-like relationships between data elements, intended to simplify logical grouping and reading/writing of related sets of data.


The focus here is on `emdfile` and thus on the Python half of the interface being described, however some discussion of the EMD 1.0 file format is also included, beginning with description of it's basic elements in [Section 2](#emd).  The complete EMD 1.0 file specification is [available here](https://emdatasets.com/format/).



## See also


- [emdfile_intro_example.ipynb](./emdfile_intro_example.ipynb) for a motivating example
- [test_custom_class.py](./test_custom_class.py) and [sample_custom_class_module/](./sample_custom_class_module) for an example of a Python module or package built on top of `emdfile`, using it to handle read and write functionality through class inheritance
- [the EMD file specification](https://emdatasets.com/format/)


## Authorship

Benjamin H. Savitzky

April, 2023

<hr style="border:2px solid gray">


## Contents

- 1. [Simple Python objects](#simplePython)
    - 1.1 [numpy arrays](#nparrays)
    - 1.2 [Python dictionaries](#dicts)
    - 1.3 [lists](#lists)

- 2. [EMD Trees](#emdtrees)
    - 2.1 [Trees](#trees)
    - 2.2 [Writing and reading EMD trees](#writingtrees)
        - 2.2.1. [Write/read a tree](#write_one_tree)
        - 2.2.2. [Read a subset of a tree](#read_a_tree_subset)
            - 2.2.2.1 [Read a single node](#read_a_single_node)
            - 2.2.2.2 [Read a branch](#read_a_branch)
            - 2.2.2.3 [Read a branch, excluding its source node](#read_a_branch_exclude_source_node)
        - 2.2.3. [Write a subset of a tree](#write_a_tree_subset)
            - 2.2.3.1 [Write a single node](#write_a_single_node)
            - 2.2.3.2 [Write a branch](#write_a_branch)
            - 2.2.3.3 [Write a branch, excluding its source node](#write_a_branch_exclude_source_node)            
        - 2.2.4. [Multiple trees](#write_and_read_multiple_trees)
        - 2.2.5. [Unrooted nodes](#unrooted_nodes)

- 3. [EMD Nodes](#emdnodes)
    - 3.1 [Node](#node)
    - 3.2 [Array](#array)
        - 3.2.1 [single array](#singlearray)
        - 3.2.2 [array stacks](#arraystacks)
    - 3.3 [PointList](#pointlist)
    - 3.4 [PointListArray](#pointlistarray)
    - 3.5 [Root](#rootnode)
    - 3.6 [Custom](#customnode)

- 4. [EMD Metadata](#emdmetadata)
    - 4.1 [The Metadata class](#metadataclass)
    - 4.2 [The .metadata property](#metadataproperty)
    - 4.3 [Root metadata](#rootmetadata)

- 5. [More trees, branches, cutting, grafting, and appending](#moretrees)
    - 5.1 [An example](#buildingtrees)
    - 5.2 [Cut and graft branches](#cutgraft)
        - 5.2.1 [Cut/graft root metadata options](#cutgraft_root_metadata)
            - 5.2.1.1 [Cutting](#cut_rootmetadata)
            - 5.2.1.2 [Grafting](#graft_rootmetadata)
    - 5.3 [Overwrite mode](#overwrite)
    - 5.4 [Appending to EMD files](#append)
        - 5.4.1 [Append a new tree](#appendtree)
        - 5.4.2 [Append to an existing tree](#appendbranch)
        - 5.4.3 [Merge a runtime tree and an EMD tree](#append_diffmerge)
        - 5.4.4 [Append conflict resolution and 'appendover' mode](#appendover)
        - 5.4.5 [Appending and root metadata](#appendrootmetadata)
    


<hr style="border:2px solid gray">

In [1]:
## Imports

import emdfile as emd
import numpy as np

In [2]:
## Filepath utilities

# update to a valid location on your filesystem!
filepath = "/Users/Ben/Desktop/test.h5"


from os.path import exists
from os import remove
def clean():
    if exists(filepath):
        remove(filepath)
clean()

<a id='simplePython'></a>


# 1. Writing simple Python objects

Write 
- a numpy array,
- a python dictionary, or 
- a list of arrays/dictionaries

into an HDF5 file.  Then read them.  

<a id='nparrays'></a>

## 1.1 numpy arrays


In [3]:
clean()

In [4]:
# Make a numpy array

ar = np.arange(12).reshape((3,4))
ar

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [5]:
# Save it

emd.save(filepath, ar)

In [6]:
# Read it

loaded_data = emd.read(filepath)

In [7]:
# The result is an emdfile Array class instance

loaded_data

Array( A 2-dimensional array of shape (3, 4) called 'np.array',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
)

In [8]:
# The original numpy array is stored in the `.data` attribute

loaded_data.data

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [9]:
# Confirm the save and loaded arrays are identical

assert(np.array_equal( ar , loaded_data.data ))

<a id='dicts'></a>

## 1.2 Python dictionaries

In [10]:
clean()

In [11]:
# Make a dictionary

very_important_knowledge = {
    'hey' : 1,
    'diddlediddle' : 2,
    'thecat' : 3,
    'andthe' : 4,
    'fiddle' : 5
}

very_important_knowledge

{'hey': 1, 'diddlediddle': 2, 'thecat': 3, 'andthe': 4, 'fiddle': 5}

In [12]:
# Save it

emd.save(filepath, very_important_knowledge)

In [13]:
# Load it

loaded_data = emd.read(filepath)

In [14]:
# The loaded data is an emdfile Metadata class instance

loaded_data

Metadata( A Metadata instance called 'dictionary', containing the following fields:

          andthe:         4
          diddlediddle:   2
          fiddle:         5
          hey:            1
          thecat:         3
)

In [15]:
# We can retrieve and assign values from/to it like a normal Python dictionary

loaded_data['andthe']

4

In [16]:
loaded_data['thecow'] = 1e6

loaded_data

Metadata( A Metadata instance called 'dictionary', containing the following fields:

          andthe:         4
          diddlediddle:   2
          fiddle:         5
          hey:            1
          thecat:         3
          thecow:         1000000.0
)

In [17]:
loaded_data.keys

dict_keys(['andthe', 'diddlediddle', 'fiddle', 'hey', 'thecat', 'thecow'])

<a id='lists'></a>

## 1.3 `list`s

In [18]:
clean()

In [19]:
# Make some data: 2 arrays, and one dictionary

ar1 = np.zeros((4,5))
ar2 = np.eye(3)
dic = {'cow':'moo','tuple':(1,2,3),'array':np.arange(9).reshape((3,3))}

In [20]:
# Save it

emd.save(filepath, [ar1,ar2,dic])

In [21]:
# Load it

loaded_data = emd.read(filepath)

In [22]:
# This time, since we saved multiple pieces of data, they were all saved together in a single EMD tree.
# More on EMD trees in the next section - for now, that means that when we load the data we get an
# emdfile Root class instance which contains some data

# The printout indicates that the Root contains two arrays, but doesn't say anything about our dictionary...

loaded_data

Root( A Node called 'root', containing the following top-level objects in its tree:

          np.array_0               	 (Array)
          np.array_1               	 (Array)
)

In [23]:
# The arrays are accessible via  the `.tree` method by passing it a string key:

l_ar1 = loaded_data.tree('np.array_0')
l_ar2 = loaded_data.tree('np.array_1')

print(l_ar1)
print(l_ar2)
print(l_ar1.data)
print(l_ar2.data)

Array( A 2-dimensional array of shape (4, 5) called 'np.array_0',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
)
Array( A 2-dimensional array of shape (3, 3) called 'np.array_1',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
)
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [24]:
assert(np.array_equal( ar1, l_ar1.data ))
assert(np.array_equal( ar2, l_ar2.data ))

In [25]:
# In emdfile, dictionaries correspond to Metadata class objects, and every data-like class instance can contain
# any number of Metadata instances, stored in the `.metadata` property.  For instance, here's what the `.metadata`
# property of the Root object we loaded looks like:

loaded_data.metadata

{'dictionary_0': Metadata( A Metadata instance called 'dictionary_0', containing the following fields:
 
           array:   2D-array
           cow:     moo
           tuple:   (1, 2, 3)
 )}

In [26]:
# We can retrieve our dictionary by passing the appropriate key to the Root's .metadata property, like this:

md = loaded_data.metadata['dictionary_0']

md

Metadata( A Metadata instance called 'dictionary_0', containing the following fields:

          array:   2D-array
          cow:     moo
          tuple:   (1, 2, 3)
)

In [27]:
md['array']

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [28]:
# Confirm that everything worked by comparing our original and loaded dictionaries item-by-item

for k,v in dic.items():
    assert(k in md.keys)
    if isinstance(v,np.ndarray): assert(np.array_equal(v,md[k]))
    else: assert(v == md[k])

<a id='emdtrees'></a>

# 2 EMD Trees

This section defines the basic elements of EMD 1.0 files - trees and nodes - and demonstrates the interface to create, write, and read these elements in Python using `emdfile`.

Note that this section does **not** cover the data itself.  Here the focus is the structures that relate and group different pieces of data.  In [Section 3: EMD Nodes](#classes) we look at the how to put data into these structures, and the kinds of data that can be stored.

<a id='trees'></a>

## 2.1 Trees

The core structure of the EMD 1.0 file is the EMD tree.  An EMD tree is a tree in the usual computer-science sense - think of directories on a filesystem: they are nested containers, with a single highest level container plus any number of additional containers and data nested to any depth inside.  The highest level container is called the "root directory" on a filesystem, and an "EMD root" or (if it's clear from the context) just a "root" in an EMD 1.0 file.  The lower level containers and data are called "files and folders" or "files and directories" on a filesystem, and are generically referred to as "nodes" in an EMD 1.0 file.  In this analogy, EMD nodes do the job of both files and folders: they may both contain data (like a file) and also other nested nodes (like a folder).

The image below shows two EMD trees.  On the left is a simple tree, comprised of a root and a single downstream node, and on the right is a complex tree, comprised of a root and many nested downstream nodes.

<img src="./pngs/tree.png" alt="Trees" width="500"/>

In [29]:
# EMD trees must have roots, so to build one from scratch we start with a Root object

root = emd.Root( name='an_EMD_root' )

In [30]:
root

Root( A Node called 'an_EMD_root', containing the following top-level objects in its tree:

)

In [31]:
# `emdfile` objects have a `.tree` method, which serves as an interface to the EMD tree it is a part of.
# Calling `.tree()` with no arguments prints the tree to the screen.  Right now, there's nothing in the tree of
# our new root:

root.tree()

/


In [32]:
# Generate some nodes, and add them to the tree.
# More on nodes in section 3.

node1 = emd.Node( name='node' )
node2 = emd.Node( name='lode' )
node3 = emd.Node( name='abode' )


# Passing the `.tree` method an EMD object as its argument, like the nodes we just made,
# adds that object to the tree

root.tree(node1)
root.tree(node2)
root.tree(node3)

root

Root( A Node called 'an_EMD_root', containing the following top-level objects in its tree:

          node                     	 (Node)
          lode                     	 (Node)
          abode                    	 (Node)
)

In [33]:
root.tree()

/
|---node
|---lode
|---abode


In [34]:
# Passing a string argument to `.tree` returns the node of that name, if it exists

root.tree('abode')

Node( A Node called 'abode', containing the following top-level objects in its tree:

)

In [35]:
# The location that new nodes are added to the tree depends on which existing node is used
# to call `.tree`.  Here we make some new nodes, and add them to different places in the tree.

node4 = emd.Node( name='snowed' )
node5 = emd.Node( name='flowed' )
node6 = emd.Node( name='crowed' )

node7 = emd.Node( name='sewed' )
node8 = emd.Node( name='glowed' )
node9 = emd.Node( name='reload' )


node1.tree(node4)    # these nodes are all added onto `node1` (which we named "node")
node1.tree(node5)
node1.tree(node6)

node2.tree(node7)    # these nodes are added to different nodes in the tree
node7.tree(node8)
node8.tree(node9)


root.tree()

/
|---node
|   |---snowed
|   |---flowed
|   |---crowed
|---lode
|   |---sewed
|       |---glowed
|           |---reload
|---abode


In [36]:
# from a non-root node, we can display the branch below this node...

node2.tree()

/
|---sewed
    |---glowed
        |---reload


In [37]:
# ...or the whole tree from the root

node2.tree( show=True )

/
|---node
|   |---snowed
|   |---flowed
|   |---crowed
|---lode
|   |---sewed
|       |---glowed
|           |---reload
|---abode


In [38]:
# We can retrieve objects at nested points in the tree using '/' delimiters

root.tree('lode/sewed')

Node( A Node called 'sewed', containing the following top-level objects in its tree:

          glowed                   	 (Node)
)

In [39]:
# This works from any node

node2.tree()

print()

node2.tree('sewed/glowed/reload')

/
|---sewed
    |---glowed
        |---reload



Node( A Node called 'reload', containing the following top-level objects in its tree:

)

In [40]:
# We can also retrieve objects using a root path even if 
# we're calling .tree from a downstream node, by using a leading '/'

node2.tree(show=True)

print()

node2.tree('/node/snowed')

/
|---node
|   |---snowed
|   |---flowed
|   |---crowed
|---lode
|   |---sewed
|       |---glowed
|           |---reload
|---abode



Node( A Node called 'snowed', containing the following top-level objects in its tree:

)

In [41]:
# The root node of a tree is accessible from any node with the .root property

print(root)
print(node1.root)
print(node8.root)

assert(root == root.root == node1.root == node2.root == node3.root == node4.root == node5.root)

Root( A Node called 'an_EMD_root', containing the following top-level objects in its tree:

          node                     	 (Node)
          lode                     	 (Node)
          abode                    	 (Node)
)
Root( A Node called 'an_EMD_root', containing the following top-level objects in its tree:

          node                     	 (Node)
          lode                     	 (Node)
          abode                    	 (Node)
)
Root( A Node called 'an_EMD_root', containing the following top-level objects in its tree:

          node                     	 (Node)
          lode                     	 (Node)
          abode                    	 (Node)
)


<a id='writingtrees'></a>

## 2.2 Writing and reading EMD trees

In this section we
- 2.2.1. [write an EMD tree to a file, and read it](#write_one_tree)
- 2.2.2. [read a subset of an EMD tree](#read_a_branch)
- 2.2.3. [write subsets of an `emdfile` tree to a file](#write_a_node_or_branch)
- 2.2.4. [write several trees to one file, then read from them](#write_and_read_multiple_trees)

Note that these sections use the tree built in section 2.1, so that should be run before this!

<a id='write_one_tree'></a>

### 2.2.1 Write an EMD tree to a file, and read it

In [42]:
clean()

In [43]:
# Inspect the tree we want to write

root.tree()

/
|---node
|   |---snowed
|   |---flowed
|   |---crowed
|---lode
|   |---sewed
|       |---glowed
|           |---reload
|---abode


In [44]:
# Write it to a file

emd.save(filepath, root)

In [45]:
# Inspect the resulting file

emd.print_h5_tree(filepath)

/
|---an_EMD_root
    |---abode
    |---lode
    |   |---sewed
    |       |---glowed
    |           |---reload
    |---node
        |---crowed
        |---flowed
        |---snowed




In [46]:
# Read the tree

loaded_data = emd.read(filepath)

In [47]:
# Inspect the loaded tree

print(loaded_data)

print()

loaded_data.tree()

Root( A Node called 'an_EMD_root', containing the following top-level objects in its tree:

          abode                    	 (Node)
          lode                     	 (Node)
          node                     	 (Node)
)

/
|---abode
|---lode
|   |---sewed
|       |---glowed
|           |---reload
|---node
    |---crowed
    |---flowed
    |---snowed


<a id='read_a_tree_subset'></a>

### 2.2.2 Read a subset of an EMD tree

There are three different subsets of an EMD tree that can be specified:
- a single node
- a branch, including its source node
- a branch, excluding its source node

In [48]:
# Inspect the file we want to read from

emd.print_h5_tree(filepath)

/
|---an_EMD_root
    |---abode
    |---lode
    |   |---sewed
    |       |---glowed
    |           |---reload
    |---node
        |---crowed
        |---flowed
        |---snowed




<a id='read_a_single_node'></a>

#### 2.2.2.1 Read a single node

In [49]:
# To read a subset of an EMD tree, we need to specify where it is in the file.
# This is done with the `emdpath` argument

# The type of subset that will be read is specified with the `tree` argument.
# Here we set `tree=False`, which means: read the selected node only

loaded_data = emd.read(
    filepath,
    emdpath = '/an_EMD_root/lode/sewed',
    tree = False
)

In [50]:
# Check the data we loaded.  As expected, it's the node called "sewed"...

loaded_data

Node( A Node called 'sewed', containing the following top-level objects in its tree:

)

In [51]:
# ...and the tree underneath it is empty, i.e. we've loaded this node only

loaded_data.tree()

/


<a id='read_a_branch'></a>

#### 2.2.2.2 Read a branch of a tree

In [52]:
# To read an entire branch of a tree, starting at the selected node, we set `tree=True`

loaded_data = emd.read(
    filepath,
    emdpath = '/an_EMD_root/lode/sewed',
    tree = True
)

In [53]:
# The loaded data is again the node called "sewed"...

loaded_data

Node( A Node called 'sewed', containing the following top-level objects in its tree:

          glowed                   	 (Node)
)

In [54]:
# ...but this time it contains any nodes that were downstream of itself in its runtime tree

loaded_data.tree()

/
|---glowed
    |---reload


<a id='read_a_branch_exclude_source_node'></a>

#### 2.2.2.3 Read a branch of a tree, excluding the selected starting node

In [55]:
# To read a branch of a tree but exclude the selected starting node, we set `tree='branch'`

loaded_data = emd.read(
    filepath,
    emdpath = '/an_EMD_root/lode/sewed',
    tree = 'branch'
)

In [56]:
# This time the loaded data is not the selected node - we didn't want to read that node! - and
# instead is a Root, with its name coming from the root of the whole tree.

loaded_data

Root( A Node called 'an_EMD_root', containing the following top-level objects in its tree:

          glowed                   	 (Node)
)

In [57]:
# Incidentally, because an EMD tree - whether in an EMD 1.0 HDF5 file, or in an `emdfile` Python
# representation - must always have a root, the root was also present in `loaded_data` in the
# previous examples, it just wasn't what was returned by the read function.  You can check that it
# was there each time by running `loaded_data.root` in the previous examples.

# Roots are discussed further in Section 3.

In [58]:
# The root that's been returned carries the data we asked for - the tree branch downstream of the node "sewed"

loaded_data.tree()

/
|---glowed
    |---reload


<a id='write_a_tree_subset'></a>

### 2.2.3 Write a subset of an `emdfile` tree to a file

Writing a subset of an `emdfile` tree parallels reading a subset of an EMD tree.  Here we'll write:
- a single node
- a branch, including its source node
- a branch, excluding its source node

<a id='write_a_single_node'></a>

#### 2.2.3.1 Write a single node

In [59]:
clean()

In [60]:
# Inspect the tree we're writing from

root.tree()

/
|---node
|   |---snowed
|   |---flowed
|   |---crowed
|---lode
|   |---sewed
|       |---glowed
|           |---reload
|---abode


In [61]:
# Let's say we want to save the node 'sewed'.  First let's retrieve the node:

my_favorite_node = root.tree('lode/sewed')

my_favorite_node

Node( A Node called 'sewed', containing the following top-level objects in its tree:

          glowed                   	 (Node)
)

In [62]:
# To save this node, we pass it to the save function with `tree=False`

emd.save(
    filepath,
    my_favorite_node,
    tree = False
)

In [63]:
# Inspect the resulting file

emd.print_h5_tree(filepath)

/
|---an_EMD_root
    |---sewed




In [64]:
# Because there's only one node under one tree in this file, we can load it with no arguments to read
# other than the filepath

loaded_data = emd.read(filepath)

print(loaded_data)
print()
loaded_data.tree()

Node( A Node called 'sewed', containing the following top-level objects in its tree:

)

/


<a id='write_a_branch'></a>

#### 2.2.3.2 Write a branch of a tree

In [65]:
clean()

In [66]:
# Let's write the branch starting at our favorite node "sewed"

print(my_favorite_node)
print()
my_favorite_node.tree()

Node( A Node called 'sewed', containing the following top-level objects in its tree:

          glowed                   	 (Node)
)

/
|---glowed
    |---reload


In [67]:
# Save the data

emd.save(
    filepath,
    my_favorite_node,
    #tree = True         # this line can be added in or left out - i.e. this is the default setting
)

In [68]:
# Inspect the resulting file

emd.print_h5_tree(filepath)

/
|---an_EMD_root
    |---sewed
        |---glowed
            |---reload




<a id='write_a_branch_exclude_source_node'></a>

#### 2.2.3.3 Write a branch of a tree, excluding its source node

In [69]:
clean()

In [70]:
# We'll use node 'sewed' again.  All we need to do is modify the 'tree' argument

emd.save(
    filepath,
    my_favorite_node,
    tree = None
)

In [71]:
# Inspect the resulting file

emd.print_h5_tree(filepath)

/
|---an_EMD_root
    |---glowed
        |---reload




<a id='write_and_read_multiple_trees'></a>

### 2.2.4 Write/read several EMD trees to a single file

In [72]:
clean()

In [73]:
# Make a second `emdfile` tree

# Note that this syntax is condensed but not very nice...
# for real data, I wouldn't recommend doing it like this!
root2 = emd.Root('another_root_to_boot')
root2.tree( emd.Node('hello') )
root2.tree('hello').tree( emd.Node('hellohello') )
root2.tree( emd.Node('yousay') )
root2.tree('yousay').tree( emd.Node('goodbye') )
root2.tree('yousay/goodbye').tree( emd.Node('eyesay') )
root2.tree('yousay/goodbye/eyesay').tree( emd.Node('HELLOWORLD') )

root2.tree()

/
|---hello
|   |---hellohello
|---yousay
    |---goodbye
        |---eyesay
            |---HELLOWORLD


In [74]:
# Save both trees to a file

emd.save(
    filepath,
    [root,root2]
)

In [75]:
# Show the results

emd.print_h5_tree(filepath)

/
|---an_EMD_root
|   |---abode
|   |---lode
|   |   |---sewed
|   |       |---glowed
|   |           |---reload
|   |---node
|       |---crowed
|       |---flowed
|       |---snowed
|---another_root_to_boot
    |---hello
    |   |---hellohello
    |---yousay
        |---goodbye
            |---eyesay
                |---HELLOWORLD




In [76]:
# Alternatively, we can save one tree first, then
# come back and save the second later by invoking append mode

clean()

# save the first tree
emd.save(
    filepath,
    root
)

# append the second tree
emd.save(
    filepath,
    root2,
    mode = 'a'         # aliases for 'a' include 'append' or '+'
)

In [77]:
# Show the file

emd.print_h5_tree(filepath)

/
|---an_EMD_root
|   |---abode
|   |---lode
|   |   |---sewed
|   |       |---glowed
|   |           |---reload
|   |---node
|       |---crowed
|       |---flowed
|       |---snowed
|---another_root_to_boot
    |---hello
    |   |---hellohello
    |---yousay
        |---goodbye
            |---eyesay
                |---HELLOWORLD




In [78]:
# If a file contains multiple trees, trying to read it without
# specifying the data we want returns a list of the available tree roots

loaded_data = emd.read(filepath)

print()
print(loaded_data)

Multiple root groups detected - returning root names. Please specify the `emdpath` argument. Returning the list of rootgroups.

['an_EMD_root', 'another_root_to_boot']


In [79]:
# We can load a tree by specifying it with the `emdpath` argument

loaded_data = emd.read(
    filepath,
    emdpath = 'an_EMD_root'
)

print(loaded_data)

Root( A Node called 'an_EMD_root', containing the following top-level objects in its tree:

          abode                    	 (Node)
          lode                     	 (Node)
          node                     	 (Node)
)


<a id='unrooted_nodes'></a>

### 2.2.5 Unrooted nodes

EMD trees and `emdfile` trees must have a root node.  It is possible to create and use nodes that don't have roots; it is **not** possible to use their `.tree` methods until they are added to a root or an existing (rooted) tree.

In [80]:
# Make an unrooted node
# Making and calling on an unrooted node is ok

unrooted_node = emd.Node( name='unrooted_node')

unrooted_node

Node( A Node called 'unrooted_node', containing the following top-level objects in its tree:

)

In [81]:
# However, trying to build a tree with an unrooted node will throw an error

another_node = emd.Node( name='another_node' )

try:
    unrooted_node.tree( another_node )   # raises an Error!

except AssertionError:
    print('hit the road, Jack')

hit the road, Jack


<a id='emdnodes'></a>

# 3. EMD Nodes

Each EMD node can store both data and metadata.  The kind of data stored, and how it's stored and retrieved, depends on the node type.  Metadata is stored and retrieved identically in all node types.  This section covers data; [Section 4](#emdmetadata) covers metadata.


The generic node structure as stored in an EMD 1.0 H5 file is shown below.  For the purpose of using `emdfile` to write and read data, the important part here is the yellow block called "Data", and which will vary from one node type to the next.


<img src="./pngs/node.png" alt="Nodes" width="1000"/>


The `emdfile` node types, and the kind of data stored in their "Data" block, are:

- [`Node`](#node): no "Data" block
- [`Array`](#array): array-like data
- [`PointList`](#pointlist): point-like data, representing N points in M string-labelled dimensions
- [`PointListArray`](#pointlistarray): ragged-array-like, representing 2D grids of variable length pointlist-like data
- [`Root`](#root): no "Data" block; a `Root` is like a `Node` plus a few special properties
- [`Custom`](#custom): container for arbitrary flat or tree-like combinations of the prior data types

<a id='node'></a>

## 3.1 `Node`

The `Node` class does not contain a "Data" block.

It does include the structure for
- the `.tree` method, for building and interacting with `emdfile` trees ([Section 2](#emdtrees)), and
- the `.metadata` property, for storing and retrieving arbitrarily many dictionary like `Metadata` objects ([Section 4](#emdmetadata))

All the other public `emdfile` classes aside from `Metadata` inherit from `Node`, and therefore also have these routines.

In [82]:
# Nodes need names, so they can be referenced and retrieved later!

node = emd.Node( name='some_node' )

node

Node( A Node called 'some_node', containing the following top-level objects in its tree:

)

In [83]:
# Nodes possess the `.tree` method.  See Section 2.

node.tree()

/


In [84]:
# Nodes possess the `.metadata` property. See Section 4.

node.metadata

{}

<a id='array'></a>

## 3.2 `Array`

In an `Array` the data block is array-like.

In addition to the `.metadata` properties which stores general/arbitrary metadata, each of the data-containing node types contain some self-descriptive metadata representing the space/dimensions its data quantifies. In an `Array` instance these are the units of the pixel values and, for an N-dimensional data array, N vectors specifying the names, units, and interpolation points of the coordinate axes sampled by the array.

In addition to holding a single array, `Array` supports stacks of arrays, in which the data is (N+1)-dimensional and represents M distinct N dimensional arrays, each with a string name and associated with the N-dimensional space defined by the N dim vectors.

<a id='singlearray'></a>

### 3.2.1 Single arrays

In [85]:
# generate some data

data = np.arange(60).reshape((3,4,5))

data

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

In [86]:
# make the Array

ar = emd.Array(
    name = 'arrrrrr',
    data = data
)

ar

Array( A 3-dimensional array of shape (3, 4, 5) called 'arrrrrr',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
       dim2 = [0,1,...] pixels
)

In [87]:
ar.data

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

In [88]:
# We can slice into the array using numpy-like syntax on Array directly

ar[0,:,1:5]

array([[ 1,  2,  3,  4],
       [ 6,  7,  8,  9],
       [11, 12, 13, 14],
       [16, 17, 18, 19]])

In [89]:
# Self-descriptive metadata

print(ar.name)
print(ar.units)
print()
print(ar.dims[0])
print(ar.dims[1])
print(ar.dims[2])
print()
print(ar.dim_names[0])
print(ar.dim_names[1])
print(ar.dim_names[2])
print()
print(ar.dim_units[0])
print(ar.dim_units[1])
print(ar.dim_units[2])

arrrrrr


[0 1 2]
[0 1 2 3]
[0 1 2 3 4]

dim0
dim1
dim2

pixels
pixels
pixels


In [90]:
# The units attribute can be modified directly

ar.units = 'cows'

In [91]:
# The dims should be set using the .set_dim method

ar.set_dim?

In [92]:
ar.set_dim(
    0,                   # which dimension
    dim = [0,5],         # when two numbers are passed the vector is extrapolated linearly
)

ar.dims[0]

array([ 0,  5, 10])

In [93]:
# the name and units can be set with 
# their own method calls...

ar.set_dim_name(
    0,
    'x-axis'
)
ar.set_dim_units(
    0,
    'pastures'
)

In [94]:
# ...or inside a call to `.set_dim`

ar.set_dim(
    1,
    dim = 2,             # when one number x is passed, the vect is extraplated linearly from [0,x] 
    name = 'y-axis',
    units = 'fields'
)

ar.dims[1]

array([0, 2, 4, 6])

In [95]:
ar.set_dim(
    2,
    dim = np.logspace(-2,2,5),   # when a 1D array is passed, its length must match the array dim length 
    name = 'z-axis',
    units = 'tracts'
)

ar.dims[2]

array([1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02])

In [96]:
print(ar.name)
print(ar.units)
print()
print(ar.dims[0])
print(ar.dims[1])
print(ar.dims[2])
print()
print(ar.dim_names[0])
print(ar.dim_names[1])
print(ar.dim_names[2])
print()
print(ar.dim_units[0])
print(ar.dim_units[1])
print(ar.dim_units[2])

arrrrrr
cows

[ 0  5 10]
[0 2 4 6]
[1.e-02 1.e-01 1.e+00 1.e+01 1.e+02]

x-axis
y-axis
z-axis

pastures
fields
tracts


In [97]:
# Alternatively, the Array can be initialized with the name and dims specified

ar = emd.Array(
    data = np.array(
        [[1,2,3],
         [4,5,6]]
    ),
    name = 'my_array',
    units = 'intensity',
    dims = [[0,3],
            [0,0.6]],
    dim_names = ['x','y'],
    dim_units = ['nm','km']
)

In [98]:
ar

Array( A 2-dimensional array of shape (2, 3) called 'my_array',
       with dimensions:

       x = [0,3,...] nm
       y = [0.0,0.6,...] km
)

In [99]:
ar.dims

([0, 3], array([0. , 0.6, 1.2]))

In [100]:
ar.data

array([[1, 2, 3],
       [4, 5, 6]])

<a id='arraystacks'></a>

### 3.2.2 Stack arrays

In [101]:
# make some data
# The code below makes 5 3x4 arrays, then combines them into a single array

data = np.stack([ np.arange(12).reshape(3,4) + 10**x for x in range(5)])

data

array([[[    1,     2,     3,     4],
        [    5,     6,     7,     8],
        [    9,    10,    11,    12]],

       [[   10,    11,    12,    13],
        [   14,    15,    16,    17],
        [   18,    19,    20,    21]],

       [[  100,   101,   102,   103],
        [  104,   105,   106,   107],
        [  108,   109,   110,   111]],

       [[ 1000,  1001,  1002,  1003],
        [ 1004,  1005,  1006,  1007],
        [ 1008,  1009,  1010,  1011]],

       [[10000, 10001, 10002, 10003],
        [10004, 10005, 10006, 10007],
        [10008, 10009, 10010, 10011]]])

In [102]:
# make the Array

ar = emd.Array(
    data = data,
    name = 'my_stack_array',
    units = 'intensity',
    dims = [[0,3],                  # we want only two dim vectors!
            [0,0.6]],
    dim_names = ['x','y'],
    dim_units = ['nm','km'],
    slicelabels = ['a','b','c','d','e']
)

In [103]:
ar

Array( A stack of 5 Arrays with 2-dimensions and shape (3, 4), called 'my_stack_array'

       The labels are:
           a
           b
           c
           d
           e


       The Array dimensions are:
           x = [0,3,...] nm
           y = [0.0,0.6,...] km
)

In [104]:
# `.data` still points to the full 3D data stack

ar.data

array([[[    1,     2,     3,     4],
        [    5,     6,     7,     8],
        [    9,    10,    11,    12]],

       [[   10,    11,    12,    13],
        [   14,    15,    16,    17],
        [   18,    19,    20,    21]],

       [[  100,   101,   102,   103],
        [  104,   105,   106,   107],
        [  108,   109,   110,   111]],

       [[ 1000,  1001,  1002,  1003],
        [ 1004,  1005,  1006,  1007],
        [ 1008,  1009,  1010,  1011]],

       [[10000, 10001, 10002, 10003],
        [10004, 10005, 10006, 10007],
        [10008, 10009, 10010, 10011]]])

In [105]:
# note the difference between `.shape` and `.data.shape` - in a non-stacked array,
# these would be the same, while here, `.shape` refers to the shared N-D space
# spanned by the dim vectors

print(ar.data.shape)
print(ar.shape)

(5, 3, 4)
(3, 4)


In [106]:
ar.dims

(array([0, 3, 6]), array([0. , 0.6, 1.2, 1.8]))

In [107]:
# We can slice into the array with strings to get the labelled subarrays back
# Note that when accessed in this way, the return value is a new Array instances

ar['a']

Array( A 2-dimensional array of shape (3, 4) called 'my_stack_array_a',
       with dimensions:

       x = [0,3,...] nm
       y = [0.0,0.6,...] km
)

In [108]:
ar['a'].data

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [109]:
# labelled subarrays can be sliced in the same [...] call

ar['a',:2,2]

array([3, 7])

In [110]:
ar['a'].data[:2,2]

array([3, 7])

In [111]:
# The order of the labels is fixed, and when slicing we can always replace the labels
# with their corresponding index.  Note that in this case, a regular numpy array is
# returned and no new Array instances are created

ar[0]

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [112]:
ar[0,:2,2]

array([3, 7])

In [113]:
# the `.rank` attribute returns N and the `.data.ndim` attribute gives N+1 for
# an N+1 dimensional stack array. For non-stack arrays, `.rank` and `data.ndim` are identical

print(ar.rank)
print(ar.data.ndim)

2
3


In [114]:
# the `.depth` property gives the number of slices. For non-stack arrays, it's 0

ar.depth

5

In [115]:
# When accessing subarrays with ar['label'] syntax, we got an Array back named "name_label".
# We can grab a subarray and give the new Array some other name

ar.get_slice('a','dingo')

Array( A 2-dimensional array of shape (3, 4) called 'dingo',
       with dimensions:

       x = [0,3,...] nm
       y = [0.0,0.6,...] km
)

<a id='pointlist'></a>


## 3.3 `PointList`

A `PointList`s data attribute can have any length, with any number of string-named fields, and each field may have its own data type.  PointList wraps numpy structured arrays.

PointLists have variable length that can change at runtime.  Routines are provided to add points, remove points, sort by field, copy a pointlist, or copy a pointlist with additional dimensions (fields) added.

In [116]:
# make some data
# we define the fields by specifying a custom `dtype` for numpy

data = np.zeros(
    3,
    dtype = [
        ('x',int),
        ('y',float)
    ]
)

data

array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('x', '<i8'), ('y', '<f8')])

In [117]:
# make a PointList

pointlist = emd.PointList(
    name = 'my_pointlist',
    data = data
)

pointlist

PointList( A length 3 PointList called 'my_pointlist',
           with 2 fields:

           x   (int64)
           y   (float64)
)

In [118]:
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('x', '<i8'), ('y', '<f8')])

In [119]:
# the fields can be accessed by slicing directly into the PointList

print(pointlist['x'])
print(pointlist['y'])

[0 0 0]
[0. 0. 0.]


In [120]:
# add points

# the new data can be either a numpy structured array, or another PointList
# either way, it must have the same dtype as the existing data

# make the new data
new_data = np.ones(
    3,
    dtype = [
        ('x',int),
        ('y',float)
    ]
)

# add it to the array
pointlist.add(new_data)

# show
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.), (1, 1.), (1, 1.), (1, 1.)],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [121]:
# add points using another PointList

# make the new data
more_new_data = emd.PointList(
    name = 'a_pointlist',
    data = np.full(3,2,
        dtype = [
            ('x',int),
            ('y',float)
        ])
)

# add it to the array
pointlist.add(more_new_data)

# show
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.), (1, 1.), (1, 1.), (1, 1.), (2, 2.),
       (2, 2.), (2, 2.)], dtype=[('x', '<i8'), ('y', '<f8')])

In [122]:
# add data as 1D vectors corresponding to the fields

pointlist.add_data_by_field(
    data = (np.linspace(5,6,num=5),np.arange(5,10)),
    fields = ('y','x')
)

pointlist.data

array([(0, 0.  ), (0, 0.  ), (0, 0.  ), (1, 1.  ), (1, 1.  ), (1, 1.  ),
       (2, 2.  ), (2, 2.  ), (2, 2.  ), (5, 5.  ), (6, 5.25), (7, 5.5 ),
       (8, 5.75), (9, 6.  )], dtype=[('x', '<i8'), ('y', '<f8')])

In [123]:
# remove points

# this is a little clunky! uses a boolean mask...

# remove the last two points
rm = np.zeros(pointlist.length,dtype = bool)
rm[:6] = True                   # flag points for removal
pointlist.remove(rm)

# show
pointlist.data

array([(2, 2.  ), (2, 2.  ), (2, 2.  ), (5, 5.  ), (6, 5.25), (7, 5.5 ),
       (8, 5.75), (9, 6.  )], dtype=[('x', '<i8'), ('y', '<f8')])

In [124]:
# make a new pointlist like this one, with some additional fields added

pointlist_copy = pointlist.add_fields(
    [('z',bool)],
    name = 'another_pointlist',
)

pointlist_copy.data

array([(2, 2.  , False), (2, 2.  , False), (2, 2.  , False),
       (5, 5.  , False), (6, 5.25, False), (7, 5.5 , False),
       (8, 5.75, False), (9, 6.  , False)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

In [125]:
# modify values in an existing field

pointlist_copy['z'][4:] = True

pointlist_copy.data

array([(2, 2.  , False), (2, 2.  , False), (2, 2.  , False),
       (5, 5.  , False), (6, 5.25,  True), (7, 5.5 ,  True),
       (8, 5.75,  True), (9, 6.  ,  True)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

In [126]:
# Sort the pointlist by one of its fields

pointlist_copy.sort('y', order='descending')

pointlist_copy.data

array([(9, 6.  ,  True), (8, 5.75,  True), (7, 5.5 ,  True),
       (6, 5.25,  True), (5, 5.  , False), (2, 2.  , False),
       (2, 2.  , False), (2, 2.  , False)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

<a id='pointlistarray'></a>


## 3.4 `PointListArray`

`emd.PointListArray` represents 2D grids of PointList instances with the same data fields.  It stores 2D ragged arrays of vectors of any length with string-accessible fields.

In [127]:
# make a PointListArray

shape = (5,6)
dtype = [('x',int),('y',int)]

pointlistarray = emd.PointListArray(
    name = 'my_pointlistarray',
    shape = shape,
    dtype = dtype
)

pointlistarray

PointListArray( A shape (5, 6) PointListArray called 'my_pointlistarray',
                with 2 fields:

                x   (int64)
                y   (int64)
)

In [128]:
# the pointlists can be accessed by slicing into the pointlistarray

pointlistarray[0,0]

PointList( A length 0 PointList called '0,0',
           with 2 fields:

           x   (int64)
           y   (int64)
)

In [129]:
# and are instantiated empty

pointlistarray[3,4].data

array([], dtype=[('x', '<i8'), ('y', '<i8')])

In [130]:
# we can populate the pointlists with the `add` method

for ii in range(pointlistarray.shape[0]):
    for jj in range(pointlistarray.shape[1]):
        
        # set an integer value that varies sinusoidally from 0 to 8
        val = int(np.round((np.sin((ii*shape[1]+jj) * 2*np.pi / np.prod(shape)) + 1) * 4))
        
        # add to the pointlist
        pointlistarray[ii,jj].add(
            np.full(
                shape = val,
                fill_value= val,
                dtype = dtype
            )
        )

In [131]:
pointlistarray[0,0].data

array([(4, 4), (4, 4), (4, 4), (4, 4)], dtype=[('x', '<i8'), ('y', '<i8')])

In [132]:
for x in range(pointlistarray.shape[0]):
    for y in range(pointlistarray.shape[1]):
        print(pointlistarray[x,y].data)

[(4, 4) (4, 4) (4, 4) (4, 4)]
[(5, 5) (5, 5) (5, 5) (5, 5) (5, 5)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(5, 5) (5, 5) (5, 5) (5, 5) (5, 5)]
[(4, 4) (4, 4) (4, 4) (4, 4)]
[(3, 3) (3, 3) (3, 3)]
[(2, 2) (2, 2)]
[(2, 2) (2, 2)]
[(1, 1)]
[(1, 1)]
[]
[]
[]
[]
[(1, 1)]
[(1, 1)]
[(2, 2) (2, 2)]
[(2, 2) (2, 2)]
[(3, 3) (3, 3) (3, 3)]


<a id='rootnode'></a>


## 3.5 `Root`

`emd.Root` represents the root node of an `emdfile` tree, and contains no "Data" block.  A root node is required to build a tree structure (by calling `.tree(node)`) with `emdfile` objects.  The root node of a tree is accessible from any node in the tree with `.root`.

Loaded data will always come with the root, and its metadata, of the EMD tree the loaded data came from.  If a single node is loaded, the node itself is returned and the root can be accessed by calling `.root`.  If a larger tree is loaded, the root node is returned.

In [133]:
root = emd.Root( name='groot' )

root

Root( A Node called 'groot', containing the following top-level objects in its tree:

)

In [134]:
# Without a root, we can't build a tree

node1 = emd.Node( name='nodoubt' )
node2 = emd.Node( name='depechenode' )

try:
    node1.tree(node2)
    
except AssertionError:
    print('I need a root!')
    
print()
node1.tree()

I need a root!

/


In [135]:
# With a root, we can

root.tree(node1)

try:
    node1.tree(node2)
    
except AssertionError:
    print('I need a root!')
    
print()
node1.tree()


/
|---depechenode


In [136]:
print(node1.root)
print(node2.root)
print(root.root)

Root( A Node called 'groot', containing the following top-level objects in its tree:

          nodoubt                  	 (Node)
)
Root( A Node called 'groot', containing the following top-level objects in its tree:

          nodoubt                  	 (Node)
)
Root( A Node called 'groot', containing the following top-level objects in its tree:

          nodoubt                  	 (Node)
)


<a id='customnode'></a>


## 3.6 `Custom`

`emd.Custom` represents data corresponding to an arbitrary group or tree hierarchies of the data blocks of the other data carrying node types (Array, PointList, and PointListArray).

An example implementation can be found in [test_custom_class.py](./test_custom_class.py) and [sample_custom_class_module/](./sample_custom_class_module).

<a id='emdmetadata'></a>


## 4 Metadata

Each `emdfile` node can store any number of Metadata objects, each of which wraps a Python dictionary of various kinds of data including strings, numbers, booleans, None, arrays, and lists or tuples of the aforementioned types.  A full list of the supported value types are described in the EMD 1.0 file specification.

Note that at present, when populating a Metadata instance the class does not check that the value types are read/write supported.  If an unsupported value type is placed in Metadata instance and a save operation is subsequently attempted on it, the save call will fail by throwing a TypeError.

<a id='metadataclass'></a>


## 4.1 The `Metadata` class

`emd.Metadata` represents many (typically small) pieces of data, each of which is accessibly using a string key.

In [137]:
# Make a new Metadata instance

metadata = emd.Metadata( name='i_never_metadata_i_didnt_like' )

metadata

Metadata( A Metadata instance called 'i_never_metadata_i_didnt_like', containing the following fields:

)

In [138]:
# These work like Python dictionaries -
# we can slice into them to either get or set items with square brackets and string keys

metadata['key'] = 'value'
metadata['answer'] = 42
metadata['bool'] = True

print(metadata['key'])
print(metadata['answer'])
print(metadata['bool'])

value
42
True


In [139]:
metadata.keys

dict_keys(['key', 'answer', 'bool'])

<a id='metadataproperty'></a>


## 4.2 The `.metadata` property

Each `emdfile` node type has the `.metadata` property, which provides an interface to store any number of Metadata instances with the node, and retrieve them again.

In [140]:
# To add Metadata to a node, assign it to the node.metadata property

# Note that this does *not* overwrite the value of `.metadata` - instead, it checks if the value
# it's been passed is a Metadata instance, and as long as it is, it adds it to a dictionary where
# it can be accessed again using the Metadata instances name as a key

node2.metadata = metadata

node2.metadata

{'i_never_metadata_i_didnt_like': Metadata( A Metadata instance called 'i_never_metadata_i_didnt_like', containing the following fields:
 
           key:      value
           answer:   42
           bool:     True
 )}

In [141]:
node2.metadata['i_never_metadata_i_didnt_like']

Metadata( A Metadata instance called 'i_never_metadata_i_didnt_like', containing the following fields:

          key:      value
          answer:   42
          bool:     True
)

In [142]:
# Trying to assign a non-Metadata value to `.metadata` fails

try:
    node2.metadata = 5
except AssertionError:
    print('no dice!')

no dice!


In [143]:
# Assigning a second Metadata instance to the same node's `.metadata` property will
# simply add the new one to a running dictionary, such that both are now stored and accessible


# make a new Metadata instance
more_metadata = emd.Metadata( name='more_metadata' )

# add info to it
more_metadata['an_array'] = np.arange(12).reshape((3,4))
more_metadata['none'] = None
more_metadata['tup'] = (1,2,3)
more_metadata['list'] = ['a','b','c']

# add it to node2
node2.metadata = more_metadata

# show the metadata in node2
node2.metadata

{'i_never_metadata_i_didnt_like': Metadata( A Metadata instance called 'i_never_metadata_i_didnt_like', containing the following fields:
 
           key:      value
           answer:   42
           bool:     True
 ),
 'more_metadata': Metadata( A Metadata instance called 'more_metadata', containing the following fields:
 
           an_array:   2D-array
           none:       None
           tup:        (1, 2, 3)
           list:       ['a', 'b', 'c']
 )}

In [144]:
node2.metadata['more_metadata']

Metadata( A Metadata instance called 'more_metadata', containing the following fields:

          an_array:   2D-array
          none:       None
          tup:        (1, 2, 3)
          list:       ['a', 'b', 'c']
)

In [145]:
node2.metadata['more_metadata']['an_array']

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [146]:
# You can build and populate a new Metadata instance and add it to a node all in one command
# by passing a dictionary to the `data` argument in the `Metadata` constructor


node2.metadata = emd.Metadata(
    name = 'even_more_metadata',
    data = {
        'x' : True,
        'y' : False
    }
)

node2.metadata

{'i_never_metadata_i_didnt_like': Metadata( A Metadata instance called 'i_never_metadata_i_didnt_like', containing the following fields:
 
           key:      value
           answer:   42
           bool:     True
 ),
 'more_metadata': Metadata( A Metadata instance called 'more_metadata', containing the following fields:
 
           an_array:   2D-array
           none:       None
           tup:        (1, 2, 3)
           list:       ['a', 'b', 'c']
 ),
 'even_more_metadata': Metadata( A Metadata instance called 'even_more_metadata', containing the following fields:
 
           x:   True
           y:   False
 )}

In [147]:
# metadata does not show up in the emd tree - you have to get it from the nodes

root.tree()

/
|---nodoubt
    |---depechenode


In [148]:
root.tree('nodoubt').metadata

{}

In [149]:
root.tree('nodoubt/depechenode').metadata

{'i_never_metadata_i_didnt_like': Metadata( A Metadata instance called 'i_never_metadata_i_didnt_like', containing the following fields:
 
           key:      value
           answer:   42
           bool:     True
 ),
 'more_metadata': Metadata( A Metadata instance called 'more_metadata', containing the following fields:
 
           an_array:   2D-array
           none:       None
           tup:        (1, 2, 3)
           list:       ['a', 'b', 'c']
 ),
 'even_more_metadata': Metadata( A Metadata instance called 'even_more_metadata', containing the following fields:
 
           x:   True
           y:   False
 )}

In [150]:
# Metadata is taken along for the ride during write/read

clean()
emd.save(filepath, root)
loaded_data = emd.read(filepath)
loaded_data.tree('depechenode').metadata

{'even_more_metadata': Metadata( A Metadata instance called 'even_more_metadata', containing the following fields:
 
           x:   True
           y:   False
 ),
 'i_never_metadata_i_didnt_like': Metadata( A Metadata instance called 'i_never_metadata_i_didnt_like', containing the following fields:
 
           answer:   42
           bool:     True
           key:      value
 ),
 'more_metadata': Metadata( A Metadata instance called 'more_metadata', containing the following fields:
 
           an_array:   2D-array
           list:       ['a', 'b', 'c']
           none:       None
           tup:        (1, 2, 3)
 )}

<a id='rootmetadata'></a>


## 4.3 Root metadata

The root of an EMD tree is always loaded along with nodes being read from that tree, regardless of where they are in the tree.  This means that by default root metadata always stays attached to each piece of data inside that tree as well.  This can be useful if, for instance, some calibrations apply to many related pieces of data all living in one tree.

In [151]:
# Add some root metadata

root.metadata = emd.Metadata(
    name = 'i_am_root_metadata',
    data = {
        'thelittledog' : 'laughed',
        'thedish' : 'thespoon'
    }
)

root.metadata['i_am_root_metadata']

Metadata( A Metadata instance called 'i_am_root_metadata', containing the following fields:

          thelittledog:   laughed
          thedish:        thespoon
)

In [152]:
root.tree()

/
|---nodoubt
    |---depechenode


In [153]:
# Let's save the whole tree then load only the last node, 'depechenode'

clean()
emd.save(filepath, root)
loaded_data = emd.read(
    filepath,
    emdpath = 'groot/nodoubt/depechenode',
    tree = False
)

loaded_data

Node( A Node called 'depechenode', containing the following top-level objects in its tree:

)

In [154]:
# The root comes attached to the node...

loaded_data.root

Root( A Node called 'groot', containing the following top-level objects in its tree:

          depechenode              	 (Node)
)

In [155]:
# ...and its metadata comes with it

loaded_data.root.metadata

{'i_am_root_metadata': Metadata( A Metadata instance called 'i_am_root_metadata', containing the following fields:
 
           thedish:        thespoon
           thelittledog:   laughed
 )}

In [156]:
loaded_data.root.metadata['i_am_root_metadata']

Metadata( A Metadata instance called 'i_am_root_metadata', containing the following fields:

          thedish:        thespoon
          thelittledog:   laughed
)

<a id='moretrees'></a>


# 5. More trees, branches, cutting, grafting, and appending

This section covers the remaining, more advanced functionality not described in the prior sections, including
- building, saving, and reading a tree comprised of various data-containing node types ([section 5.1](#buildingtrees))
- removing a branch from a tree to create a new tree, or removing a branch from a tree and attaching it somewhere on another tree ([section 5.2](#cutgraft))
- appending new data to an EMD tree in an existing EMD 1.0 file ([section 5.3](#append))

<a id='buildingtrees'></a>


## 5.1. An example

Here we building a tree comprised of various data-containing node types, save it, and load it again.

In [157]:
clean()

In [158]:
# make some data

ar1 = emd.Array(
    name = 'ar1',
    data = np.arange(12).reshape((3,4))
)
ar2 = emd.Array(
    name = 'ar2',
    data = np.arange(24).reshape((3,4,2)),
    slicelabels = ('a','b')
)
node = emd.Node(
    name = 'immanode'
)
pointlist1 = emd.PointList(
    name = 'pointlist1',
    data = np.ones(
        5,
        dtype = [('rx',int),('ry',int)]
    )
)
pointlist2 = emd.PointList(
    name = 'pointlist2',
    data = np.zeros(
        6,
        dtype = [('qx',float),('qy',float)]
    )
)
pointlistarray = emd.PointListArray(
    name = 'pointlistarray',
    shape = (3,4),
    dtype = [('yes',bool),('no',bool)]
)
for rx in range(pointlistarray.shape[0]):
    for ry in range(pointlistarray.shape[1]):
        pointlistarray[rx,ry].add(
            np.ones(
                int(ry + rx*pointlistarray.shape[1]),
                dtype = [('yes',bool),('no',bool)]
            )
        )
        
# add some metadata
pointlist1.metadata = emd.Metadata(
    name = 'evolution',
    data = {
        'pikachu' : 'raichi',
        'thunderstone' : True
    }
)
pointlistarray.metadata = emd.Metadata(
    name = 'is_rodent',
    data = {
        'gerbil' : True,
        'mouse' : True,
        'pikachu' : True,
        'bulbasaur' : 'False'
    }
)

In [159]:
# Make a tree

# start with a Root
root = emd.Root( name='worldtree' )

# and add data
root.tree(node)
node.tree(pointlistarray)
pointlistarray.tree(pointlist1)
root.tree(ar1)
root.tree(pointlist2)
node.tree(ar2)

# show the tree
root.tree()

/
|---immanode
|   |---pointlistarray
|   |   |---pointlist1
|   |---ar2
|---ar1
|---pointlist2


In [160]:
# save

emd.save(filepath,root)

100%|████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 1988.76it/s]


In [161]:
# load

loaded_data = emd.read(filepath)

loaded_data

Reading PointListArray: 100%|█████████████████████████████████████| 12/12 [00:00<00:00, 2770.80PointList/s]


Root( A Node called 'worldtree', containing the following top-level objects in its tree:

          ar1                      	 (Array)
          immanode                 	 (Node)
          pointlist2               	 (PointList)
)

In [162]:
loaded_data.tree()

/
|---ar1
|---immanode
|   |---ar2
|   |---pointlistarray
|       |---pointlist1
|---pointlist2


In [163]:
# check that the data is the same

assert(np.array_equal( loaded_data.tree('ar1').data, ar1.data ))
assert(np.array_equal( loaded_data.tree('immanode/ar2').data, ar2.data ))
assert(np.array_equal( loaded_data.tree('immanode/pointlistarray/pointlist1').data, pointlist1.data ))
assert(np.array_equal( loaded_data.tree('pointlist2').data, pointlist2.data ))

In [164]:
# check that the metadata is the same

def check_metadata(obj1, obj2):
    """ asserts equivalence of the metadata in obj1 to obj2. Fails for array-like metadata
    """
    for k in obj1.metadata.keys():
        md_i,md_f = obj1.metadata[k],obj2.metadata[k]
        for k in md_i.keys:
            assert( md_i[k] == md_f[k] )
            
check_metadata(
    loaded_data.tree('immanode/pointlistarray/pointlist1'),
    pointlist1
)
check_metadata(
    loaded_data.tree('immanode/pointlistarray'),
    pointlistarray
)

In [165]:
# get some node from root

data = root.tree('immanode/pointlistarray/pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

In [166]:
# get some node from another, upstream node

data = pointlistarray.tree('pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

In [167]:
# get some node from another node, using a path referenced to the root

data = pointlistarray.tree('/immanode/pointlistarray/pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

<a id='cutgraft'></a>


## 5.2 Cutting and grafting branches


In this section we'll make a new tree and use it to demonstrate
- cutting branches off a parent tree to yield some new, smaller tree, and
- grafting a branch from one tree to another

In [168]:
# make some data

ar3 = emd.Array(
    name = 'ar3',
    data = np.arange(12,22).reshape((5,2))
)
node2 = emd.Node(
    name = 'node2'
)
node3 = emd.Node(
    name = 'node3'
)
pointlist3 = emd.PointList(
    name = 'pointlist3',
    data = np.ones(
        3,
        dtype = [('rx',int),('ry',int)]
    )
)
pointlist4 = emd.PointList(
    name = 'pointlist4',
    data = np.zeros(
        7,
        dtype = [('qx',float),('qy',float)]
    )
)

In [169]:
# make a tree

root2 = emd.Root( name='treeofknowledge')
root2.tree(node2)
node2.tree(ar3)
node2.tree(pointlist3)
pointlist3.tree(node3)
root2.tree(pointlist4)

root2.tree()

/
|---node2
|   |---ar3
|   |---pointlist3
|       |---node3
|---pointlist4


In [170]:
# Cut a branch off of the tree

new_root = pointlist3.tree(cut=True)

new_root

Root( A Node called 'treeofknowledge_cut_pointlist3', containing the following top-level objects in its tree:

          pointlist3               	 (PointList)
)

In [171]:
# Show the original tree and cut off branch

root2.tree()
print()
new_root.tree()

/
|---node2
|   |---ar3
|---pointlist4

/
|---pointlist3
    |---node3


In [172]:
# Show the two trees
root2.tree()
print()
root.tree()

/
|---node2
|   |---ar3
|---pointlist4

/
|---immanode
|   |---pointlistarray
|   |   |---pointlist1
|   |---ar2
|---ar1
|---pointlist2


In [173]:
# Graft a branch from root2 at node2 onto root at pointlist1


# perform the graft

node2.graft(pointlist1)               # Note that these two lines are just different syntax
#node2.tree( graft=pointlist1 )         # They perform an identical graft operation


# showing the two trees
root2.tree()
print()
root.tree()

/
|---pointlist4

/
|---immanode
|   |---pointlistarray
|   |   |---pointlist1
|   |       |---node2
|   |           |---ar3
|   |---ar2
|---ar1
|---pointlist2


<a id='cutgraft_rootmetadata'></a>


### 4.2.1 Cut/graft root metadata options

`emdfile` tries to keeps roots and their metadata with all the associated data in it's tree.  When cutting a branch, a new root is created.  When grafting a branch, the root of the target tree becomes the new root of the branch being moved.  However, a question arises regarding the original root's metadata: should this metadata be carried along to the new tree's root, or not?

Three options are available.  They are:
- no, leave the old root metadata behind
- yes, copy the old root metadata to the new root
- yes, create a pointer in the new root to the old root metadata

In the last case, the new and old roots will both point to the same Metadata instances, and changing values in one will change both.

In [174]:
# make two data trees, each containing some containing root metadata

def make_trees():

    # roots
    root1 = emd.Root( name='root1' )
    root2 = emd.Root( name='root2' )

    # nodes
    node1 = emd.Node( name = 'node1' )
    node2 = emd.Node( name = 'node2' )
    node3 = emd.Node( name = 'node3' )
    node4 = emd.Node( name = 'node4' )

    # tree 1
    root1.tree(node1)
    node1.tree(node2)
    node2.tree(node3)
    # add root metadata
    root1.metadata = emd.Metadata(
        name = 'metadata1',
        data = {'x':1}
    )
    root1.metadata = emd.Metadata(
        name = 'metadata2',
        data = {'y':2}
    )
    
    # tree 2
    root2.tree(node4)
    # add root metadata
    root2.metadata = emd.Metadata(
        name = 'metadata3',
        data = {'z':3}
    )
    
    return root1,root2

root1,root2 = make_trees()

In [175]:
# show the trees

root1.tree()
print()
root2.tree()

/
|---node1
    |---node2
        |---node3

/
|---node4


In [176]:
# show the root metadata

print(root1.metadata)
print()
print(root2.metadata)

{'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)}

{'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          z:   3
)}


<a id='cut_rootmetadata'></a>


#### 4.2.1.1 Cutting branches and root metadata

In [177]:
# Case 1: cut a branch, copy root metadata



# make new trees
root1,_ = make_trees()

# show the original trees
print("ORIGINAL TREE")
root1.tree()

# get the target node
target_node = root1.tree('node1/node2')

# show the tree under this node
print()
print("TARGET BRANCH")
target_node.tree()

# cut off the branch
new_root = target_node.tree( cut='copy' )   # specify copying the root metadata



# Show results

# show the old and new trees
print()
print("OLD/NEW TREES")
root1.tree()
print()
new_root.tree()

# show the metadata in the new tree
print()
print("NEW ROOT METADATA")
print(new_root.metadata)

# check if the new tree's Metadata is the same or different objects
# from the old tree's Metadata of the same name
print()
print("COMPARE OLD/NEW ROOT METADATA")
print(root1.metadata['metadata1'] is new_root.metadata['metadata1_copy'])

ORIGINAL TREE
/
|---node1
    |---node2
        |---node3

TARGET BRANCH
/
|---node3

OLD/NEW TREES
/
|---node1

/
|---node2
    |---node3

NEW ROOT METADATA
{'metadata1_copy': Metadata( A Metadata instance called 'metadata1_copy', containing the following fields:

          x:   1
), 'metadata2_copy': Metadata( A Metadata instance called 'metadata2_copy', containing the following fields:

          y:   2
)}

COMPARE OLD/NEW ROOT METADATA
False


In [178]:
# Case 2: cut a branch, add pointers to root metadata



# make new trees
root1,_ = make_trees()

# show the original trees
print("ORIGINAL TREE")
root1.tree()

# get the target node
target_node = root1.tree('node1/node2')

# show the tree under this node
print()
print("TARGET BRANCH")
target_node.tree()

# cut off the branch
new_root = target_node.tree( cut=True )   # specify adding pointers to the root metadata



# Show results

# show the old and new trees
print()
print("OLD/NEW TREES")
root1.tree()
print()
new_root.tree()

# show the metadata in the new tree
print()
print("NEW ROOT METADATA")
print(new_root.metadata)

# check if the new tree's Metadata is the same or different objects
# from the old tree's Metadata of the same name
print()
print("COMPARE OLD/NEW ROOT METADATA")
print(root1.metadata['metadata1'] is new_root.metadata['metadata1'])

ORIGINAL TREE
/
|---node1
    |---node2
        |---node3

TARGET BRANCH
/
|---node3

OLD/NEW TREES
/
|---node1

/
|---node2
    |---node3

NEW ROOT METADATA
{'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)}

COMPARE OLD/NEW ROOT METADATA
True


In [179]:
# Case 3: cut a branch, leave root metadata behind



# make new trees
root1,_ = make_trees()

# show the original trees
print("ORIGINAL TREE")
root1.tree()

# get the target node
target_node = root1.tree('node1/node2')

# show the tree under this node
print()
print("TARGET BRANCH")
target_node.tree()

# cut off the branch
new_root = target_node.tree( cut=False )   # specify *not* carrying the root metadata to the new tree



# Show results

# show the old and new trees
print()
print("OLD/NEW TREES")
root1.tree()
print()
new_root.tree()

# show the metadata in the new tree
print()
print("NEW ROOT METADATA")
print(new_root.metadata)

ORIGINAL TREE
/
|---node1
    |---node2
        |---node3

TARGET BRANCH
/
|---node3

OLD/NEW TREES
/
|---node1

/
|---node2
    |---node3

NEW ROOT METADATA
{}


<a id='graft_rootmetadata'></a>


#### 4.2.1.2 Grafting branches and root metadata

In [180]:
# Case 1: graft a branch, copy root metadata



# make new trees
root1,root2 = make_trees()

# show the original trees
print("ORIGINAL TREES")
root1.tree()
print()
root2.tree()

# get the souce and target nodes
source_node = root1.tree('node1/node2')
target_node = root2.tree('node4')

# show the tree under these nodes
print()
print("SOURCE AND TARGET BRANCHES")
source_node.tree()
target_node.tree()

# perform the graft
source_node.tree(graft = (target_node,'copy'))   # This syntax is a little clunky :|




# Show results

# show the old and new trees
print()
print("OLD/NEW TREES")
root1.tree()
print()
root2.tree()

# show the metadata in the new tree
print()
print("NEW ROOT METADATA")
print(root2.metadata)

# check if the new tree's Metadata is the same or different objects
# from the old tree's Metadata of the same name
print()
print("COMPARE OLD/NEW ROOT METADATA")
print(root1.metadata['metadata1'] is root2.metadata['metadata1_copy'])

ORIGINAL TREES
/
|---node1
    |---node2
        |---node3

/
|---node4

SOURCE AND TARGET BRANCHES
/
|---node3
/

OLD/NEW TREES
/
|---node1

/
|---node4
    |---node2
        |---node3

NEW ROOT METADATA
{'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          z:   3
), 'metadata1_copy': Metadata( A Metadata instance called 'metadata1_copy', containing the following fields:

          x:   1
), 'metadata2_copy': Metadata( A Metadata instance called 'metadata2_copy', containing the following fields:

          y:   2
)}

COMPARE OLD/NEW ROOT METADATA
False


In [181]:
# Case 2: graft a branch, add pointers to root metadata




# make new trees
root1,root2 = make_trees()

# show the original trees
print("ORIGINAL TREES")
root1.tree()
print()
root2.tree()

# get the souce and target nodes
source_node = root1.tree('node1/node2')
target_node = root2.tree('node4')

# show the tree under these nodes
print()
print("SOURCE AND TARGET BRANCHES")
source_node.tree()
target_node.tree()

# perform the graft
source_node.tree(graft = (target_node,True))   # This syntax is a little clunky :|
#source_node.tree(graft = target_node)          # This is equivalent to the above line
                                               # (i.e. this is the default behavior)




# Show results

# show the old and new trees
print()
print("OLD/NEW TREES")
root1.tree()
print()
root2.tree()

# show the metadata in the new tree
print()
print("NEW ROOT METADATA")
print(root2.metadata)

# check if the new tree's Metadata is the same or different objects
# from the old tree's Metadata of the same name
print()
print("COMPARE OLD/NEW ROOT METADATA")
print(root1.metadata['metadata1'] is root2.metadata['metadata1'])

ORIGINAL TREES
/
|---node1
    |---node2
        |---node3

/
|---node4

SOURCE AND TARGET BRANCHES
/
|---node3
/

OLD/NEW TREES
/
|---node1

/
|---node4
    |---node2
        |---node3

NEW ROOT METADATA
{'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          z:   3
), 'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)}

COMPARE OLD/NEW ROOT METADATA
True


In [182]:
# Case 3: graft a branch, leave root metadata behind




# make new trees
root1,root2 = make_trees()

# show the original trees
print("ORIGINAL TREES")
root1.tree()
print()
root2.tree()

# get the souce and target nodes
source_node = root1.tree('node1/node2')
target_node = root2.tree('node4')

# show the tree under these nodes
print()
print("SOURCE AND TARGET BRANCHES")
source_node.tree()
target_node.tree()

# perform the graft
source_node.tree(graft = (target_node,False))   # This syntax is a little clunky :|




# Show results

# show the old and new trees
print()
print("OLD/NEW TREES")
root1.tree()
print()
root2.tree()

# show the metadata in the new tree
print()
print("NEW ROOT METADATA")
print(root2.metadata)

ORIGINAL TREES
/
|---node1
    |---node2
        |---node3

/
|---node4

SOURCE AND TARGET BRANCHES
/
|---node3
/

OLD/NEW TREES
/
|---node1

/
|---node4
    |---node2
        |---node3

NEW ROOT METADATA
{'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          z:   3
)}


<a id='overwrite'></a>


## 5.3 Overwrite mode

Attempting to save to a path where a file already exists will throw an error, unless overwrite mode is invoked, in which case the existing file is deleted and the new file written in its place.

In [183]:
# remove files currently at `filepath` 
clean()

In [184]:
# make new trees
root1,root2 = make_trees()

In [185]:
# write a new file
emd.save(
    filepath,
    root1
)

# examine the file
emd.print_h5_tree(filepath)

/
|---root1
    |---node1
        |---node2
            |---node3




In [186]:
# attempt to write to the same path without overwrite mode,
# raising an error

emd.save(
    filepath,
    root2
)

AssertionError: A file already exists at this destination; use append or overwrite mode, or choose a new file path.

In [None]:
# write the new tree to the same filepath, using overwrite mode

emd.save(
    filepath,
    root2,
    mode = 'o'
    #mode = 'overwrite'    # identical to the line above
)

# examine the file
emd.print_h5_tree(filepath)

<a id='append'></a>


## 5.4 Appending to EMD files


The simplest case of file appending is adding a new tree to an existing EMD file.

We can also append new nodes or branches to existing trees in EMD files.  In the case of appending an entire branch to an existing tree, the runtime branch may be entirely new nodes, in which case no conflict resolution is necessary.  Alternatively, the new branch may contain some new nodes and some nodes already present in the target H5 branch.  In this case we can either skip or overwrite the conflicting nodes.

When appending branches to existing trees, the root metadata in the new branch may not match the root metadata of the existing EMD tree.  New root metadata is added to the existing root.  In the case of conflicting root metadata, we can either ignore the new root metadata, or overwrite the old root metadata.

In [209]:
# Define some trees for append examples

def make_append_trees():
    
    # tree 1
    root1 = emd.Root( name = 'tree1' )
    x1 = emd.Array( name='x', data=np.full((2,2),1) )
    y1 = emd.Array( name='y', data=np.full((2,2),1) )
    z1 = emd.Array( name='z', data=np.full((2,2),1) )
    w1 = emd.Array( name='w', data=np.full((2,2),1) )
    root1.tree(x1)
    x1.tree(y1)
    x1.tree(z1)
    z1.tree(w1)
    
    # tree 2
    root2 = emd.Root( name = 'tree2' )
    alpha = emd.Node( name='alpha' )
    beta = emd.Node( name='beta' )
    root2.tree(alpha)
    alpha.tree(beta)
    
    return root1,root2


# make the trees
root1,root2 = make_append_trees()

# show the trees
root1.tree()
print()
root2.tree()

/
|---x
    |---y
    |---z
        |---w

/
|---alpha
    |---beta


<a id='appendtree'></a>


### 5.4.1 Append a new tree

If we try to append a tree to an existing EMD file, and no tree with this name (i.e. the root name) exists, a new tree will be added to the tree.

In [210]:
clean()

In [211]:
# Write a new file with only tree1

emd.save(
    filepath,
    root1
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---x
        |---y
        |---z
            |---w




In [212]:
# Append tree2

emd.save(
    filepath,
    root2,
    mode = 'a'
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
|   |---x
|       |---y
|       |---z
|           |---w
|---tree2
    |---alpha
        |---beta




<a id='appendbranch'></a>


### 5.4.2 Append from a runtime tree to a different EMD tree

The `emdpath` argument specifies a target node inside the existing EMD file that we'd like to append onto.  If we pass as data a runtime tree which does not match the existing EMD tree, and we specify a node in the existing tree with `emdpath`, the data is appended onto the target node.

In [213]:
clean()

In [214]:
# Write a new file

emd.save(
    filepath,
    root1
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---x
        |---y
        |---z
            |---w




In [215]:
# Show the runtime tree we want to append

root2.tree()

/
|---alpha
    |---beta


In [216]:
# Append to a target node in an existing EMD tree

emd.save(
    filepath,
    root2,
    #root2.tree('alpha'),
    mode = 'a',
    emdpath = '/tree1/x/y',
    #tree = False
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---x
        |---y
        |   |---alpha
        |       |---beta
        |---z
            |---w




<a id='append_diffmerge'></a>


### 5.4.3 Merge a runtime tree and an EMD tree

If we pass a runtime tree with the same name as an existing EMD tree, append mode will attempt to match the runtime tree to the existing EMD tree, then performs a diffmerge - that is, it looks for nodes in the source (runtime) tree which are not present in the target (EMD) tree, and adds them.

If `emdpath` is set, then only nodes at or past the specified target node will be modified in the diffmerge.  New nodes in the runtime tree which are not at or downstream of the target node (e.g. exist on some other branch) will not be written.  Similarly, if the node passed as data is not the root node, only data at or downstream of this source node will be included in the append.

If "appendover" mode has been set, it will also overwrite any nodes present in both the source and target trees.  Note that overwriting in this way unlinks the overwritten data but **does not** free the associated storage space.

In [217]:
clean()

In [218]:
# make a tree
root1,_ = make_append_trees()

# show the tree
root1.tree()

/
|---x
    |---y
    |---z
        |---w


In [219]:
# Write a new file

emd.save(
    filepath,
    root1
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---x
        |---y
        |---z
            |---w




In [220]:
# Modify the tree

alpha = emd.Node( name='alpha' )
beta = emd.Node( name='beta' )
gamma = emd.Node( name='gamma' )
delta = emd.Node( name='delta' )

root1.tree(alpha)
root1.tree('x').tree(beta)
root1.tree('x/y').tree(gamma)
root1.tree('x/z/w').tree(delta)

root1.tree()

/
|---x
|   |---y
|   |   |---gamma
|   |---z
|   |   |---w
|   |       |---delta
|   |---beta
|---alpha


In [221]:
# Append

emd.save(
    filepath,
    root1,
    mode = 'a'
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---alpha
    |---x
        |---beta
        |---y
        |   |---gamma
        |---z
            |---w
                |---delta




In [222]:
# Do the same, but this time specify an `emdpath`

clean()

# make a tree
root1,_ = make_append_trees()

# Write a new file
emd.save(
    filepath,
    root1
)

# Modify the tree
alpha = emd.Node( name='alpha' )
beta = emd.Node( name='beta' )
gamma = emd.Node( name='gamma' )
delta = emd.Node( name='delta' )

root1.tree(alpha)
root1.tree('x').tree(beta)
root1.tree('x/y').tree(gamma)
root1.tree('x/z/w').tree(delta)


# Show the current state of the EMD file and the runtime tree
emd.print_h5_tree(filepath)
root1.tree()

/
|---tree1
    |---x
        |---y
        |---z
            |---w


/
|---x
|   |---y
|   |   |---gamma
|   |---z
|   |   |---w
|   |       |---delta
|   |---beta
|---alpha


In [223]:
# Append

emd.save(
    filepath,
    root1,
    mode = 'a',
    emdpath = 'tree1/x'
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---x
        |---beta
        |---y
        |   |---gamma
        |---z
            |---w
                |---delta




In [224]:
# Do the same, but this time specify an `emdpath` and use a non-root runtime node

clean()

# make a tree
root1,_ = make_append_trees()

# Write a new file
emd.save(
    filepath,
    root1
)

# Modify the tree
alpha = emd.Node( name='alpha' )
beta = emd.Node( name='beta' )
gamma = emd.Node( name='gamma' )
delta = emd.Node( name='delta' )

root1.tree(alpha)
root1.tree('x').tree(beta)
root1.tree('x/y').tree(gamma)
root1.tree('x/z/w').tree(delta)


# Show the current state of the EMD file and the runtime tree
emd.print_h5_tree(filepath)
root1.tree()

/
|---tree1
    |---x
        |---y
        |---z
            |---w


/
|---x
|   |---y
|   |   |---gamma
|   |---z
|   |   |---w
|   |       |---delta
|   |---beta
|---alpha


In [225]:
# Append

emd.save(
    filepath,
    root1.tree('x'),
    mode = 'a',
    emdpath = 'tree1/x'
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---x
        |---beta
        |---y
        |   |---gamma
        |---z
            |---w
                |---delta




<a id='appendover'></a>


### 5.4.4 Append from a runtime tree to the same EMD tree

If we pass a runtime tree with the same name as an existing EMD tree, append mode will attempt to match the runtime tree to the existing EMD tree, then performs a diffmerge - that is, it looks for nodes in the source (runtime) tree which are not present in the target (EMD) tree, and adds them.

If `emdpath` is set, then only nodes at or past the specified target node will be modified in the diffmerge.  New nodes in the runtime tree which are not at or downstream of the target node (e.g. exist on some other branch) will not be written.  Similarly, if the node passed as data is not the root node, only data at or downstream of this source node will be included in the append.

If "appendover" mode has been set, it will also overwrite any nodes present in both the source and target trees.  Note that overwriting in this way unlinks the overwritten data but **does not** free the associated storage space.

In [244]:
clean()

In [245]:
# make a tree
root1,_ = make_append_trees()

# show the tree
root1.tree()

/
|---x
    |---y
    |---z
        |---w


In [246]:
# Write a new file

emd.save(
    filepath,
    root1
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---x
        |---y
        |---z
            |---w




In [247]:
# Currently all the nodes store the array ([[1,1],[1,1]])

print(root1.tree('x').data)
print()
print(root1.tree('x/y').data)
print()
print(root1.tree('x/z').data)
print()
print(root1.tree('x/z/w').data)
print()

[[1 1]
 [1 1]]

[[1 1]
 [1 1]]

[[1 1]
 [1 1]]

[[1 1]
 [1 1]]



In [248]:
# Modify the tree

# Here we both add new nodes, AND modify some existing nodes


# new nodes
alpha = emd.Node( name='alpha' )
beta = emd.Node( name='beta' )
gamma = emd.Node( name='gamma' )
delta = emd.Node( name='delta' )
root1.tree(alpha)
root1.tree('x').tree(beta)
root1.tree('x/y').tree(gamma)
root1.tree('x/z/w').tree(delta)

# modify existing nodes
root1.tree('x').data = np.full((2,2),2)
root1.tree('x/y').data = np.full((2,2),2)
root1.tree('x/z').data = np.full((2,2),2)
root1.tree('x/z/w').data = np.full((2,2),2)

root1.tree()

/
|---x
|   |---y
|   |   |---gamma
|   |---z
|   |   |---w
|   |       |---delta
|   |---beta
|---alpha


In [249]:
# Now we have new nodes, and all the original nodes store the array ([[2,2],[2,2]])

print(root1.tree('x').data)
print()
print(root1.tree('x/y').data)
print()
print(root1.tree('x/z').data)
print()
print(root1.tree('x/z/w').data)
print()

[[2 2]
 [2 2]]

[[2 2]
 [2 2]]

[[2 2]
 [2 2]]

[[2 2]
 [2 2]]



In [250]:
# Append and overwrite nodes

emd.save(
    filepath,
    root1.tree('x/y'),
    mode = 'ao',           # appendover mode
    emdpath = 'tree1/x/'
)

# show
emd.print_h5_tree(filepath)

/
|---tree1
    |---x
        |---y
        |   |---gamma
        |---z
            |---w




In [251]:
# Load the results and check which nodes were overwritten

loaded_data = emd.read(filepath).root

loaded_data.tree()

print(loaded_data.tree('x').data)
print()
print(loaded_data.tree('x/y').data)
print()
print(loaded_data.tree('x/z').data)
print()
print(loaded_data.tree('x/z/w').data)
print()

/
|---x
    |---y
    |   |---gamma
    |---z
        |---w
[[1 1]
 [1 1]]

[[2 2]
 [2 2]]

[[1 1]
 [1 1]]

[[1 1]
 [1 1]]



<a id='appendrootmetadata'></a>


### 5.4.5 Appending and root metadata


In append operations where an existing tree is being modified, any root metadata in the source tree root which is not in the EMD tree root will be added there.  If appendover mode is on, then any root Metadata instances in the EMD tree root with the same name as Metadata instances in the source tree root will be overwritten.