> "We aim to develop a completely open file format flexible enough to store any possible type of electron microscopy data, while also allowing metadata of any type to be included."
>> Colin Ophus, https://emdatasets.com/format/


# `emd` package walkthrough

This notebook is a somewhat detailed walkthrough of the `emd` package from a user-facing perspective. If you haven't looked at `emd_package_basic examples.ipynb`, you may want to start there.  For an example of implementation of the `emd.Custom` class, see `emd_custom_class_example.py`.

`emd` defines a write and a read function, along with a set of classes which interface between long term storage (HDF5 files) and Python runtime objects.  The classes are designed to quickly build, save to, and read from filetree-like representations of data and metadata.  While the focus here is the Python package, along the way we'll arrive at the EMD 1.0 file specification, to which this package provides a Pythonic interface.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from py4DSTEM import emd

In [3]:
# utility for writing and cleaning a filepath

fp = "/Users/Ben/Desktop/test.h5"

from os.path import exists
from os import remove

def clean():
    if exists(fp):
        remove(fp)
        
clean()

# 1. numpy arrays, Python dictionaries, and lists

This section demos the simplest write operations, which don't even use the `emd` objects yet.  Here we write a numpy array to file, then read the file and confirm that the contents are correct.  Then we do the same thing with a Python dictionary, then with a list of arrays and dictionaries.

## 1.1 `np.array` io


In [4]:
clean()

In [5]:
import numpy as np

In [6]:
# Make a numpy array

ar = np.arange(12).reshape((3,4))
ar

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [7]:
# Save it

emd.save(fp, ar)

In [8]:
# Read it

loaded_data = emd.read(fp)

Reading group /np.array/np.array


In [9]:
# The loaded object is a `Root` - the base file storage unit of the format
# the message here tells us that the root we've loaded contains a single
# data object, that it's type is Array, and that it's name is 'np.array'

loaded_data

Root( A Node called 'np.array', containing the following top-level objects in its tree:

          np.array 		 (Array)
)

In [10]:
# We can access the data with the .tree method, using the object's name as a key

loaded_ar = loaded_data.tree('np.array')

In [11]:
# The data is now stored inside an emd.Array object instance

loaded_ar

Array( A 2-dimensional array of shape (3, 4) called 'np.array',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
)

In [12]:
# It's data, like all EMD data objects(*), is stored in its .data attribute

loaded_ar.data

# (*) excludes Metadata(**)
# (**) should it?

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [13]:
# Confirm the save and loaded arrays are identical

assert(np.array_equal( ar , loaded_ar.data ))

## 1.2 `dict`s

In [14]:
clean()

In [15]:
# Make a dictionary

very_important_knowledge = {
    'hey' : 1,
    'diddlediddle' : 1,
    'thecat' : 2,
    'andthe' : 3,
    'fiddle' : 5
}

very_important_knowledge

{'hey': 1, 'diddlediddle': 1, 'thecat': 2, 'andthe': 3, 'fiddle': 5}

In [16]:
# Save it

emd.save(fp, very_important_knowledge)

In [17]:
# Load it

loaded_data = emd.read(fp)

In [18]:
# The loaded data again lives inside a Root
# This time, however, no data appears to live in the tree under this root!

loaded_data

Root( A Node called 'dictionary', containing the following top-level objects in its tree:

)

In [19]:
# The reason is that metadata is handled differently from data...
# The root has a .metadata property, which points to a dictionary of Metadata objects.
# Each Metadata object wraps a single, depth-1 (i.e. non-nested) dictionary
# We only saved a single dictionary, so the .metadata dictionary contains one member:
# a Metadata instance storing the data we saved

loaded_data.metadata

{'dictionary': Metadata( A Metadata instance called 'dictionary', containing the following fields:
 
           andthe:         3
           diddlediddle:   1
           fiddle:         5
           hey:            1
           thecat:         2
 )}

In [20]:
# We can
# ...access the Metadata instance with its key, 'dictionary'

md = loaded_data.metadata['dictionary']

md

Metadata( A Metadata instance called 'dictionary', containing the following fields:

          andthe:         3
          diddlediddle:   1
          fiddle:         5
          hey:            1
          thecat:         2
)

In [21]:
# ...get the dictionary keys

md.keys

dict_keys(['andthe', 'diddlediddle', 'fiddle', 'hey', 'thecat'])

In [22]:
# ...and slice into the object like a normal dictionary

md['thecat']

2

In [23]:
# Metadata instances wrap python dictionaries, with the wrapped dict itself living in ._params

md._params

{'andthe': 3, 'diddlediddle': 1, 'fiddle': 5, 'hey': 1, 'thecat': 2}

In [24]:
# so ._params should match the original, saved dictionary

assert( very_important_knowledge == loaded_data.metadata['dictionary']._params )

In [25]:
# The .__getitem__,.__setitem__ methods point to _params, support element assignment and retrieval
# while keeping the dictionary itself protected from roving users

md['thelittledoglaughed'] = 256

md

Metadata( A Metadata instance called 'dictionary', containing the following fields:

          andthe:                3
          diddlediddle:          1
          fiddle:                5
          hey:                   1
          thecat:                2
          thelittledoglaughed:   256
)

## 1.3 `list`s

In [26]:
clean()

In [27]:
# Make some data

ar1 = np.zeros((4,5))
ar2 = np.eye(3)
dic = {'cow':'moo','tuple':(1,2,3),'array':np.arange(9).reshape((3,3))}

In [28]:
# Save it

emd.save(fp, [ar1,ar2,dic])

In [29]:
# Load it

loaded_data = emd.read(fp)

Reading group /np.array/np.array_0
Reading group /np.array/np.array_1


In [30]:
# All the loaded data lives in a Root, whose name is determined by the first entry in the list

loaded_data

Root( A Node called 'np.array', containing the following top-level objects in its tree:

          np.array_0 		 (Array)
          np.array_1 		 (Array)
)

In [31]:
# The arrays are accessible via the tree

l_ar1 = loaded_data.tree('np.array_0')
l_ar2 = loaded_data.tree('np.array_1')

assert(np.array_equal( ar1, l_ar1.data ))
assert(np.array_equal( ar2, l_ar2.data ))

In [32]:
# And the dictionary is accessible as metadata

loaded_data.metadata

{'dictionary_0': Metadata( A Metadata instance called 'dictionary_0', containing the following fields:
 
           array:   2D-array
           cow:     moo
           tuple:   (1, 2, 3)
 )}

In [33]:
md = loaded_data.metadata['dictionary_0']

md

Metadata( A Metadata instance called 'dictionary_0', containing the following fields:

          array:   2D-array
          cow:     moo
          tuple:   (1, 2, 3)
)

In [34]:
md['array']

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [35]:
# We'd like to confirm the dictionary wrote/read correctly, however
# this naive approach fails because the dictionary stores an array, which breaks `==` comparison...

assert( dic == md._params )

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [None]:
# ...so we can check that everything worked another way:

for k,v in dic.items():
    assert(k in md.keys)
    if isinstance(v,np.ndarray): assert(np.array_equal(v,md[k]))
    else: assert(v == md[k])

**Important Note:** 

You may have noticed that the read step looked a little complicated in this section - we write a numpy array, but then when we read it again, its some crazy object called a Root which you need to slice into with some method called .tree().  WTF?!  Its true that this may seem a little extra in the context of just writing and then reading an array.  If I wrote an array, can't the read function just give me my gosh damned array back?  The answer is that it could - but all the extra structure you're seeing and have to muddle through at this step is what will enable us to store and retrieve arbitrary data and metadata later on.  

Could we write a little extra code here to make package just return the numpy array or Python dictionary in these super simple cases?  Definitely.  This would not be very hard.  It just is not what this package was designed for, so, this is what it looks like right now.  If you'd like to complain about this, please first wait at least 3 hours.  If you're still irked, please find a seashell, hold it to your ear, and listen.  Take three breaths.  Then, if you still feel cranky, please direct any complaints to the shell, as I am confident it will be glad to listen.

# 2 `emd` basics: `Array`, `Metadata`, and `.tree`

In this section we first we look at the `Array` class, which represents the original EMD 0.1 file specification as a Python runtime object. We'll look at how `Array` instances carry their own basic metadata with them, and can additionally hold arbitrarily many Python dictionaries in the form of `Metadata` instances. Then, we build a runtime emd tree of data instances.  For each of these cases, we show how these data collections can be written and read to/from file.

## 2.1 `emd.Array`

The `emd.Array` class is our first example of a class representing an EMD data tree node.  Below we look at several attributes and properties of `Arrays`

In [None]:
emd.Array?

### 2.1.1 `emd.Array` as a simple `np.ndarray` wrapper

In [36]:
clean()

In [37]:
# Make an array with no bells or whistles...
# in this case, it's a simple wrapper for a numpy array

ar = emd.Array(
    data = np.array(
        [[1,2,3],
         [7,8,9],
         [40,50,60]]
    )
)

In [38]:
ar

Array( A 2-dimensional array of shape (3, 3) called 'array',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
)

In [39]:
ar.data

array([[ 1,  2,  3],
       [ 7,  8,  9],
       [40, 50, 60]])

In [40]:
# The `emd.Array` class has some properties beyond the data
# These all get carried along in write/read
# Here these were uninitialized, so have default values

print(ar.name)
print(ar.units)
print()
print(ar.dims[0])
print(ar.dims[1])
print()
print(ar.dim_names[0])
print(ar.dim_names[1])
print()
print(ar.dim_units[0])
print(ar.dim_units[1])

array


[0 1 2]
[0 1 2]

dim0
dim1

pixels
pixels


In [41]:
# Save it

emd.save(fp, ar)

In [42]:
# Load it

loaded_data = emd.read(fp)

Reading group /array/array


In [43]:
# As always, the loaded data is a Root, which contains our data

loaded_data

Root( A Node called 'array', containing the following top-level objects in its tree:

          array 		 (Array)
)

In [44]:
# and we can access it via the `.tree` method

l_ar = loaded_data.tree('array')

In [45]:
# and all the properties are the same as for the saved object

assert(np.array_equal(ar.data, l_ar.data))

for idx in range(2):
    assert(ar.dim_names[idx] == l_ar.dim_names[idx])
    assert(ar.dim_units[idx] == l_ar.dim_units[idx])
    assert(np.array_equal(ar.dims[idx], l_ar.dims[idx]))

In [46]:
l_ar.data

array([[ 1,  2,  3],
       [ 7,  8,  9],
       [40, 50, 60]])

### 2.1.2 `emd.Array` with builtin metadata

The `Array` class comes with the basic metadata which calibrates this array bundled with it.  These include a name, units, and for an N-dimensional array, N vectors describing the name, units, and step size of the pixels along each dimension. This metadata is all carried along for the ride at write/read time.

In [47]:
clean()

In [48]:
# Make an emd.Array with the name and dims customized

ar = emd.Array(
    data = np.array(
        [[1,2,3,4],
         [5,6,7,8],
         [9,10,11,12],
         [13,14,15,16]]
    ),
    name = 'my_array',
    units = 'intensity',
    dims = [[0,3],     # setting two numbers will extrapolate the full vector linearly
            [0,0.6]],
    dim_names = ['x','y'],
    dim_units = ['nm','km']
)

In [49]:
ar

Array( A 2-dimensional array of shape (4, 4) called 'my_array',
       with dimensions:

       x = [0,3,...] nm
       y = [0.0,0.6,...] km
)

In [50]:
ar.dims[0]

array([0, 3, 6, 9])

In [51]:
# The values of the Array metadata fields were initialized as expected

print(ar.name)
print(ar.units)
print()
print(ar.dims[0])
print(ar.dims[1])
print()
print(ar.dim_names[0])
print(ar.dim_names[1])
print()
print(ar.dim_units[0])
print(ar.dim_units[1])

my_array
intensity

[0 3 6 9]
[0.  0.6 1.2 1.8]

x
y

nm
km


In [52]:
# Save it

emd.save(fp, ar)

In [53]:
# Load it

loaded_data = emd.read(fp)

Reading group /my_array/my_array


In [54]:
# The loaded data is a Root

loaded_data

Root( A Node called 'my_array', containing the following top-level objects in its tree:

          my_array 		 (Array)
)

In [55]:
# and the key to access our data is now the name we set when we made the Array

l_ar = loaded_data.tree('my_array')

In [56]:
# as before, all properties loaded correctly

assert(np.array_equal(ar.data, l_ar.data))

for idx in range(2):
    assert(ar.dim_names[idx] == l_ar.dim_names[idx])
    assert(ar.dim_units[idx] == l_ar.dim_units[idx])
    assert(np.array_equal(ar.dims[idx], l_ar.dims[idx]))

In [57]:
# or we can look at the values directly

print(l_ar.data)
print()
print(l_ar.name)
print(l_ar.units)
print()
print(l_ar.dims[0])
print(l_ar.dims[1])
print()
print(l_ar.dim_names[0])
print(l_ar.dim_names[1])
print()
print(l_ar.dim_units[0])
print(l_ar.dim_units[1])

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

my_array
intensity

[0 3 6 9]
[0.  0.6 1.2 1.8]

x
y

nm
km


### 2.1.3 `emd.Array` as "stacked" arrays, representing N+1 dimensional data

The syntax below is meant to support multiple arrays which represent N+1 dimensional data - i.e. several data arrays, each with N normal dimensions which are described by normal calibration dim vectors, stacked into a single object.  The individual arrays are then accessible individually by string keys.

In [58]:
clean()

In [59]:
# make some data
# we'll assume we have (2+1)D data - i.e. several 2D arrays in a single Array instance.
# The code below makes 3 4x4 arrays, and then combines them into a single array

data = np.dstack([
    np.arange(16).reshape(4,4),
    np.eye(4),
    np.full((4,4),7.2)
])

data.shape

(4, 4, 3)

In [60]:
# make the Array

ar = emd.Array(
    data = data,
    name = 'my_stack_array',
    units = 'intensity',
    dims = [[0,3],
            [0,0.6]],
    dim_names = ['x','y'],
    dim_units = ['nm','km'],
    slicelabels = ['a','b','c']  # the last dimension is the 'stacking' dimension
)

In [61]:
ar

Array( A stack of 3 Arrays with 2-dimensions and shape (4, 4), called 'my_stack_array'

       The labels are:
           a
           b
           c


       The Array dimensions are:
           x = [0,3,...] nm
           y = [0.0,0.6,...] km
)

In [62]:
# for some (N+1) dimesional stack array, the `.rank` attribute returns N
# for non-stack arrays, `.rank` is identical to the numpy array property `.ndim`
# for stack arrays, `.rank` is one less than `.ndim`

print(ar.rank)
print(ar.data.ndim)

2
3


In [63]:
# the `.depth` property gives the number of slices. For non-stack arrays, it is 0

ar.depth

3

#### **TODO:** fix dim vector propagation!

In [64]:
# the slices can be accessed by indexing into the array with string keys,
# returning another Array

ar['a']

Array( A 2-dimensional array of shape (4, 4) called 'my_stack_array_a',
       with dimensions:

       x = [0,3,...] nm
       dim1 = [0,1,...] pixels
)

In [65]:
# Save it

emd.save(fp, ar)

In [66]:
# Load it

loaded_data = emd.read(fp)

Reading group /my_stack_array/my_stack_array


In [67]:
# The loaded data is a Root

loaded_data

Root( A Node called 'my_stack_array', containing the following top-level objects in its tree:

          my_stack_array 		 (Array)
)

In [68]:
l_ar = loaded_data.tree('my_stack_array')

In [69]:
l_ar

Array( A stack of 3 Arrays with 2-dimensions and shape (4, 4), called 'my_stack_array'

       The labels are:
           a
           b
           c


       The Array dimensions are:
           x = [0,3,...] nm
           y = [0.0,0.6,...] km
)

In [70]:
# check that all properties loaded correctly

assert(np.array_equal(ar.data, l_ar.data))

for idx in range(2):
    assert(ar.dim_names[idx] == l_ar.dim_names[idx])
    assert(ar.dim_units[idx] == l_ar.dim_units[idx])
    assert(np.array_equal(ar.dims[idx], l_ar.dims[idx]))

In [71]:
# check that all slices are the same

for key in ar.slicelabels:
    assert(np.array_equal( ar[key].data, l_ar[key].data ))

## 2.2 adding `Metadata` dictionaries

We've seen above that the `emd.Metadata` class wraps python dictionaries.  The `emd.Array` class, along with the other data classes we'll see a little later, can store any number of these metadata dictionaries, and they'll be carried along during write/read steps in the background

In [72]:
clean()

In [73]:
# Make some metadata instances

md1 = emd.Metadata(name = 'very_important_metadata')
md1['clouds'] = 'taste metallic'
md1['yoshimi'] = 'battles the pink robots'

md2 = emd.Metadata(name = 'questionable_important_metadata')
md2['quartz'] = 7
md2['TaS2'] = [2,4,6]

print(md1)
print(md2)

Metadata( A Metadata instance called 'very_important_metadata', containing the following fields:

          clouds:    taste metallic
          yoshimi:   battles the pink robots
)
Metadata( A Metadata instance called 'questionable_important_metadata', containing the following fields:

          quartz:   7
          TaS2:     [2, 4, 6]
)


In [74]:
# Make an Array

ar = emd.Array(
    name = 'pi',
    data = np.array(
        [[3,1,4],
         [1,5,9],
         [2,6,5]]
    )
)

In [75]:
ar

Array( A 2-dimensional array of shape (3, 3) called 'pi',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
)

In [76]:
ar.metadata

{}

In [77]:
# Add the metadata

# Note that `.metadata` is a property, not a regular attribute, so the code below
# doesn't overwrite `md1` with `md2` - instead, both are added to the `.metadata` dictionary,
# with keys corresponding to their names

ar.metadata = md1
ar.metadata = md2

In [78]:
ar.metadata

{'very_important_metadata': Metadata( A Metadata instance called 'very_important_metadata', containing the following fields:
 
           clouds:    taste metallic
           yoshimi:   battles the pink robots
 ),
 'questionable_important_metadata': Metadata( A Metadata instance called 'questionable_important_metadata', containing the following fields:
 
           quartz:   7
           TaS2:     [2, 4, 6]
 )}

In [79]:
# Save it

emd.save(fp, ar)

In [80]:
# Load it

loaded_data = emd.read(fp)

Reading group /pi/pi


In [81]:
loaded_data

Root( A Node called 'pi', containing the following top-level objects in its tree:

          pi 		 (Array)
)

In [82]:
l_ar = loaded_data.tree('pi')

l_ar

Array( A 2-dimensional array of shape (3, 3) called 'pi',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
)

In [83]:
l_ar.data

array([[3, 1, 4],
       [1, 5, 9],
       [2, 6, 5]])

In [84]:
# confirm that the metadata is all there

l_ar.metadata

{'questionable_important_metadata': Metadata( A Metadata instance called 'questionable_important_metadata', containing the following fields:
 
           TaS2:     [2, 4, 6]
           quartz:   7
 ),
 'very_important_metadata': Metadata( A Metadata instance called 'very_important_metadata', containing the following fields:
 
           clouds:    taste metallic
           yoshimi:   battles the pink robots
 )}

In [85]:
# and confirm the same programatically

for k in ar.metadata.keys():
    md_i,md_f = ar.metadata[k],l_ar.metadata[k]
    for k in md_i.keys:
        assert( md_i[k] == md_f[k] )

## 2.3 EMD data trees

The ability to build, save, extend, and read whole trees of data at once is the core utility of the `emd` package.  Various aspects of this functionality are accessed via the `.tree` function.  Trees can only be built on top of roots.  Here we build, save, and retrieve a tree made entirely of Arrays.

In [86]:
clean()

In [87]:
# Build some arrays

ar1 = emd.Array(
    name = 'array1',
    data = np.arange(np.prod((3,4,5,6))).reshape((3,4,5,6))
)
ar2 = emd.Array(
    name = 'array1_mean',
    data = np.mean(ar1.data,axis=(2,3))
)
ar3 = emd.Array(
    name = 'array1_max',
    data = np.max(ar1.data,axis=(2,3))
)
ar4 = emd.Array(
    name = 'array1_mean_sq',
    data = ar2.data**2
)

In [88]:
# The `.tree` method has several modes, depending on the arguments it receives
# When nothing is passed, it prints the tree of objects under the current data instance
# Because we've added nothing to the tree so far, the tree under `ar1` is nothing,
# represented here by the string '/'

ar1.tree()

/


In [89]:
# When `data1.tree(data2)` is called, if both `data1` and `data2` are emd data objects, 
# `data2` is added to the tree of `data1`. However, the code below will fail......

ar1.tree(ar2)
ar1.tree()

AssertionError: Can't add objects to an unrooted node. See the Node docstring for more info.

In [90]:
# ...because trees can't exist without roots.
# To build a tree, first we need to make a root, and then we can build from there

root = emd.Root(name = 'root')
root

Root( A Node called 'root', containing the following top-level objects in its tree:

)

In [91]:
# Now we can an array to the root

root.tree(ar1)
root.tree()

/
|--array1


In [92]:
# ...and now that `ar1` is rooted, adding `ar2` to it will work

ar1.tree(ar2)
ar1.tree()

/
|--array1_mean


In [93]:
root.tree()

/
|--array1
	|--array1_mean


In [94]:
ar2.tree()

/


In [95]:
# we can display the whole tree, from the root, from any node 

ar2.tree(show=True)

/
|--array1
	|--array1_mean


In [96]:
ar1.tree(ar3)
ar2.tree(ar4)

root.tree()

/
|--array1
	|--array1_mean
	|	|--array1_mean_sq
	|--array1_max


In [97]:
# the root is accessible from anywhere in the tree

ar4.root

Root( A Node called 'root', containing the following top-level objects in its tree:

          array1 		 (Array)
)

In [98]:
assert(root == ar1.root == ar2.root == ar3.root == ar4.root)

In [99]:
# Let's add some metadata

ar1.metadata = emd.Metadata(
    name = 'some_metadata',
    data = {
        'x' : 1,
        'y' : 2
    }
)
ar4.metadata = emd.Metadata(
    name = 'other_metadata',
    data = {
        'q' : 7,
        'r' : 2.1
    }
)
root.metadata = emd.Metadata(
    name = 'root_metadata',
    data = {
        'zamboni' : 'baloney',
        'pineapple' : 'house'
    }
)

In [100]:
# Save

emd.save(fp, root)

In [101]:
# Read

l_root = emd.read(fp)

Reading group /root/array1
Reading group /root/array1/array1_max
Reading group /root/array1/array1_mean
Reading group /root/array1/array1_mean/array1_mean_sq


In [102]:
l_root

Root( A Node called 'root', containing the following top-level objects in its tree:

          array1 		 (Array)
)

In [103]:
l_root.tree()

/
|--array1
	|--array1_max
	|--array1_mean
		|--array1_mean_sq


In [104]:
# data is accessed by passing the `.tree` function string inputs,
# delimiting nodes with '/' characters

l_ar1 = l_root.tree('array1')
l_ar2 = l_root.tree('array1/array1_mean')
l_ar3 = l_root.tree('array1/array1_max')
l_ar4 = l_root.tree('array1/array1_mean/array1_mean_sq')

assert(np.array_equal( ar1.data, l_ar1.data ))
assert(np.array_equal( ar2.data, l_ar2.data ))
assert(np.array_equal( ar3.data, l_ar3.data ))
assert(np.array_equal( ar4.data, l_ar4.data ))

In [105]:
# data can be accessed from downstream nodes with the same kind of syntax

l_ar4_v2 = l_ar2.tree('array1_mean_sq')

assert( l_ar4_v2 is l_ar4 )

In [106]:
# data can be accessed from downstream nodes using root paths by adding a leading '/'

l_ar4_v3 = l_ar2.tree('/array1/array1_mean/array1_mean_sq')

assert( l_ar4_v3 is l_ar4_v2 is l_ar4 )

In [107]:
# finally, the metadata all came along for the ride

assert( root.metadata['root_metadata']._params == l_root.metadata['root_metadata']._params)
assert( ar1.metadata['some_metadata']._params == l_ar1.metadata['some_metadata']._params)
assert( ar4.metadata['other_metadata']._params == l_ar4.metadata['other_metadata']._params)

# 3. Other data types

The `emd.Array` class represents one type of data we can store at a node of an EMD tree.  In this section we show the other standard classes: `Node`, `PointList`, and `PointListArray`.  A final data node class type, `Custom`, allows a single node to contain any number of each of the other types - this will not be covered in this tutorial. 

## 3.1 `emd.Node`

`emd.Node`s are the most basic data object.  They have a name, and may contain any number of Metadata instances.

In [108]:
clean()

In [109]:
# Make the node

node = emd.Node( name = 'my_node' )

node

Node( A Node called 'my_node', containing the following top-level objects in its tree:

)

In [110]:
# The .metadata dictionary is instantiated empty

node.metadata

{}

In [111]:
# Let's add some metadata

# make dictionaries
some_metadata = {
    'twas' : 'brillig',
    'slithy' : 'toves'
}
some_other_metadata = {
    'woods' : 'Arcady',
    'antique' : 'joy'
}

# add as Metadata to node
node.metadata = emd.Metadata(
    name = 'jabberwock',
    data = some_metadata
)
node.metadata = emd.Metadata(
    name = 'shepherd',
    data = some_other_metadata
)

# take a look
node.metadata

{'jabberwock': Metadata( A Metadata instance called 'jabberwock', containing the following fields:
 
           twas:     brillig
           slithy:   toves
 ),
 'shepherd': Metadata( A Metadata instance called 'shepherd', containing the following fields:
 
           woods:     Arcady
           antique:   joy
 )}

In [112]:
# write

emd.save(fp, node)

In [113]:
# read

loaded_data = emd.read(fp)

loaded_data

Reading group /my_node/my_node


Root( A Node called 'my_node', containing the following top-level objects in its tree:

          my_node 		 (Node)
)

In [114]:
# Get the loaded node

l_node = loaded_data.tree('my_node')

l_node

Node( A Node called 'my_node', containing the following top-level objects in its tree:

)

In [115]:
l_node.metadata

{'jabberwock': Metadata( A Metadata instance called 'jabberwock', containing the following fields:
 
           slithy:   toves
           twas:     brillig
 ),
 'shepherd': Metadata( A Metadata instance called 'shepherd', containing the following fields:
 
           antique:   joy
           woods:     Arcady
 )}

In [116]:
# and confirm that everything is the same

for k in node.metadata.keys():
    md_i,md_f = node.metadata[k],l_node.metadata[k]
    for k in md_i.keys:
        assert( md_i[k] == md_f[k] )

## 3.2 `emd.PointList`

`emd.PointList` instances wrap 1D numpy structured arrays.  A `PointList`s data attribute can have any length, with any number of string-named fields, and each field may have its own data type.  Unlike Array's, PointLists have variable length that can change at runtime.  Like all other emd data objects, they can hold any number of Metadata instances.

In [117]:
clean()

In [118]:
# make some data
# here we define the fields by specifying a custom `dtype` for numpy

data = np.zeros(
    5,
    dtype = [
        ('x',int),
        ('y',float)
    ]
)

data

array([(0, 0.), (0, 0.), (0, 0.), (0, 0.), (0, 0.)],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [119]:
# make a PointList

pointlist = emd.PointList(
    name = 'my_pointlist',
    data = data
)

pointlist

PointList( A length 5 PointList called 'my_pointlist',
           with 2 fields:

           x   (int64)
           y   (float64)
)

In [120]:
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.), (0, 0.), (0, 0.)],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [121]:
# the fields can be accessed by slicing directly into the PointList

print(pointlist['x'])
print(pointlist['y'])

[0 0 0 0 0]
[0. 0. 0. 0. 0.]


#### Runtime modifications

PointLists support runtime modifications including adding or removing data,
adding data by field, adding new fields, sorting data by field, or copying the object

In [122]:
# remove points

# make a boolean mask
rm = np.zeros(5,dtype = bool)
rm[3:] = True        # flag the last two points

# remove the last two point
pointlist.remove(rm)

# show
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('x', '<i8'), ('y', '<f8')])

In [123]:
# add points
# the new data must have the same dtype as the existing data

# make the new data
new_data = np.ones(
    3,
    dtype = [
        ('x',int),
        ('y',float)
    ]
)

# add it to the array
pointlist.add(new_data)

# show
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.), (1, 1.), (1, 1.), (1, 1.)],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [124]:
# add data as 1D vectors corresponding to the fields

pointlist.add_data_by_field(
    data = (np.arange(5,10),np.linspace(5,6,num=5)),
    fields = ('x','y')
)

pointlist.data

array([(0, 0.  ), (0, 0.  ), (0, 0.  ), (1, 1.  ), (1, 1.  ), (1, 1.  ),
       (5, 5.  ), (6, 5.25), (7, 5.5 ), (8, 5.75), (9, 6.  )],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [125]:
# make a new pointlist like this one, with some additional fields added

pointlist_copy = pointlist.add_fields(
    [('z',bool)],
    name = 'another_pointlist',
)

pointlist_copy.data

array([(0, 0.  , False), (0, 0.  , False), (0, 0.  , False),
       (1, 1.  , False), (1, 1.  , False), (1, 1.  , False),
       (5, 5.  , False), (6, 5.25, False), (7, 5.5 , False),
       (8, 5.75, False), (9, 6.  , False)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

In [126]:
# modify values in an existing field

pointlist_copy['z'][6:] = True

pointlist_copy.data

array([(0, 0.  , False), (0, 0.  , False), (0, 0.  , False),
       (1, 1.  , False), (1, 1.  , False), (1, 1.  , False),
       (5, 5.  ,  True), (6, 5.25,  True), (7, 5.5 ,  True),
       (8, 5.75,  True), (9, 6.  ,  True)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

In [127]:
# Sort the pointlist by one of its fields

pointlist_copy.sort('y', order='descending')

pointlist_copy.data

array([(9, 6.  ,  True), (8, 5.75,  True), (7, 5.5 ,  True),
       (6, 5.25,  True), (5, 5.  ,  True), (1, 1.  , False),
       (1, 1.  , False), (1, 1.  , False), (0, 0.  , False),
       (0, 0.  , False), (0, 0.  , False)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

In [128]:
# Add some metadata

pointlist_copy.metadata = emd.Metadata(
    name = 'evolution',
    data = {
        'pikachu' : 'raichi',
        'thunderstone' : True
    }
)

pointlist_copy.metadata['evolution']

Metadata( A Metadata instance called 'evolution', containing the following fields:

          pikachu:        raichi
          thunderstone:   True
)

In [129]:
# save

emd.save(fp, pointlist_copy)

In [130]:
# read

loaded_data = emd.read(fp)

loaded_data

Reading group /another_pointlist/another_pointlist


Root( A Node called 'another_pointlist', containing the following top-level objects in its tree:

          another_pointlist 		 (PointList)
)

In [131]:
# Get the pointlist

l_pointlist = loaded_data.tree('another_pointlist')

l_pointlist.data

array([(9, 6.  ,  True), (8, 5.75,  True), (7, 5.5 ,  True),
       (6, 5.25,  True), (5, 5.  ,  True), (1, 1.  , False),
       (1, 1.  , False), (1, 1.  , False), (0, 0.  , False),
       (0, 0.  , False), (0, 0.  , False)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

In [132]:
# Check that the data loaded correctly

assert(np.array_equal(pointlist_copy.data, l_pointlist.data))

In [133]:
# Check that the metadata loaded correctly

for k in pointlist_copy.metadata.keys():
    md_i,md_f = pointlist_copy.metadata[k],l_pointlist.metadata[k]
    for k in md_i.keys:
        assert( md_i[k] == md_f[k] )

## 3.3 `emd.PointListArray`

`emd.PointListArray` represents 2D grids of PointList instances with the same data fields.  It stores 2D ragged arrays of vectors of any length with string-accessible fields.

In [134]:
clean()

In [135]:
# make a PointListArray

shape = (5,6)
dtype = [('x',int),('y',int)]

pointlistarray = emd.PointListArray(
    name = 'my_pointlistarray',
    shape = shape,
    dtype = dtype
)

pointlistarray

PointListArray( A shape (5, 6) PointListArray called 'my_pointlistarray',
                with 2 fields:

                x   (int64)
                y   (int64)
)

In [136]:
# the pointlists can be accessed by slicing into the pointlistarray

pointlistarray[0,0]

PointList( A length 0 PointList called '0,0',
           with 2 fields:

           x   (int64)
           y   (int64)
)

In [137]:
# and are instantiated empty

pointlistarray[3,4].data

array([], dtype=[('x', '<i8'), ('y', '<i8')])

In [138]:
# we can populate the pointlists with the `add` method

for ii in range(pointlistarray.shape[0]):
    for jj in range(pointlistarray.shape[1]):
        
        # set an integer value that varies sinusoidally from 0 to 8
        val = int(np.round((np.sin((ii*shape[1]+jj) * 2*np.pi / np.prod(shape)) + 1) * 4))
        
        # add to the pointlist
        pointlistarray[ii,jj].add(
            np.full(
                shape = val,
                fill_value= val,
                dtype = dtype
            )
        )

In [139]:
pointlistarray[0,0].data

array([(4, 4), (4, 4), (4, 4), (4, 4)], dtype=[('x', '<i8'), ('y', '<i8')])

In [140]:
print(pointlistarray[0,0].data)
print(pointlistarray[0,1].data)
print(pointlistarray[0,2].data)
print(pointlistarray[0,3].data)
print(pointlistarray[0,4].data)
print(pointlistarray[0,5].data)
print(pointlistarray[1,0].data)
print(pointlistarray[1,1].data)
print(pointlistarray[1,2].data)
print(pointlistarray[1,3].data)
print(pointlistarray[1,4].data)
print(pointlistarray[1,5].data)
print(pointlistarray[2,0].data)
print(pointlistarray[2,1].data)
print(pointlistarray[2,2].data)
print(pointlistarray[2,3].data)

[(4, 4) (4, 4) (4, 4) (4, 4)]
[(5, 5) (5, 5) (5, 5) (5, 5) (5, 5)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(5, 5) (5, 5) (5, 5) (5, 5) (5, 5)]
[(4, 4) (4, 4) (4, 4) (4, 4)]


In [141]:
# add some metadata

pointlistarray.metadata = emd.Metadata(
    name = 'is_rodent',
    data = {
        'gerbil' : True,
        'mouse' : True,
        'pikachu' : True,
        'bulbasaur' : 'False'
    }
)

In [142]:
# save

emd.save(fp, pointlistarray)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 2585.35it/s]


In [143]:
# load

loaded_data = emd.read(fp)

loaded_data

Reading PointListArray: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 4469.79PointList/s]

Reading group /my_pointlistarray/my_pointlistarray





Root( A Node called 'my_pointlistarray', containing the following top-level objects in its tree:

          my_pointlistarray 		 (PointListArray)
)

In [144]:
# get the pointlistarray

l_pointlistarray = loaded_data.tree('my_pointlistarray')

l_pointlistarray

PointListArray( A shape (5, 6) PointListArray called 'my_pointlistarray',
                with 2 fields:

                x   (int64)
                y   (int64)
)

In [145]:
# check it

for ii in range(l_pointlistarray.shape[0]):
    for jj in range(l_pointlistarray.shape[1]):
        
        print(l_pointlistarray[ii,jj].data)
        assert(np.array_equal(pointlistarray[ii,jj].data,
                              l_pointlistarray[ii,jj].data))

[(4, 4) (4, 4) (4, 4) (4, 4)]
[(5, 5) (5, 5) (5, 5) (5, 5) (5, 5)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(5, 5) (5, 5) (5, 5) (5, 5) (5, 5)]
[(4, 4) (4, 4) (4, 4) (4, 4)]
[(3, 3) (3, 3) (3, 3)]
[(2, 2) (2, 2)]
[(2, 2) (2, 2)]
[(1, 1)]
[(1, 1)]
[]
[]
[]
[]
[(1, 1)]
[(1, 1)]
[(2, 2) (2, 2)]
[(2, 2) (2, 2)]
[(3, 3) (3, 3) (3, 3)]


In [146]:
# metadata

for k in pointlistarray.metadata.keys():
    md_i,md_f = pointlistarray.metadata[k],l_pointlistarray.metadata[k]
    for k in md_i.keys:
        assert( md_i[k] == md_f[k] )

# 4. Trees

The tree structure is the core of the `emd` package and format.  In this section we first look at building, saving, and restoring trees from file.  We look at basic tree methods for showing and retrieving data in trees.  Then we look at methods for modifying existing trees by cutting off branches, or grafting branches from one tree to another.  Root metadata is handled specially, and is discussed.

In [147]:
# All the functionality shown in this section is accessed via the `.tree` method.
# The usage of `.tree` can be specified by passing a keyword argument in (show, add, get, cut, graft).
# In the first three cases, the keyword can be omitted, as long as the data passed has the correct type.

node.tree?

## 4.1 Build a tree

Let's make a tree with all these data types, write it, read it, and confirm that it worked.

In [148]:
clean()

In [149]:
# make some data

ar1 = emd.Array(
    name = 'ar1',
    data = np.arange(12).reshape((3,4))
)
ar2 = emd.Array(
    name = 'ar2',
    data = np.arange(24).reshape((3,4,2)),
    slicelabels = ('a','b')
)
node = emd.Node(
    name = 'immanode'
)
pointlist1 = emd.PointList(
    name = 'pointlist1',
    data = np.ones(
        5,
        dtype = [('rx',int),('ry',int)]
    )
)
pointlist2 = emd.PointList(
    name = 'pointlist2',
    data = np.zeros(
        6,
        dtype = [('qx',float),('qy',float)]
    )
)
pointlistarray = emd.PointListArray(
    name = 'pointlistarray',
    shape = (3,4),
    dtype = [('yes',bool),('no',bool)]
)
for rx in range(pointlistarray.shape[0]):
    for ry in range(pointlistarray.shape[1]):
        pointlistarray[rx,ry].add(
            np.ones(
                int(ry + rx*pointlistarray.shape[1]),
                dtype = [('yes',bool),('no',bool)]
            )
        )
        
# add some metadata
pointlist1.metadata = emd.Metadata(
    name = 'evolution',
    data = {
        'pikachu' : 'raichi',
        'thunderstone' : True
    }
)
pointlistarray.metadata = emd.Metadata(
    name = 'is_rodent',
    data = {
        'gerbil' : True,
        'mouse' : True,
        'pikachu' : True,
        'bulbasaur' : 'False'
    }
)

In [150]:
# Make a tree

# start with a Root
root = emd.Root( name='treeoflife' )

# and add data
root.tree(node)
node.tree(pointlistarray)
pointlistarray.tree(pointlist1)
root.tree(ar1)
root.tree(pointlist2)
node.tree(ar2)

# show the tree
root.tree()

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


In [151]:
# save

emd.save(fp,root)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 3066.76it/s]


In [152]:
# load

loaded_data = emd.read(fp)

loaded_data

Reading PointListArray: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 3148.09PointList/s]

Reading group /treeoflife/ar1
Reading group /treeoflife/immanode
Reading group /treeoflife/immanode/ar2
Reading group /treeoflife/immanode/pointlistarray
Reading group /treeoflife/immanode/pointlistarray/pointlist1
Reading group /treeoflife/pointlist2





Root( A Node called 'treeoflife', containing the following top-level objects in its tree:

          ar1 		 (Array)
          immanode 		 (Node)
          pointlist2 		 (PointList)
)

In [153]:
loaded_data.tree()

/
|--ar1
|--immanode
|	|--ar2
|	|--pointlistarray
|		|--pointlist1
|--pointlist2


In [154]:
# check that the data is the same

assert(np.array_equal( loaded_data.tree('ar1').data, ar1.data ))
assert(np.array_equal( loaded_data.tree('immanode/ar2').data, ar2.data ))
assert(np.array_equal( loaded_data.tree('immanode/pointlistarray/pointlist1').data, pointlist1.data ))
assert(np.array_equal( loaded_data.tree('pointlist2').data, pointlist2.data ))

In [155]:
# check that the metadata is the same

def check_metadata(obj1, obj2):
    """ asserts equivalence of the metadata in obj1 to obj2. Fails for array-like metadata
    """
    for k in obj1.metadata.keys():
        md_i,md_f = obj1.metadata[k],obj2.metadata[k]
        for k in md_i.keys:
            assert( md_i[k] == md_f[k] )
            
check_metadata(
    loaded_data.tree('immanode/pointlistarray/pointlist1'),
    pointlist1
)
check_metadata(
    loaded_data.tree('immanode/pointlistarray'),
    pointlistarray
)

## 4.2 Show and retrieve data

We've aleady done most of these operations above!

In [156]:
# show the tree, from root

root.tree()

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


In [157]:
# show the tree, from some node

node.tree()

/
|--pointlistarray
|	|--pointlist1
|--ar2


In [158]:
# show the whole tree from root, using some node

node.tree(show=True)
print()
print()
node.tree(True)

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


In [159]:
# get some node from root

data = root.tree('immanode/pointlistarray/pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

In [160]:
# get some node from another, upstream node

data = pointlistarray.tree('pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

In [161]:
# get some node from another node, using a path referenced to the root

data = pointlistarray.tree('/immanode/pointlistarray/pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

## 4.3 Cutting and grafting branches

In this section we'll make a new tree.  We use it first to demonstrate cutting branches off a parent tree to yield some new, smaller tree.  Then we demonstrate grafting a branch from one tree to another.

In [162]:
# make some data

ar3 = emd.Array(
    name = 'ar3',
    data = np.arange(12,22).reshape((5,2))
)
node2 = emd.Node(
    name = 'node2'
)
node3 = emd.Node(
    name = 'node3'
)
pointlist3 = emd.PointList(
    name = 'pointlist3',
    data = np.ones(
        3,
        dtype = [('rx',int),('ry',int)]
    )
)
pointlist4 = emd.PointList(
    name = 'pointlist4',
    data = np.zeros(
        7,
        dtype = [('qx',float),('qy',float)]
    )
)

In [163]:
# make a tree

root2 = emd.Root( name='treeofknowledge')
root2.tree(node2)
node2.tree(ar3)
node2.tree(pointlist3)
pointlist3.tree(node3)
root2.tree(pointlist4)

root2.tree()

/
|--node2
|	|--ar3
|	|--pointlist3
|		|--node3
|--pointlist4


In [164]:
# Cut a branch off of the tree

new_root = pointlist3.tree(cut=True)

new_root

Root( A Node called 'pointlist3', containing the following top-level objects in its tree:

          pointlist3 		 (PointList)
)

In [165]:
# Show the original tree and cut off branch

root2.tree()
print()
new_root.tree()

/
|--node2
|	|--ar3
|--pointlist4

/
|--pointlist3
	|--node3


In [166]:
# Graft a branch from one tree onto another
# let's graft from root2 at `node2` onto root at `pointlist1`

# start by showing the two trees
root2.tree()
print()
root.tree()

/
|--node2
|	|--ar3
|--pointlist4

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


In [167]:
# perform the graft
node2.graft(pointlist1)

# showing the two trees
root2.tree()
print()
root.tree()

/
|--pointlist4

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|		|--node2
|	|			|--ar3
|	|--ar2
|--ar1
|--pointlist2


## 4.4 Root metadata

`emd` defines trees which are *almost* unidirectional: each node knows about and points to only the nodes downstream of itself...*plus*, each node also knows about and points to the root node of its tree.  This is helpful in a number of ways.  One is that metadata living in root is available to every object in that tree.  When cutting or grafting branches, the question then arises: should the root metadata be carried along to the new tree's root, or not?

In [168]:
node.tree?

In [184]:
# make a data trees

def make_trees():

    # roots
    root1 = emd.Root( name='root1' )
    root2 = emd.Root( name='root2' )

    # nodes
    node1 = emd.Node( name = 'node1' )
    node2 = emd.Node( name = 'node2' )
    node3 = emd.Node( name = 'node3' )
    node4 = emd.Node( name = 'node4' )

    # tree 1
    root1.tree(node1)
    node1.tree(node2)
    node2.tree(node3)
    # add root metadata
    root1.metadata = emd.Metadata(
        name = 'metadata1',
        data = {'x':1}
    )
    root1.metadata = emd.Metadata(
        name = 'metadata2',
        data = {'y':2}
    )
    
    # tree 2
    root2.tree(node4)
    # add root metadata
    root2.metadata = emd.Metadata(
        name = 'metadata3',
        data = {'z':3}
    )
    
    return root1,root2

root1,root2 = make_trees()

In [185]:
# show the trees

root1.tree()
print()
root2.tree()

/
|--node1
	|--node2
		|--node3

/
|--node4


### 4.4.1 Cutting

First we'll cut a branch off of the first tree.  When cutting a branch off an existing tree, a new root is created at the base of the new branch.  The old root metadata can be left behind, or pointers to the same metadata can be added to the new root, or copies of the metadata can be placed in the new root.

In [186]:
# Check the existint metadata

root1.metadata.keys()

dict_keys(['metadata1', 'metadata2'])

In [187]:
print(root1.metadata['metadata1'])
print(root1.metadata['metadata2'])

Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
)
Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)


#### 4.4.1.1 add root metadata pointers

In [188]:
# Before performing the cut, we'll find the node we want to cut at

# show the tree
root1.tree()

# get the node of interest
target_node = root1.tree('node1/node2')

# show the tree under this node
target_node.tree()

/
|--node1
	|--node2
		|--node3
/
|--node3


In [189]:
# When we cut a branch from a tree, it creates its own new root under the node where we made the cut.
# Here, pointers to the old root metadata are included in the new root metadata

# cut off the branch
new_root = target_node.tree(cut=True)

# show the old and new trees
root1.tree()
print()
new_root.tree()

/
|--node1

/
|--node2
	|--node3


In [190]:
# The old and new root metadata contain the same information

check_metadata(root1, new_root)

In [191]:
# And are in fact the same objects

for key in root1.metadata.keys():
    
    assert( root1.metadata[key] is new_root.metadata[key] )

#### 4.4.1.2 add root metadata copies

In [192]:
# make fresh trees

root1,_ = make_trees()

In [193]:
# Find the node we want to cut at

# show the tree
root1.tree()

# get the node of interest
target_node = root1.tree('node1/node2')

# show the tree under this node
target_node.tree()

/
|--node1
	|--node2
		|--node3
/
|--node3


In [194]:
# This time we'll copy the metadata

# cut off the branch
new_root = target_node.tree(cut='copy')

# show the old and new trees
root1.tree()
print()
new_root.tree()

/
|--node1

/
|--node2
	|--node3


In [195]:
# The old and new root metadata contain the same information

for k in root1.metadata.keys():
    md_i,md_f = root1.metadata[k],new_root.metadata[k+"_copy"]
    for k in md_i.keys:
        assert( md_i[k] == md_f[k] )

In [196]:
# But they are *not* the same objects

for k in root1.metadata.keys():
    assert( root1.metadata[k] is not new_root.metadata[k+"_copy"] )

#### 4.4.1.3 don't transfer metadata

In [None]:
# make fresh trees

root1,_ = make_trees()

In [None]:
# Find the node we want to cut at

# show the tree
root1.tree()

# get the node of interest
target_node = root1.tree('node1/node2')

# show the tree under this node
target_node.tree()

In [197]:
# Cut the branch, but don't transfer any root metadata

# cut off the branch
new_root = target_node.tree(cut=False)

# show the old and new trees
root1.tree()
print()
new_root.tree()

/
|--node1

/
|--node2
	|--node3


In [198]:
# Check

print(root1.metadata.keys())
print()
print(new_root.metadata.keys())

dict_keys(['metadata1', 'metadata2'])

dict_keys([])


### 4.4.2 Grafting

The same options exist when grafting a branch from tree_source to tree_target, however, in this case tree_target may already have metadata of its own.  In this case, adding pointers or copying metadata will work normally if there are no name conflicts between metadata in the source and target trees.  If there is a name conflict, linked or copied source root metadata will overwrite target root metadata of the same name.

In [209]:
# Make the trees

root1,root2 = make_trees()


# and show their contents

root1.tree()
print()
root2.tree()

/
|--node1
	|--node2
		|--node3

/
|--node4


In [210]:
# Examine the metadata

print(root1.metadata)
print()
print(root2.metadata)

{'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)}

{'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          z:   3
)}


In [211]:
# We'll graft from root1 onto root2. So lets add a new Metadata instance to root1,
# that has the same name as the dictionary in root2, but different contents

root1.metadata = emd.Metadata(
    name = 'metadata3',
    data = {
        'thats a horse' : 'of a different color'
    }
)

In [212]:
# Find the node we want to cut at

# show the trees
root1.tree()
print()
root2.tree()
print()
print()

# get the souce and target nodes
source_node = root1.tree('node1/node2')
target_node = root2.tree('node4')


/
|--node1
	|--node2
		|--node3

/
|--node4




In [213]:
# perform the graft

source_node.tree(graft = target_node)

Root( A Node called 'root2', containing the following top-level objects in its tree:

          node4 		 (Node)
)

In [214]:
# show the trees after the graft

root1.tree()
print()
root2.tree()

/
|--node1

/
|--node4
	|--node2
		|--node3


In [216]:
# show the root metadata

print(root1.metadata)
print()
print(root2.metadata)

{'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
), 'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          thats a horse:   of a different color
)}

{'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          thats a horse:   of a different color
), 'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)}


#### 4.4.2.1 graft root metadata options

The same options for copying, linking, or discarding root metadata that were available when cutting off a branch are available when grafting, with a little extra syntax.  The default behavior, shown above, is to link the root metadata.  To instead copy the root metadata, use
    
    >>> source_node.tree(graft = (target_node,'copy'))
    
and to avoid transferring root metadata, use

    >>> source_node.tree(graft = (target_node,False))

# 5. The EMD 1.0 File

In [None]:
clean()

## 5.1 Roots

aka `topgroups` in the older parlance, aka homes for `trees`.  Each EMD 1.0 file may contain any number of EMD object trees.  Each tree must begin with a root.  To

## 5.2 Appending new roots

## 5.3 Reading from different roots

# 6. Append and other fancy read / write operations

## 6.1 Read from any node

## 6.2 Read from a single node, read a whole branch, read a branch without it's source node

## 6.3 Append to an existing tree

## 6.4 Append conflict resolution

## 6.5 Append root metadata handling

## 6.6 Storage and re-writes to free up space