# `emd` package walkthrough

This notebook is a somewhat detailed walkthrough of the user-facing functionality in `emd`. If you're interested in the motivation of the package and a simple demo implementation, check out `emd_intro_example.ipynb`.  For an example of implementation of the `emd.Custom` class in a dependent script or package, see the `sample_custom_class_module`.


`emd` is a Python package which defines (1) write and read functions and (2) a set of classes which together interface between long term storage (HDF5 files) and Python runtime objects.  The classes are designed to quickly build, save to, and read from filetree-like representations of data and metadata.  At present the emd class data points to system RAM.  While the focus here is the Python package, along the way we'll arrive at the EMD 1.0 file specification, to which this package provides a Pythonic interface.


## Contents

- 1. [Writing simple Python objects](#simplePython)
   - 1.1 [numpy arrays](#nparrays)
   - 1.2 [Python dicts](#dicts)
   - 1.3 [lists](#lists)

- 2. [`emd` basics](#emd)
   - 2.1 [Trees](#trees)
   - 2.2 [Metadata](#metadata)
   - 2.3 [Writing and reading a tree](#writingtrees)

- 3. [Class types](#classes)
   - 3.1 [Array](#array)
      - 3.1.1 [as a np.array wrapper](#arraywrapper)
      - 3.1.2 [using built-in metadata](#arraymetadata)
      - 3.1.3 [array stacks](#arraystacks)
   - 3.2 [PointList](#pointlist)
   - 3.3 [PointListArray](#pointlistarray)
   - 3.4 [Saving unrooted nodes](#savingnodes)

- 4. [More on trees](#trees)
   - 4.1 [Building trees](#buildingtrees)
   - 4.2 [Show and retrieve data](#showtree)
   - 4.3 [Cutting and grafting branches](#cutgraft)
   - 4.4 [Cut/graft root metadata options](#rootmetadata)
      - 4.4.1 [Cutting](#rootmetadatacutting)
         - 4.4.1.1 [add pointers](#rootmetadatapointers)
         - 4.4.1.2 [add copies](#rootmetadatacopies)
         - 4.4.1.3 [no metadata](#rootmetadatanometadata) 
      - 4.4.2 [Grafting](#graft)
         - 4.4.1.1 [root options](#graftrootoptions)

- 5. [The EMD 1.0 file](#emdspec)
   - 5.1 [Trees and roots](#treesandroots)

___

In [1]:
# %load_ext autoreload
# %autoreload 2

In [3]:
## Import emd

import emdfile as emd

In [4]:
## Demo utilities

fp = "/Users/Ben/Desktop/test.h5"
from os.path import exists
from os import remove
def clean():
    if exists(fp):
        remove(fp)
clean()

<a id='simplePython'></a>


# 1. Writing simple Python objects

Write a numpy array, a python dictionary, or a list of those into an HDF5 file.  Read them.  

When `emd` writes simple python objects, it wraps them in some `emd` structures behind the scenes.  Reading thus requires pulling the data back out of those objects.  Below we'll just do it.  The structures themselves are discussed in section 2.


<a id='nparrays'></a>

## 1.1 `np.array`


In [5]:
clean()

In [6]:
import numpy as np

In [7]:
# Make a numpy array

ar = np.arange(12).reshape((3,4))
ar

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [8]:
# Save it

emd.save(fp, ar)

In [9]:
# Read it

loaded_data = emd.read(fp)

In [10]:
# The loaded object is an emd `Root`

loaded_data

Root( A Node called 'np.array', containing the following top-level objects in its tree:

          np.array 		 (Array)
)

In [11]:
# We can access the data like this -

loaded_array = loaded_data.tree('np.array')

In [12]:
# Which gives us an emd `Array` object

loaded_array

Array( A 2-dimensional array of shape (3, 4) called 'np.array',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
)

In [13]:
# with our numpy array living here

loaded_array.data

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [14]:
# Confirm the save and loaded arrays are identical

assert(np.array_equal( ar , loaded_array.data ))

<a id='dicts'></a>

## 1.2 `dict`s

In [15]:
clean()

In [16]:
# Make a dictionary

very_important_knowledge = {
    'hey' : 1,
    'diddlediddle' : 1,
    'thecat' : 2,
    'andthe' : 3,
    'fiddle' : 5
}

very_important_knowledge

{'hey': 1, 'diddlediddle': 1, 'thecat': 2, 'andthe': 3, 'fiddle': 5}

In [17]:
# Save it

emd.save(fp, very_important_knowledge)

In [18]:
# Load it

loaded_data = emd.read(fp)

In [19]:
# The loaded data is and emd `Root`

loaded_data

Root( A Node called 'dictionary', containing the following top-level objects in its tree:

)

In [20]:
# Because we saved a dictionary, it was saved as metadata

loaded_data.metadata

{'dictionary': Metadata( A Metadata instance called 'dictionary', containing the following fields:
 
           andthe:         3
           diddlediddle:   1
           fiddle:         5
           hey:            1
           thecat:         2
 )}

In [21]:
# Retrieve the metadata

md = loaded_data.metadata['dictionary']

md

Metadata( A Metadata instance called 'dictionary', containing the following fields:

          andthe:         3
          diddlediddle:   1
          fiddle:         5
          hey:            1
          thecat:         2
)

In [22]:
# This emd `Metadata` object can be sliced into like a regular Python dict

md['thecat']

2

In [23]:
# And contains the same data as the original dictionary

assert( very_important_knowledge == loaded_data.metadata['dictionary']._params )

<a id='lists'></a>

## 1.3 `list`s

In [24]:
clean()

In [25]:
# Make some data

ar1 = np.zeros((4,5))
ar2 = np.eye(3)
dic = {'cow':'moo','tuple':(1,2,3),'array':np.arange(9).reshape((3,3))}

In [26]:
# Save it

emd.save(fp, [ar1,ar2,dic])

In [27]:
# Load it

loaded_data = emd.read(fp)

In [28]:
# All the loaded data lives in a Root, whose name is determined by the first entry in the list

loaded_data

Root( A Node called 'np.array', containing the following top-level objects in its tree:

          np.array_0 		 (Array)
          np.array_1 		 (Array)
)

In [29]:
# The arrays are accessible via  `.tree`

l_ar1 = loaded_data.tree('np.array_0')
l_ar2 = loaded_data.tree('np.array_1')

assert(np.array_equal( ar1, l_ar1.data ))
assert(np.array_equal( ar2, l_ar2.data ))

In [30]:
# And the dictionary is accessible as metadata

loaded_data.metadata

{'dictionary_0': Metadata( A Metadata instance called 'dictionary_0', containing the following fields:
 
           array:   2D-array
           cow:     moo
           tuple:   (1, 2, 3)
 )}

In [31]:
md = loaded_data.metadata['dictionary_0']

md

Metadata( A Metadata instance called 'dictionary_0', containing the following fields:

          array:   2D-array
          cow:     moo
          tuple:   (1, 2, 3)
)

In [32]:
md['array']

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [33]:
# Check that everything worked:

for k,v in dic.items():
    assert(k in md.keys)
    if isinstance(v,np.ndarray): assert(np.array_equal(v,md[k]))
    else: assert(v == md[k])

<a id='emd'></a>

# 2 `emd` basics


Conceptually, the core structure of `emd` is a rooted tree.  I think the easiest way to [think about this is visually](https://www.youtube.com/watch?v=ykwqXuMPsoc), but in words, I mean this in the usual computer-science-y sort of sense: a set of nested containers, plus data in those containers.  In a computer filesystem, the containers are directories, and the data are files. In HDF5, they blur those lines a little: there are H5 groups, which are containers, and H5 attributes, which are (usually small bits of) data, and H5 datasets, which are both containers and data.  In each case, filesystems and HDF5 files, these are 'trees' in the sense that each container can be thought of as a node point on a directed graph, with each contained item represented as a child node.  The resulting trees are 'rooted' in the sense that each such structure must have a single highest-level container, corresponding to a single node point at the base of the graphical tree, the root node.


If HDF5 is guilty of blurring the lines between containers and data, in EMD we blur the lines even more: there's just one thing, `emd` nodes, and each may store data in addition to containing any number of downstream nodes.  The specific data stored in any given node is determined by its emd data type.  The data types are described in Section 3.  In this section we show how to build emd trees, how to read and write them, and how to add metadata.

<a id='trees'></a>

## 2.1 Trees


In [34]:
# A rooted tree should have a root!

root = emd.Root( name='groot' )

In [35]:
root

Root( A Node called 'groot', containing the following top-level objects in its tree:

)

In [36]:
# add some nodes

node1 = emd.Node( name='node' )
node2 = emd.Node( name='lode' )
node3 = emd.Node( name='abode' )

root.tree(node1)
root.tree(node2)
root.tree(node3)

root

Root( A Node called 'groot', containing the following top-level objects in its tree:

          node 		 (Node)
          lode 		 (Node)
          abode 		 (Node)
)

In [37]:
root.tree()

/
|--node
|--lode
|--abode


In [38]:
root.tree('abode')

Node( A Node called 'abode', containing the following top-level objects in its tree:

)

In [39]:
# You may have noticed that the `.tree` method is overloaded - i.e.
# it has a few different behaviors, depending on what we pass to it

# If we give it...
#                 ...a node, it adds that node to the calling object's tree
#                 ...nothing, it prints the tree
#                 ...a string, it returns the node of that name

In [40]:
# We can add to a node's tree the same way we just added to the root's tree

node4 = emd.Node( name='snowed' )
node5 = emd.Node( name='flowed' )
node6 = emd.Node( name='crowed' )

node7 = emd.Node( name='load' )
node8 = emd.Node( name='glowed' )
node9 = emd.Node( name='reload' )


node1.tree(node4)
node1.tree(node5)
node1.tree(node6)

node2.tree(node7)
node7.tree(node8)
node8.tree(node9)


root.tree()

/
|--node
|	|--snowed
|	|--flowed
|	|--crowed
|--lode
|	|--load
|		|--glowed
|			|--reload
|--abode


In [41]:
# from a non-root node, we can display the branch below this node...

node2.tree()

/
|--load
	|--glowed
		|--reload


In [42]:
# ...or the whole tree from root

node2.tree(show=True)

/
|--node
|	|--snowed
|	|--flowed
|	|--crowed
|--lode
|	|--load
|		|--glowed
|			|--reload
|--abode


In [43]:
# We can retrieve objects using '/' delimiters

root.tree()

/
|--node
|	|--snowed
|	|--flowed
|	|--crowed
|--lode
|	|--load
|		|--glowed
|			|--reload
|--abode


In [44]:
root.tree('lode/load')

Node( A Node called 'load', containing the following top-level objects in its tree:

          glowed 		 (Node)
)

In [45]:
# This works from any node

node2.tree()

/
|--load
	|--glowed
		|--reload


In [46]:
node2.tree('load/glowed/reload')

Node( A Node called 'reload', containing the following top-level objects in its tree:

)

In [47]:
# We can also retrieve objects using a root path even if 
# we're calling .tree from a downstream node, by using a leading '/'

node2.tree(show=True)

/
|--node
|	|--snowed
|	|--flowed
|	|--crowed
|--lode
|	|--load
|		|--glowed
|			|--reload
|--abode


In [48]:
node2.tree('/node/snowed')

Node( A Node called 'snowed', containing the following top-level objects in its tree:

)

In [49]:
# The root node of a tree is accessible from any node with the .root property

assert(root == root.root == node1.root == node2.root == node3.root == node4.root == node5.root)

<a id='writingtrees'></a>

## 2.2 Writing and reading a tree

In [50]:
clean()

In [51]:
# save a tree to an HDF5 file

emd.save(fp, root)

In [52]:
# load it

loaded_data = emd.read(fp)

In [53]:
# look inside

loaded_data

Root( A Node called 'groot', containing the following top-level objects in its tree:

          abode 		 (Node)
          lode 		 (Node)
          node 		 (Node)
)

In [54]:
loaded_data.tree()

/
|--abode
|--lode
|	|--load
|		|--glowed
|			|--reload
|--node
	|--crowed
	|--flowed
	|--snowed


In [55]:
# You can print the tree of a file to screen without loading it with print_h5_tree()

emd.print_h5_tree(fp)

/
|--groot
	|--abode
	|--lode
	|	|--load
	|		|--glowed
	|			|--reload
	|--node
		|--crowed
		|--flowed
		|--snowed




<a id='metadata'></a>

## 2.2 Metadata


`emd` enables arbitrarily many Python dict-like objects to be stored with each data node.  Those objects are the emd Metadata class, and support normal dictionary string key access.

In [56]:
# Metadata is represented in emd by its `Metadata` class

metadata = emd.Metadata( name='some_metadata' )

metadata

Metadata( A Metadata instance called 'some_metadata', containing the following fields:

)

In [57]:
# These work like Python dictionaries -
# we can slice into them to either get or set items with square brackets and string keys

metadata['key'] = 'value'
metadata['answer'] = 42
metadata['bool'] = True

In [58]:
metadata.keys

dict_keys(['key', 'answer', 'bool'])

In [59]:
metadata['answer']

42

In [60]:
# To add Metadata to a node, assign it to the node.metadata property

node2.metadata = metadata

node2.metadata

{'some_metadata': Metadata( A Metadata instance called 'some_metadata', containing the following fields:
 
           key:      value
           answer:   42
           bool:     True
 )}

In [61]:
# Note that node.metadata is a dictionary
# so we need to slice into it to retrieve the Metadata instance we just added to it

node2.metadata['some_metadata']

Metadata( A Metadata instance called 'some_metadata', containing the following fields:

          key:      value
          answer:   42
          bool:     True
)

In [62]:
# The reason it's build this way is that now we can add as many Metadata instances (dictionaries)
# as we like to each node


# make a new Metadata instance
more_metadata = emd.Metadata( name='more_metadata' )

# add info to it
more_metadata['an_array'] = np.arange(12).reshape((3,4))
more_metadata['none'] = None
more_metadata['tup'] = (1,2,3)
more_metadata['list'] = ['a','b','c']

# add it to node2
node2.metadata = more_metadata

# show the metadata in node2
node2.metadata

{'some_metadata': Metadata( A Metadata instance called 'some_metadata', containing the following fields:
 
           key:      value
           answer:   42
           bool:     True
 ),
 'more_metadata': Metadata( A Metadata instance called 'more_metadata', containing the following fields:
 
           an_array:   2D-array
           none:       None
           tup:        (1, 2, 3)
           list:       ['a', 'b', 'c']
 )}

In [63]:
node2.metadata['more_metadata']['an_array']

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [64]:
# By the way, you can build and populate a new Metadata instance
# all in one command by passing a dictionary to the `data` argument...


node2.metadata = emd.Metadata(
    name = 'even_more_metadata',
    data = {
        'x' : True,
        'y' : False
    }
)


node2.metadata

{'some_metadata': Metadata( A Metadata instance called 'some_metadata', containing the following fields:
 
           key:      value
           answer:   42
           bool:     True
 ),
 'more_metadata': Metadata( A Metadata instance called 'more_metadata', containing the following fields:
 
           an_array:   2D-array
           none:       None
           tup:        (1, 2, 3)
           list:       ['a', 'b', 'c']
 ),
 'even_more_metadata': Metadata( A Metadata instance called 'even_more_metadata', containing the following fields:
 
           x:   True
           y:   False
 )}

In [65]:
# metadata does not show up in the emd tree - you have to get it from the nodes

root.tree()

/
|--node
|	|--snowed
|	|--flowed
|	|--crowed
|--lode
|	|--load
|		|--glowed
|			|--reload
|--abode


In [66]:
loaded_data.tree('lode').metadata

{}

In [67]:
# Metadata is taken along for the ride during write/read

clean()
emd.save(fp, root)
loaded_data = emd.read(fp)
loaded_data.tree('lode').metadata

{'even_more_metadata': Metadata( A Metadata instance called 'even_more_metadata', containing the following fields:
 
           x:   True
           y:   False
 ),
 'more_metadata': Metadata( A Metadata instance called 'more_metadata', containing the following fields:
 
           an_array:   2D-array
           list:       ['a', 'b', 'c']
           none:       None
           tup:        (1, 2, 3)
 ),
 'some_metadata': Metadata( A Metadata instance called 'some_metadata', containing the following fields:
 
           answer:   42
           bool:     True
           key:      value
 )}

<a id='classes'></a>

# 3. Class types

In addition to storing metadata, each nodes in an EMD tree can hold some data.  The structure of that data is determined by the node's `emd` class type:

- `Node` instances store only metadata
- `Array` instances store array-like data
- `PointList` instances store N points in M string labeled / sliceable dimensions
- `PointListArray` instances store 2D grids of variable length PointLists, i.e. a form of ragged (2+1)D array

A final data node class type, `Custom`, allows a single node to contain any number of each of the other types - this will not be covered in this tutorial.  See `sample_custom_class_module` for more on custom nodes.

<a id='array'></a>

## 3.1 `Array`

Here we demo
- using `Array` as a simple wrapper for numpy arrays
- using the built-in array metadata
- stack arrays

<a id='arraywrapper'></a>

### 3.1.1 simple `np.ndarray` wrapper

In [68]:
# generate data
data = np.arange(60).reshape((3,4,5))

data

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

In [69]:
# make the Array
ar = emd.Array(
    name = 'arrrrrr',
    data = data
)

ar

Array( A 3-dimensional array of shape (3, 4, 5) called 'arrrrrr',
       with dimensions:

       dim0 = [0,1,...] pixels
       dim1 = [0,1,...] pixels
       dim2 = [0,1,...] pixels
)

In [70]:
ar.data

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

<a id='arraymetadata'></a>

### 3.1.2 built-in metadata

The `Array` class comes with the basic metadata which calibrates the array it's bundled with.  These include a name, units, and N vectors (for N-dimensional arrays) describing the name, units, and step size of the pixels along each dimension of the array. The dimension vectors ('dim vectors') are carried along for the ride at write/read time.

In [71]:
# This metadata is 'built-in' in the sense that it is always included with Arrays.
# The Array we just made will be populated with the default values:

print(ar.name)
print(ar.units)
print()
print(ar.dims[0])
print(ar.dims[1])
print(ar.dims[2])
print()
print(ar.dim_names[0])
print(ar.dim_names[1])
print(ar.dim_names[2])
print()
print(ar.dim_units[0])
print(ar.dim_units[1])
print(ar.dim_units[2])

arrrrrr


[0 1 2]
[0 1 2 3]
[0 1 2 3 4]

dim0
dim1
dim2

pixels
pixels
pixels


In [72]:
# The units attribute can be modified directly

ar.units = 'cows'

In [73]:
# The dims should be set using the .set_dim method

ar.set_dim?

In [74]:
ar.set_dim(
    0,                   # which dimension
    dim = [0,5],         # when two numbers are passed the vector is extrapolated linearly
)

ar.dims[0]

array([ 0,  5, 10])

In [75]:
# the name and units can be set with 
# their own method calls (this cell) or
# inside a call to `.set_dim` (next cell)

ar.set_dim_name(
    0,
    'x-axis'
)
ar.set_dim_units(
    0,
    'pastures'
)

In [76]:
ar.set_dim(
    1,
    dim = 2,             # when one number x is passed, the vect is extraplated linearly from [0,x] 
    name = 'y-axis',
    units = 'fields'
)

ar.dims[1]

array([0, 2, 4, 6])

In [77]:
ar.set_dim(
    2,
    dim = np.logspace(-2,2,5),   # when a 1D array is passed, its length must match the array dim length 
    name = 'z-axis',
    units = 'tracts'
)

ar.dims[2]

array([1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02])

In [78]:
print(ar.name)
print(ar.units)
print()
print(ar.dims[0])
print(ar.dims[1])
print(ar.dims[2])
print()
print(ar.dim_names[0])
print(ar.dim_names[1])
print(ar.dim_names[2])
print()
print(ar.dim_units[0])
print(ar.dim_units[1])
print(ar.dim_units[2])

arrrrrr
cows

[ 0  5 10]
[0 2 4 6]
[1.e-02 1.e-01 1.e+00 1.e+01 1.e+02]

x-axis
y-axis
z-axis

pastures
fields
tracts


In [79]:
# Alternatively, the Array can be initialized with the name and dims specified

ar = emd.Array(
    data = np.array(
        [[1,2,3],
         [4,5,6]]
    ),
    name = 'my_array',
    units = 'intensity',
    dims = [[0,3],     # setting two numbers will extrapolate the full vector linearly
            [0,0.6]],
    dim_names = ['x','y'],
    dim_units = ['nm','km']
)

In [80]:
ar

Array( A 2-dimensional array of shape (2, 3) called 'my_array',
       with dimensions:

       x = [0,3,...] nm
       y = [0.0,0.6,...] km
)

In [81]:
ar.dims[0]

[0, 3]

<a id='arraystacks'></a>

### 3.1.3 array stacks

The syntax below supports holding multiple arrays in a single Array instance, each using the same set of dim vectors.  In other words, an array stack holds (N+1) dimensional data, where the first N dimensions are described by the dim vectors, and N-dimensional slices along the last dimension are accessible using string keys.

In [82]:
# make some data
# we'll assume we have (2+1)D data - i.e. several 2D arrays in a single Array instance.
# The code below makes 5 3x4 arrays, and then combines them into a single array

data = np.dstack([
    np.arange(12).reshape(3,4),
    np.arange(12).reshape(3,4)+10,
    np.arange(12).reshape(3,4)+100,
    np.arange(12).reshape(3,4)+1000,
    np.arange(12).reshape(3,4)+10000,
])

data

array([[[    0,    10,   100,  1000, 10000],
        [    1,    11,   101,  1001, 10001],
        [    2,    12,   102,  1002, 10002],
        [    3,    13,   103,  1003, 10003]],

       [[    4,    14,   104,  1004, 10004],
        [    5,    15,   105,  1005, 10005],
        [    6,    16,   106,  1006, 10006],
        [    7,    17,   107,  1007, 10007]],

       [[    8,    18,   108,  1008, 10008],
        [    9,    19,   109,  1009, 10009],
        [   10,    20,   110,  1010, 10010],
        [   11,    21,   111,  1011, 10011]]])

In [83]:
data.shape

(3, 4, 5)

In [84]:
# make the Array

ar = emd.Array(
    data = data,
    name = 'my_stack_array',
    units = 'intensity',
    dims = [[0,3],                  # we want only two dim vectors
            [0,0.6]],
    dim_names = ['x','y'],
    dim_units = ['nm','km'],
    slicelabels = ['a','b','c','d','e']
)

In [85]:
ar

Array( A stack of 5 Arrays with 2-dimensions and shape (3, 4), called 'my_stack_array'

       The labels are:
           a
           b
           c
           d
           e


       The Array dimensions are:
           x = [0,3,...] nm
           y = [0.0,0.6,...] km
)

In [86]:
# the `.rank` attribute returns N and the `.data.ndim` attribute gives N+1
# for some N+1 dimensional stack array.
# for non-stack arrays, `.rank` and `data.ndim` are identical

print(ar.rank)
print(ar.data.ndim)

2
3


In [87]:
# the `.depth` property gives the number of slices. For non-stack arrays, it is 0

ar.depth

5

In [88]:
# the slices can be accessed by indexing into the array with string keys,
# returning another Array

ar['a']

Array( A 2-dimensional array of shape (3, 4) called 'my_stack_array_a',
       with dimensions:

       x = [0,3,...] nm
       y = [0.0,0.6,...] km
)

<a id='pointlist'></a>


## 3.2 `PointList`

A `PointList`s data attribute can have any length, with any number of string-named fields, and each field may have its own data type.  PointLists have variable length that can change at runtime.  PointList wraps numpy structured arrays.

In [89]:
# make some data
# we define the fields by specifying a custom `dtype` for numpy

data = np.zeros(
    5,
    dtype = [
        ('x',int),
        ('y',float)
    ]
)

data

array([(0, 0.), (0, 0.), (0, 0.), (0, 0.), (0, 0.)],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [90]:
# make a PointList

pointlist = emd.PointList(
    name = 'my_pointlist',
    data = data
)

pointlist

PointList( A length 5 PointList called 'my_pointlist',
           with 2 fields:

           x   (int64)
           y   (float64)
)

In [91]:
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.), (0, 0.), (0, 0.)],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [92]:
# the fields can be accessed by slicing directly into the PointList

print(pointlist['x'])
print(pointlist['y'])

[0 0 0 0 0]
[0. 0. 0. 0. 0.]


In [93]:
# remove points

# make a boolean mask
rm = np.zeros(5,dtype = bool)
rm[3:] = True        # flag the last two points

# remove the last two point
pointlist.remove(rm)

# show
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('x', '<i8'), ('y', '<f8')])

In [94]:
# add points
# the new data must have the same dtype as the existing data

# make the new data
new_data = np.ones(
    3,
    dtype = [
        ('x',int),
        ('y',float)
    ]
)

# add it to the array
pointlist.add(new_data)

# show
pointlist.data

array([(0, 0.), (0, 0.), (0, 0.), (1, 1.), (1, 1.), (1, 1.)],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [95]:
# add data as 1D vectors corresponding to the fields

pointlist.add_data_by_field(
    data = (np.arange(5,10),np.linspace(5,6,num=5)),
    fields = ('x','y')
)

pointlist.data

array([(0, 0.  ), (0, 0.  ), (0, 0.  ), (1, 1.  ), (1, 1.  ), (1, 1.  ),
       (5, 5.  ), (6, 5.25), (7, 5.5 ), (8, 5.75), (9, 6.  )],
      dtype=[('x', '<i8'), ('y', '<f8')])

In [96]:
# make a new pointlist like this one, with some additional fields added

pointlist_copy = pointlist.add_fields(
    [('z',bool)],
    name = 'another_pointlist',
)

pointlist_copy.data

array([(0, 0.  , False), (0, 0.  , False), (0, 0.  , False),
       (1, 1.  , False), (1, 1.  , False), (1, 1.  , False),
       (5, 5.  , False), (6, 5.25, False), (7, 5.5 , False),
       (8, 5.75, False), (9, 6.  , False)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

In [97]:
# modify values in an existing field

pointlist_copy['z'][6:] = True

pointlist_copy.data

array([(0, 0.  , False), (0, 0.  , False), (0, 0.  , False),
       (1, 1.  , False), (1, 1.  , False), (1, 1.  , False),
       (5, 5.  ,  True), (6, 5.25,  True), (7, 5.5 ,  True),
       (8, 5.75,  True), (9, 6.  ,  True)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

In [98]:
# Sort the pointlist by one of its fields

pointlist_copy.sort('y', order='descending')

pointlist_copy.data

array([(9, 6.  ,  True), (8, 5.75,  True), (7, 5.5 ,  True),
       (6, 5.25,  True), (5, 5.  ,  True), (1, 1.  , False),
       (1, 1.  , False), (1, 1.  , False), (0, 0.  , False),
       (0, 0.  , False), (0, 0.  , False)],
      dtype=[('x', '<i8'), ('y', '<f8'), ('z', '?')])

<a id='pointlistarray'></a>


## 3.3 `PointListArray`

`emd.PointListArray` represents 2D grids of PointList instances with the same data fields.  It stores 2D ragged arrays of vectors of any length with string-accessible fields.

In [99]:
# make a PointListArray

shape = (5,6)
dtype = [('x',int),('y',int)]

pointlistarray = emd.PointListArray(
    name = 'my_pointlistarray',
    shape = shape,
    dtype = dtype
)

pointlistarray

PointListArray( A shape (5, 6) PointListArray called 'my_pointlistarray',
                with 2 fields:

                x   (int64)
                y   (int64)
)

In [100]:
# the pointlists can be accessed by slicing into the pointlistarray

pointlistarray[0,0]

PointList( A length 0 PointList called '0,0',
           with 2 fields:

           x   (int64)
           y   (int64)
)

In [101]:
# and are instantiated empty

pointlistarray[3,4].data

array([], dtype=[('x', '<i8'), ('y', '<i8')])

In [102]:
# we can populate the pointlists with the `add` method

for ii in range(pointlistarray.shape[0]):
    for jj in range(pointlistarray.shape[1]):
        
        # set an integer value that varies sinusoidally from 0 to 8
        val = int(np.round((np.sin((ii*shape[1]+jj) * 2*np.pi / np.prod(shape)) + 1) * 4))
        
        # add to the pointlist
        pointlistarray[ii,jj].add(
            np.full(
                shape = val,
                fill_value= val,
                dtype = dtype
            )
        )

In [103]:
pointlistarray[0,0].data

array([(4, 4), (4, 4), (4, 4), (4, 4)], dtype=[('x', '<i8'), ('y', '<i8')])

In [104]:
for x in range(pointlistarray.shape[0]):
    for y in range(pointlistarray.shape[1]):
        print(pointlistarray[x,y].data)

[(4, 4) (4, 4) (4, 4) (4, 4)]
[(5, 5) (5, 5) (5, 5) (5, 5) (5, 5)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8) (8, 8)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7) (7, 7)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(6, 6) (6, 6) (6, 6) (6, 6) (6, 6) (6, 6)]
[(5, 5) (5, 5) (5, 5) (5, 5) (5, 5)]
[(4, 4) (4, 4) (4, 4) (4, 4)]
[(3, 3) (3, 3) (3, 3)]
[(2, 2) (2, 2)]
[(2, 2) (2, 2)]
[(1, 1)]
[(1, 1)]
[]
[]
[]
[]
[(1, 1)]
[(1, 1)]
[(2, 2) (2, 2)]
[(2, 2) (2, 2)]
[(3, 3) (3, 3) (3, 3)]


<a id='savingnodes'></a>

## Saving unrooted nodes


In [105]:
# In this last section we made an Array, a Poitlist, and a PointListArray

ar

Array( A stack of 5 Arrays with 2-dimensions and shape (3, 4), called 'my_stack_array'

       The labels are:
           a
           b
           c
           d
           e


       The Array dimensions are:
           x = [0,3,...] nm
           y = [0.0,0.6,...] km
)

In [106]:
pointlist

PointList( A length 11 PointList called 'my_pointlist',
           with 2 fields:

           x   (int64)
           y   (float64)
)

In [107]:
pointlistarray

PointListArray( A shape (5, 6) PointListArray called 'my_pointlistarray',
                with 2 fields:

                x   (int64)
                y   (int64)
)

In [108]:
# None of them have been placed in a tree, so none have roots

print( ar.root is None )
print( pointlist.root is None )
print( pointlistarray.root is None )

True
True
True


In [109]:
# We can't add objects to one another with the .tree method if they're not rooted

try:
    ar.tree(pointlist)
    
except AssertionError:
    print("No can do, bucko")

No can do, bucko


In [110]:
# If you only want to save a single, unrooted object, you can pass that object to the save function.
# A root with the same name as the object will be created, the object placed inside of it, and
# the whole thing saved to the file.  That root will then be removed, so the runtime object - i.e. the
# node you just saved - isn't modified.

clean()
emd.save(fp, ar)
loaded_data = emd.read(fp)
loaded_data

Root( A Node called 'my_stack_array', containing the following top-level objects in its tree:

          my_stack_array 		 (Array)
)

In [111]:
loaded_data.tree('my_stack_array')

Array( A stack of 5 Arrays with 2-dimensions and shape (3, 4), called 'my_stack_array'

       The labels are:
           a
           b
           c
           d
           e


       The Array dimensions are:
           x = [0,3,...] nm
           y = [0.0,0.6,...] km
)

In [112]:
ar.root is None

True

In [113]:
# Alternatively, you can make a new root and name it whatever you like. This way you can also
# put as many objects as you like inside it, or build a tree of objects as in section 2.1

clean()

root = emd.Root( name='baby_groot' )
root.tree(ar)
root.tree(pointlist)
root.tree(pointlistarray)

emd.save(fp,root)
loaded_data = emd.read(fp)
loaded_data.tree()

100%|█████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 2220.07it/s]
Reading PointListArray: 100%|██████████████████████████████████████| 30/30 [00:00<00:00, 3162.17PointList/s]

/
|--my_pointlist
|--my_pointlistarray
|--my_stack_array





<a id='trees'></a>


# 4. Trees

The tree structure is the core of the `emd` package and format.  In this section we first look at building, saving, and restoring trees from file.  We look at basic tree methods for showing and retrieving data in trees.  Then we look at methods for modifying existing trees by cutting off branches, or grafting branches from one tree to another.  Root metadata is handled specially, and is discussed.

In [114]:
# All the functionality shown in this section is accessed via the `.tree` method.
# The usage of `.tree` can be specified by passing a keyword argument in (show, add, get, cut, graft).
# In the first three cases, the keyword can be omitted, as long as the data passed has the correct type.

node.tree?

Object `node.tree` not found.


<a id='buildingtrees'></a>


## 4.1 Building trees

Let's make a tree with all these data types, write it, read it, and confirm that it worked.

In [115]:
clean()

In [116]:
# make some data

ar1 = emd.Array(
    name = 'ar1',
    data = np.arange(12).reshape((3,4))
)
ar2 = emd.Array(
    name = 'ar2',
    data = np.arange(24).reshape((3,4,2)),
    slicelabels = ('a','b')
)
node = emd.Node(
    name = 'immanode'
)
pointlist1 = emd.PointList(
    name = 'pointlist1',
    data = np.ones(
        5,
        dtype = [('rx',int),('ry',int)]
    )
)
pointlist2 = emd.PointList(
    name = 'pointlist2',
    data = np.zeros(
        6,
        dtype = [('qx',float),('qy',float)]
    )
)
pointlistarray = emd.PointListArray(
    name = 'pointlistarray',
    shape = (3,4),
    dtype = [('yes',bool),('no',bool)]
)
for rx in range(pointlistarray.shape[0]):
    for ry in range(pointlistarray.shape[1]):
        pointlistarray[rx,ry].add(
            np.ones(
                int(ry + rx*pointlistarray.shape[1]),
                dtype = [('yes',bool),('no',bool)]
            )
        )
        
# add some metadata
pointlist1.metadata = emd.Metadata(
    name = 'evolution',
    data = {
        'pikachu' : 'raichi',
        'thunderstone' : True
    }
)
pointlistarray.metadata = emd.Metadata(
    name = 'is_rodent',
    data = {
        'gerbil' : True,
        'mouse' : True,
        'pikachu' : True,
        'bulbasaur' : 'False'
    }
)

In [117]:
# Make a tree

# start with a Root
root = emd.Root( name='treeoflife' )

# and add data
root.tree(node)
node.tree(pointlistarray)
pointlistarray.tree(pointlist1)
root.tree(ar1)
root.tree(pointlist2)
node.tree(ar2)

# show the tree
root.tree()

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


In [118]:
# save

emd.save(fp,root)

100%|█████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 3017.30it/s]


In [119]:
# load

loaded_data = emd.read(fp)

loaded_data

Reading PointListArray: 100%|██████████████████████████████████████| 12/12 [00:00<00:00, 2423.29PointList/s]


Root( A Node called 'treeoflife', containing the following top-level objects in its tree:

          ar1 		 (Array)
          immanode 		 (Node)
          pointlist2 		 (PointList)
)

In [120]:
loaded_data.tree()

/
|--ar1
|--immanode
|	|--ar2
|	|--pointlistarray
|		|--pointlist1
|--pointlist2


In [121]:
# check that the data is the same

assert(np.array_equal( loaded_data.tree('ar1').data, ar1.data ))
assert(np.array_equal( loaded_data.tree('immanode/ar2').data, ar2.data ))
assert(np.array_equal( loaded_data.tree('immanode/pointlistarray/pointlist1').data, pointlist1.data ))
assert(np.array_equal( loaded_data.tree('pointlist2').data, pointlist2.data ))

In [122]:
# check that the metadata is the same

def check_metadata(obj1, obj2):
    """ asserts equivalence of the metadata in obj1 to obj2. Fails for array-like metadata
    """
    for k in obj1.metadata.keys():
        md_i,md_f = obj1.metadata[k],obj2.metadata[k]
        for k in md_i.keys:
            assert( md_i[k] == md_f[k] )
            
check_metadata(
    loaded_data.tree('immanode/pointlistarray/pointlist1'),
    pointlist1
)
check_metadata(
    loaded_data.tree('immanode/pointlistarray'),
    pointlistarray
)

<a id='showtree'></a>


## 4.2 Show and retrieve data

We've aleady done most of these operations above!

In [123]:
# show the tree, from root

root.tree()

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


In [124]:
# show the tree, from some node

node.tree()

/
|--pointlistarray
|	|--pointlist1
|--ar2


In [125]:
# show the whole tree from root, using some node

node.tree(show=True)
print()
print()
node.tree(True)

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


In [126]:
# get some node from root

data = root.tree('immanode/pointlistarray/pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

In [127]:
# get some node from another, upstream node

data = pointlistarray.tree('pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

In [128]:
# get some node from another node, using a path referenced to the root

data = pointlistarray.tree('/immanode/pointlistarray/pointlist1')

data

PointList( A length 5 PointList called 'pointlist1',
           with 2 fields:

           rx   (int64)
           ry   (int64)
)

<a id='cutgraft'></a>


## 4.3 Cutting and grafting branches


In this section we'll make a new tree.  We use it first to demonstrate cutting branches off a parent tree to yield some new, smaller tree.  Then we demonstrate grafting a branch from one tree to another.

In [129]:
# make some data

ar3 = emd.Array(
    name = 'ar3',
    data = np.arange(12,22).reshape((5,2))
)
node2 = emd.Node(
    name = 'node2'
)
node3 = emd.Node(
    name = 'node3'
)
pointlist3 = emd.PointList(
    name = 'pointlist3',
    data = np.ones(
        3,
        dtype = [('rx',int),('ry',int)]
    )
)
pointlist4 = emd.PointList(
    name = 'pointlist4',
    data = np.zeros(
        7,
        dtype = [('qx',float),('qy',float)]
    )
)

In [130]:
# make a tree

root2 = emd.Root( name='treeofknowledge')
root2.tree(node2)
node2.tree(ar3)
node2.tree(pointlist3)
pointlist3.tree(node3)
root2.tree(pointlist4)

root2.tree()

/
|--node2
|	|--ar3
|	|--pointlist3
|		|--node3
|--pointlist4


In [131]:
# Cut a branch off of the tree

new_root = pointlist3.tree(cut=True)

new_root

Root( A Node called 'pointlist3', containing the following top-level objects in its tree:

          pointlist3 		 (PointList)
)

In [132]:
# Show the original tree and cut off branch

root2.tree()
print()
new_root.tree()

/
|--node2
|	|--ar3
|--pointlist4

/
|--pointlist3
	|--node3


In [133]:
# Graft a branch from one tree onto another
# let's graft from root2 at `node2` onto root at `pointlist1`

# start by showing the two trees
root2.tree()
print()
root.tree()

/
|--node2
|	|--ar3
|--pointlist4

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|--ar2
|--ar1
|--pointlist2


In [134]:
# perform the graft
node2.graft(pointlist1)

# showing the two trees
root2.tree()
print()
root.tree()

/
|--pointlist4

/
|--immanode
|	|--pointlistarray
|	|	|--pointlist1
|	|		|--node2
|	|			|--ar3
|	|--ar2
|--ar1
|--pointlist2


<a id='rootmetadata'></a>


## 4.4 Cut/graft root metadata options

`emd` defines trees which are *almost* unidirectional: each node knows about and points to only the nodes downstream of itself...*plus*, each node also knows about and points to the root node of its tree.  This is helpful in a number of ways.  One is that metadata living in root is available to every object in that tree.  When cutting or grafting branches, the question then arises: should the root metadata be carried along to the new tree's root, or not?

In [135]:
node.tree?

In [136]:
# make a data trees

def make_trees():

    # roots
    root1 = emd.Root( name='root1' )
    root2 = emd.Root( name='root2' )

    # nodes
    node1 = emd.Node( name = 'node1' )
    node2 = emd.Node( name = 'node2' )
    node3 = emd.Node( name = 'node3' )
    node4 = emd.Node( name = 'node4' )

    # tree 1
    root1.tree(node1)
    node1.tree(node2)
    node2.tree(node3)
    # add root metadata
    root1.metadata = emd.Metadata(
        name = 'metadata1',
        data = {'x':1}
    )
    root1.metadata = emd.Metadata(
        name = 'metadata2',
        data = {'y':2}
    )
    
    # tree 2
    root2.tree(node4)
    # add root metadata
    root2.metadata = emd.Metadata(
        name = 'metadata3',
        data = {'z':3}
    )
    
    return root1,root2

root1,root2 = make_trees()

In [137]:
# show the trees

root1.tree()
print()
root2.tree()

/
|--node1
	|--node2
		|--node3

/
|--node4


<a id='rootmetadatacutting'></a>

### 4.4.1 Cutting

First we'll cut a branch off of the first tree.  When cutting a branch off an existing tree, a new root is created at the base of the new branch.  The old root metadata can be left behind, or pointers to the same metadata can be added to the new root, or copies of the metadata can be placed in the new root.

In [138]:
# Check the existint metadata

root1.metadata.keys()

dict_keys(['metadata1', 'metadata2'])

In [139]:
print(root1.metadata['metadata1'])
print(root1.metadata['metadata2'])

Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
)
Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)


<a id='rootmetadatapointers'></a>

#### 4.4.1.1 add pointers

In [140]:
# Before performing the cut, we'll find the node we want to cut at

# show the tree
root1.tree()

# get the node of interest
target_node = root1.tree('node1/node2')

# show the tree under this node
target_node.tree()

/
|--node1
	|--node2
		|--node3
/
|--node3


In [141]:
# When we cut a branch from a tree, it creates its own new root under the node where we made the cut.
# Here, pointers to the old root metadata are included in the new root metadata

# cut off the branch
new_root = target_node.tree(cut=True)

# show the old and new trees
root1.tree()
print()
new_root.tree()

/
|--node1

/
|--node2
	|--node3


In [142]:
# The old and new root metadata contain the same information

check_metadata(root1, new_root)

In [143]:
# And are in fact the same objects

for key in root1.metadata.keys():
    
    assert( root1.metadata[key] is new_root.metadata[key] )

<a id='rootmetadatacopies'></a>


#### 4.4.1.2 add copies

In [144]:
# make fresh trees

root1,_ = make_trees()

In [145]:
# Find the node we want to cut at

# show the tree
root1.tree()

# get the node of interest
target_node = root1.tree('node1/node2')

# show the tree under this node
target_node.tree()

/
|--node1
	|--node2
		|--node3
/
|--node3


In [146]:
# This time we'll copy the metadata

# cut off the branch
new_root = target_node.tree(cut='copy')

# show the old and new trees
root1.tree()
print()
new_root.tree()

/
|--node1

/
|--node2
	|--node3


In [147]:
# The old and new root metadata contain the same information

for k in root1.metadata.keys():
    md_i,md_f = root1.metadata[k],new_root.metadata[k+"_copy"]
    for k in md_i.keys:
        assert( md_i[k] == md_f[k] )

In [148]:
# But they are *not* the same objects

for k in root1.metadata.keys():
    assert( root1.metadata[k] is not new_root.metadata[k+"_copy"] )

<a id='rootmetadatanometadata'></a>

#### 4.4.1.3 no metadata

In [149]:
# make fresh trees

root1,_ = make_trees()

In [150]:
# Find the node we want to cut at

# show the tree
root1.tree()

# get the node of interest
target_node = root1.tree('node1/node2')

# show the tree under this node
target_node.tree()

/
|--node1
	|--node2
		|--node3
/
|--node3


In [151]:
# Cut the branch, but don't transfer any root metadata

# cut off the branch
new_root = target_node.tree(cut=False)

# show the old and new trees
root1.tree()
print()
new_root.tree()

/
|--node1

/
|--node2
	|--node3


In [152]:
# Check

print(root1.metadata.keys())
print()
print(new_root.metadata.keys())

dict_keys(['metadata1', 'metadata2'])

dict_keys([])


<a id='graft'></a>

### 4.4.2 Grafting

The same options exist when grafting a branch from tree_source to tree_target, however, in this case tree_target may already have metadata of its own.  In this case, adding pointers or copying metadata will work normally if there are no name conflicts between metadata in the source and target trees.  If there is a name conflict, linked or copied source root metadata will overwrite target root metadata of the same name.

In [153]:
# Make the trees

root1,root2 = make_trees()


# and show their contents

root1.tree()
print()
root2.tree()

/
|--node1
	|--node2
		|--node3

/
|--node4


In [154]:
# Examine the metadata

print(root1.metadata)
print()
print(root2.metadata)

{'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)}

{'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          z:   3
)}


In [155]:
# We'll graft from root1 onto root2. So lets add a new Metadata instance to root1,
# that has the same name as the dictionary in root2, but different contents

root1.metadata = emd.Metadata(
    name = 'metadata3',
    data = {
        'thats a horse' : 'of a different color'
    }
)

In [156]:
# Find the node we want to cut at

# show the trees
root1.tree()
print()
root2.tree()
print()
print()

# get the souce and target nodes
source_node = root1.tree('node1/node2')
target_node = root2.tree('node4')


/
|--node1
	|--node2
		|--node3

/
|--node4




In [157]:
# perform the graft

source_node.tree(graft = target_node)

Root( A Node called 'root2', containing the following top-level objects in its tree:

          node4 		 (Node)
)

In [158]:
# show the trees after the graft

root1.tree()
print()
root2.tree()

/
|--node1

/
|--node4
	|--node2
		|--node3


In [159]:
# show the root metadata

print(root1.metadata)
print()
print(root2.metadata)

{'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
), 'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          thats a horse:   of a different color
)}

{'metadata3': Metadata( A Metadata instance called 'metadata3', containing the following fields:

          thats a horse:   of a different color
), 'metadata1': Metadata( A Metadata instance called 'metadata1', containing the following fields:

          x:   1
), 'metadata2': Metadata( A Metadata instance called 'metadata2', containing the following fields:

          y:   2
)}


<a id='graftrootoptions'></a>

#### 4.4.2.1 graft root metadata options

The same options for copying, linking, or discarding root metadata that were available when cutting off a branch are available when grafting, with a little extra syntax.  The default behavior, shown above, is to link the root metadata.  To instead copy the root metadata, use
    
    >>> source_node.tree(graft = (target_node,'copy'))
    
and to avoid transferring root metadata, use

    >>> source_node.tree(graft = (target_node,False))

<a id='emdspec'></a>

# 5. The EMD 1.0 File

In [160]:
clean()

<a id='treesandroots'></a>

## 5.1 Trees and roots

aka `topgroups` in the older parlance, aka homes for `trees`.  Each EMD 1.0 file may contain any number of EMD object trees.  Each tree must begin with a root.  To

## 5.2 Appending new roots

## 5.3 Reading from different roots

# 6. Append and other fancy read / write operations

## 6.1 Read from any node

## 6.2 Read from a single node, read a whole branch, read a branch without it's source node

## 6.3 Append to an existing tree

## 6.4 Append conflict resolution

## 6.5 Append root metadata handling

## 6.6 Storage and re-writes to free up space