# `emdfile` Basic Usage

Some basic examples are given below.  See also the emdfile documentation pages. [[TODO link]]

### Contents

- [Basics](#basics)
  - [Save & read an array](#basics-array)
  - [Save & read a dictionary](#basics-dict)
  - [Save & read several arrays](#basics-array-collection)
  - [Save & read several dictionaries](#basics-dict-collection)
- [Trees](#build-trees)
  - [Build a tree](#build-a-tree)
  - [Inspect a Python tree](#inspect-runtime-tree)
  - [Save a tree](#save-a-tree)
  - [Inspect an HDF5 tree](#inspect-filesystem-tree)
  - [Read from an HDF5 tree](#read-from-a-tree)
- [Metadata](#include-metadata)
  - [Read dictionaries to ``Metadata``](#read-dicts-to-metadata)
  - [Use ``Metadata`` like dictionaries](#read-dicts-to-metadata)
  - [Store various data types](#metadata-dtypes)
- [Nodes](#work-with-nodes)
  - [Nodes have names](#node-names)
  - [Nodes hold arbitrary metadata](#node-metadata)
  - [Nodes have a versatile `.tree` method](#node-trees)
- [Arrays](#arrays-calibrations)
  - [Minimal ``Array`` instantiation](#minimal-array)
  - [``Array`` with calibrations](#calibrated-array)
- [More Data Classes](#more-data-classes)
  - [``PointList``](#pointlists)
  - [``PointListArray``](#pointlistarrays)
- [Append Mode](#append)
  - [Append two EMD trees to one file](#two-trees-one-file)
  - [Append new data into an existing EMD tree](#append-diffmerge-1)
  - [Append-over mode to overwrite data](#append-diffmerge-2)

In [1]:
import emdfile as emd, numpy as np
path = "/Users/Ben_1/Desktop/test-emdfile.h5"

<a name="basics"></a>
## Basics

<a name="basics-array"></a>
#### Save & read an array

In [2]:
ar = np.random.random((5,5))
emd.save(path, ar, mode='o')
_ar = emd.read(path)

print(_ar)
print(_ar.data)

Array( A 2-dimensional array of shape (5, 5) called 'np.array',
       with dimensions:

           dim0 = [0,1,2,...] pixels
           dim1 = [0,1,2,...] pixels
)
[[0.08066823 0.3855435  0.41075189 0.6103807  0.38235884]
 [0.46849205 0.5654167  0.86886856 0.30828458 0.52199845]
 [0.23115325 0.40894516 0.96305373 0.73815512 0.86366473]
 [0.91220756 0.08353876 0.97418133 0.68328141 0.02632076]
 [0.68015992 0.32665775 0.59773418 0.82131054 0.94496314]]


<a name="basics-dict"></a>
#### Save and read a Python dictionary

In [3]:
dic = {'a':1, 'b':2}
emd.save(path, dic, mode='o')
_dic = emd.read(path)

print(_dic)
print(_dic._params)

Metadata( A Metadata instance called 'dictionary', containing the following fields:

          a:   1
          b:   2
)
{'a': 1, 'b': 2}


<a name="basics-array-collection"></a>
#### Save and read several arrays

In [4]:
ar_A = np.array([1,2,3])
ar_B = np.zeros((4,4),dtype=bool)
ar_C = np.zeros((2,3,4),dtype=np.complex64)
emd.save(path, [ar_A,ar_B,ar_C], mode='o')

# and read them again

data = emd.read(path)
_ar_A = data.tree('array_0')
_ar_B = data.tree('array_1')
_ar_C = data.tree('array_2')
print(_ar_A.data)
print(_ar_B.data)
print(_ar_C.data)

[1 2 3]
[[False False False False]
 [False False False False]
 [False False False False]
 [False False False False]]
[[[0.+0.j 0.+0.j 0.+0.j 0.+0.j]
  [0.+0.j 0.+0.j 0.+0.j 0.+0.j]
  [0.+0.j 0.+0.j 0.+0.j 0.+0.j]]

 [[0.+0.j 0.+0.j 0.+0.j 0.+0.j]
  [0.+0.j 0.+0.j 0.+0.j 0.+0.j]
  [0.+0.j 0.+0.j 0.+0.j 0.+0.j]]]


<a name="basics-dict-collection"></a>
#### Save and read several dictionaries

In [5]:
dic_A = {'a':1, 'b':3.14}
dic_B = {'x':True, 'y':None}
dic_C = {'q':np.zeros((2,2)), 'r':(4,5,6)}
emd.save(path, [dic_A,dic_B,dic_C], mode='o')

data = emd.read(path)
_dic_A = data.metadata['dictionary_0']
_dic_B = data.metadata['dictionary_1']
_dic_C = data.metadata['dictionary_2']
print(_dic_A)
print(_dic_B)
print(_dic_C)

Metadata( A Metadata instance called 'dictionary_0', containing the following fields:

          a:   1
          b:   3.14
)
Metadata( A Metadata instance called 'dictionary_1', containing the following fields:

          x:   True
          y:   None
)
Metadata( A Metadata instance called 'dictionary_2', containing the following fields:

          q:   2D-array
          r:   (4, 5, 6)
)


<a name="build-trees"></a>
## Trees

Build trees by adding parent-child relationships between data nodes

<a name="build-a-tree"></a>
#### Build a tree

In [6]:
R = emd.Root()
A = emd.Node('A')
B = emd.Node('B')
C = emd.Node('C')

R.tree(A)
R.tree(B)
B.tree(C)

<a name="inspect-runtime-tree"></a>
#### Inspect a runtime (Python) tree

In [7]:
R.tree()

/
|---A
|---B
    |---C


<a name="save-a-tree"></a>
#### Save a tree

In [8]:
emd.save(path, R, mode='o')

<a name="inspect-filesystem-tree"></a>
#### Inspect a filesystem (HDF5) tree

In [9]:
emd.printtree(path)

/
|---root
    |---A
    |---B
        |---C




<a name="read-from-a-tree"></a>
#### Read from an HDF5 tree

In [10]:
data_0 = emd.read(path)
data_1 = emd.read(path, emdpath='root/A')                  # reads A
data_2 = emd.read(path, emdpath='root/B')                  # reads B---C
data_3 = emd.read(path, emdpath='root/B', tree=False)      # reads B only

print(data_0)
print(data_1)
print(data_2)
print(data_3)

Root( A Node called 'root', containing the following top-level objects in its tree:

          A                        	 (Node)
          B                        	 (Node)
)
Node( A Node called 'A', containing the following top-level objects in its tree:

)
Node( A Node called 'B', containing the following top-level objects in its tree:

          C                        	 (Node)
)
Node( A Node called 'B', containing the following top-level objects in its tree:

)


<a name="include-metadata"></a>
## Metadata

Store data in nested key-value dictionary-like structures.  Many ``Metadata`` instances can be stored in each ``Node``.

<a name="read-dicts-to-metadata"></a>
#### Read dictionaries to ``Metadata``

In [11]:
# Saving then reading a dictionary returns a Metadata instance

emd.save(path, {'a':1,'b':2}, mode='o')
x = emd.read(path)
print(x)

Metadata( A Metadata instance called 'dictionary', containing the following fields:

          a:   1
          b:   2
)


<a name="read-dicts-to-metadata"></a>
#### Use ``Metadata`` like dictionaries

In [12]:
# Return a value

x['a']

1

In [13]:
# Assign a value

x['c'] = 3

print(x)

Metadata( A Metadata instance called 'dictionary', containing the following fields:

          a:   1
          b:   2
          c:   3
)


<a name="metadata-dtypes"></a>
#### ``Metadata`` stores various data types

In [14]:
m = emd.Metadata( name='my_metadata' )
m['x'] = True
m['y'] = np.random.random((3,4,5))
m['z'] = {
    'alpha' : None,
    'beta' : {
        'gamma' : [10,11,12]
    }
}
emd.save(path, m, mode='o')

In [15]:
_m = emd.read(path)
print(_m)

Metadata( A Metadata instance called 'my_metadata', containing the following fields:

          x:   True
          y:   3D-array
          z:   {'alpha': None, 'beta': {'gamma': [10, 11, 12]}}
)


<a name="work-with-nodes"></a>
## Nodes

Every ``emdfile`` class other than ``Metadata`` inherits from ``Node``, enabling tree operations, arbitrary metadata storage, and reading & writing.

<a name="node-names"></a>
#### Nodes have names

In [16]:
node = emd.Node( name='my_node' )
print(node.name)

my_node


<a name="node-metadata"></a>
#### Nodes hold arbitrary metadata

In [17]:
node.metadata = emd.Metadata('md1',{'x':1,'y':2})
node.metadata = emd.Metadata('md2',{'a':1,'b':{'c':2,'d':3}})

In [18]:
node.metadata

{'md1': Metadata( A Metadata instance called 'md1', containing the following fields:
 
           x:   1
           y:   2
 ),
 'md2': Metadata( A Metadata instance called 'md2', containing the following fields:
 
           a:   1
           b:   {'c': 2, 'd': 3}
 )}

In [19]:
node.metadata['md1']

Metadata( A Metadata instance called 'md1', containing the following fields:

          x:   1
          y:   2
)

<a name="node-trees"></a>
#### Nodes have a `.tree` method

``.tree`` is versatile, enabling building, displaying, and modification of trees.

In [20]:
node.tree?

[0;31mSignature:[0m [0mnode[0m[0;34m.[0m[0mtree[0m[0;34m([0m[0marg[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Usages -

    >>> node.tree()                # show the tree downstream of this node
    >>> node.tree(show=True)       # show the full tree from the root node
    >>> node.tree(show=False)      # show from current node
    >>> node.tree('path/to/node')  # return the node at the chosen location
    >>> node.tree('/path/to/node') # specifiy the location starting from root
    >>> node.tree(node)            # add a child node; must be a Node instance
    >>> node.tree(cut=True)        # remove & return a branch; include root metadata
    >>> node.tree(cut=False)       # discard root metadata
    >>> node.tree(cut='copy')      # copy root metadata
    >>> node.tree(graft=node)      # remove & graft a branch; add new root metadata
    >>> node.tree(graft=(node,False))   # discard root

<a name="arrays-calibrations"></a>
## Arrays

Each has some built-in calibrating metadata - below are examples with ``Array``s.  

<a name="minimal-array"></a>
#### Minimal ``Array`` instatiation

In [21]:
array = emd.Array(np.random.random((3,3)))

<a name="calibrated-array"></a>
#### ``Array`` with calibrations

In [22]:
ar = emd.Array(
    np.ones((20,40,1000)),
    name = '3ddatacube',
    units = 'intensity',
    dims = [
        [0,5],
        [0,5],
        [0,0.02],
    ],
    dim_units = [
        'nm',
        'nm',
        'eV'
    ],
    dim_names = [
        'x',
        'y',
        'E',
    ],
)

print(ar)

Array( A 3-dimensional array of shape (20, 40, 1000) called '3ddatacube',
       with dimensions:

           x = [0,5,10,...] nm
           y = [0,5,10,...] nm
           E = [0.0,0.02,0.04,...] eV
)


In [23]:
#print(ar.dims)

In [24]:
ar.get_dim(0)

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80,
       85, 90, 95])

In [25]:
ar.set_dim?
# ar.set_dim_units?
# ar.set_dim_name?

[0;31mSignature:[0m
[0mar[0m[0;34m.[0m[0mset_dim[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mn[0m[0;34m:[0m [0mint[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdim[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mlist[0m[0;34m,[0m [0mnumpy[0m[0;34m.[0m[0mndarray[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0munits[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Sets the n'th dim vector, using ``dim`` as described in the Array
documentation. If ``units`` and/or ``name`` are passed, sets these
values for the n'th dim vector.

Parameters
----------
n : int
    specifies which dim vector
dim : list or array
    length must be either 2, or match the length of the n'th axis
units 

<a name="more-data-classes"></a>
## More data classes

Normal data-holding classes include ``Array``, ``PointList``, and ``PointListArray``.

<a name="pointlists"></a>
#### ``PointList``

In [26]:
emd.PointList?

[0;31mInit signature:[0m [0memd[0m[0;34m.[0m[0mPointList[0m[0;34m([0m[0mdata[0m[0;34m:[0m [0mnumpy[0m[0;34m.[0m[0mndarray[0m[0;34m,[0m [0mname[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;34m'pointlist'[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
PointList instances represent sets of points in some M dimensional space.
Each dimension is given by a named field and has its own dtype. See also
the documentation for `numpy structured arrays <https://numpy.org/doc/stable/user/basics.rec.html>`_.

.. topic:: Instantiation

    For some numpy structured array like

        >>> x = np.ones(
        >>>     10,
        >>>     dtype = [('x',float),('y',int)]
        >>> )

    then calling

        >>> pl = PointList(
        >>>     x,
        >>>     name = 'my_pointlist',
        >>> )

    will create a pointlist of length 10 with fields 'x' and 'y'.

.. topic:: Data Access

    The data can be accessed 

<a name="pointlistarrays"></a>
#### ``PointListArray``

In [27]:
emd.PointListArray?

[0;31mInit signature:[0m [0memd[0m[0;34m.[0m[0mPointListArray[0m[0;34m([0m[0mdtype[0m[0;34m,[0m [0mshape[0m[0;34m,[0m [0mname[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;34m'pointlistarray'[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
A PointListArray instance comprises a 2D grid of PointLists, each sharing a
single dtype and set of fields, and each having any variable length. It
therefore represents a "ragged array" in 2+1 dimensions, i.e. with two
dimensions of a fixed shape and one of variable length, embedded in an
M dimensional space for PointLists with M fields.

.. topic:: Instantiation

    Calling

        >>> pla = PointListArray(
        >>>     [('a',float),('b',float)],
        >>>     (5,5)
        >>> )

    will create a 5x5 PointListArray instance with fields 'a' and 'b', and

        >>> dt = np.dtype([('a',float,('b',float)])
        >>> for x in range(5):
        >>>     for y in ran

<a name="append"></a>
## Append Mode

<a name="two-trees-one-file"></a>
#### Append two EMD trees to one file

In [28]:
# Make the first tree, then save it

root1 = emd.Root('root1')
root1.tree( emd.Node('A') )
root1.tree( emd.Node('B') )
emd.save(path, root1, mode='o')

emd.printtree(path)

/
|---root1
    |---A
    |---B




In [29]:
# Make the second tree, then append it

root2 = emd.Root('root2')
root2.tree( emd.Node('C') )
root2.tree( emd.Node('D') )
emd.save(path, root2, mode='a') # Note mode='a'

emd.printtree(path)

/
|---root1
|   |---A
|   |---B
|---root2
    |---C
    |---D




<a name="append-diffmerge-1"></a>
#### Append new data into an existing EMD tree

In [30]:
# Make a simple tree
root = emd.Root( 'my_root' )
ar1 = emd.Array(np.ones((5,5)),'array1')
root.tree(ar1)

# Save it
emd.save(path, root, mode='o')

# Print the runtime and filesystem trees
root.tree()
emd.printtree(path)

/
|---array1
/
|---my_root
    |---array1




In [31]:
# Make a new data node
ar2 = emd.Array(np.zeros((3,3,3)),'array2')

# Add it to the runtime (Python) tree
ar1.tree(ar2)

# Inspect it
root.tree()

/
|---array1
    |---array2


In [32]:
# Append to the file
emd.save(path, root, mode='a') # Note the mode

# Inspect the HDF5 file
emd.printtree(path)

/
|---my_root
    |---array1
        |---array2




<a name="append-diffmerge-2"></a>
#### Append-over mode to overwrite data

In [33]:
# Modify the first array's data

print(ar1.data)
ar1.data += 1
print(ar1.data)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]]


In [34]:
# Notice that the data in array 1 is now different in the runtime object and on the filesystem

_ar1 = emd.read(path, emdpath='my_root/array1')
print(_ar1)
print(_ar1.data)
print(np.array_equal(ar1.data, _ar1.data))

Array( A 2-dimensional array of shape (5, 5) called 'array1',
       with dimensions:

           dim0 = [0,1,2,...] pixels
           dim1 = [0,1,2,...] pixels
)
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
False


In [35]:
# A normal append-mode save will *NOT* update the data

# Perform a 'normal' append
emd.save(path, root, mode='a')

# Inspect the file tree - it should look the same
emd.printtree(path)

# Notice that the data in Python and on the filesystem are still different
_ar1 = emd.read(path, emdpath='my_root/array1')
print(_ar1)
print(_ar1.data)
print(np.array_equal(ar1.data, _ar1.data))

/
|---my_root
    |---array1
        |---array2


Array( A 2-dimensional array of shape (5, 5) called 'array1',
       with dimensions:

           dim0 = [0,1,2,...] pixels
           dim1 = [0,1,2,...] pixels
)
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
False


In [36]:
# An "append-over" mode save *WILL* update the data

# Perform an appendover operation
emd.save(path, root, mode='ao')

# Inspect the file tree - it should look the same
emd.printtree(path)

# Now the data in Python and on the filesystem match
_ar1 = emd.read(path, emdpath='my_root/array1')
print(_ar1)
print(_ar1.data)
print(np.array_equal(ar1.data, _ar1.data))

/
|---my_root
    |---array1
        |---array2


Array( A 2-dimensional array of shape (5, 5) called 'array1',
       with dimensions:

           dim0 = [0,1,2,...] pixels
           dim1 = [0,1,2,...] pixels
)
[[2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]]
True


### Note

that some caution may be in order when using append-over mode, particularly if you're dealing with larger data blocks.  See the discussion in the emdfile doc pages. ((TODO - link))