Skip to content

Commit

Permalink
Rename FINF to ASDF
Browse files Browse the repository at this point in the history
  • Loading branch information
mdboom committed Jul 29, 2014
1 parent 3ca4383 commit 48c086c
Show file tree
Hide file tree
Showing 70 changed files with 535 additions and 544 deletions.
6 changes: 3 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[submodule "astropy_helpers"]
path = astropy_helpers
url = https://github.com/astropy/astropy-helpers.git
[submodule "finf-standard"]
path = finf-standard
url = git://github.com/spacetelescope/finf-standard.git
[submodule "asdf-standard"]
path = asdf-standard
url = https://github.com/spacetelescope/asdf-standard.git
20 changes: 5 additions & 15 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,21 @@
PyFINF
pyasdf
======

Python library for reading and writing FINF files.
Python library for reading and writing ASDF files.

FINF (FINF is not FITS) is a next generation interchange format for
astronomical data.
Advanced Scientific Data Format (ASDF) is a next generation
interchange format for scientific data.

Installation
------------
This package uses a ``git submodule`` to get the schema information
from the FINF standard itself. Therefore, you need to run the
following once::

git submodule init

and the every time you update the repository::

git submodule update

[We'll try to automate this in a future revision].

To install::

python setup.py install

Testing
-------

To run the unit tests::

python setup.py test
1 change: 1 addition & 0 deletions asdf-standard
Submodule asdf-standard added at ddd8b1
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PyFINF Documentation
pyasdf Documentation
====================

.. toctree::
:maxdepth: 2

pyfinf/index.rst
pyasdf/index.rst
130 changes: 65 additions & 65 deletions docs/pyfinf/examples.rst → docs/pyasdf/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,46 +4,46 @@ Examples
Hello World
-----------

In it's simplest form, Finf is a way of saving nested data structures
In it's simplest form, ASDF is a way of saving nested data structures
to YAML. Here we save a dictionary with the key/value pair ``'hello':
'world'``.

.. runcode::

from pyfinf import FinfFile
from pyasdf import AsdfFile

# Make the tree structure, and create a FinfFile from it.
# Make the tree structure, and create a AsdfFile from it.
tree = {'hello': 'world'}
ff = FinfFile(tree)
ff = AsdfFile(tree)

# You can also make the FinfFile first, and modify its tree directly:
ff = FinfFile()
# You can also make the AsdfFile first, and modify its tree directly:
ff = AsdfFile()
ff.tree['hello'] = 'world'

# Use the `with` construct so the file is automatically closed
with ff.write_to("test.finf"):
with ff.write_to("test.asdf"):
pass

.. finf:: test.finf
.. asdf:: test.asdf

Saving arrays
-------------

Beyond the basic data types of dictionaries, lists, strings and
numbers, the most important thing Finf can save is arrays. It's as
numbers, the most important thing ASDF can save is arrays. It's as
simple as putting a Numpy array somewhere in the tree. Here, we save
an 8x8 array of random floating-point numbers. Note that the YAML
part contains information about the structure (size and data type) of
the array, but the actual array content is in a binary block.

.. runcode::

from pyfinf import FinfFile
from pyasdf import AsdfFile
import numpy as np

tree = {'my_array': np.random.rand(8, 8)}
ff = FinfFile(tree)
with ff.write_to("test.finf"):
ff = AsdfFile(tree)
with ff.write_to("test.asdf"):
pass

.. note::
Expand All @@ -54,44 +54,44 @@ the array, but the actual array content is in a binary block.
page.


.. finf:: test.finf
.. asdf:: test.asdf

Schema validation
-----------------

In the current draft of the FINF schema, there are very few elements
In the current draft of the ASDF schema, there are very few elements
defined at the top-level -- for the most part, the top-level can
contain any ad hoc elements. One of the few specified elements is
``data``: it must be an array, and is used to specify the "main" data
content (for some definition of "main") so that tools that merely want
to view or preview the FINF file have a standard location to find the
to view or preview the ASDF file have a standard location to find the
most interesting data. If you set this to anything but an array,
pyfinf will complain::
pyasdf will complain::

>>> from pyfinf import FinfFile
>>> from pyasdf import AsdfFile
>>> tree = {'data': 'Not an array'}
>>> FinfFile(tree)
>>> AsdfFile(tree)
Traceback (most recent call last):
...
ValidationError: mismatched tags, wanted
'tag:stsci.edu:finf/0.1.0/core/ndarray', got
'tag:stsci.edu:asdf/0.1.0/core/ndarray', got
'tag:yaml.org,2002:str'
...

This validation happens only when a `FinfFile` is instantiated, read
This validation happens only when a `AsdfFile` is instantiated, read
or saved, so it's still possible to get the tree into an invalid
intermediate state::

>>> from pyfinf import FinfFile
>>> ff = FinfFile()
>>> from pyasdf import AsdfFile
>>> ff = AsdfFile()
>>> ff.tree['data'] = 'Not an array'
>>> # The FINF file is now invalid, but pyfinf will tell us when
>>> # The ASDF file is now invalid, but pyasdf will tell us when
>>> # we write it out.
>>> ff.write_to('test.finf')
>>> ff.write_to('test.asdf')
Traceback (most recent call last):
...
ValidationError: mismatched tags, wanted
'tag:stsci.edu:finf/0.1.0/core/ndarray', got
'tag:stsci.edu:asdf/0.1.0/core/ndarray', got
'tag:yaml.org,2002:str'
...

Expand All @@ -105,7 +105,7 @@ data being saved.

.. runcode::

from pyfinf import FinfFile
from pyasdf import AsdfFile
import numpy as np

my_array = np.random.rand(8, 8)
Expand All @@ -114,121 +114,121 @@ data being saved.
'my_array': my_array,
'subset': subset
}
ff = FinfFile(tree)
with ff.write_to("test.finf"):
ff = AsdfFile(tree)
with ff.write_to("test.asdf"):
pass

.. finf:: test.finf
.. asdf:: test.asdf


Saving inline arrays
--------------------

For these sort of small arrays, you may not care about the efficiency
of a binary representation and want to just save the content directly
in the YAML tree. The `~pyfinf.FinfFile.set_block_type` method
in the YAML tree. The `~pyasdf.AsdfFile.set_block_type` method
can be used to set the type of block of the associated data, either
``internal``, ``external`` or ``inline``.

.. runcode::

from pyfinf import FinfFile
from pyasdf import AsdfFile
import numpy as np

my_array = np.random.rand(8, 8)
tree = {'my_array': my_array}
ff = FinfFile(tree)
ff = AsdfFile(tree)
ff.set_block_type(my_array, 'inline')
with ff.write_to("test.finf"):
with ff.write_to("test.asdf"):
pass

.. finf:: test.finf
.. asdf:: test.asdf

Saving external arrays
----------------------

For various reasons discussed in the "Exploded Form" section of the
FINF specification, you may want to save the data in an external
ASDF specification, you may want to save the data in an external
block.

.. runcode::

from pyfinf import FinfFile
from pyasdf import AsdfFile
import numpy as np

my_array = np.random.rand(8, 8)
tree = {'my_array': my_array}
ff = FinfFile(tree)
ff = AsdfFile(tree)
ff.set_block_type(my_array, 'external')
with ff.write_to("test.finf"):
with ff.write_to("test.asdf"):
pass

.. finf:: test.finf
.. asdf:: test.asdf

.. finf:: test0000.finf
.. asdf:: test0000.asdf

Streaming array data
--------------------

In certain scenarios, you may want to stream data to disk, rather than
writing an entire array of data at once. For example, it may not be
possible to fit the entire array in memory, or you may want to save
data from a device as it comes in to prevent loss. The FINF standard
data from a device as it comes in to prevent loss. The ASDF standard
allows exactly one streaming block per file where the size of the
block isn't included in the block header, but instead is implicitly
determined to include all of the remaining contents of the file. By
definition, it must be the last block in the file.

To use streaming, rather than including a Numpy array object in the
tree, you include a `pyfinf.Stream` object which sets up the structure
tree, you include a `pyasdf.Stream` object which sets up the structure
of the streamed data, but will not write out the actual content. The
`~pyfinf.FinfFile.write_to_stream` method is then later used to
`~pyasdf.AsdfFile.write_to_stream` method is then later used to
manually write out the binary data.

.. runcode::

from pyfinf import FinfFile, Stream
from pyasdf import AsdfFile, Stream
import numpy as np

tree = {
# Each "row" of data will have 128 entries.
'my_stream': Stream([128], np.float64)
}

ff = FinfFile(tree)
with ff.write_to('test.finf'):
ff = AsdfFile(tree)
with ff.write_to('test.asdf'):
# Write 100 rows of data, one row at a time.
# write_to_stream expects the raw binary bytes, not an array,
# so we use `tostring()`
for i in range(100):
ff.write_to_stream(np.array([i] * 128, np.float64).tostring())

.. finf:: test.finf
.. asdf:: test.asdf

References
----------

FINF files may reference items in the tree in other FINF files. The
ASDF files may reference items in the tree in other ASDF files. The
syntax used in the file for this is called "JSON Pointer", but the
Python programmer can largely ignore that.

First, we'll create a FINF file with a couple of arrays in it:
First, we'll create a ASDF file with a couple of arrays in it:

.. runcode::

from pyfinf import FinfFile
from pyasdf import AsdfFile
import numpy as np

tree = {
'a': np.arange(0, 10),
'b': np.arange(10, 20)
}

target = FinfFile(tree)
with target.write_to('target.finf'):
target = AsdfFile(tree)
with target.write_to('target.asdf'):
pass

.. finf:: target.finf
.. asdf:: target.asdf

Then we will reference those arrays in a couple of different ways.
First, we'll load the source file in Python and use the
Expand All @@ -239,46 +239,46 @@ to the target file.

.. runcode::

ff = FinfFile()
ff = AsdfFile()

with FinfFile.read('target.finf') as target:
with AsdfFile.read('target.asdf') as target:
ff.tree['my_ref_a'] = target.make_reference(['a'])

ff.tree['my_ref_b'] = {'$ref': 'target.finf#b'}
ff.tree['my_ref_b'] = {'$ref': 'target.asdf#b'}

with ff.write_to('source.finf'):
with ff.write_to('source.asdf'):
pass

.. finf:: source.finf
.. asdf:: source.asdf

Calling `~pyfinf.FinfFile.find_references` will look up all of the
Calling `~pyasdf.AsdfFile.find_references` will look up all of the
references so they can be used as if they were local to the tree. It
doesn't actually move any of the data, and keeps the references as
references.

.. runcode::

ff = FinfFile.read('source.finf')
ff = AsdfFile.read('source.asdf')
ff.find_references()
assert ff.tree['my_ref_b'].shape == (10,)

On the other hand, calling `~pyfinf.FinfFile.resolve_references`
On the other hand, calling `~pyasdf.AsdfFile.resolve_references`
places all of the referenced content directly in the tree, so when we
write it out again, all of the external references are gone, with the
literal content in its place.

.. runcode::

ff = FinfFile.read('source.finf')
ff = AsdfFile.read('source.asdf')
ff.resolve_references()
with FinfFile(ff).write_to('resolved.finf'):
with AsdfFile(ff).write_to('resolved.asdf'):
pass

.. finf:: resolved.finf
.. asdf:: resolved.asdf

A similar feature provided by YAML, anchors and aliases, also provides
a way to support references within the same file. These are supported
by pyfinf, however the JSON Pointer approach is generally favored because:
by pyasdf, however the JSON Pointer approach is generally favored because:

- It is possible to reference elements in another file

Expand Down
Loading

0 comments on commit 48c086c

Please sign in to comment.