Skip to content

Future Data Storage

Chris Meyer edited this page Apr 5, 2024 · 1 revision

This page describes a feature planned for Nion Swift 0.15.

Organization

The application provides user profiles. Each profile has a list of projects. Each project is represented by an index file and subfolders or is a self contained file such as HDF5. Projects can be designated as application-wide and they will appear in all profiles. Read only projects can also be provided by a package.

Items within a project can be organized into collections. Collections can be regular or smart. Items can added/removed from a regular collection. A smart collection uses a filter to control its contents. Collections can be associated with the application, in which case they will be available to all profiles, or associated with a profile, in which case they will be specific to the profile.

User Interface

The user activates a profile and is presented with panels for projects, collections, filters, and data.

The projects panel shows all projects in the profile. The user can add and remove projects at any time.

The projects panel will display badges for the following conditions: needs update, missing items, incomplete (references to items outside of itself), missing, unmounted.

A special Working project is required to put acquisition results and other items that don't get stored directly in another project. The Working project can either be configured to be application-wide or specific to a profile. The user is allowed to specify a location (so as to put in on an SSD drive, for instance) by using a menu item such as Use as Working.

A special Trash project may exist too.

The folder in which a project is stored establishes the hierarchy of projects. A future extension may use tags or another mechanism to establish an alternative view/hierarchy.

The collections panel shows the collections for profile. Collections can be created and deleted. Deleting a collection will not delete the items within the collection. The user can add and remove items from a regular collection. The user can delete underlying items from both regular and smart collections.

The filter associated with a smart collection can be edited.

A special All collection and a special Current Session are predefined smart collections.

A future extension may allow a hierarchy of collections.

When projects or collections are selected and have focus, the inspector provides the ability to change attributes about the selection.

The filter panel allows the user to specify a filter. A filter can be saved permanently as a smart collection. The filter panel includes an interface for text search, rating and flag search, and can be expanded with additional criteria.

The data panel shows the items in the selected project(s) or collection(s), filtered by the filter described in the filter section. Items may include data items, display items, connections, data structures, and computations.

The user can move items between projects by selecting a project and dragging the desired items to a different project. The user can copy items by snapshot or duplicate and then dragging the new item to the desired target.

The user can add items to a regular collection by selecting the desired items and dragging them to a regular collection. Dragging items from a collection to a project is not allowed.

The user should be able to determine where an item is stored, whether it be an individual file or as part of a self-contained project. A menu item to Show in Project; and Show in Finder/Explorer is provided.

  • Libraries

    • Working
    • Project 1
    • Project 2
  • Collections

    • Current Session
    • Collection 1
    • Smart Collection 2
  • Filter

    • By text
  • Data

    • Spectrum 32
    • Image 16

Startup

When the application launches, it reads its preferences from the Application Data folder. The preferences file stores the current profile (by uuid). The application then reads profiles from Application Data folder and activates the current profile. Each profile is stored in its own file and contains a list of project files. The project files are loaded and the document model (library) merges all project items into single lists.

Profiles

A profile stores project references, workspaces, and collections.

Projects

A project stores data items, display items, connections, data structures, and computations.

A project reference may be a project index file, which describes how to find or generate the data items (on the web or in a subfolder, for instance), and directly stores display items, connections, data structures, and computations. It may also store metadata about the data items if the data item files cannot store the information directly or are read-only.

A project reference may also be a self-contained project file, which stores data items, connections, data structures, and computations directly in the file. An example of this would be an HDF5 file.

A project index file may be configured to recursively or non-recursively load all files matching its data item file criteria starting from its folder or from an explicitly listed folder, or may explicitly list specific absolute or relative files.

A project index file may contain additional metadata about individual data items if the data item file format is read-only or otherwise cannot store the required information. In this case, a UUID will be assigned and associated with a file path.

Caching

TODO: where do cached previews and other data go?

Switching Profiles

  • TODO: stop all acquisition
  • TODO: stop all computations
  • TODO: unload all projects
  • TODO: load new projects
  • TODO: establish data item references

HDF5 Files

There are several ways in which HDF5 files may be used.

A project index file may place HDF5 files representing data items within its folder.

A project index file may reference HDF5 read-only files. In this case metadata about individual items within the HDF5 file are stored in the project index file.

A project itself itself may be stored in an HDF5 file with the project data items and the other items together.

Importing

A user may import a folder, recursively or non-recursively. They can decide which file types are valid input types and also decide the output types for each data class.

Data files can be imported by converting to a native format and copying into a read/write project. The file becomes managed.

Data files can be also be imported by copying the original file into the project as read-only or read-write depending on file format and user preference. The file becomes managed.

Data files can be also be imported by referencing the original file as read-only or read-write, depending on file format and user preference. The file becomes referenced.

Data files have associated data item info (data description, metadata) that is stored separately for non-native formats that don't already contain it. The data item info is always managed.

Data files may have multiple data items associated with the single file (HDF5 x multiple data sets).

Data can be exported to various formats, but to retain all info such as data description, metadata, computations, relationships, displays and provenance, only the native HDF5-based project item will suffice.

Exporting data can be a copy or a move operation.

When importing a HDF5-based project, user has the option of importing a read-only frozen reference, or importing by recreating the data in native format and become managed, severing ties to the original project.

Importing folders; exporting folders. Watched folders.

Project Properties

All projects have properties:

  • Locked vs unlocked (no data can be overwritten)
  • Writeable vs read-only (metadata can be written)
  • Available vs missing (disk may be offline)
  • Loaded vs unloaded (project may be tracked but explicitly not loaded)

A project index file has properties:

  • Data folder locations (absolute or the project containing folder if None)
  • Read file types
  • Write file types (per data class: 1D, 2D, SI, 4d, EELS spectra, etc.)

The project panel also tracks whether the project is active when building the visible display items.

Data Item Discovery

Data item discovery is specific to the project type.

For a project index file, data item discovery is performed by scanning through the directories and looking for files matching the project criteria for being a data item.

Ideally, this can happen asynchronously so that the UI is available almost immediately.

To happen asynchronously, a requirement is that projects need to be able to gracefully connect dependent and source objects as they become available.

Duplicated items are noted, but only the first one encountered is active. The ordering is arbitrary. The only way to resolve duplicated item issues is to ensure only one item is loaded.

Sanity Checks

The loader should have sanity checks to avoid trying to load huge files that were improperly written.

The loader should also have other sanity checks to avoid cases where the application cannot launch due to a corrupt project.

Versioning

Both projects and data files are versioned.

Projects that are older than the current version are displayed, but not loaded. The user can explicitly upgrade them.

Data files within projects that are older than the current version are not loaded. The user can explicitly upgrade them.

Projects or data files that are newer than the current version are not loaded or displayed.

If the working project is older or newer than the current version, a new working project is created and used.

Migrating from Older File Layouts

There are two main scenarios involving older file layouts.

For libraries older than version 12, the library file is named without a version number.

For libraries version 12 or newer, the library file is named with a version number.

The migration may need to read data files from version 14 and older.

The version folders are loaded starting with the latest version folder and working backwards.

For any version folder, if the data item has already been loaded, as determined by uuid, it is ignored and log message is printed.

For the most recent version folder, if the data item version is older than the latest version, it is migrated to the latest version and rewritten to disk and a log message is printed.

Old library files and folders are removed during upgrade. The user is warned.

Exporting

The user can create a new project and move or copy items into it.

The user can also export selected items to a new project. When exporting rather than just moving or copying into a new project, items are not given a new UUID. It is up to the user to manage the duplication.

New Items

New items that are created as a result of processing are placed into the same project from which their first data item is located. The user must move the item to a new project if desired. If the same project is read-only, then the new item is placed into the Working project.

Acquisition always places items into the Working project.

If the user specifies a target project, then new items that would otherwise go into the Working project will go into the target project.

Implementation Notes

How are objects with the same UUID handled?

If only the first item is used, should references be reconstituted if the first item is unloaded and now the second item is all that is left?

If an item is loaded and a subsequent item is ignored, then the subsequent item is unloaded, how does the document model know which one was unloaded?

How are modifications to read-only projects handled?

For instance, if a read-only project has a spectrum image and pick tool places a graphic for position on the read-only data item, what happens?

One potentially useful idea is to allow display modifiers (such as graphics or masks) to be added to a display using external objects (computations). So the graphic for a pick would be stored with the computation rather than directly on the display.

But the user may want to edit the displays of read-only projects. Do these just get modified during the session but not written to disk?

Read-only projects may require a shadow project and the user will probably need an option to snapshot a read-only project so that it can be modified.

How are smart collections edited?

Choices are to use the inspector or the filter panel.