Skip to content
Chad Trabant edited this page Apr 10, 2017 · 18 revisions

The mseedindex programs reads Mini-SEED, determines contiguous data sections of time series (same network, station, location, channel and quality) and synchronizes information about each data section with a database.

For a quick introduction see the how to index data set instructions. For detailed usage information see the mseedindex manual in the 'doc' directory.

The database schema is designed to serve as an index and summary of time series data to fulfill multiple needs. One of the main use cases is to efficiently search for and access data, other important use cases are to build various summaries of data holdings.

The location of a given section is represented in the database as starting at a byte offset in a file and a count of bytes that follow. Each section of data is a row in the schema.

Data files should be scanned or synchronized when they are in-place. By default the absolute path to each input file will be resolved and stored.

Any existing rows in the database that match the file being synchronized will be replaced during synchronization. This operation is done as a database transaction containing all deletions and all insertions. See FILE VERSIONING section in the manual for a description of how to avoid race conditions while simultaneously updating data files and extracting data.

Supported databases

PostgreSQL (version >= 9.1) and SQLite3 are supported as target databases. For PostgreSQL the hstore extension is required and is included with any modern PostgreSQL installation.

When using Postgres the specified table is expected to exist. When using SQLite both the database file and table will be created as needed, along with some indexes on common fields.

Program flow

The general program flow follows this pattern:

  • Read all the data in the specified file(s) and determine the summary details
  • For each file:
    • Check the database for existing rows for file
      • If existing rows exist for the same data scanned, compare MD5 sums and retain updated time values if the data has not changed.
    • Delete all rows for file
    • Insert new rows for file

The DELETE and INSERT statements are performed in a transaction in order to guarantee than any queries reading this information do not miss any data.

Tracking time series data updates

The indexing table contains an MD5 sum that mseedindex uses to determine when a data section changes. Data, or specifically a data section, can be moved in a file and the MD5 will still be the same and the indexer will know that the data in that section has not changed.

The updated field tracks the last known time that a data section was changed. If there are no existing entries for given data (initial load) the updated field is set to the file modification time.