Python command line script that uses R packages to calculate seismology data quality metrics.
Python
Switch branches/tags
Clone or download
Pull request Compare This branch is 231 commits ahead of iris-edu-int:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
ispaq
preference_files
test_data
.gitignore updates to test py script Nov 22, 2016
CHANGELOG.txt
EXAMPLES
IRISMustangMetrics_2.1.0.tar.gz updated R-packages Apr 11, 2018
IRISSeismic_1.4.8.tar.gz update IRISSeismic R-package May 7, 2018
LICENSE.txt
MANIFEST.in
README.html
README.md
ispaq-conda-install.txt
run_ispaq.py
seismicRoll_1.1.2.tar.gz
setup.py

README.md

ISPAQ - IRIS System for Portable Assessment of Quality

ISPAQ is a Python client that allows seismic data scientists and instrumentation operators to run data quality metrics on their own workstation, using much of same code as used in IRIS's MUSTANG data quality web service. It can be installed on Linux and macOS.

Users have the ability to create personalized preference files that list combinations of station specifiers and statistical metrics of interest, such that they can be run repeatedly over data from many different time periods. Alternatively, single station specifiers and metrics can be specified on the command line for simple runs or for use in shell scripting.

ISPAQ offers the option for access to FDSN Web Services to retrieve seismic data and metadata directly from selected data centers supporting the FDSN protocol. Users also have the option to read local miniSEED files and metadata on their own workstations and construct on-the-spot data quality analyses on that data.

Output is provided in CSV format for tabular metrics. In addition, Probability Density Functions (PDF) for single days can be plotted to PNG image files.

The business logic for MUSTANG metrics is emulated through ObsPy and custom Python code and the core calculations are performed using the same R packages as used by MUSTANG.

Background

IRIS (Incorporated Research Institutions for Seismology) has developed a comprehensive quality assurance system called MUSTANG.

The MUSTANG system was built to operate at the IRIS DMC and is not generally portable. However, the key MUSTANG component is the Metric Calculators and these are publicly available. While the results of MUSTANG calculations are stored in a database and provided to users via web services, ISPAQ is intended to carry out the process of calculating these metrics locally on the user's workstation. This has the benefit of allowing users to generate just-in-time metrics on data of their choosing, whether stored an FDSN data center or on the user's own data store.

IRIS has over 40 MUSTANG metrics algorithms, most written in R, that are now available in the CRAN (Comprehensive R Archive Network) repository under the name IRISMustangMetrics. ISPAQ comes with the latest version of these packages available in CRAN and ISPAQ has an update capability to allow users to seamlessly upgrade these R packages as new releases become available.

ISPAQ contains business logic similar to MUSTANG, such that the computed metrics produced are identical (or very similar) to the results you will see in MUSTANG. The end result is a lightweight and portable version of MUSTANG that users are free to leverage on their own hardware.

Questions or comments can be directed to the IRIS DMC Quality Assurance Group at dmc_qa@iris.washington.edu.

Installation

ISPAQ is distributed through GitHub, via IRIS's public repository (iris-edu). You will use a git client command to get a copy of the latest stable release. In addition, you will use the miniconda python package manager to create a customized Python environment designed to run ISPAQ properly. This will include a localized installation of ObsPy and R.

If running macOS, Xcode command line tools should be installed. Check for existence and install if missing:

xcode-select --install

Follow the steps below to begin running ISPAQ.

Download the Source Code

You must first have git installed your system. This is a commonly used source code management system and serves well as a mode of software distribution as it is easy to capture updates. See the Git Home Page to begin installation of git before proceeding further.

After you have git installed, you will download the ISPAQ distribution into a directory of your choosing from GitHub by opening a text terminal and typing:

git clone https://github.com/iris-edu/ispaq.git

This will produce a copy of this code distribution in the directory you have chosen. When new ispaq versions become available, you can update ISPAQ by typing:

cd ispaq
git pull origin master

Install the Anaconda Environment

Anaconda is quickly becoming the defacto package manager for scientific applications written python or R. Miniconda is a trimmed down version of Anaconda that contains the bare necessities without loading a large list of data science packages up front. With miniconda, you can set up a custom python environment with just the packages you need to run ISPAQ.

Proceed to the Miniconda web site to find the installer for your operating system before proceeding with the instructions below. If you can run conda from the command line, then you know you have it successfully installed.

By setting up a conda virtual environment, we assure that our ISPAQ installation is entirely separate from any other installed software.

Creating the ispaq environment for macOS or Linux

You will go into the ispaq directory that you created with git, update miniconda, then create an environment specially for ispaq. You have to activate the ISPAQ environment whenever you perform installs, updates, or run ISPAQ.

cd ispaq
conda update conda
conda create --name ispaq python=2.7 readline=6.2
source activate ispaq
conda install -c conda-forge -c r -c r-old -c bioconda --file ispaq-conda-install.txt

Note: if source activate ispaq does not work because your shell is csh/tcsh instead of bash you will need to start a bash shell first. Type bash in a terminal window and then proceed with source activate ispaq.

See what is installed in our (ispaq) environment with:

conda list

Now install the IRIS R packages for ISPAQ. This will be a good test that R is installed properly:

R CMD INSTALL seismicRoll_1.1.2.tar.gz 
R CMD INSTALL IRISSeismic_1.4.8.tar.gz
R CMD INSTALL IRISMustangMetrics_2.1.0.tar.gz 

Using ISPAQ

Every time you use ISPAQ you must ensure that you are running in the proper Anaconda environment. If you followed the instructions above you only need to type:

cd ispaq
source activate ispaq

after which your prompt should begin with (ispaq). You run ispaq using the run_ispaq.py python script. The example below shows how to get ISPAQ to show the help display. A leading ./ is used to indicate that the script is in the current directory.

A list of command-line options is available with the --help flag:

(ispaq)$ ./run_ispaq.py -h
usage: run_ispaq.py [-h] [-P PREFERENCES_FILE] [-M METRICS] [-S STATIONS]
                    [--starttime STARTTIME] [--endtime ENDTIME]
                    [--dataselect_url DATASELECT_URL]
                    [--station_url STATION_URL] [--event_url EVENT_URL]
                    [--resp_dir RESP_DIR] [--csv_dir CSV_DIR]
                    [--png_dir PNG_DIR] [--sncl_format SNCL_FORMAT]
                    [--sigfigs SIGFIGS]
                    [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [-A]
                    [-V] [-U] [-L]

ISPAQ version 1.0.0

single arguments:
  -h, --help                       show this help message and exit
  -V, --version                    show program's version number and exit
  -U, --update-r                   check for and install newer CRAN IRIS Mustang packages, and exit
  -L, --list-metrics               list names of available metrics and exit

arguments for running metrics:
  -P PREFERENCES_FILE, --preferences-file PREFERENCES_FILE
                                   path to preference file, default=./preference_files/default.txt
  -M METRICS, --metrics METRICS    metrics alias as defined in preference file or metric name, required
  -S STATIONS, --stations STATIONS
                                   stations alias as defined in preference file or station SNCL, required
  --starttime STARTTIME            starttime in ObsPy UTCDateTime format, required for webservice requests and 
                                   defaults to earliest data file for local data 
                                   examples: YYYY-MM-DD, YYYYMMDD, YYYY-DDD, YYYYDDD[THH:MM:SS]
  --endtime ENDTIME                endtime in ObsPy UTCDateTime format, default=starttime + 1 day; 
                                   if starttime is also not specified then it defaults to the latest data file for local data 
                                   examples: YYYY-MM-DD, YYYYMMDD, YYYY-DDD, YYYYDDD[THH:MM:SS]
  --dataselect_url DATASELECT_URL  FDSN webservice or path to directory with miniSEED files, overrides preference file
  --station_url STATION_URL        FDSN webservice or path to stationXML file, overrides preference file
  --event_url EVENT_URL            FDSN webservice or path to QuakeML file, overrides preference file
  --resp_dir RESP_DIR              path to directory with RESP files, overrides preference file
  --csv_dir CSV_DIR                directory to write generated metrics .csv files, overrides preference file
  --png_dir PNG_DIR                directory to write generated metrics .png files, overrides preference file
  --sncl_format SNCL_FORMAT        format of SNCL aliases and miniSEED file names, overrides preference file
                                   examples:"N.S.L.C","S.N.L.C"
                                   where N=network code, S=station code, L=location code, C=channel code
  --sigfigs SIGFIGS                number of significant figures used for output columns named "value",
                                   overrides preference file
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                                   log level printed to console, default="INFO"
  -A, --append                     append to TRANSCRIPT file rather than overwriting

If no preference file is specified and the default file ./preference_files/default.txt cannot be found:
--csv_dir defaults to "."
--png_dir defaults to "."
--sncl_format defaults to "N.S.C.L"
--sigfigs defaults to "6"

For those that prefer to run ISPAQ as a package, you can use the following invocation (using help example):

(ispaq) $ python -m ispaq.ispaq --help

When calculating metrics, valid arguments for -M and -S are required and must be provided. If -P is not provided, ISPAQ uses the default preference file located at ispaq/preference_files/default.txt. However, all entries in the preference file can be overridden by command-line options. If --log-level is not specified, the default log-level is INFO.

When --starttime is invoked without --endtime, metrics are run for a single day. Metrics that are defined as day-long metrics (24 hour windows, see metrics documentation at MUSTANG) will be calculated for the time period 00:00:00-23:59:59.9999. An endtime of YYYY-DD-MM is interpreted as YYYY-DD-MM 00:00:00 so that e.g., --starttime=2016-01-01 --endtime=2016-01-02 will also calculate one day of metrics. When an end time greater than one day is requested, metrics will be calculated by cycling through multiple single days to produce a measurement for each day. Additionally, and only if using local data files, you can run metrics without specifying a start time. In this case, ISPAQ will use a start time corresponding to the earliest file found that matches the requested station(s). If end time is also not specified, ISPAQ will use an end time corresponding to the latest file found that matches the requested station(s).

Preference files

The ISPAQ system is designed to be configurable through the use of preference files. These are usually located in the preference_files/ directory. Not surprisingly, the default preference file is preference_files/default.txt. This file is self describing with the following comments in the header:

# Preferences fall into four categories:
#  * Metrics -- aliases for user defined combinations of metrics (Use with -M)
#  * Station_SNCLs -- aliases for user defined combinations of SNCL patterns (Use with -S)
#                     SNCL patterns are station names formatted as Network.Station.Location.Channel
#                     wildcards * and ? are allowed. SNCL pattern format can be modified 
#                     using the Preferences sncl_format.          
#  * Data_Access -- FDSN web services or local files
#  * Preferences -- additional user preferences
#
# This file is in a very simple format.  After each category heading, all lines containing a colon 
# will be interpreted as key:value and made available to ISPAQ.
#

Metric aliases can be any of one of the predefined options or any user-created alias: metric combination, where metric can be a single metric name or a comma separated list of valid metric names. Aliases cannot be combinations of other aliases. Example: myMetrics: num_gaps, sample_mean, cross_talk.

Station_SNCL aliases are user created alias: Network.Station.Location.Channel combinations. Station SNCLs can be comma separated lists. * or ? wildcards can be used in any of the network, station, location, channel elements. Example: "myStations: IU.ANMO.10.BHZ, IU.*.00.BH?, IU.ANMO.*.?HZ, II.PFO.??.*. By default, aliases are formatted as Network.Station.Location.Channel. This format pattern can be modified using the sncl_formatentry discussed below.

Note: When directly specifying a SNCL pattern on the command line, SNCLs containing wildcards should be enclosed by quotes to avoid a possible error of unrecognized arguments.

Data_Access has four entries describing where to find data, metadata, events, and optionally response files.

  • dataselect_url: should indicate a miniSEED data resource as one of the FDSN web service aliases used by ObsPy (e.g. IRIS), an explicit URL pointing to an FDSN web service domain (e.g. http://service.iris.edu ), or a file path to a directory containing miniSEED files (See: "Using Local Data Files", below).

  • station_url: should indicate a metadata location as an FDSN web service alias, an explicit URL, or a path to a file containing metadata in StationXML format (schema). For web services, this should point to the same place as dataselect_url (e.g. http://service.iris.edu). For local metadata, StationXML is read at the channel level and any response information is ignored. Local instrument response (if used) is expected to be in RESP file format and specified in the resp_dir entry (see below). If neither webservices or StationXML is available for metadata, the station_url entry should be left unspecified (blank). In this case, metrics that do not require metadata will still be calculated. Metrics that do require metadata information (cross_talk, polarity_check, orientation_check, transfer_function) will not be calculated and will return a log message stating "No available waveforms".

    If you are starting from a dataless SEED metadata file, you can create StationXML from this using the FDSN StationXML-SEED Converter.

  • event_url: should indicate an event catalog resource as an FDSN web service alias (e.g. USGS), an explicit URL (e.g. https://earthquake.usgs.gov), or a path to a file containing event information in QuakeML format (schema). Only web service providers that can output text format can be used at this time. This entry will only be used by metrics that require event information in order to be calculated (cross_talk, polarity_check, orientation_check).

  • resp_dir: should be unspecified or absent if local response files are not used. The default behavior is to retrieve response information from IRIS Evalresp. To use local instrument responses instead of IRIS Evalresp, this parameter should indicate a path to a directory containing response files in RESP format. Local response files are expected to be named RESP.network.station.location.channel or RESP.station.network.location.channel. Filenames with extension .txt are also acceptable. E.g., RESP.IU.CASY.00.BH1, RESP.CASY.IU.00.BH1, RESP.IU.CASY.00.BH1.txt.

    Response information is only needed when generating PSD derived metrics, PDF plots, or the transfer_function metric.

    If you are starting from a dataless SEED, you can create RESP files using rdseed.

Preferences has four entries describing ispaq output.

  • csv_dir: should be followed by a directory path for output of generated metric text files (CSV). If the directory does not exist, then it defaults to the current working directory.

  • png_dir: should be followed by a directory path for output of generated PDF plots (PNG). If the directory does not exist, then it defaults to the current working directory.

  • sigfigs: should indicate the number of significant figures used for output columns named "value". Default is 6.

  • sncl_format: should be the format of sncl aliases and miniSEED file names, must be some combination of period separated N=network, S=station, L=location, C=channel (e.g., N.S.L.C, S.N.L.C). If no sncl_format exists, it defaults to N.S.L.C.

Any of these preference file entries can be overridden by command-line arguments: -M "metric name", -S "station SNCL", --dataselect_url, --station_url, --event_url, --resp_dir, --csv_output_dir, --plot_output_dir, --sigfigs, --sncl_format

More information about using local files can be found below in the section "Using Local Data Files".

Output files

ISPAQ will always create a log file named ISPAQ_TRANSCRIPT.log to record actions taken and messages generated during processing.

Results of most metrics calculations will be written to .csv files using the following naming scheme:

  • MetricAlias_StationAlias_startdate__businessLogic.csv

when a single day is specified on the command line or

  • MetricAlias_StationAlias_startdate_enddate_businessLogic.csv

when multiple days are specified from the command line.

businessLogic corresponds to which script is invoked:

businessLogic ISPAQ script metrics
simpleMetrics simple_metrics.py most metrics
SNRMetrics SNR_metrics.py sample_snr
PSDMetrics PSD_metrics.py pct_above_nhnm, pct_below_nlnm, dead_channel_{exp,lin,gsn}, psd_corrected, pdf_text
crossTalkMetrics crossTalk_metrics.py cross_talk
pressureCorrelationMetrics pressureCorrelation_metrics.py pressure_effects
crossCorrelationMetrics crossCorrelation_metrics.py polarity_check
orientationCheckMetrics orientationCheck_metrics.py orientation_check
transferMetrics transferFunction_metrics.py transfer_function

The metric alias psdText (or any user defined set with metrics psd_corrected or pdf_text) will generate corrected PSDs and PDFs in files named:

  • SNCL_startdate_PSDcorrected.csv
  • SNCL_startdate_PDF.csv

while the metric alias PDF (metric pdf_plot) will generate PDF plot images as:

  • SNCL_startdate_PDF.png

If specifying metrics and station SNCLs from the command line instead of using preference file aliases, the metric name and station SNCL will be used instead of the MetricAlias and StationAlias in the output file name. In addition, any instances of command-line wildcards "*" or "?" will be replaced with the letter "x" in the output file name.

Command line invocation

Example invocations are found in the EXAMPLES section and at the end of this README.

You can modify the information printed to the console by modifying the --log-level. To see detailed progress information use --log-level DEBUG. To hide everything other than an outright crash use --log-level CRITICAL. If --log-level is not invoked, the default is to print information at the INFO level. The other available levels are WARNING and ERROR.

The following example demonstrates what you should see. Note: Please ignore the warning message from matplotlib. It will only occur on first use.

(ispaq) $ run_ispaq.py -M basicStats -S basicStats --starttime 2010-04-20 --log-level INFO
2017-05-26 13:58:12 - INFO - Running ISPAQ version 1.0.0 on Fri May 26 13:58:12 2017
~/miniconda2/envs/ispaq/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is 
building the font cache using fc-list. This may take a moment. 
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
2017-05-26 13:58:22 - INFO - Calculating simple metrics for 3 SNCLs on 2010-04-20
2017-05-26 13:58:22 - INFO - 000 Calculating simple metrics for IU.ANMO.00.BH1
2017-05-26 13:58:24 - INFO - 001 Calculating simple metrics for IU.ANMO.00.BH2
2017-05-26 13:58:25 - INFO - 002 Calculating simple metrics for IU.ANMO.00.BHZ
2017-05-26 13:58:26 - INFO - Writing simple metrics to basicStats_basicStats_2010-04-20__simpleMetrics.csv
2017-05-26 13:58:26 - INFO - ALL FINISHED!
(ispaq) $

Additional information about running ISPAQ on the command line can be found by invoking run_ispaq.py --help.

Using Local Data Files

Local data files should be in miniSEED format and organized in network-station-channel-day files with naming convention

Network.Station.Location.Channel.Year.JulianDay.Quality

where Quality is optional (e.g., TA.P19K..BHZ.2016.214.M or TA.P19K..BHZ.2016.214). This naming convention can be modified by using the sncl_format entry in the preferences file or the --sncl_format option on the command line. sncl_format allows you to specify a different order for Network.Station.Location.Channel, although all these elements must be present in the file name.

ISPAQ will search for miniSEED files in the directory specified by dataselect_url in the preferences file or --dataselect_url on the command line. Furthermore, it will recursively follow that directory structure and look for miniSEED files in directories contained within the dataselect_url directory. If more than one file name is found that matches the same requested network, station, location, channel, year, and julian day, then the metrics will be run on the first file that is found. To request all data files, use preference file Station_SNCL alias: *.*.*.*, or -S "*.*.*.*" from the command line". Wildcarding every element is strongly discouraged when using FDSN webservices instead of local files.

Note: All data is expected to be in the day file that matches its timestamp; if records do not break on the day boundary, data that is not in the correct day file will not be used in the metrics calculation. This can lead to cases where, for example, a gap is calculated at the start of a day when the data for that time period is in the previous day file.

If your miniSEED files are not already split on day boundaries, one tool that can be used for this task is the dataselect command-line tool available at https://github.com/iris-edu/dataselect. Follow the releases link in the README to download the latest version of the source code. The following example reads the input miniSEED files, splits the records on day boundaries, and writes to files named network.station.location.channel.year.julianday.quality.

Example: dataselect -Sd -A %n.%s.%l.%c.%Y.%j.%q inputfiles

Updating CRAN packages

The command-line argument -U, --update-r can be used to check CRAN for newer IRISSeismic, seismicRoll, and IRISMustangMetrics R packages.

(ispaq) $ ./run_ispaq.py -U
2017-05-25 12:37:19 - INFO - Running ISPAQ version 1.0.0 on Thu May 25 12:37:19 2017
2017-05-25 12:37:19 - INFO - Checking for IRIS R package updates...
--- Please select a CRAN mirror for use in this session ---
HTTPS CRAN mirror 

 1: 0-Cloud [https]                 2: Algeria [https]              
 3: Australia (Canberra) [https]    4: Australia (Melbourne) [https]
 5: Australia (Perth) [https]       6: Austria [https]              
...

Selection: 1

              package installed   CRAN upgrade
0         seismicRoll     1.1.2  1.1.2   False
1         IRISSeismic     1.4.5  1.4.3   False
2  IRISMustangMetrics     2.0.8  2.0.8   False

2017-05-25 12:37:34 - INFO - No packages need updating.

If a newer CRAN package does exist, the -U option will then automatically download the package from CRAN and install it. ISPAQ code can be updated using git pull origin master. Sometimes it is necessary to update the ISPAQ python code in conjunction with the CRAN code.

List of Metrics

The command-line argument -L will list the names of available metrics.

Note: When using local data files, metrics based on miniSEED activity flags, I/O flags, and timing blockette 1001 are not valid. These metrics are calibration_signal, clock_locked, event_begin, event_end, event_in_progress, timing_correction, and timing_quality. ISPAQ will not return values for these metrics.

Brief Descriptions and Links to Documentation

  • amplifier_saturation: The number of times that the 'Amplifier saturation detected' bit in the 'dq_flags' byte is set within a miniSEED file. This data quality flag is set by some dataloggers in the fixed section of the miniSEED header. The flag was intended to indicate that the preamp is being overdriven, but the exact meaning is datalogger-specific. Documentation

  • calibration_signal: The number of times that the 'Calibration signals present' bit in the 'act_flags' byte is set within a miniSEED file. A value of 1 indicates that a calibration signal was being sent to that channel. Documentation

  • clock_locked: The number of times that the 'Clock locked' bit in the 'io_flags' byte is set within a miniSEED file. This clock flag is set to 1 by some dataloggers in the fixed section of the miniSEED header to indicate that its GPS has locked with enough satellites to obtain a time/position fix. Documentation

  • cross_talk: The correlation coefficient of channel pairs from the same sensor. Data windows are defined by seismic events. Correlation coefficients near 1 may indicate cross-talk between those channels. Documentation

  • dead_channel_exp: Dead channel metric - exponential fit. This metric is calculated from the mean of all the PSDs generated (typically 47 for a 24 hour period). Values of the PSD mean curve over the band expLoPeriod:expHiPeriod are fit to an exponential curve by a least squares linear regression of log(PSD mean) ~ log(period). The dead_channel_exp metric is the standard deviation of the fit residuals of this regression. Lower numbers indicate a better fit and a higher likelihood that the mean PSD is exponential - an indication of a dead channel. Documentation

    • channels = [BCDHM][HX].
  • dead_channel_gsn: A boolean measurement providing a TRUE or FALSE indication that the median PSD values of channel exhibit an average 5dB deviation below the NLNM in the 4 to 8s period band as measured using a McNamara PDF noise matrix. The TRUE condition is indicated with a numeric representation of '1' and the FALSE condition represented as a '0'. Documentation

    • channels = [BCDHLM][HX].
  • dead_channel_lin: Dead channel metric - linear fit. This metric is calculated from the mean of all the PSDs generated (typically 47 for a 24 hour period). Values of the PSD mean curve over the band linLoPeriod:linHiPeriod are fit to a linear curve by a least squares linear regression of PSD mean ~ log(period). The dead_channel_lin metric is the standard deviation of the fit residuals of this regression. Lower numbers indicate a better fit and a higher likelihood that the mean PSD is linear - an indication that the sensor is not returning expected seismic energy. Documentation

    • channels = [BCDHM][HX].
  • digital_filter_charging: The number of times that the 'A digital filter may be charging' bit in the 'dq_flags' byte is set within a miniSEED file. Data samples acquired while a datalogger is loading filter parameters - such as after a reboot - may contain a transient. Documentation

  • digitizer_clipping: The number of times that the 'Digitizer clipping detected' bit in the 'dq_flags' byte is set within a miniSEED file. This flag indicates that the input voltage has exceeded the maximum range of the ADC. Documentation

  • event_begin: The number of times that the 'Beginning of an event, station trigger' bit in the 'act_flags' byte is set within a miniSEED file. This metric can be used to quickly identify data days that may have events. It may also indicate when trigger parameters need adjusting at a station. Documentation

  • event_end: The number of times that the 'End of an event, station detrigger' bit in the 'act_flags' byte is set within a miniSEED file. This metric can be used to quickly identify data days that may have events. It may also indicate when trigger parameters need adjusting at a station. Documentation

  • event_in_progress: The number of times that the 'Event in progress' bit in the 'act_flags' byte is set within a miniSEED file. This metric can be used to quickly identify data days that may have events. It may also indicate when trigger parameters need adjusting at a station. Documentation

  • glitches: The number of times that the 'Glitches detected' bit in the 'dq_flags' byte is set within a miniSEED file. This metric can be used to identify data with large filled values that data users may need to handle in a way that they don't affect their research outcomes. Documentation

  • max_gap: Indicates the size of the largest gap encountered within a 24-hour window. Documentation

  • max_overlap: Indicates the size of the largest overlap in seconds encountered within a 24-hour window. Documentation

  • max_stalta: The STALTAMetric function calculates the maximum of STA/LTA of the incoming seismic signal over a 24 hour period. In order to reduce computation time of the rolling averages, the averaging window is advanced in 1/2 second increments. Documentation

    • channels = [BHCDES][HPLX].
  • missing_padded_data: The number of times that the 'Missing/padded data present' bit in the 'dq_flags' byte is set within a miniSEED file. This metric can be used to identify data with padded values that data users may need to handle in a way that they don't affect their research outcomes. Documentation

  • num_gaps: This metric reports the number of gaps encountered within a 24-hour window. Documentation

  • num_overlaps: This metric reports the number of overlaps encountered in a 24-hour window. Documentation

  • num_spikes: This metric uses a rolling Hampel filter, a median absolute deviation (MAD) test, to find outliers in a timeseries. The number of discrete spikes is determined after adjacent outliers have been combined into individual spikes. NOTE: not to be confused with the spikes metric, which is an SOH flag only. Documentation

    • channels = [BH][HX].
  • orientation_check: Determine channel orientations by rotating horizontal channels until the resulting radial component maximizes cross-correlation with the Hilbert transform of the vertical component. This metric uses Rayleigh waves from large, shallow events. Documentation

    • channels = [BCHLM][HX].
  • pct_above_nhnm: Percent above New High Noise Model. Percentage of Probability Density Function values that are above the New High Noise Model. This value is calculated over the entire time period. Documentation

    • channels = [BCDHM][HX].
  • pct_below_nlnm: Percent below New Low Noise Model. Percentage of Probability Density Function values that are below the New Low Noise Model. This value is calculated over the entire time period. Documentation

    • channels = [BCDHM][HX].
  • pdf_plot: Probability density function plots. Generates one plot per station-day. Reference

  • pdf_text: Probability density function text output (frequency, power, hits, target, starttime, endtime) Reference

  • percent_availability: The portion of data available for each day is represented as a percentage. 100% data available means full coverage of data for the reported start and end time. Documentation

  • polarity_check: The signed cross-correlation peak value based on the cross-correlation of two neighboring station channels in proximity to a large earthquake signal. A negative peak close to -1.0 can indicate reversed polarity. Documentation

    • channels = [BCFHLM][HX].
  • pressure_effects: The correlation coefficient of a seismic channel and an LDO pressure channel. Large correlation coefficients may indicate the presence of atmospheric effects in the seismic data. Documentation

    • channels = LH., LDO
  • psd_corrected: Power spectral density values, corrected for instrument response, in text format (starttime, endtime, frequency, power). Documentation

    • channels = .[HLGNPYXD].
  • sample_max: This metric reports largest amplitude value in counts encountered within a 24-hour window. Documentation

  • sample_mean: This metric reports the average amplitude value in counts over a 24-hour window. This mean is one measure of the central tendency of the amplitudes that is calculated from every amplitude value present in the time series. The mean value itself may not occur as an amplitude value in the times series. Documentation

  • sample_median: This metric reports the middle amplitude value in counts of sorted amplitude values from a 24-hour window. This median is one measure of the central tendency of the amplitudes in a time series when values are arranged in sorted order. The median value itself always occurs as an amplitude value in the times series. Documentation

  • sample_min: This metric reports smallest amplitude value in counts encountered within a 24-hour window. Documentation

  • sample_rms: Displays the RMS variance of trace amplitudes within a 24-hour window. Documentation

  • sample_snr: A ratio of the RMS variance calculated from data 30 seconds before and 30 seconds following the predicted first-arriving P phase. Documentation

    • channels = .[HLGNPYX].
  • sample_unique: This metric reports the number (count) of unique values in data trace over a 24-hour window. Documentation

  • spikes: The number of times that the 'Spikes detected' bit in the 'dq_flags' byte is set within a miniSEED file. This data quality flag is set by some dataloggers in the fixed section of the miniSEED header when short-duration spikes have been detected in the data. Because spikes have shorter duration than the natural period of most seismic sensors, spikes often indicate a problem introduced at or after the datalogger. Documentation

  • suspect_time_tag: The number of times that the 'Time tag is questionable' bit in the 'dq_flags' byte is set within a miniSEED file. This metric can be used to identify stations with GPS locking problems and data days with potential timing issues. Documentation

  • telemetry_sync_error: The number of times that the 'Telemetry synchronization error' bit in the 'dq_flags' byte is set within a miniSEED file. This metric can be searched to determine which stations may have telemetry problems or to identify or omit gappy data from a data request. Documentation

  • timing_correction: The number of times that the 'Time correction applied' bit in the 'act_flags' byte is set within a miniSEED file. This clock quality flag is set by the network operator in the fixed section of the miniSEED header when a timing correction stored in field 16 of the miniSEED fixed header has been applied to the data's original time stamp. A value of 0 means that no timing correction has been applied. Documentation

  • timing_quality: Daily average of the SEED timing quality stored in miniSEED blockette 1001. This value is vendor specific and expressed as a percentage of maximum accuracy. Percentage is NULL if not present in the miniSEED. Documentation

  • transfer_function: Transfer function metric consisting of the gain ratio, phase difference and magnitude squared of two co-located sensors. Documentation

    • channels = [BCFHLM][HX].

Examples Using Default.txt Preference File

Note: not using -P in the command line is the same as specifying -P preference_files/default.txt

cd ispaq
source activate ispaq
./run_ispaq.py -M basicStats -S basicStats --starttime 2010-100             # starttime specified as julian day
./run_ispaq.py -M gaps -S gaps --starttime 2013-01-05                       # starttime specified as calendar day
./run_ispaq.py -M numSpikes -S numSpikes --starttime 2013-01-03 --endtime 2013-01-08
./run_ispaq.py -M stalta -S stalta --starttime 2013-153
./run_ispaq.py -M snr -S snr --starttime 2013-06-02
./run_ispaq.py -M psdDerived -S psd --starttime 2011-138 --endtime 2011-140
./run_ispaq.py -M psdText -S psd --starttime 2011-05-18 
./run_ispaq.py -M pdf -S pdf --starttime 2013-06-01
./run_ispaq.py -M crossTalk -S crossTalk --starttime 2013-09-21
./run_ispaq.py -M pressureCorrelation -S pressureCorrelation --starttime 2013-05-02
./run_ispaq.py -M crossCorrelation -S crossCorrelation --starttime 2011-01-01
./run_ispaq.py -M orientationCheck -S orientationCheck --starttime 2015-11-24
./run_ispaq.py -M transferFunction -S transferFunction --starttime=2012-10-03 --endtime=2012-10-05 

Example Using Command-line Options to Override Preference File

./run_ispaq.py -M sample_mean -S II.KAPI.00.BHZ --starttime 2013-01-05 --dataselect_url ./test_data --station_url ./test_data/II.KAPI_station.xml --csv_dir ./output_files