Skip to content

Submission Details

Marlin Schäfer edited this page Oct 13, 2021 · 12 revisions

This page lists information on the submission procedure, algorithm and data formats, as well as the evaluation metrics.

Submission procedure

If you and/or your group want to participate in the mock data challenge, we kindly ask you to contact us at mlgwsc@aei.mpg.de by December 31st, 2021. As all evaluation is done manually and we aim to be in close correspondence with any participants, we limit the number of submissions to roughly 30.

Once registered, you have time to work on your final submission until March 31st, 2022 which is the final submission deadline. To submit your algorithm we ask you to send us a mail with the required code attached or to share a git-repository. In case files are too large to be transferred via standard e-mail we offer the use of Cryptshare to transfer large files. If neither of these options work, feel free to get in contact with us at mlgwsc@aei.mpg.de.

We will evaluate each submission on a smaller validation set and return the metrics derived on this validation set to the submitting group. We then ask the group to verify that we are able to reproduce the expected search behavior within margin of error. This allows groups to make sure that the submitted algorithm was the correct version and that no unexpected error occurred. We explicitly ask groups to not use this as a means to optimize their code and only view it as a verification step.

If the results on the validation set are signed off by the submitting group as correct, the algorithm will then be applied to a secret data set that is the same for all groups. Results on this set will not be communicated back to the group prior to the submission deadline.

The submission may be retracted at any point prior to final publication of the paper.

Algorithm input/output format

Input

Each algorithm is expected to process a HDF5 file containing the raw detector data. Details on this data can be found here. It is expected that any required pre-processing is done by the algorithm itself.

The input file contains two groups H1 and L1 which store one or multiple datasets for the detectors Hanford and Livingston, respectively. The datasets are named by their integer start times. The exact start time is stored in the attributes of each of these datasets under the key start_time.

The attributes of the file contain meta-data to reproduce this file. This meta-data will be stripped from the final test-set to prevent the usage by the algorithm.

The file structure may look something like this

root
├── attrs
├── H1
│   ├── 0
│   │   └── attrs
│   │        ├── start_time
│   │        └── delta_t
│   ├── 7200
│   │   └── attrs
│   │        ├── start_time
│   │        └── delta_t
│   └── ...
└── L1
    ├── 0
    │   └── attrs
    │        ├── start_time
    │        └── delta_t
    ├── 7200
    │   └── attrs
    │        ├── start_time
    │        └── delta_t
    └── ...

In this case there are (at least) two datasets per detector with the start times 0 and 7200 (in seconds). Additional datasets may be present, as indicated by .... The start times are always the same in both detectors. To view the available datasets you can use h5py

import h5py

file_path = "" #Insert path to file

with h5py.File(file_path, 'r') as file:
    print(file['H1'].keys())

The data can be easily loaded into a PyCBC TimeSeries using the following code:

from pycbc.types import load_timeseries

file_path = "" #Insert path to file here
start_time = "" #Insert integer start-time here

h1_ts = load_timeseries(file_path, group=f'H1/{start_time}')
l1_ts = load_timeseries(file_path, group=f'L1/{start_time}')

Output

The algorithm is expected to output a single HDF5 file for a given input file. This file is expected to contain exactly 3 datasets all of equal length. The datasets are supposed to be named and contain the data as detailed in the table below

Key Data description
time The GPS times at which the algorithm expects an injection to be present. This must be a 1D array containing all suspected event times, concatenated over all analyzed input segments.
stat A ranking-statistic like quantity, meaning that a larger value corresponds to a higher certainty of the algorithm to have found a true signal.
var The timing accuracy of the prediction. The returned time may not coincide with an injection perfectly. This number gives the allowed tolerance around an injection time to accept true positives, i.e. the maximum allowed separation between an injection and the recovered event time. Note, this value may be constant for all returned events.

Algorithm format

You may either submit an executable that can be run via ./ using the installed software stack or a singularity image if you have further software dependencies. In case of any problems or questions feel free to contact us directly or join our monthly support calls.

Either way the program needs to accept two positional arguments, where the first is the absolute path to the input file and the second is the absolute path at which the output file should be stored.

Example:

./script.py input_path.hdf output_path.hdf
singularity run singularity_image.sif input_path.hdf output_path.hdf

Working examples can be found in the examples folder of the repository.

Evaluation metrics

We will calculate the following metrics for any of the submitted algorithms. Each of these metrics will be described in detail in the following subsections.

Metric Short description
false-alarm rate Any event will be assigned a false-alarm rate. This false-alarm rate is the number of false positives per unit time with a ranking statistic larger than the considered event. The false positives will be determined on data containing no injections.
sensitive distance The search overall will be assigned a sensitive distance. The sensitive distance measures out to which distance sources can be detected at or below a given false-alarm rate. It is directly related to the number of detected signals at that false-alarm rate.
runtime Because all searches are run on the same hardware, we also report the wall-time used by the search algorithm to perform pre-processing, evaluation, and post-processing.

false-alarm rate

The false-alarm rate (FAR) is a measure for how often the search is expected to return a false-positive event with a ranking statistic larger than a given threshold. It is calculated by applying the search algorithm to pure noise data and applying thresholds to the returned events. At each threshold the number of remaining events is divided by the total analyzed time. We use every event returned on the background as a single threshold.

sensitive distance

The sensitive distance is a measure for how well the search is able to detect signals at a given false-alarm rate and takes the input distribution into account. It is, therefore, more informative than the true positive rate, which can be altered by altering the test set parameters. For details on the calculation for arbitrary distributions please refer to [1]. Since the distribution of signals in our test set is made uniform in volume, the calculation of the sensitive volume reduces to the fraction of detected signals multiplied by the volume of a sphere with radius of the maximum injected distance. This volume is finally converted to a distance by inferring the radius of a sphere with such volume. The sensitive distance is calculated for every false-alarm rate.

runtime

The runtime is tracked by noting the time when calling the script to evaluate input data (see the section Algorithm format above) and when it returns. We measure it twice: Once when evaluating data containing injections and once when evaluating pure noise. Both values will be reported. We do not average the wall-time over multiple runs. This metric is not calculated by the provided evaluate.py script.

Available Hardware for testing

We will perform all validation and testing manually on the hardware resources listed below

Hardware Available
CPU 2 x 8 (16) cores (threads) @ 2.5 GHz
GPU 8x NVIDIA RTX 2070 Super (8GB VRAM)
Memory 192 GB

References