Skip to content

Data Sets

Marlin Schäfer edited this page Oct 9, 2021 · 10 revisions

This page describes the four data sets which are used to test the submitted algorithms. Software to create instances of these data sets is provided in the script generate_data.py of this repository. For details on the usage of this script, please refer to the main page about the available software here.

There are four distinct data sets labeled by an increasing integer index. The higher the index, the more realistic the data set.

For final evaluation each data set will be of 1 month duration. They will be generated using the provided script with an unknown seed. All submission are then applied to these data sets and evaluated as detailed here.

Details

All data sets contain time domain data from the two detectors in Hanford and Livingston sampled at 2048Hz. This data is raw, i.e. no pre-processing like whitening, bandpassing, etc. has been performed on it. The only pre-processing applied is a high-pass filter of 15 Hz. Injected waveforms are generated using the waveform approximant IMRPhenomXPHM with a lower frequency cutoff of 20 Hz. The parameters shown in the table below are the same for all four data sets. Other parameters are varied as described in the following sections.

Fixed parameter distributions: isotropic in sky location, inclination uniform in cosine angle, coalescence phase and polarization uniform from 0 to 2 Pi

The data for any of the data sets is chopped up into multiple parts, where each part has a minimum duration of 2 hours. These parts may be uncorrelated. Therefore, no signals are being injected in the first and final 30 seconds of each part.

Data Set 1

All noise is simulated from the power spectral density aLIGOZeroDetHighPower and, therefore, Gaussian. The same power spectral density is used in both detectors. Signals are simulated with component masses ranging from 10 to 50 solar masses and spins are set to 0. Only the fundamental mode is simulated.

Data Set 2

All noise is simulated and Gaussian. The power spectral density is the same for all data from one detector but is unknown. The power spectral density may be different in the different detectors. The component masses are drawn such that signals are not longer in duration than 20 seconds (with a lower frequency cutoff of 20 Hz), spins are aligned with the orbital angular momentum, and the spin z-component is uniformly distributed between -0.99 and 0.99. Only the fundamental mode is simulated.

Data Set 3

All noise is simulated and Gaussian. The data is split into multiple parts and each segment has a random power spectral density. The power spectral density may be different between detectors. The component masses are drawn such that signals are not longer in duration than 20 seconds (with a lower frequency cutoff of 20 Hz). Spin orientations are isotropically distributed and the magnitude is uniform between 0 and 0.99. This means that precession effects will be present in the waveform. All higher order multipoles available for the approximant IMRPhenomXPHM are calculated ((2, 2), (2, -2), (2, 1), (2, -1), (3, 3), (3, -3), (3, 2), (3, -2), (4, 4), (4, -4)).

Data Set 4

Real noise from the O3a run is used for both detectors. The data is required to have the DATA data-quality flag and none of the categories CBC_CAT1, CBC_CAT2, CBC_HW_INJ, and BURST_HW_INJ may be active. Overlapping segments span a minimum duration of 2 hours. Signal injection parameters are drawn from the same distribution and using the same effects as for Data Set 3. The code used to prepare the real noise data can be found in downsample.py.

Clone this wiki locally