<img style="float: left;padding: 1.3em" src="https://indico.in2p3.fr/event/18313/logo-786578160.png">  

This notebook puts together tutorials available at the [Gravitational-Wave Open Science Center (GWOSC) website](https://www.gw-openscience.org)

Topics:

* Discover Gravitational Wave Open Data
* Introduction to GWpy: the TimeSeries class. Plotting and simple data manipulation.
* Spectral analysis, FFTs, PSDs, and time-frequency representation of the signals. The $Q$-transform
* Working with segments lists and Timelines
* Plot spectrograms to identify glitches, signals, and hardware injections


# Part 1.1: Discovering open data from GW observatories

## Software installation

Install the necessary software.  See [here](https://github.com/gw-odw/odw-2021/blob/master/setup.md) for further information.

In [None]:
# -- Uncomment following line if running in Google Colab
! pip install -q 'gwosc==0.5.4'

Check the version of the package gwosc you are using: you should get `0.5.4`. 

If this is not the case, check all the steps in the [Software Setup Instructions](https://github.com/gw-odw/odw-2021/blob/master/setup.md).

In [None]:
import gwosc
print(gwosc.__version__)

## Querying for event information

The module `gwosc.datasets` provides tools for searching for datasets, including events, catalogs and full run strain data releases.

For example, we can search for events in the catalogs released from the **first three observing runs** (O1, O2, and O3) and see how many confident gravitational-wave have been achieved so far.
* [GWTC-1 catalog](https://www.gw-openscience.org/eventapi/html/GWTC-1-confident/)
* [GWTC-2.1 catalog](https://www.gw-openscience.org/eventapi/html/GWTC-2.1-confident/)
* [GWTC-3 catalog](https://www.gw-openscience.org/eventapi/html/GWTC-3-confident/)


In [None]:
from gwosc.datasets import find_datasets
from gwosc import datasets

#-- List all available catalogs
print("List of available catalogs")
print(find_datasets(type="catalog"))
print("")

#-- Print all the GW events from the GWTC-1, GWTC-2.1, and GWTC-3 catalogs
gwtc1 = datasets.find_datasets(type='event', catalog='GWTC-1-confident')
gwtc2 = datasets.find_datasets(type='event', catalog='GWTC-2.1-confident')
gwtc3 = datasets.find_datasets(type='event', catalog='GWTC-3-confident')
print('GWTC-1 events:', gwtc1)
print("")
print(len(gwtc1) + len(gwtc2) + len(gwtc3))

In [None]:
#-- Print all the large strain data sets from LIGO/Virgo observing runs
runs = find_datasets(type='run')
print('Large data sets:', runs)

**Attention** -- _Note that the most recent observation runs, e.g. O2, are labeled with names containing the name of the run (e.g. O2), the sampling rate (4 or 16 kHz) and the release version (e.g. R1). This means that for O2 you have two labels 'O2_4KHZ_R1' and 'O2_16KHZ_R1', depending which is the desired sampling rate._

`datasets.find_datasets` also accepts a `segment` and `detector` keyword to narrow results based on GPS time and detector:

In [None]:
#-- Detector and segments keywords limit search result
print(datasets.find_datasets(type='events', catalog='GWTC-1-confident', detector="L1", segment=(1164556817, 1187733618)))

_Note that the name of the events contains also the version of the last release_ 

Using `gwosc.datasets.event_gps`, we can query for the GPS time of a specific event (it works also without the version number).  Let's take the case of the first GW ever observed: **GW150914**.

In [None]:
from gwosc.datasets import event_gps
gps = event_gps('GW150914')
print(gps)

<div class="alert alert-info">All of these times are returned in the GPS time system, which counts the number of seconds that have elapsed since the start of the GPS epoch at midnight (00:00) on January 6th 1980. GWOSC provides a <a href="https://www.gw-openscience.org/gps/">GPS time converter</a> you can use to translate into datetime, or you can use <a href="https://gwpy.github.io/docs/stable/time/"><code>gwpy.time</code></a> in Python codes.</div>

We can query for the GPS time interval for an observing run:

In [None]:
from gwosc.datasets import run_segment
print(run_segment('O1'))

Let's see how many confident events came from O1, O2, and O3 (=O3a+O3b), respectively:

In [None]:
O1_events = datasets.find_datasets(type='events', catalog='GWTC-1-confident', segment=run_segment('O1'))
O2_events = datasets.find_datasets(type='events', catalog='GWTC-1-confident', segment=run_segment('O2_4KHZ_R1'))
O3a_events = datasets.find_datasets(type='events', catalog='GWTC-2.1-confident', segment=run_segment('O3a_4KHZ_R1'))
O3b_events = datasets.find_datasets(type='events', catalog='GWTC-3-confident', segment=run_segment('O3b_4KHZ_R1'))
print("O1:", len(O1_events))
print("O2:", len(O2_events))
print("O3:", len(O3a_events)+len(O3b_events), "( =", len(O3a_events),"+",len(O3b_events), ")")

## Querying for data files

The `gwosc.locate` module provides a function to find the URLs of data files associated with a given dataset.

For event datasets, one can get the list of URLs using only the event name:

In [None]:
from gwosc.locate import get_event_urls
urls = get_event_urls('GW150914')
print(urls)

By default, this function returns all of the files associated with a given event, which is not immediately helpful. However, we can can filter on any of these by using keyword arguments, for example to get the URL for the 32-second file for the LIGO-Livingston detector:

In [None]:
urls = get_event_urls('GW150914', duration=32, detector='L1')
print(urls)

# Part 1.2: GWPy – Plots and simple GW data manipulation

## Software installation

Note: we use [`pip`](https://docs.python.org/3.6/installing/), but **it is recommended** to use [conda](https://docs.ligo.org/lscsoft/conda/) on your own machine, as explained in the [installation instructions](https://github.com/gw-odw/odw-2021|/blob/master/setup.md). This usage might look a little different than normal, simply because we want to do this directly from the notebook.

In [None]:
# -- Uncomment following line if running in Google Colab
! pip install -q 'gwpy==2.0.2'

**Important:** _With Google Colab, you may need to restart the runtime after running the cell above._

## Initialization

In [None]:
import gwpy
print(gwpy.__version__)

## Handling data in the time domain

### Finding open data

Let's actually read some open data. Let's start with event **GW190412**, the first detection of a gravitational-wave signal from an unequal-mass BBH (binary black hole system).

We can use the [`TimeSeries.fetch_open_data`](https://gwpy.github.io/docs/stable/api/gwpy.timeseries.TimeSeries.html#gwpy.timeseries.TimeSeries.fetch_open_data) method to download data directly from https://www.gw-openscience.org, but we need to know the GPS times for the event we want to request.

In [None]:
gps = event_gps('GW190412')
print(gps)

Now we can build a `[start, end)` GPS segment to 10 seconds around this time, using integers for convenience:

In [None]:
segment = (int(gps)-5, int(gps)+5)
print(segment)

and can now query for the full data.
For this example we choose to retrieve data for the LIGO-Livingston interferometer, using the identifier `'L1'`.
We could have chosen any of

- `'G1`' - GEO600
- `'H1'` - LIGO-Hanford
- `'L1'` - LIGO-Livingston
- `'V1'` - (Advanced) Virgo
- `'K1'` - KAGRA

In [None]:
from gwpy.timeseries import TimeSeries
ldata = TimeSeries.fetch_open_data('L1', *segment, verbose=True)
print(ldata)
print(type(ldata))

##### The `verbose=True` flag lets us see that GWpy has:
1. discovered one file that provides the data for the given interval, 
1. downloaded it, and
1. loaded the data.

The files are not stored permanently, so next time you do the same call, it will be downloaded again, however, if you know you might repeat the same call many times, you can use `cache=True` to store the file on your computer.

Notes: 

* To control the dataset from which your data come from you can use the 'dataset' keyword. It is recommended to use data from a run if they are available, because they contain the most updated version of the calibration.  For the sampling at 4 kHz, the complete command to get data from this dataset is then: `TimeSeries.fetch_open_data('L1', *segment, verbose=True, dataset='O3a_4KHZ_R1')`. 

* To read data from a local file instead of from the GWOSC server, we can use [`TimeSeries.read`](https://gwpy.github.io/docs/stable/api/gwpy.timeseries.TimeSeries.html#gwpy.timeseries.TimeSeries.read) method.

We have now downloaded real LIGO data for GW190412! **These are the actual data used in the analysis that discovered the first binary black hole merger.**

To sanity check things, we can easily make a plot, using the [`plot()`](https://gwpy.github.io/docs/stable/timeseries/plot.html) method of the `data` `TimeSeries`.

<div class="alert alert-info">
Since this is the first time we are plotting something in this notebook, we need to configure `matplotlib` (the plotting library) to work within the notebook properly:
</div>

### Plotting the data

In [None]:
%matplotlib inline
plot = ldata.plot()

Notes: There are alternative ways to access the GWOSC data. 

* [`readligo`](https://losc.ligo.org/s/sample_code/readligo.py) is a light-weight Python module that returns the time series into a Numpy array.
* The [PyCBC](http://github.com/ligo-cbc/pycbc) package has the `pycbc.frame.query_and_read_frame` and `pycbc.frame.read_frame` methods.

## Handling data in the frequency domain using the Fourier transform

We can calculate the Fourier transform of our `TimeSeries` using the [`fft()`](https://gwpy.github.io/docs/stable/api/gwpy.timeseries.TimeSeries.html#gwpy.timeseries.TimeSeries.fft) method:

In [None]:
fft = ldata.fft()
print(fft)

The result is a [`FrequencySeries`](https://gwpy.github.io/docs/stable/frequencyseries/), with complex amplitude, representing the **amplitude and phase** of each frequency in our data.
We can use `abs()` to extract the amplitude and then plot it:

In [None]:
plot = fft.abs().plot(xscale="log", yscale="log")
plot.show()

If you are familiar with GW detector spectra, this does not look correct at all!
The problem is that the FFT works under the assumption that our data are periodic, which means that the edges of our data look like discontinuities when transformed.

We need to apply a window function to our time-domain data before transforming: this can be done with the [`scipy.signal`](https://docs.scipy.org/doc/scipy/reference/signal.html) module:

<img style="float: left;padding: 1.3em" src="https://www.gaussianwaves.com/gaussianwaves/wp-content/uploads/2020/09/Coherent-Power-gain-of-Hann-window-1024x297.png">  

In [None]:
from scipy.signal import get_window
window = get_window('hann', ldata.size)
lwin = ldata * window

We transform the windowed data this time and see what we get.

In [None]:
fftamp = lwin.fft().abs()
plot = fftamp.plot(xscale="log", yscale="log")
plot.show(warn=False)

This looks a little more like what we expect for the amplitude spectral density of a gravitational-wave detector.

## Calculating the amplitude spectral density

In practice, we typically use a large number of FFTs to estimate an average power spectral density over a long period of data.
We can do this using the [`asd()`](https://gwpy.github.io/docs/stable/api/gwpy.timeseries.TimeSeries.html#gwpy.timeseries.TimeSeries.asd) method, which uses [Welch's method](https://en.wikipedia.org/wiki/Welch%27s_method) to combine FFTs of overlapping, windowed chunks of data.

In [None]:
asd = ldata.asd(fftlength=4, method="median")
plot = asd.plot()
plot.show(warn=False)

In [None]:
ax = plot.gca()
ax.set_xlim(10, 1400)
ax.set_ylim(1e-24, 1e-20)
plot

The ASD is a standard tool used to study the frequency-domain sensitivity of a gravitational-wave detector.
For the LIGO-Livingston data we loaded, we can see large spikes at certain frequencies, including

- ~300 Hz
- ~500 Hz
- ~1000 Hz

The [O2 spectral lines](https://www.gw-openscience.org/o2speclines/) page on GWOSC describes a number of these spectral features for O2: some of them are forced upon us, while others are deliberately introduced to help with interferometer control.

Loading a longer stretch of data allows for more FFTs to be averaged during the ASD calculation, meaning random variations get averaged out, and we can see more detail:

In [None]:
ldata2 = TimeSeries.fetch_open_data('L1', int(gps)-512, int(gps)+512, cache=True)
lasd2 = ldata2.asd(fftlength=4, method="median")
plot = lasd2.plot()
ax = plot.gca()
ax.set_xlim(10, 1400)
ax.set_ylim(1e-24, 1e-20)
ax.set_ylabel(r'Strain noise [$1/\sqrt{\mathrm{Hz}}$]')
plot.show(warn=False)

Now we can see some more features, including sets of lines around ~30 Hz and ~65 Hz, and some more isolated lines through the more sensitive region.

For comparison, we can load the LIGO-Hanford and the Virgo data and plot them  as well:

In [None]:
# Get Hanford data
hdata2 = TimeSeries.fetch_open_data('H1', int(gps)-512, int(gps)+512, cache=True)
hasd2 = hdata2.asd(fftlength=4, method="median")

# Get Virgo data
vdata2 = TimeSeries.fetch_open_data('V1', int(gps)-512, int(gps)+512, cache=True)
vasd2 = vdata2.asd(fftlength=4, method="median")

# And plot using standard colours
ax.plot(hasd2, label='LIGO-Hanford', color='gwpy:ligo-hanford')
ax.plot(vasd2, label='Virgo', color='gwpy:virgo')

# Update the Livingston line to use standard colour, and have a label
lline = ax.lines[0]
lline.set_color('gwpy:ligo-livingston')  # Change colour of Livingston data
lline.set_label('LIGO-Livingston')

ax.set_ylabel(r'Strain noise [$1/\sqrt{\mathrm{Hz}}$]')
ax.legend()
plot

Now we can see clearly the relative sensitivity of each LIGO instrument, the common features between both, and those unique to each observatory.

# Part 1.3: Data representations in GWpy

This includes
* the spectrogram
* the Q-transform

## Showing the time-evolution of FFTs

The FFT, and the ASD, show us a snapshot of the frequency-domain content of our signal, referred to a single time interval.
It is commonly useful to show how this frequency-domain content evolves over time.

For this we use **spectrograms**, which show the FFT (or ASD) at each time step on a **time-frequency representation**.
The `TimeSeries` in GWpy includes two methods for this:

- [`spectrogram()`](https://gwpy.github.io/docs/stable/api/gwpy.timeseries.TimeSeries.html#gwpy.timeseries.TimeSeries.spectrogram) - which includses a `stride` parameter, and shows an averaged ASD every time interval corresponding to each stride, and 
- [`spectrogram2()`](https://gwpy.github.io/docs/stable/api/gwpy.timeseries.TimeSeries.html#gwpy.timeseries.TimeSeries.spectrogram2) - shows a single-FFT ASD, defined by its `fftlength`, at each time step. These FFTs can include overlapping segments of tdata, as specified by the `overlap` parameter.

Which one should be used? The short answer is: use `spectrogram2()` for short(ish) chunks of data, less than a minute or so, and `spectrogram()` for longer chunks where the averaging helps remove very short noise bursts.

[ _The long answer is that averaging the FFTs computed each stride of data reduces the variance of the resulting ASD estimate. However, these averages also reduce the time resolution of the corresponding spectrogram, which is no more `fftlength` $-$ `overlap` but is now determined by the duration of the `stride`. This is called [Welch's spectral density estimation method](https://en.wikipedia.org/wiki/Welch%27s_method). 
The choice of the overalp is instead determined by how independent we want our FFTs to be (no overlap) and how dense we want them to be (`overlap` equals to a large fraction of `fftlegth`). Refer to [this GWpy example](https://gwpy.github.io/docs/stable/examples/spectrogram/spectrogram2.html) on an over-dense spectrogam of GW150914._ ]

Let's focus on event **GW170817**, the **first binary neutron star coalescence ever observed**. 

In [None]:
gps = event_gps('GW170817')
print("GW170817 GPS:", gps)

ldata = TimeSeries.fetch_open_data('L1', int(gps)-512, int(gps)+512, cache=True)
print("GW170817 data")
print(ldata)

Now, we can generate our spectrogram using a specific FFT length (remembering to use a window):

<div class="alert alert-info">
Each of these methods returns the output as stacked power spectral densities, so we take the square root to get back to a familiar amplitude spectral density
</div>

In [None]:
specgram = ldata.spectrogram2(fftlength=4, overlap=2, window='hann') ** (0.5)
plot = specgram.plot()
plot.add_colorbar()

Hmmm... something is not right. Can you spot the only "hot" point in this colormap? Maybe the default scale of the color axis is not suitable to fit the excursion of values in this map. **Pro Tip**: check `specgram.min()` and `specgram.max()` values to see if the previous guess is right or not.

In [None]:
print(specgram.min())
print(specgram.max())

We can fix this by passing a few more arguments to our plot to control the display (especially the colouring):

In [None]:
ax = plot.gca()
ax.set_yscale('log')
ax.set_ylim(10, 1400)
ax.colorbar(
    clim=(1e-24, 1e-20),
    norm="log",
    label=r"Strain noise [$1/\sqrt{\mathrm{Hz}}$]",
)
plot  # refresh

Here we can see how the ASD for LIGO-Livingston evolves over a ~17 minute span around GW170817. Four order of magnitudes in the ASD intensity are involved, and this is why the previous attempt resulted in an (almost!) fully dark blue image.

We can see that the low-frequency noise (<30 Hz) rumbles along with some variation, but high frequencies (>100 Hz) are relatively stable. Between 30-100 Hz we can see some narrow features appearing and disappearing as non-stationary noise affects the measurement.

## $Q$-transforms in GWpy

The **spectrogram** above is a useful way to show the variation of a amplitude spectral density (ASD) estimate over time. It is best used to see general trends in how the sensitivity of the GW detectors is changing **over longish periods (minutes or hours)**.

In this section, we will see how we can use a special filter, called a **Q-transform**, to create a time-frequency representation of our data that allows use to pick out features at different frequencies, and how they evolve **over very short times**, without much prior knowledge of the signal morphology.

See [this article](https://doi.org/10.1088/0264-9381/21/20/024) for more details on the Q-transform and its application to gravitational-wave data.

First, lets reload some data from LIGO Hanford around GW170817:

In [None]:
segment = (int(gps) - 30, int(gps) + 2)
hdata = TimeSeries.fetch_open_data('H1', *segment, verbose=True, cache=True)

We can now use the `q_transform()` method of the `hdata` `TimeSeries` to create our time-frequency representation (as a [spectrogram](https://gwpy.github.io/docs/stable/spectrogram/)).

In [None]:
hq = hdata.q_transform(frange=(30, 500))
plot = hq.plot()
plot.colorbar(label="Normalised energy")

From this we can see a different representation of the data. Because the Q-transform returns (by default) normalised energy, the low-frequency rumbling is now much less obvious, and we can see better some noises at higher frequencies.

But, we can clean up the display to better visualise the data:

In [None]:
ax = plot.gca()
ax.set_epoch(gps)
ax.set_ylim(30, 500)
ax.set_yscale("log")
plot  # refresh

Now we can see a more prominent feature starting at ~-6 seconds that looks a little familiar.
Here we can use our knowledge of the Q-transform, and our hunch about the origin of the "feature" to choose a more specific range of 'Q' for the Q-transform, so as to better resolve the feature:

In [None]:
hq = hdata.q_transform(frange=(30, 500), qrange=(100, 110))
plot = hq.plot()
ax = plot.gca()
ax.set_epoch(gps)
ax.set_yscale('log')
ax.colorbar(label="Normalised energy")

Now we see the beautiful, clear track of a BNS merger, visible from about -4 seconds (maye -10 if you squint), all the way through to the merger at T=0.

 We can also use the `outseg` option to zoom in around the merger:

In [None]:
#-- Use OUTSEG for small time range
hq2 = hdata.q_transform(frange=(30, 500), qrange=(80, 110), outseg=(gps-3,gps+0.5)) 
plot = hq2.plot()
ax = plot.gca()
ax.set_epoch(gps)
ax.set_yscale('log')
ax.colorbar(label="Normalised energy")

We can repeat the exercise using LIGO-Livingston data to see something even more remarkable.
First we download and filter the Livingston data:

In [None]:
ldata = TimeSeries.fetch_open_data('L1', *segment, verbose=True)

In [None]:
lq = ldata.q_transform(frange=(30, 500), qrange=(100, 110))
plot = lq.plot()
ax = plot.gca()
ax.set_epoch(gps)
ax.set_yscale('log')
ax.colorbar(label="Normalised energy")

Now we can see a large blob of energy that is 1000 times louder than what we see in the LIGO-Hanford data.
As luck would have it, an instrumental glitch almost exactly overlaps the BNS signal in LIGO-Livingston.
But, we can rescale things to see the astrophyscal signal better:

In [None]:
plot.colorbars[0].mappable.set_clim(0,20)
plot.refresh()
plot

Now we can see the BNS signal track all the way back to T=-28 seconds in LIGO-Livingston data!
 
This is basically the same procedure (and the same code) that was used to produce Figures 1 and 2 of the BNS discovery article '_Observation of Gravitational Waves from a Binary Neutron Star Inspiral_' [[link](https://doi.org/10.1103/PhysRevLett.119.161101)]