<img style="float: left;padding: 1.3em" src="https://indico.in2p3.fr/event/18313/logo-786578160.png">  

#  Gravitational Wave Open Data Workshop #3


#### Tutorial 1.1: Discovering open data from GW observatories

This notebook describes how to discover what data are available from the [Gravitational-Wave Open Science Center (GWOSC)](https://www.gw-openscience.org).
    
[Click this link to view this tutorial in Google Colaboratory](https://colab.research.google.com/github/gw-odw/odw-2020/blob/master/Day_1/Tuto%201.1%20Discovering%20Open%20Data.ipynb)

## Software installation  (execute only if running on a cloud platform or haven't done the installation yet!)

First, we need to install the software, which we do following the instruction in [Software Setup Instructions](https://github.com/gw-odw/odw-2020/blob/master/setup.md):

In [1]:
# -- Uncomment following line if running in Google Colab
#! pip install -q 'gwpy==1.0.1'

**Important:** With Google Colab, you may need to restart the runtime after running the cell above.

In [2]:
#check the version of the package gwosc you are using
import gwosc
print(gwosc.__version__)

0.5.3


The version you get should be 0.5.3. If it's not, check that you have followed all the steps in [Software Setup Instructions](https://github.com/gw-odw/odw-2020/blob/master/setup.md).

## Querying for event information

The gwosc.datasets module provides tools to search for datasets, including filtering on GPS times.

For example, we can search for what event datasets are available:

In [3]:
from gwosc.datasets import find_datasets
events = find_datasets(type="event")
print(events)

['GRB051103', 'GW150914', 'GW150914_R1', 'GW151012_R1', 'GW151226', 'GW151226_R1', 'GW170104', 'GW170104_R1', 'GW170608', 'GW170608_R1', 'GW170729_R1', 'GW170809_R1', 'GW170814', 'GW170814_R1', 'GW170817', 'GW170817_R1', 'GW170818_R1', 'GW170823_R1', 'LVT151012', 'MC151008_R1', 'MC151012A_R1', 'MC151116_R1', 'MC161202_R1', 'MC161217_R1', 'MC170208_R1', 'MC170219_R1', 'MC170405_R1', 'MC170412_R1', 'MC170423_R1', 'MC170616_R1', 'MC170630_R1', 'MC170705_R1', 'MC170720_R1', 'O1_O2_Preliminary_GW150914_R1', 'O1_O2_Preliminary_GW170608_R1', 'O1_O2_Preliminary_GW170814_R1', 'O1_O2_Preliminary_GW170817_R1', 'O1_O2_Preliminary_LVT151012_R1', 'O3_Discovery_Papers_GW190412_R1', 'O3_Discovery_Papers_GW190412_R2', 'O3_Discovery_Papers_GW190425_R1', 'blind_injection']


The events datasets labeled with `'GW'` represent confident detections. Of those, 11 where detected during O1 and O2 (see [GWTC-1](https://www.gw-openscience.org/GWTC-1/) for more details), the others where detected during O3 and are labeled as `'O3_Discovery_Papers'`. Many events datasets contain `'_R'+number` in their name: this indicates the number of the release, so R1 marks the first release of an event dataset, R2 the second and so on. If the release number is not indicated, this means it coincides with the first one. The prefix `'MC'` refers to _marginal_ detections (again see [GWTC-1](https://www.gw-openscience.org/GWTC-1/) for more details). The prefix `'O1_O2_Preliminary'` is used for the events published before the catalog GWTC-1.
As you can see, for one event, different datasets can be available. For example for GW150914 you will have the datasets called 'GW150914', 'GW150914_R1', 'O1_O2_Preliminary_GW150914_R1'.

`find_datasets` also accepts a `detector` keyword to return only those datasets that include data for that detector:

In [4]:
print(find_datasets(type="event", detector="V1"))
print(find_datasets(type="event", detector="G1"))

['GW170729_R1', 'GW170809_R1', 'GW170814', 'GW170814_R1', 'GW170817', 'GW170817_R1', 'GW170818_R1', 'O1_O2_Preliminary_GW170814_R1', 'O1_O2_Preliminary_GW170817_R1', 'O3_Discovery_Papers_GW190412_R1', 'O3_Discovery_Papers_GW190412_R2', 'O3_Discovery_Papers_GW190425_R1']
['GW170817', 'GW170817_R1']


`find_datasets` also accepts a `segment` keyword to narrow results based on GPS time:

In [5]:
print(find_datasets(type="event", detector="L1", segment=(1164556817, 1187733618)))

['GW170104', 'GW170104_R1', 'GW170608', 'GW170608_R1', 'GW170729_R1', 'GW170809_R1', 'GW170814', 'GW170814_R1', 'GW170817', 'GW170817_R1', 'GW170818_R1', 'GW170823_R1', 'MC161202_R1', 'MC161217_R1', 'MC170208_R1', 'MC170219_R1', 'MC170405_R1', 'MC170412_R1', 'MC170423_R1', 'MC170616_R1', 'MC170630_R1', 'MC170705_R1', 'MC170720_R1', 'O1_O2_Preliminary_GW170608_R1', 'O1_O2_Preliminary_GW170814_R1', 'O1_O2_Preliminary_GW170817_R1']


Using `gwosc.datasets.event_gps`, we can query for the GPS time of a specific event:

In [6]:
from gwosc.datasets import event_gps
gps = event_gps('GW170817')
print(gps)

1187008882.4


<div class="alert alert-info">All of these times are returned in the GPS time system, which counts the number of seconds that have elapsed since the start of the GPS epoch at midnight (00:00) on January 6th 1980. GWOSC provides a <a href="https://www.gw-openscience.org/gps/">GPS time converter</a> you can use to translate into datetime, or you can use <a href="https://gwpy.github.io/docs/stable/time/"><code>gwpy.time</code></a>.</div>

We can query for the GPS time interval for an observing run:

In [7]:
from gwosc.datasets import run_segment
print(run_segment('O1'))

(1126051217, 1137254417)


## Querying for data files

The `gwosc.locate` module provides a function to find the URLs of data files associated with a given dataset.

For event datasets, one can get the list of URLs using only the event name:

In [8]:
from gwosc.locate import get_event_urls
urls = get_event_urls('GW150914')
print(urls)

['https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/H-H1_GWOSC_4KHZ_R1-1126259447-32.hdf5', 'https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/H-H1_GWOSC_4KHZ_R1-1126257415-4096.hdf5', 'https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126259447-32.hdf5', 'https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126257415-4096.hdf5']


By default, this function returns all of the files associated with a given event, which isn't particularly helpful. However, we can can filter on any of these by using keyword arguments, for example to get the URL for the 32-second file for the LIGO-Livingston detector:

In [9]:
urls = get_event_urls('GW150914', duration=32, detector='L1')
print(urls)

['https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126259447-32.hdf5']


# Exercises

Now that you've seen examples of how to query for dataset information using the `gwosc` package, please try and complete the following exercies using that interface:

- How many months did S6 last?
- How many events were detected during O1 (only confident detections)?
- What file URL contains data for V1 4096 seconds around GW170817?