# Query the OpenScience datasets using `gwosc`

This pre-tutorial describes how you can use the [`gwosc`](//gwosc.readthedocs.io) python module to search for GW open data information.

First, lets install it:

In [1]:
import sys
!{sys.executable} -m pip install "gwosc"

Collecting gwosc
  Using cached https://files.pythonhosted.org/packages/74/f6/1c7ad8effc4f770000b4779e8462d0a1932678d9c2b78c5c81b61b8eda66/gwosc-0.3.3-py2.py3-none-any.whl
Installing collected packages: gwosc
Successfully installed gwosc-0.3.3


## Querying for event information

The `gwosc.datasets` module provides tools to search for datasets, including filtering on GPS times.

For example, we can search for what event datasets are available:

In [2]:
from gwosc.datasets import find_datasets
events = find_datasets(type='event')
print(events)

[u'GW150914', u'GW151226', u'GW170104', u'GW170608', u'GW170814', u'GW170817', u'LVT151012']


Here we see the list of confirmed detections (those prefixed as 'GW') and one likely detection (prefixed as 'LVT'). `find_datasets` also accepts a `detector` keyword to return only those datasets that include data for that detector.

We can query for the GPS time of a given event:

In [3]:
from gwosc.datasets import event_gps
gps = event_gps('GW170817')
print(gps)

1187008882.43


<div class="alert alert-info">All of these times are returned in the GPS time system, which counts the number of seconds that have elapsed since the start of the GPS epoch at midnight (00:00) on January 6th 1980. LOSC provides a <a href="https://losc.ligo.org/gps/">GPS time converter</a> you can use to translate into datetime, or you can use <a href="https://gwpy.github.io/docs/stable/time/"><code>gwpy.time</code></a>.</div>

We can query for the GPS time interval for an observing run:

In [4]:
from gwosc.datasets import run_segment
print(run_segment('S6'))

(931035615, 971622015)


## Querying for data files

### Events during O1

The `gwosc.locate` module provides a function to find the URLs of data files associated with a given dataset.

For event datasets, one can get the list of URLs using only the event name:

In [5]:
from gwosc.locate import get_event_urls
urls = get_event_urls('GW150914')
print(urls)

[u'https://losc.ligo.org//s/events/GW150914/H-H1_LOSC_4_V2-1126259446-32.hdf5', u'https://losc.ligo.org//s/events/GW150914/L-L1_LOSC_4_V2-1126259446-32.hdf5', u'https://losc.ligo.org//s/events/GW150914/H-H1_LOSC_4_V2-1126257414-4096.hdf5', u'https://losc.ligo.org//s/events/GW150914/L-L1_LOSC_4_V2-1126257414-4096.hdf5']


By default, this function returns all of the files associated with a given event, which isn't particularly helpful. However, we can can filter on any of these by using keyword arguments, for example to get the URL for the 32-second file for the LIGO-Livingston detector:

In [6]:
urls = get_event_urls('GW150914', duration=32, detector='L1')
print(urls)

[u'https://losc.ligo.org//s/events/GW150914/L-L1_LOSC_4_V2-1126259446-32.hdf5']


### Events during O2

For events during O2 (and beyond), multiple different types of data were released for a given event, typically including the calibrated strain data, and a cleaned dataset with numerous well-defined instrumental noises removed.

When querying for events during O2, an error will be raised if you don't specify which type of data you want, using the `tag` keyword:

In [7]:
urls = get_event_urls('GW170817')

ValueError: multiple LOSC URL tags discovered in dataset, please select one of: u'C00', u'CLN'

The error message tells you what tags are available, and you can refer to the [release page](https://losc.ligo.org/events/GW170817/) for documentation on what each tagged set contains.

Now, if you specify the tag, you get back to the sitaution as for those events in O1:

In [8]:
urls = get_event_urls('GW170817', tag='CLN')
print(urls)

[u'https://losc.ligo.org//s/events/GW170817/H-H1_LOSC_CLN_4_V1-1187007040-2048.hdf5', u'https://losc.ligo.org//s/events/GW170817/L-L1_LOSC_CLN_4_V1-1187007040-2048.hdf5', u'https://losc.ligo.org//s/events/GW170817/V-V1_LOSC_CLN_4_V1-1187007040-2048.hdf5']


We see that for this cleaned dataset, there's just a single file for each interferometer, so we could select on one using the `detector` keyword:

In [9]:
urls = get_event_urls('GW170817', tag='CLN', detector='V1')
print(urls)

[u'https://losc.ligo.org//s/events/GW170817/V-V1_LOSC_CLN_4_V1-1187007040-2048.hdf5']


# Exercises

Now that you've seen examples of how to query for dataset information using the `gwosc` package, please try and complete the following exercies using that interface:

- How many months did S6 last?
- How many events were detected during O1?
- Which event releases include data for the Virgo detector?