# Tutorial Step 1: Download GWOSC Data

Before we begin, let's outline the content of the tutorials:
  - Notebook 1 explains how to download data from the GWOSC website.
  - Notebook 2 presents details about reading the content of a file.
  - Notebook 3 presents quality flags, an important concept when working with gravitational wave data.
  - Notebook 4 presents another important concept for gravitational wave data, the frequency domain.
  - Notebook 5 presents a higher-level interface that hides many details to provide an easier access to the data.

If you want to understand the details about GWOSC files, simply run the tutorials in order
If you're comfortable with python and don't want to bother about the details, go directly to [notebook 5](<05 - GWpy Examples.ipynb>).

This tutorial will show you how to download data from the [GWOSC website](https://gwosc.org).

## Browse available data sets

Go to the [GWOSC website](https://gwosc.org/) and open "Get Data" in the menu bar and then click on "Download".
The [Observatory Data Sets](https://gwosc.org/data/) page will display the list of data sets.

## Use timelines to show times of available data

Data for the LVK detectors is only publicly available during times when the detectors were operating under normal conditions, labeled with the DATA flag (i.e. DATA is available at this time).
The GWOSC Timelines provide a convenient, graphical tool for discovering times when detectors were collecting data.

Let's start by looking at the DATA Timeline for H1 during the Second Part of the Third Observing Run (O3b):
  - On the [Observatory Data Sets](https://gwosc.org/data/) page, find the "O3b" Data Release.
  - Click the [Timeline icon](https://gwosc.org/timeline/show/O3b_16KHZ_R1/H1_DATA*L1_DATA*V1_DATA/1256655618/12708000/). You should see plots like this: ![timeline](./img/timeline.png)

Here are some notes to help understand the GWOSC Timelines:
  - The label on the far left indicates which instrument is represented in the plot. In this example, H1 corresponds to the "Hanford One" detector.
  - The curve shows if the detector was collecting data at this particular time: a vertical bar indicates that the detector collects data while the absence of a bar indicates that the detector was not collecting data.
  - This data set spans the Second Part of the Third Observing Run (O3b) i.e. from 2019-11-01T15:00:00 UTC (GPS=1256655618) to 2020-03-27T17:00:00 UTC (GPS=1269363618).
  - You can use your mouse to zoom and pan the graphs and download the figure as an image on your computer by clicking on the "..." icon in the top-right corner of the plot.

## Download data files from the data archive

Now that we've got some idea when data is available, let's get the actual data files.
We'll try to find data for the month of January 2020.
We can query the O3b archive for the data we want:

  - Click the "Get Data" link in the menu bar and then on "Download".
  - Under "O3b Data Release", click the "4 kHz Data" icon:

    - Use the radio buttons to select "H1".
    - In the web form, enter the dates 2020-01-01T00:00:00 UTC (GPS=1261872018) and 2020-01-31T23:59:59 UTC (GPS=1264550417).
    - Then, click the continue button.

As you can see, we use the GPS time.
To find the GPS time of a specific UTC time (or vice-versa), you can use the [time conversion tool](https://gwosc.org/gps/).

This will query the database for data files between the entered dates.
You should see a list that looks like this: ![archive content](./img/archive_content.png)

Each line of the table corresponds to a data file (or tile) covering 4096 seconds of calendar time.
A given instrument may be up or down for any fraction of that time, and the far right column shows what percentage of the 4096 seconds contains science mode data.

Let's download one data file, which we'll use for the rest of this tutorial.
Since we want a file with mostly Science Mode data, let's download the file that starts at GPS time 1264312320.

To download the file, just click the link in the column under the heading "HDF5".
The downloaded file should be named `H-H1_GWOSC_O3b_4KHZ_R1-1264312320-4096.hdf5`.

<div class="alert alert-block alert-warning">
<div><b>&#9888; Warning</b></div>
    After the download completes, move the file in the directory where the notebooks are stored.
    If you use Binder or Colab to run the notebooks, upload this file to the Binder/Colab instance.
    Alternatively, next section will show you how to programmatically download this file in the current folder.
</div>

## Programmatic download

It's possible to interact with the GWOSC website using python code.
By creating dedicated web requests, it is possible to get the information in a form usable by the program.
You might have recognized what is known as a web API (if you're not familiar with this concept, don't pay attention to it for now).

In the following example, we are going to show how to download the same file with web requests.
For this, we use the [requests python package](https://requests.readthedocs.io/en/latest/) which allows to send requests to the server in a simple way.

First let's define a function that will return the list of strain files between 2 GPS times for a given run and detector.

In [1]:
import requests


def fetch_strain_list(run, detector, gps_start, gps_end):
    "Return the list of strain file info for `run` and `detector`."

    # Get the strain list
    fetch_url = (
        f"https://gwosc.org/archive/links/"
        f"{run}/{detector}/{gps_start}/{gps_end}/json/"
    )
    response = requests.get(fetch_url)
    response.raise_for_status()
    return response.json()["strain"]

Now let's use this with the same GPS times as before:

In [2]:
strain_files = fetch_strain_list("O3b_4KHZ_R1", "H1", 1261872018, 1264550417)
print(f"Found {len(strain_files)} files")
print(strain_files[0:5])

Found 1062 files
[{'GPSstart': 1261965312, 'UTCstart': '2020-01-02T01:54:54', 'detector': 'H1', 'sampling_rate': 4096, 'duration': 4096, 'format': 'hdf5', 'url': 'http://gwosc.org/archive/data/O3b_4KHZ_R1/1261436928/H-H1_GWOSC_O3b_4KHZ_R1-1261965312-4096.hdf5', 'min_strain': -5.393283677409768e-19, 'max_strain': 4.920883533490361e-19, 'mean_strain': 5.0291205486927025e-25, 'stdev_strain': 9.167596975962463e-20, 'duty_cycle': 91.2353515625, 'BLRMS200': 5.472092079816551e-24, 'BLRMS1000': 1.9148069663502894e-21, 'BNS': 106.15558044159869}, {'GPSstart': 1261965312, 'UTCstart': '2020-01-02T01:54:54', 'detector': 'H1', 'sampling_rate': 4096, 'duration': 4096, 'format': 'gwf', 'url': 'http://gwosc.org/archive/data/O3b_4KHZ_R1/1261436928/H-H1_GWOSC_O3b_4KHZ_R1-1261965312-4096.gwf', 'min_strain': -5.393283677409768e-19, 'max_strain': 4.920883533490361e-19, 'mean_strain': 5.0291205486927025e-25, 'stdev_strain': 9.167596975962463e-20, 'duty_cycle': 91.2353515625, 'BLRMS200': 5.472092079816551e-2

Take some time to inspect the `strain_files` variable.
You will see that it contains the same data than the table above.

To find the HDF5 file starting at GPS time 1264312320, we can proceed as follows:

In [3]:
def download_strain_file(download_url):
    "Download the strain file on the given url and save to disk."
    # In the next line I parse the file name from the download url.
    # Ideally, the file name should be grabbed from the
    # Content-Disposition response header.
    filename = download_url.split("/")[-1]
    with requests.get(download_url, stream=True) as r:
        r.raise_for_status()
        with open(filename, "wb") as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)
    return filename


for a_file in strain_files:
    if a_file["GPSstart"] == 1264312320 and a_file["format"] == "hdf5":
        print(f"Downloading {a_file['url']}")
        fname = download_strain_file(a_file["url"])

Downloading http://gwosc.org/archive/data/O3b_4KHZ_R1/1263534080/H-H1_GWOSC_O3b_4KHZ_R1-1264312320-4096.hdf5


The file should be downloaded in the current folder.

<div class="alert alert-block alert-warning">
<div><b>&#9888; Warning</b></div>
    You could be tempted to use such code to mass-download the data by looping on all the files.
    However, this would induce a high load on the GWOSC servers and will usually fail due to various network glitches.
    Therefore remember that with great power comes great responsibility and don't do this.
    If you want to download a large amount of data, have a look at <a href="https://gwosc.org/cvmfs/">CVMFS</a>.
</div>

## What's next?

In the [next step](<02 - What's in a GWOSC Data File.ipynb>), you will learn how gravitational wave data are stored in this file.