<a id="topD"></a>

# Downloading COS Data

# Learning Goals
### This Notebook is designed to walk the user (*you*) through: **Downloading Existing Cosmic Origins Spectrograph (*COS*) data from the online archive**
   #### 1. [**Using the web browser interface**](#mastD)
   ##### - 1.1. [The MAST Web Search](#mastD)
   ##### - 1.2. [Searching for a Series of Observations on the MAST Web Search](#WebSearchSeriesD)
   #### 2. [**Using the `Python` module `Astroquery`**](#astroqueryD)
   ##### - 2.1. [Searching for a single source with Astroquery](#Astroquery1D)
   ##### - 2.2. [Narrowing Search with Observational Parameters](#NarrowSearchD)
   ##### - 2.3. [Choosing and Downloading Data Products](#dataprodsD)
   ##### - 2.4. [Using astroquery to find data on a series of sources](#Astroquery2D)
   

# 0. Introduction
#### The Cosmic Origins Spectrograph ([*COS*](https://www.nasa.gov/content/hubble-space-telescope-cosmic-origins-spectrograph)) is an ultraviolet spectrograph on-board the Hubble Space Telescope([*HST*](https://www.stsci.edu/hst/about)) with capabilities in the near ultraviolet (*NUV*) and far ultraviolet (*FUV*).

#### This tutorial aims to prepare you to access the existing COS data of your choice by walking you through downloading a processed spectrum, as well as various calibration files obtained with COS.

- For an in-depth manual to working with COS data and a discussion of caveats and user tips, see the [COS Data Handbook](https://hst-docs.stsci.edu/display/COSDHB/).
- For a detailed overview of the COS instrument, see the [COS Instrument Handbook](https://hst-docs.stsci.edu/display/COSIHB/).


## We will import the following packages:

- numpy to handle array functions
- astropy.io fits for accessing FITS files
- astropy.table Table for creating tidy tables of the data
- astropy.units and astropy.visualization.quantity_support for dealing with units
- matplotlib.pyplot for plotting data
- astroquery.mast Mast and Observations for finding and downloading data from the [MAST](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html) archive
- csv reader for reading in from a csv file of source names.

In [None]:
#Make matplotlib look good in a notebook
%matplotlib inline
# Manipulating arrays
import numpy as np
# Reading in data
from astropy.table import Table

# Downloading data from archive
from astroquery.mast import Mast
from astroquery.mast import Observations
from astroquery.mast import Catalogs

# Reading in multiple source names from a csv file
from csv import reader

## We will also define a few directories in which to place our data.

In [None]:
# These will be important directories for the notebook
cwd = !pwd
cwd = cwd[0]
!mkdir ./data
datadir = cwd + '/data/'

<a id="downloadD"></a>
# 1. Downloading the Data through the Browser interface

One can search for COS data from both a browser-based gui and a command-line `Python` library.

##### *A more in-depth MAST archive tutorial can be found [here](https://mast.stsci.edu/api/v0/MastApiTutorial.html).*

<a id="mastD"></a>
## 1.1 The MAST Web Search
A browser gui for searching HST archival data can be found [here](http://archive.stsci.edu/hst/search.php).

The search page is laid out as in fig. 1.1:
### Fig 1.1
<center><img src=./figures/Mast_hst_searchformQSO.png width ="900" title="MAST Archive search form for a COS data query"> </center>

where here we have indicated we would like to find all archival science data from the **COS far-ultraviolet (FUV) configuration**, taken with any grating while looking at Quasi-Stellar Objects (QSO) within a 3 arcminute radius of (1hr:37':40", +33d 09m 32s). The output columns we have selected to see are visible in the bottom left of Fig 1.1.

Note that if you have a list of coordinates, Observation ID(s), etc. for a series of targets you can click on the "File Upload Form" and attach your list of OBSIDs or identifying features. Then specify which type of data your list contains using the "File Contents" drop-down menu.

Figure 1.2 shows the results of our search shown in Fig 1.1.
### Fig 1.2
<center><img src=figures/QSO_MastSearchRes.png width ="900" title="MAST Archive search results for a COS data query"> </center>


#### We now choose our dataset.
We rather arbitrarily select LCXV13050 because of its long exposure time, taken under an calibration program described as:
> "Project AMIGA: Mapping the Circumgalactic Medium of Andromeda"

This is a Quasar known as [3C48](http://simbad.u-strasbg.fr/simbad/sim-basic?Ident=3c48&submit=SIMBAD+search), one of the first quasars discovered.

Clicking on the dataset, we are taken to a page displaying a preview spectrum (Fig 1.3).
### Fig 1.3

<center><img src=./figures/QSOPreviewSpec.png width ="900" title="MAST Archive preview spectrum of LCXV13050"> </center>

We now return to the [search page](http://archive.stsci.edu/hst/search.php) and enter in LCXV13050 under "Dataset" with no other parameters set. Hitting search, now we see a single-rowed table with *just* our dataset, and the option to download datasets. We mark the row we wish to download and click "Submit marked data for retrieval from STDADS". See Fig 1.4.

### Fig. 1.4

<center><img src =figures/LCXV13050_res.png width ="900" title="MAST Archive dataset overview of LCXV13050"> </center>

Now we see a page like in Fig 1.5, where we can either sign in with STScI credentials, or simply provide our email to proceed (relatively) anonymously. Make sure to select "Deliver the data to the Archive staging area". Hit "Send Retrieval Request to STDADS" and you will recieve an email with instructions on downloading with ftp. You will need to do this step from the command line.

### Fig. 1.5
<center><img src =figures/DownloadOptions.png width ="900" title="Download Options for LCXV13050"> </center>

In the case of this request, the command to retrieve the data was:
>`wget -r --ftp-user=anonymous --ask-password ftps://archive.stsci.edu/stage/anonymous/anonymous42822 --directory-prefix=datadir`

where the password was the email address used, and datadir is the directory defined at the beginning of this notebook.. Now all the data is in a subdirectory `"/archive.stsci.edu/stage/anonymous/anonymous42822/"`

#### Well Done!


### Exercise 1: *Searching the archive for TRAPPIST-1 data*

[TRAPPIST-1](https://en.wikipedia.org/wiki/TRAPPIST-1) is a cool red dwarf with a multiple-exoplanet system. 
- Find its coordinates using the [SIMBAD Basic Search](http://simbad.u-strasbg.fr/simbad/sim-fbasic).
- Use those coordinates in the [MAST web search](https://archive.stsci.edu/hst/search.php) to find all COS exposures of the system.
- Limit the search terms to find the COS dataset taken in the COS far-UV configuration with the grating G130M.

#### What is the dataset ID, and how long was the exposure?

Place your answer in the cell below.

In [None]:
# Your answer here



<a id=WebSearchSeriesD></a>
## 1.2. Searching for a Series of Observations on the MAST Web Search

Now let's try using the web interface's [file upload form](http://archive.stsci.edu/hst/search.php?form=fuf) to search for a series of observations by their dataset IDs. We're going to look for three observations of the same object, the white dwarf WD1057+719, taken with three different COS gratings. Two are in the FUV and one in the NUV. The dataset IDs are
- LDYR52010
- LBNM01040
- LBBD04040

We make a comma-separated-value txt file with these three obs_ids, and save it as `obsId_list.txt`. 

In [None]:
obsIdList = ['LDYR52010','LBNM01040','LBBD04040']
obsIdList_length = len(obsIdList)

with open('./obsId_list.txt', 'w') as f:
    for i, item in enumerate(obsIdList):
        if i < obsIdList_length - 1:
            f.writelines(item + ",")
        if i == obsIdList_length - 1:
            f.writelines(item)

Then we link to this file under the **Local File Name** browse menu on the file upload form. We must set the **File Contents** term to Data ID, as that is the identifier we have provided in our file, and we change the **delimiter** to a comma.
Because we are searching by Dataset ID, we don't need to specify any additional parameters to narrow down the data.

### Fig 1.6
<center><img src =figures/FUF_search.png width ="900" title="File Upload Search Form"> </center>

#### Perfect! We now can access all the datasets, as shown in Fig. 1.7:

## Fig. 1.7
<center><img src =figures/FUF_res.png width ="900" title="File Upload Search Results"> </center>

Now, to download all of the relavent files, we can check the **mark** box for all of them, and again hit  "Submit marked data for retrieval from STDADS". This time, we want to retrieve **all the calibration files** associated with each dataset, so we check the following boxes:
- Uncalibrated
- Calibrated
- Used Reference Files
See Fig. 1.8

### Fig 1.8
<center><img src =./figures/DownloadOptions_FUF.png width ="900" title="Download Options for multiple datasets"> </center>


The procedure from here is the same described above in Section 1.1. Now, when we download with wget, we obtain multiple subdirectories with each dataset separated.

<a id = astroqueryD></a>
# 2. The Python Package `astroquery.mast`
Another way to search for and download archived datasets is within `Python` using the module [`astroquery.mast`](https://astroquery.readthedocs.io/en/latest/mast/mast.html). We have already imported 2 of this module's key tools: `Observations` and `Mast`.

<a id=Astroquery1D></a>
## 2.1. Searching for a single source with Astroquery

There are *many* options for searching the archive with astroquery, but we will begin with a very general search using the coordinates we found for WD1057+719 in the last section to find the dataset with the longest exposure time using the G160M filter. 
- Our coordinates were:      (11:00:34.126 +71:38:02.80). 
    - We can search these coordinates as sexagesimal coordinates, or convert them to decimal degrees.

In [None]:
query_1 = Observations.query_object("11:00:34.126 +71:38:02.80", radius="5 sec")

This command has generated a table of objects called **"query_1"**. We can see what information we have on the objects in the table by printing its *`keys`*, and see how many objects are in the table with `len(query_1)`.

In [None]:
print(f"We have table information on {len(query_1)} observations in the following categories/columns:\n")
q1_keys = (query_1.keys())
q1_keys

<a id=NarrowSearchD></a>
## 2.2. Narrowing Search with Observational Parameters
#### Now we narrow down a bit with some additional parameters (wavelength_region, instrument_name / configuration, dataproduct_type), and sort by exposure time:

In [None]:
query_2 = Observations.query_criteria(s_ra=[165., 166.], s_dec=[+71.,+72.],
                                        wavelength_region="UV", instrument_name=["COS/NUV","COS/FUV"], 
                                        dataproduct_type = "spectrum", filters = 'G160M')

# Next line just simplifies the columns of data we see to some useful data we want to look at right now
limq2 = query_2['obsid','obs_id', 'target_name', 'dataproduct_type', 'instrument_name', 'project', 'filters', 'wavelength_region', 't_exptime'] 
sort_order = query_2.argsort('t_exptime') # This is the index list in order of exposure time, increasing
print(limq2[sort_order])
chosenObs = limq2[sort_order][-1] # Grab the last value of the sorted list
print(f"\n\nThe longest exposure with the G160M filter is: \n\n{chosenObs}") 

#### Caution! 
<img src=./figures/warning.png width ="60" title="CAUTION"> 

Please note that these queries are Astropy tables and do not always respond as expected for other data structures like Pandas DataFrames. For instance, the first way of filtering a table shown below is correct, but the second will consistently produce the *wrong result*.

In [None]:
# Searching a table generated with a query
## First correct way using masking
mask = (query_1['obs_id'] == 'lbbd01020') # NOTE, obs_id must be lower-case
print("Correct way yields: \n" , query_1[mask]['obs_id'],"\n\n")

# Second incorrect way
print("Incorrect way yields: \n" , query_1['obs_id' == 'LBBD01020']['obs_id'], "\nwhich is NOT what we're looking for!")

<a id=dataprodsD></a>
## 2.3. Choosing and Downloading Data Products

### Now we can choose and download our data products from the archive dataset.

We will first generate a list of data products in the dataset: `product_list`.

In [None]:
product_list = Observations.get_product_list(chosenObs)
product_list[:10] #NOT THE WHOLE THING

Now, we will download *just the* **minimum recommended products** (*mrp*), which are the fully calibrated spectrum (denoted by the suffix `_x1d` or here `x1dsum`) and the association file (denoted by the suffix `_asn`). This association file contains no data, but rather the metadata explaining which exposures produced the `x1dsum` dataset. 

If we wanted to download *all* the data from the observation, including: 
- support files such as the spacecraft's pointing data over time (`jit` files).
- intermediate data products such as calibrated TIME-TAG data (`corrtag` or `corrtag_a`/`corrtag_b` files) and extracted 1-dimensional spectra averaged over exposures with a specific `FP-POS` value (`x1dsum<n>` files).

<img src=./figures/warning.png width ="60" title="CAUTION">

However, use caution with downloading all files, as in this case, setting mrp to False results in the transfer of **7 Gigabytes** of data, which can take a long time to transfer and eat away at your computer's storage! In general, only download the files you need. Since here we only need the final `x1dsum` and `asn` files, we only need to download 2 Megabytes.

In [None]:
downloads = Observations.download_products(product_list, download_dir=datadir , extension='fits', mrp_only=True, cache=False)

### Exercise 2: *Download the raw counts data on TRAPPIST-1*

In the previous exercise, we found an observation COS took on TRAPPIST-1 system. In case you skipped Exercise 1, the observation's Dataset ID is `LDLM40010`.

Use `Astroquery.mast` to download the raw `TIME-TAG` data, rather than the x1d spectra files. See the [COS Data Handbook Ch. 2](https://hst-docs.stsci.edu/cosdhb/chapter-2-cos-data-files/2-4-cos-data-products) for details on TIME-TAG data files. Make sure to get the data from both segments of the FUV detector (i.e. both `RAWTAG_A` and `RAWTAG_B` files). If you do this correctly, there should be five data files for each detector segment.

*Note that some of the obs_id may appear in the table as slightly different, i.e.: ldlm40alq and ldlm40axq, rather than ldlm40010. The main obs_id they fall under is still ldlm40010, and this will still work as a search term. They are linked together by the association file described here in section 2.3.*

In [None]:
# Your answer here




<a id=Astroquery2D></a>
## 2.4. Using astroquery to find data on a series of sources
In this case, we'll look for COS data around several bright globular clusters:
- omega Centauri
- M5
- M13
- M15
- M53

We will first write a comma-separated-value file `objectname_list.csv` listing these sources by their common name.

In [None]:
sourcelist = ['omega Centauri','M5','M13','M15','M53']
sourcelist_length = len(sourcelist)

with open('./objectname_list.csv', 'w') as f:
    for i, item in enumerate(sourcelist):
        if i < sourcelist_length - 1:
            f.writelines(item + ",")
        if i == sourcelist_length - 1:
            f.writelines(item)

In [None]:
with open('./objectname_list.csv', 'r', newline = '') as csvFile:
    objList = list(reader(csvFile, delimiter = ','))[0]

print("The input csv file contained the following sources:\n", objList)

globular_cluster_queries = {}
for obj in objList:
    query_x = Observations.query_criteria(objectname = obj, radius = "5 min", instrument_name=['COS/FUV', 'COS/NUV'])
    globular_cluster_queries[obj] = (query_x)
    
globular_cluster_queries

#### Excellent! You've now done the hardest part - finding and downloading the right data. From here, it's generally straightforward to read in and plot the spectrum. We recommend you look into our tutorial on [Viewing a COS Spectrum](#NEEDSLINK).

## Congratulations! You finished this notebook!
### There are more COS data walkthrough notebooks on different topics. You can find them [here](https://github.com/spacetelescope/COS-Notebooks).


---
## About this Notebook
**Author:** [Nat Kerman](nkerman@stsci.edu)
**Updated On:** 2020-10-28

> *This tutorial was generated to be in compliance with the [STScI style guides](https://github.com/spacetelescope/style-guides) and would like to cite the [Jupyter guide](https://github.com/spacetelescope/style-guides/blob/master/templates/example_notebook.ipynb) in particular.*

## Citations

If you use `astropy`, `matplotlib`, `astroquery`, or `numpy` for published research, please cite the
authors. Follow these links for more information about citations:

* [Citing `astropy`/`numpy`/`matplotlib`](https://www.scipy.org/citing.html)
* [Citing `astroquery`](https://astroquery.readthedocs.io/en/latest/)

---

[Top of Page](#topD)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 

<br></br>
<br></br>
<br></br>

## Exercise Solutions:
Note, that for many of these, there are multiple ways to get an answer.

In [None]:
## Ex. 1 soln:
dataset_id_ = 'LDLM40010'
exptime_ = 12403.904
print(f"The TRAPPIST-1 COS data is in dataset {dataset_id_}, taken with an exosure time of {exptime_}")

In [None]:
## Ex. 2 soln:
query_3 = Observations.query_criteria(obs_id = 'LDLM40010',
                                        wavelength_region="UV", instrument_name="COS/FUV", filters = 'G130M')

product_list2 = Observations.get_product_list(query_3)
rawRowsA = np.where(product_list2['productSubGroupDescription'] == "RAWTAG_A")
rawRowsB = np.where(product_list2['productSubGroupDescription'] == "RAWTAG_B")
rawRows = np.append(rawRowsA,rawRowsB)
!mkdir ./data/Ex2/
downloads2 = Observations.download_products(product_list2[rawRows], download_dir=datadir+'Ex2/' , extension='fits', mrp_only=False, cache=True)
downloads3 = Observations.download_products(product_list2, download_dir=datadir+'Ex2/' , extension='fits', mrp_only=True, cache=True)

asn_data = Table.read('./data/Ex2/mastDownload/HST/ldlm40010/ldlm40010_asn.fits', hdu = 1)
print(asn_data)