<a id="topD"></a>

# Viewing COS Data

# Learning Goals
### This Notebook is designed to walk the user (*you*) through:
#### - **Downloading Existing Cosmic Origins Spectrograph (*COS*) data from the online archive**
   - [**Using the web browser interface**](#mastD)
   - [**Using the command line interface and the `Python` module `Astroquery`**](#astroqueryD)

# 0. Introduction
#### The Cosmic Origins Spectrograph ([*COS*](https://www.nasa.gov/content/hubble-space-telescope-cosmic-origins-spectrograph)) is an ultraviolet spectrograph on-board the Hubble Space Telescope([*HST*](https://www.stsci.edu/hst/about)) with capabilities in the near ultraviolet (*NUV*) and far ultraviolet (*FUV*).

#### This tutorial aims to prepare you to access the existing COS data of your choice by walking you through downloading a processed spectrum, as well as various calibration files obtained with COS.

- For an in-depth manual to working with COS data and a discussion of caveats and user tips, see the [COS Data Handbook](https://hst-docs.stsci.edu/display/COSDHB/).
- For a detailed overview of the COS instrument, see the [COS Instrument Handbook](https://hst-docs.stsci.edu/display/COSIHB/).


## We will import the following packages:

- numpy to handle array functions
- astropy.io fits for accessing FITS files
- astropy.table Table for creating tidy tables of the data
- astropy.units and astropy.visualization.quantity_support for dealing with units
- matplotlib.pyplot for plotting data
- astroquery.mast Mast and Observations for finding and downloading data from the [MAST](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html) archive

In [1]:
#Make matplotlib look good in a notebook
%matplotlib inline
# Manipulating arrays
import numpy as np
# Reading in data
from astropy.table import Table

# Downloading data from archive
from astroquery.mast import Mast
from astroquery.mast import Observations
from astroquery.mast import Catalogs

## We will also define a few directories in which to place our data.

In [2]:
# These will be important directories for the notebook
cwd = !pwd
cwd = cwd[0]
!mkdir ./data
datadir = cwd + '/data/'

mkdir: ./data: File exists


<a id="downloadD"></a>
# 1. Downloading the Data through the Browser interface

One can search for COS data from both a browser-based gui and a command-line `Python` library.

##### *A more in-depth MAST archive tutorial can be found [here](https://mast.stsci.edu/api/v0/MastApiTutorial.html).*

<a id="mastD"></a>
## 1.1 The MAST Web Search
A browser gui for searching HST archival data can be found [here](http://archive.stsci.edu/hst/search.php).

The search page is laid out as in fig. 1.1:
### Fig 1.1
<center><img src=./figures/Mast_hst_searchformQSO.png width ="900" title="MAST Archive search form for a COS data query"> </center>

where here we have indicated we would like to find all archival science data from the **COS far-ultraviolet (FUV) configuration**, taken with any grating while looking at Quasi-Stellar Objects (QSO) within a 3 arcminute radius of (1hr:37':40", +33d 09m 32s). The output columns we have selected to see are visible in the bottom left of Fig 1.1.

Note that if you have a list of coordinates, Observation ID(s), etc. for a series of targets you can click on the "File Upload Form" and attach your list of OBSIDs or identifying features. Then specify which type of data your list contains using the "File Contents" drop-down menu.

Figure 1.2 shows the results of our search shown in Fig 1.1.
### Fig 1.2
<center><img src=figures/QSO_MastSearchRes.png width ="900" title="MAST Archive search results for a COS data query"> </center>


#### We now choose our dataset.
We select LCXV13050 because of its long exposure time, taken under an calibration program described as:
> "Project AMIGA: Mapping the Circumgalactic Medium of Andromeda"

This is a Quasar known as [3C48](http://simbad.u-strasbg.fr/simbad/sim-basic?Ident=3c48&submit=SIMBAD+search), one of the first quasars discovered.

Clicking on the dataset, we are taken to a page displaying a preview spectrum (Fig 1.3).
### Fig 1.3

<center><img src=./figures/QSOPreviewSpec.png width ="900" title="MAST Archive preview spectrum of LCXV13050"> </center>

We now return to the [search page](http://archive.stsci.edu/hst/search.php) and enter in LCXV13050 under "Dataset" with no other parameters set. Hitting search, now we see a single-rowed table with *just* our dataset, and the option to download datasets. We mark the row we wish to download and click "Submit marked data for retrieval from STDADS". See Fig 1.4.

### Fig. 1.4

<center><img src =figures/LCXV13050_res.png width ="900" title="MAST Archive dataset overview of LCXV13050"> </center>

Now we see a page like in Fig 1.5, where we can either sign in with STScI credentials, or simply provide our email to proceed (relatively) anonymously. Make sure to select "Deliver the data to the Archive staging area". Hit "Send Retrieval Request to STDADS" and you will recieve an email with instructions on downloading with ftp. You will need to do this step from the command line.

### Fig. 1.5
<center><img src =figures/DownloadOptions.png width ="900" title="Download Options for LCXV13050"> </center>

In the case of this request, the command to retrieve the data was:
>`wget -r --ftp-user=anonymous --ask-password ftps://archive.stsci.edu/stage/anonymous/anonymous42822 --directory-prefix=datadir`

where the password was the email address used, and datadir is the directory defined at the beginning of this notebook.. Now all the data is in a subdirectory `"/archive.stsci.edu/stage/anonymous/anonymous42822/"`

#### Well Done!


### Exercise 1: *Searching the archive for TRAPPIST-1 data*

[TRAPPIST-1](https://en.wikipedia.org/wiki/TRAPPIST-1) is a cool red dwarf with a multiple-exoplanet system. 
- Find its coordinates using the [SIMBAD Basic Search](http://simbad.u-strasbg.fr/simbad/sim-fbasic).
- Use those coordinates in the [MAST web search](https://archive.stsci.edu/hst/search.php) to find all COS exposures of the system.
- Limit the search terms to find the COS dataset taken in the COS far-UV configuration with the grating G130M.

#### What is the dataset ID, and how long was the exposure?

Place your answer in the cell below.

In [3]:
# Your answer here



## 1.2. Searching for a series of observations

Now let's try using the web interface's [file upload form](http://archive.stsci.edu/hst/search.php?form=fuf) to search for a series of observations by their dataset IDs. We're going to look for three observations of the same object, the white dwarf WD1057+719, taken with three different COS gratings. Two are in the FUV and one in the NUV. The dataset IDs are
- LDYR52010
- LBNM01040
- LBBD04040

We make a comma-separated-value txt file with these three obs_ids, and save it as `obsId_list.txt`. Then we link to this file under the **Local File Name** browse menu on the file upload form. We must set the **File Contents** term to Data ID, as that is the identifier we have provided in our file, and we change the **delimiter** to a comma.
Because we are searching by Dataset ID, we don't need to specify any additional parameters to narrow down the data.

### Fig 1.6
<center><img src =figures/FUF_search.png width ="900" title="File Upload Search Form"> </center>

#### Perfect! We now can access all the datasets, as shown in Fig. 1.7:

## Fig. 1.7
<center><img src =figures/FUF_res.png width ="900" title="File Upload Search Results"> </center>

Now, to download all of the relavent files, we can check the **mark** box for all of them, and again hit  "Submit marked data for retrieval from STDADS". This time, we want to retrieve **all the calibration files** associated with each dataset, so we check the following boxes:
- Uncalibrated
- Calibrated
- Used Reference Files
See Fig. 1.8

### Fig 1.8
<center><img src =./figures/DownloadOptions_FUF.png width ="900" title="Download Options for multiple datasets"> </center>


The procedure from here is the same described above in Section 1.1. Now, when we download with wget, we obtain multiple subdirectories with each dataset separated.

<a id = astroqueryD></a>
# 2. Downloading the data with astroquery.mast
Another way to search for and download archived datasets is within `Python` using the module [`astroquery.mast`](https://astroquery.readthedocs.io/en/latest/mast/mast.html). We have already imported 2 of this module's key tools: `Observations` and `Mast`.

There are *many* options for searching the archive with astroquery, but we will begin with a very general search using the coordinates we found for WD1057+719 in the last section to find the dataset with the longest exposure time using the G160M filter. 
- Our coordinates were:      (11:00:34.126 +71:38:02.80). 
    - We can search these coordinates as sexagesimal coordinates, or convert them to decimal degrees.

In [4]:
query_1 = Observations.query_object("11:00:34.126 +71:38:02.80", radius="5 sec")

This command has generated a table of objects called **"query_1"**. We can see what information we have on the objects in the table by printing its *`keys`*, and see how many objects are in the table with `len(query_1)`.

In [5]:
print(f"We have table information on {len(query_1)} observations in the following categories/columns:\n")
q1_keys = (query_1.keys())
q1_keys

We have table information on 708 observations in the following categories/columns:



['intentType',
 'obs_collection',
 'provenance_name',
 'instrument_name',
 'project',
 'filters',
 'wavelength_region',
 'target_name',
 'target_classification',
 'obs_id',
 's_ra',
 's_dec',
 'dataproduct_type',
 'proposal_pi',
 'calib_level',
 't_min',
 't_max',
 't_exptime',
 'em_min',
 'em_max',
 'obs_title',
 't_obs_release',
 'proposal_id',
 'proposal_type',
 'sequence_number',
 's_region',
 'jpegURL',
 'dataURL',
 'dataRights',
 'mtFlag',
 'srcDen',
 'obsid',
 'distance']

#### Now we narrow down a bit with some additional parameters (wavelength_region,instrument_name and configuration, dataproduct_type), and sort by exposure time:

In [6]:
query_2 = Observations.query_criteria(s_ra=[165., 166.], s_dec=[+71.,+72.],
                                        wavelength_region="UV", instrument_name=["COS/NUV","COS/FUV"], 
                                        dataproduct_type = "spectrum", filters = 'G160M')

limq2 = query_2['obsid','obs_id', 'target_name', 'dataproduct_type', 'instrument_name', 'project', 'filters', 'wavelength_region', 't_exptime'] # This just simplifies the columns of data we see to some useful data we want to look at right now
sort_order = query_2.argsort('t_exptime') # This is the index list in order of exposure time, increasing
print(limq2[sort_order])
chosenObs = limq2[sort_order][-1] # Grab the last value of the sorted list
print(f"\n\nThe longest exposure with the G160M filter is: \n{chosenObs}") 

  obsid      obs_id  target_name ... filters wavelength_region t_exptime
---------- --------- ----------- ... ------- ----------------- ---------
2003183667 lbe702iqs  WD1057+719 ...   G160M                UV       1.0
2003183129 lbb916lbq  WD1057+719 ...   G160M                UV       1.0
2003183190 lbb9x3ckq  WD1057+719 ...   G160M                UV       1.0
2003177779 la9r02dfq WD-1057+719 ...   G160M                UV       1.0
2003183137 lbb917k7q  WD1057+719 ...   G160M                UV       1.0
2003183145 lbb918scq  WD1057+719 ...   G160M                UV       1.0
2003177778 la9r02deq WD-1057+719 ...   G160M                UV     108.0
2003183144 lbb918sbq  WD1057+719 ...   G160M                UV     108.0
2003183136 lbb917k6q  WD1057+719 ...   G160M                UV     108.0
2003183128 lbb916laq  WD1057+719 ...   G160M                UV     108.0
       ...       ...         ... ...     ...               ...       ...
2003887628 lbnm03030  WD1057+719 ...   G160M       

#### Caution! 
<img src=./figures/warning.png width ="60" title="MAST Archive preview spectrum of LBBD01020"> 

Please note that these queries are Astropy tables and do not always respond as expected for other data structures like Pandas DataFrames. For instance, the first way of filtering a table shown below is correct, but the second will consistently produce the *wrong result*.

In [7]:
# Searching a table generated with a query
## First correct way using masking
mask = (query_1['obs_id'] == 'lbbd01020') # NOTE, obs_id must be lower-case
print("Correct way yields: \n" , query_1[mask]['obs_id'],"\n\n")

# Second incorrect way
print("Incorrect way yields: \n" , query_1['obs_id' == 'LBBD01020']['obs_id'], "\nwhich is NOT what we're looking for")

Correct way yields: 
   obs_id 
---------
lbbd01020 


Incorrect way yields: 
 tess-s0014-4-3 
which is NOT what we're looking for


### Now we can choose and download our data products from the archive dataset.

We will first generate a list of data products in the dataset: `product_list`. Then we will download *just the* **minimum recommended products**, which are the fully calibrated spectrum (denoted by the suffix `_x1d` or here `x1dsum`) and the association file (denoted by the suffix `_asn`).

In [8]:
product_list = Observations.get_product_list(chosenObs)
product_list

obsID,obs_collection,dataproduct_type,obs_id,description,type,dataURI,productType,productGroupDescription,productSubGroupDescription,productDocumentationURL,project,prvversion,proposal_id,productFilename,size,parent_obsid
str10,str3,str8,str9,str62,str1,str44,str9,str28,str9,str1,str6,str5,str5,str27,int64,str10
2003886225,HST,spectrum,lbek02020,DADS JIF file,C,mast:HST/product/lbek02020_jif.fits,AUXILIARY,--,JIF,--,CALCOS,--,12086,lbek02020_jif.fits,216000,2003886225
2003886225,HST,spectrum,lbek02020,DADS JIT file,C,mast:HST/product/lbek02020_jit.fits,AUXILIARY,--,JIT,--,CALCOS,--,12086,lbek02020_jit.fits,279360,2003886225
2003886225,HST,spectrum,lbek02020,DADS TRL file - Processing log,C,mast:HST/product/lbek02020_trl.fits,AUXILIARY,--,TRL,--,CALCOS,--,12086,lbek02020_trl.fits,184320,2003886225
2003886225,HST,spectrum,lbek02020,DADS X1S file - Summed 1D spectrum COS,C,mast:HST/product/lbek02020_x1dsum1.fits,AUXILIARY,--,X1DSUM1,--,CALCOS,3.3.9,12086,lbek02020_x1dsum1.fits,1287360,2003886225
2003886225,HST,spectrum,lbek02020,DADS X2S file - Summed 1D spectrum COS,C,mast:HST/product/lbek02020_x1dsum2.fits,AUXILIARY,--,X1DSUM2,--,CALCOS,3.3.9,12086,lbek02020_x1dsum2.fits,1287360,2003886225
2003886225,HST,spectrum,lbek02020,DADS X3S file - Summed 1D spectrum COS,C,mast:HST/product/lbek02020_x1dsum3.fits,AUXILIARY,--,X1DSUM3,--,CALCOS,3.3.9,12086,lbek02020_x1dsum3.fits,1287360,2003886225
2003886225,HST,spectrum,lbek02020,DADS X4S file - Summed 1D spectrum COS,C,mast:HST/product/lbek02020_x1dsum4.fits,AUXILIARY,--,X1DSUM4,--,CALCOS,3.3.9,12086,lbek02020_x1dsum4.fits,1287360,2003886225
2003886225,HST,spectrum,lbek02020,DADS ASN file - Association ACS/WFC3/STIS,C,mast:HST/product/lbek02020_asn.fits,AUXILIARY,Minimum Recommended Products,ASN,--,CALCOS,3.3.9,12086,lbek02020_asn.fits,11520,2003886225
2003886225,HST,spectrum,lbek02020,DADS XSM file - Calibrated combined extracted 1D spectrum COS,C,mast:HST/product/lbek02020_x1dsum.fits,SCIENCE,Minimum Recommended Products,X1DSUM,--,CALCOS,3.3.9,12086,lbek02020_x1dsum.fits,1287360,2003886225
2003886225,HST,spectrum,lbek02020,Preview-Thumb,C,mast:HST/product/lbek02020_x1dsum_thumb.png,THUMBNAIL,--,--,--,CALCOS,3.3.9,12086,lbek02020_x1dsum_thumb.png,6474,2003886225


In [9]:
downloads = Observations.download_products(product_list, download_dir=datadir , extension='fits', mrp_only=True, cache=False)

Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/lbek02020_asn.fits to /Users/nkerman/Projects/Walkthroughs/DataDL/data/mastDownload/HST/lbek02020/lbek02020_asn.fits ... [Done]
Downloading URL https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/lbek02020_x1dsum.fits to /Users/nkerman/Projects/Walkthroughs/DataDL/data/mastDownload/HST/lbek02020/lbek02020_x1dsum.fits ... [Done]


### Exercise 2: *Download the raw counts data on TRAPPIST-1*

In the previous exercise, we found an observation COS took on TRAPPIST-1 system. In case you skipped Exercise 1, the observation's Dataset ID is `LDLM40010`.

Use `Astroquery.mast` to download the raw `TIME-TAG` data, rather than the x1d spectra files. See the [COS Data Handbook Ch. 2](https://hst-docs.stsci.edu/cosdhb/chapter-2-cos-data-files/2-4-cos-data-products) for details on TIME-TAG data files. Make sure to get the data from both segments of the FUV detector (i.e. both `RAWTAG_A` and `RAWTAG_B` files). If you do this correctly, there should be five data files for each detector segment.

*Note that some of the obs_id may appear in the table as slightly different, i.e.: ldlm40alq and ldlm40axq, rather than ldlm40010. The main obs_id they fall under is still ldlm40010, and this will still work as a search term. They are linked together by the association file described here in section 2.3.*

In [10]:
# Your answer here




### We will now demonstrate using astroquery to find observations for a series of sources
In this case, we'll look for COS data around several bright globular clusters. The clusters themselves are specified by common name in the file `objectname_list.csv`.

In [11]:
from csv import reader

with open('./objectname_list.csv', 'r', newline = '') as csvFile:
    objList = list(reader(csvFile, delimiter = ','))[0]


globular_cluster_queries = {}
for obj in objList:
    query_x = Observations.query_criteria(objectname = obj, radius = "5 min", instrument_name=['COS/FUV', 'COS/NUV'])
    globular_cluster_queries[obj] = (query_x)
    
globular_cluster_queries

{'omega Centauri': <Table masked=True length=15>
 dataproduct_type calib_level obs_collection ...   objID1        distance     
       str8          int64         str3      ...   str10         float64      
 ---------------- ----------- -------------- ... ---------- ------------------
         spectrum           2            HST ... 2048682699  97.04717859213693
            image           2            HST ... 2048665922 295.35269365022396
         spectrum           1            HST ... 2048802296  87.65983662909147
            image           2            HST ... 2048650864  94.47272986343471
         spectrum           1            HST ... 2048803154  87.65983662909147
         spectrum           3            HST ... 2050133345  94.47272986343471
         spectrum           3            HST ... 2050133503  87.65983662909147
         spectrum           3            HST ... 2050137845  97.04717859213693
            image           2            HST ... 2048651830  97.04717859213693
   

#### Excellent! You've now done the hardest part - finding and downloading the data. From here, it's generally straightforward to read in and plot the spectrum. We recommend you look into our tutorial on [Viewing a COS Spectrum](#NEEDSLINK).

## Congratulations! You finished this notebook!
### There are more COS data walkthrough notebooks on different topics. You can find them [here](https://github.com/nkerman/Walkthroughs).


---
## About this Notebook
**Author:** [Nat Kerman](nkerman@stsci.edu)
**Updated On:** 2020-10-06

> *This tutorial was generated to be in compliance with the [STScI style guides](https://github.com/spacetelescope/style-guides) and would like to cite the [Jupyter guide](https://github.com/spacetelescope/style-guides/blob/master/templates/example_notebook.ipynb) in particular.*

## Citations

If you use `astropy`, `matplotlib`, `astroquery`, or `numpy` for published research, please cite the
authors. Follow these links for more information about citations:

* [Citing `astropy`/`numpy`/`matplotlib`](https://www.scipy.org/citing.html)
* [Citing `astroquery`](https://astroquery.readthedocs.io/en/latest/)

---

[Top of Page](#topD)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 



# Any miscellaneous cells/quick hacks/ exercise solutions below:

## Exercise Solutions:

In [12]:
## Ex. 1 soln:
dataset_id_ = 'LDLM40010'
exptime_ = 12403.904*u.s
print(f"The TRAPPIST-1 COS data is in dataset {dataset_id_}, taken with an exosure time of {exptime_}")

NameError: name 'u' is not defined

In [None]:
## Ex. 2 soln:
query_3 = Observations.query_criteria(obs_id = 'LDLM40010',
                                        wavelength_region="UV", instrument_name="COS/FUV", filters = 'G130M')

product_list2 = Observations.get_product_list(query_3)
rawRowsA = np.where(product_list2['productSubGroupDescription'] == "RAWTAG_A")
rawRowsB = np.where(product_list2['productSubGroupDescription'] == "RAWTAG_B")
rawRows = np.append(rawRowsA,rawRowsB)

downloads2 = Observations.download_products(product_list2[rawRows], download_dir=datadir+'Ex2/' , extension='fits', mrp_only=False, cache=True)
downloads3 = Observations.download_products(product_list2, download_dir=datadir+'Ex2/' , extension='fits', mrp_only=True, cache=True)

asn_data = Table.read('./data/Ex2/mastDownload/HST/ldlm40010/ldlm40010_asn.fits', hdu = 1)
print(asn_data)

<a id="readInV"></a>
# 2. Reading in the data

The calibrated spectrum data is now on our local machine as: **`current-working-directory`**`/data/mastDownload/HST/LBBD01020/LBBD01020_x1dsum.fits`

## 2.1. Investigating the Data - *Basics*
We want to learn a bit about this file, then input the data.

We can learn a great deal about our data from its primary fits header (see cell below).

In [None]:
x1d_filepath = './data/mastDownload/HST/LBBD01020/LBBD01020_x1dsum.fits' # Make sure these point to your new data!
asn_filepath = './data/mastDownload/HST/LBBD01020/LBBD01020_asn.fits'    # This is the association file


header_x1d = fits.getheader(x1d_filepath)
header_asn = fits.getheader(asn_filepath)

header_x1d[:18],"...",header_x1d[45:50] #This is the main 0th header; THIS IS NOT THE WHOLE THING!

For instance, we notice that the data was taken in [TIME-TAG mode](https://hst-docs.stsci.edu/cosdhb/chapter-5-cos-data-analysis/5-4-working-with-time-tag-data) and calibrated with `calcos` version `3.3.9`

However, some metadata information, such as the time of observation and calculated exposure time, can be found in the **1st header** rather than the 0th. We will read and print this below:

In [None]:
with fits.open(x1d_filepath) as hdu:
    header1_x1d = hdu[1].header
    date = header1_x1d['DATE-OBS']
    time = header1_x1d['TIME-OBS']
    exptime = header1_x1d['EXPTIME']
    """
    It's also perfectly valid to access the 1st extension header using 'fits.getheader(x1d_filepath, ext=1)'
    """

print(f"This data was taken on {date} starting at {time} with a net exposure time of {exptime} seconds.")

## 2.2. Reading in the `x1d` Main Data
#### The simplest way to read in the `x1d` data from fits extension \#1 is using the [`astropy.table.getdata`](https://docs.astropy.org/en/stable/io/fits/api/files.html#astropy.io.table.getdata) command.
We can then display all the fields contained in this data table using the `.colnames` method. You can ignore the warnings about multiple slashes that come up while reading in the data. The proper units are displayed in LaTex as:

- 'erg /s /cm\**2 /angstrom'  ==> $$\ \ erg\ s^{-1}\ cm^{-2}\ Aangstrom^{-1}$$
- 'count /s /pixel'          ==> $$\ \ counts\ s^{-1}\ pixel^{-1}$$


#### In the case of the NUV data, we see an astropy style table of 3 rows (next python cell). These rows contain data from the 3 stripes of the NUV spectrum (see figure 2.1).

### Fig. 2.1 from [COS DHB Fig. 1.10](https://hst-docs.stsci.edu/cosdhb/chapter-1-cos-overview/1-2-cos-physical-configuration#id-1.2COSPhysicalConfiguration-Figure1.10)

The columns of this table include some scalar values which *describe* the data (i.e. EXPTIME), while the columns containing actual data hold it in lists (i.e. WAVELENGTH, FLUX, etc.)

<center><img src=figures/ch1_cos_overview3.10.jpg width ="900" title="An example COS NUV spectrum"> </center>

An important thing to note about this data in particular is that with the grating used here (G230L), segment C is actually a 2nd order spectrum with a higher dispersion (x2) and ~5% contamination from the 1st order spectrum. See the [COS Data Handbook](https://hst-docs.stsci.edu/cosdhb/chapter-1-cos-overview/1-1-instrument-capabilities-and-design), *especially Fig. 1.3,* for more information.

In [None]:
x1d_data = Table.read(x1d_filepath)
columns = x1d_data.colnames

print("\n\n",columns, "\n\n")

x1d_data

## 2.3. The Association `_asn` file

#### It's also likely we will want to see what observations went into making this calibrated spectrum. This information is contained in the Association (`_asn`) file, under the MEMNAME column.

In [None]:
print(fits.info(asn_filepath),'\n\n----\n')
asn_data = Table.read(asn_filepath, hdu = 1)
print(asn_data)

We see that our data has MEMTYPE = `PROD-FP`, meaning it is an Output science product, see COS DHB [Table 2.6](https://hst-docs.stsci.edu/cosdhb/chapter-2-cos-data-files/2-4-cos-data-products).
This particular association file lists only one `EXP-FP` (Input science exposure), with the `MEMNAME` (Dataset ID) LBBD01HPQ. We could search for this dataset, if we needed the raw data.

<a id="plottingV"></a>
# 3. Plotting our Data

## 3.1. Examining the first-order spectrum

#### Let's grab the simplest data we need to plot a spectrum: **WAVELENGTH, FLUX, and ERROR**.
- *Here, ERROR is flux error*

We will limit this first plot to only our first order spectra (so we will exclude segment C). We will plot these two spectra together (in the top panel) and then segment-by-segment in the lower 2 panes.
- The former view gives a better sense of the continuum
- The latter views show zooms which allow us to meaningfully view the errors and see specific emission/absorption features.

In [None]:
fig, (ax0, ax1, ax2) = plt.subplots(3, 1, figsize = (16, 16))

for i in range(2):
    wvln, flux, fluxErr = x1d_data[i]["WAVELENGTH"], x1d_data[i]["FLUX"], x1d_data[i]["ERROR"]
    segment = x1d_data[i]["SEGMENT"]

    ax0.plot(wvln, flux,
                linestyle = "-", label = segment, c = segment_colors[segment])
    ax0.legend(fontsize = 20 )
    ax0.set_title("Fig 3.1\nFirst-order NUV Spectra with G230L Grating", size = 35)
    
    if i == 0:
        ax1.errorbar(x = wvln, y = flux, yerr = fluxErr,
                    linestyle = "",  label = segment, c = segment_colors[segment] )
        ax1.set_xlim(2100,2510)
        ax1.set_ylim(0.5E-13,1.5E-13)
        ax1.legend(fontsize = 20 )
        ax1.set_ylabel('Flux [$erg\ s^{-1}\ cm^{-2}\ Aangstrom^{-1}$]', size = 30)
        
    if i == 1:
        ax2.errorbar(x = wvln, y = flux, yerr = fluxErr,
                    linestyle = "",  label = segment, c = segment_colors[segment] )
        ax2.set_xlim(3190,3600)
        ax2.set_ylim(1E-14,4.5E-14)
        ax2.legend(fontsize = 20 )
        ax2.set_xlabel('Wavelength [$\AA$]', size = 30)

plt.tight_layout()
plt.show()

## 3.2. Examining the second-order spectrum

On segment C, we have a more dispersed spectrum over a smaller chunk of the NUV. Below, we plot this portion over the first-order spectrum from segment A.

Clearly, our errorbars on the second-order spectrum are *much larger*. However, if we need a very high dispersion, for instance, to split close-together lines, the lower panel (zoom) shows a potential advantage of segment C. Its higher spectral sampling rate allows for finer distinctions in wavelength.

In [None]:
fig, (ax0,ax1) = plt.subplots(2,1,figsize = (16, 8))

for i in [2,0]: #We reverse this order, so that the 0th segment (A) is plotted OVER the 2nd segment (C). It's purely aesthetic
    wvln, flux, fluxErr = x1d_data[i]["WAVELENGTH"], x1d_data[i]["FLUX"], x1d_data[i]["ERROR"]
    segment = x1d_data[i]["SEGMENT"]
    
    if i == 0:
        ax0.errorbar(x = wvln, y = flux, yerr = fluxErr,
                    linestyle = "",  label = segment, c = 'k', alpha = 0.3)
        
        ax1.errorbar(x = wvln, y = flux, yerr = fluxErr,
                    linestyle = "",  label = segment, c = 'k', alpha = 0.3)
        
    if i == 2:
        ax0.errorbar(x = wvln, y = flux, yerr = fluxErr,
                    linestyle = "",  label = segment, c = segment_colors[segment] , alpha = 0.8)
        
        ax1.errorbar(x = wvln, y = flux, yerr = fluxErr,
                    linestyle = "",  label = segment, c = segment_colors[segment] , alpha = 0.8)

 
ax0.set_xlim(2100,2510)
ax0.set_ylim(0.48E-13,1.75E-13)

ax0.set_title("Fig 3.2\nOverlay of First and Second-order Spectrum with G230L Grating", size = 25)

ax1.set_xlim(2200,2210)
ax1.set_ylim(0.6E-13,1.75E-13)

ax1.set_xlabel('Wavelength [$\AA$]', size = 20)
fig.text(-0.015, 0.5, 'Flux [$erg\ s^{-1}\ cm^{-2}\ Aangstrom^{-1}$]', size = 20, va='center', rotation='vertical')       

#Let's add a dashed rectangle to show where we are zooming into in the lower panel.
ax0.plot([2210,2200,2200,2210,2210],[0.6E-13,0.6E-13,1.7E-13,1.7E-13,0.6E-13], 
        'b', linewidth = 5, linestyle = '--', alpha = 0.7, label = "Lower panel zoom bounds")


handles,labels = ax0.get_legend_handles_labels() # These lines just ensure that the legend is ordered correctly (first ax0)
handles = [handles[2], handles[1], handles[0]]
labels = [labels[2], labels[1], labels[0]]
ax0.legend(handles, labels, fontsize = 20 , loc = 'upper right')
handles,labels = ax1.get_legend_handles_labels() # Now for ax1
handles = [handles[1], handles[0]]
labels = [labels[1], labels[0]]
ax1.legend(handles,labels, fontsize = 20 , loc = 'upper right')

plt.tight_layout()
plt.show()

### Exercise 3: *Removing the zeros*

All of the segments have real, useful data, bookended on each size by zeros (see Fig. 3.1, upper panel).

1. Plot a histogram of each section's FLUX to visualize this 
2. Plot segments A and B as in Fig. 3.1, without these zero values

In [None]:
# Your answer here




## 3.3. Reading in and plotting with `specutils`
### *(Optional)*
An alternative way to read in spectral data is with the [`specutils` package](https://specutils.readthedocs.io/en/stable/), which contains quite a bit of functionality for working with spectra. It also can make dealing with units easier, as it works well with astropy units and other modules.

Below is a simple example of using `specutils` to read-in, plot, and continuum-normalize our spectrum.

In [None]:
import specutils
from specutils.fitting import fit_generic_continuum
spec1d = specutils.Spectrum1D.read(x1d_filepath)

fig, (ax0,ax1) = plt.subplots(2,1,figsize = (10,8))

#Plot the non-normalized flux
ax0.plot(spec1d.wavelength , spec1d.flux)
ax0.set_title("Un-normalized flux")

# Continuum Normalize the flux:
cont_norm_spec1d = spec1d /fit_generic_continuum(spec1d)(spec1d.spectral_axis) 

#Plot the normalized flux
ax1.plot(cont_norm_spec1d.wavelength , cont_norm_spec1d.flux)
ax1.set_title("Normalized flux")

plt.tight_layout()
plt.show()

In [None]:
## Ex. 3 Soln:
fig, (ax0,ax1) = plt.subplots(2,1,figsize = (16, 8))

for q in range(3):
    ax0.hist(x1d_data[q]["FLUX"], alpha = 0.5, label = x1d_data[q]["SEGMENT"], bins = np.linspace(0,1.8E-13,40))

for q in range(2):
    mask_no_zeros = (x1d_data[q]["FLUX"] > 0)
    ax1.plot(x1d_data[q]["WAVELENGTH"][mask_no_zeros],x1d_data[q]["FLUX"][mask_no_zeros], alpha = 0.5, label = x1d_data[q]["SEGMENT"])

ax0.legend()
ax0.set_title("Flux distributions", size = 30)

ax1.legend()
ax1.set_title("First-order spectrum without zeros", size = 30)

ax0.set_xlabel('Flux [$erg\ s^{-1}\ cm^{-2}\ Aangstrom^{-1}$]\n----\n', size = 20)
ax1.set_xlabel('Wavelength [$\AA$]', size = 20)
ax1.set_ylabel('Flux [$erg\ s^{-1}\ cm^{-2}\ Aangstrom^{-1}$]', size = 12)

plt.tight_layout()

### Misc/Junk

In [None]:
# mask = (query_1['obs_id'] == 'lbbd01020')
mask_prod_list2 = (product_list['obs_id'] == 'lbbd01hpq') | (product_list['obs_id'] == 'lbbd01020' )
product_list[mask_prod_list2]