# JWST Fetch Crowded Field Spectral Products

## Introduction

This tutorial will demonstrate how to use [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) to retrieve and process JWST data products. 
Our goal is to retrieve the set of spectral data products needed to run the JWST processing pipeline. Once we have these products, we'll perform our own extractions within crowded fields (e.g., within a stellar cluster, the Galactic Center, or a deep image of a galaxy cluster). 

This notebook walks through the process as follows:

* First we will demonstrate how to use `astroquery.mast` to perform the three basics of data search and retrieval. We will use images from the Early Relase Observation (ERO) of NGC 3132 (the Southern Ring Nebula) for this first part. The basic steps to download data from MAST are:
    1. Conduct a search for observations.
    2. Retrieve the set of products for those observations.
    3. Download the products you want.
    
    
* Then we will walk through these steps for a wide-field slitless spectroscopy program, and explain some of the challenges involved when there can be thousands of products associated with the observations. We will show how you can filter these products to only get the ones you need, and demonstrate the batch download technique we recommend when requesting large amounts of data, to make your retrieval process as smooth as possible. 

## Table of Contents
- [Imports](#Imports)<br>
- [The Basics Of Searching and Downloading Data Through astroquery.mast](#The-Basics-Of-Searching-and-Downloading-Data-Through-astroquery.mast)<br>
    - [Step 1: Searching For Observations](#Step-1:-Searching-For-Observations)<br>
        - [Search By Object Coordinates](#Search-By-Object-Coordinates)<br>
        - [Search By Resolvable Object Name](#Search-By-Resolvable-Object-Name)<br>
        - [Search By Observational Metadata](#Search-By-Observational-Metadata)<br>
    - [Step 2: Retrieving Data Products](#Step-2:-Retrieving-Data-Products)
    - [Step 3: Downloading Data Products](#Step-3:-Downloading-Data-Products)
- [Downloading Calibrated Data For Multi-Object Spectral Programs](#Downloading-Calibrated-Data-For-Multi-Object-Spectral-Programs)

## Imports
* We use the astropy `Units` module to define physical units.
* We use the `SkyCoord` class from the `astropy.coordinates` module to define coordinates.
* We use the `fits` module from astropy.io to read data contained in FITS files.
* We import a couple options from `astropy.visualization` for applying scaling and stretches.
* The `Observations` class from the `astroquery.mast` module allows you to query by coordinates, resolvable target names, or observational metadata like program IDs, filters, or exposure times.

* We use `matplotlib.pyplot` for plotting and image display.

In [None]:
from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.io import fits
from astropy.visualization import ZScaleInterval, SquaredStretch, ImageNormalize
from astroquery.mast import Observations
import matplotlib.pyplot as plt

## The Basics Of Searching and Downloading Data Through astroquery.mast

We'll use the ERO observations of the Southern Ring Nebula to show you how to search for observations.  You can use the `Observations` module of `astroquery.mast` in a variety of ways, including [by position](https://astroquery.readthedocs.io/en/latest/mast/mast.html#positional-queries) or by [observational metadata](https://astroquery.readthedocs.io/en/latest/mast/mast.html#observation-criteria-queries).

### Step 1: Searching For Observations

The first step in downloading JWST data products through `astroquery.mast` is to search for observations of interest.  There are several ways to find observations of interest, so in this section, we will demonstrate how to do: 

* a cone search by providing coordinates, 
* a cone search by providing a resolvable target name, 
* and a search based on observational metadata.

#### Search By Object Coordinates

First, let's search for observations by sky coordinates of the object directly.

In [None]:
# Define the coordinates of the object.
obj_coords = SkyCoord("10:07:01", "-40:26:14", unit=(u.hourangle, u.deg))

# Conduct a cone search centered on these coordinates.  The default search radius is 0.2 degrees.  Let's use a much
# smaller search radius of one arcminute.
obs_table = Observations.query_region(obj_coords, radius="1.0 arcmin")

We get back an Astropy `Table` containing observations whose footprints fall within our search radius.  But notice that this includes data from lots of missions, e.g., TESS, WUPPE, etc.  This is because the simple cone search operates across all the missions in our cross-mission database.  We'll show later how to do a search and only get back JWST mission data. Descriptions of the columns returned from an observation search are [documented here](https://mast.stsci.edu/api/v0/_c_a_o_mfields.html).

In [None]:
# The show_in_notebook() function allows us to explore an Astropy table with pagination and search capability.
# Try searching for the JWST entries.

# One of the return columns is called the 's_region', which contains the polygonal representation of the footprint
# on the sky.  This is a very long string, and makes viewing the rest of the table difficult in a notebook. So
# we will get a list of all the coumns except this one for display purposes.
display_columns = [x for x in obs_table.columns if x != "s_region"]

# Show the contents of the table.
obs_table[display_columns].show_in_notebook(display_length=5)

#### Search By Resolvable Object Name

Instead of specifying the coordinates, you can provide the name of an object as long as it is resolvable by Simbad, NED, or a KIC, EPIC, or TIC catalog name. This still does a cone search on the sky, by using the resolver service to translate your provided string into sky coordinates. This method does NOT do a string match for the object name in the MAST database!

**Note:** When resolving KIC, EPIC, and TIC names, make sure you include the catalog name. For example, search for `'TIC 100100827'` rather than `'100100827'`.



In [None]:
# Conduct a cone search by passing a resolvable target name.
obs_table = Observations.query_region("NGC 3132", radius="1.0 arcmin")
len(obs_table)

In [None]:
# Show the contents of the table.
display_columns = [x for x in obs_table.columns if x != "s_region"]
obs_table[display_columns].show_in_notebook(display_length=5)

#### Search By Observational Metadata

You can also perform a search by observational metadata; for example, program ID, instrument, or filter. We used the `query_region` method above to do cone searches. For metadata queries, we'll use the `query_criteria` method.

First, let's see what metadata is at our disposal to query with. You can view the list of observations metadata on [this webpage](https://mast.stsci.edu/api/v0/_c_a_o_mfields.html), or you can do a Python command to see the list of metadata like so: 

In [None]:
Observations.get_metadata('observations').show_in_notebook()

In [None]:
# Search for all observations for a given JWST proposal ID.  In this case, Program ID 2733, the ERO program.
# Make sure to specify the obs_collection (mission) = JWST to avoid any other data sets that might have a proposal
# ID of 2733, e.g. HST.
obs_table = Observations.query_criteria(obs_collection=["JWST"], proposal_id=[2733])

In [None]:
# Show the contents of the table.
display_columns = [x for x in obs_table.columns if x != "s_region"]
obs_table[display_columns].show_in_notebook(display_length=5)

For our final example, let's only search for the MIRI images from this program ID.  You can combine a metadata
query WITH a cone search on the sky by including a `coordinates` and optional `radius` argument.  Let's do that
now just for demonstration purposes.

In [None]:
# A metadata search that includes a cone search component.
obj_coords = SkyCoord("10:07:01", "-40:26:14", unit=(u.hourangle, u.deg))

obs_table = Observations.query_criteria(obs_collection=["JWST"], 
                                        proposal_id=[2733], 
                                        instrument_name=["MIRI"],
                                        coordinates=obj_coords, 
                                        radius="1 arcmin")

In [None]:
# Show the contents of the table.
display_columns = [x for x in obs_table.columns if x != "s_region"]
obs_table[display_columns].show_in_notebook(display_length=5)

### Step 2: Retrieving Data Products

The second step is to retrieve the data products associated with a table of observations. Sometimes an observation might have a single product. Others may have thousands. For the purpose of this demo, let's just get the data products for the first MIRI observation.

In [None]:
data_products = Observations.get_product_list(obs_table[0])

In [None]:
# Let's just take a peak at the first 10 products in the returned Astropy table.
data_products[:10].show_in_notebook()

### Step 3: Downloading Data Products

The final step is to download the data products. As the example above shows, you may not want to download all of them. One quick way to eliminate extraneous products is with the "Minimum Recommended Products" (MRP) flag. Setting the `mrp_only` argument to `True` returns the "most useful" products; in the case of JWST, this will exclude guide-star products and return only the most calibrated results.

By default, `download_products` will have this set to False and you will download all available products.

In [None]:
# Download the Minimum Recommended Products for our MIRI Observations.
manifest = Observations.download_products(data_products, mrp_only=True)

In [None]:
# The return is an Astropy table that contains status of your download and the local path where it saved the file.
manifest

In [None]:
# To complete this part of the tutorial, let's show the image! Let's select the row containing the i2d.fits file.
for idx, path in enumerate(manifest['Local Path']):
    if '_i2d.fits' in path: index = idx

# Store the local path as a scalar string.
i2d_file = manifest[index]['Local Path']
print(i2d_file)

In [None]:
# Read in the image data.
sci_data = fits.getdata(i2d_file, ext=1)

In [None]:
# Show the image.
norm = ImageNormalize(sci_data, 
                      interval=ZScaleInterval(),
                      stretch=SquaredStretch())
plt.figure(figsize=(8, 8))
plt.imshow(sci_data, cmap='gray', norm=norm, origin='lower')

# Colorbar
im_ratio = sci_data.shape[0]/sci_data.shape[1]
plt.colorbar(fraction=0.047*im_ratio)

# Short Pause!

Any questions or issues following along as we covered searching and downloading data from the ERO observations of the Southern Ring Nebula?

Next: let's walk through downloading data for observations with LOTS of products and observations.

## Downloading Calibrated Data For Multi-Object Spectral Programs

Now to our original motivation: downloading data from a multi-object spectroscopic program.  For this tutorial, we are going to use NIRISS data from [Program 2736](https://www.stsci.edu/cgi-bin/get-proposal-info?id=2736&observatory=JWST), the ERO program targeting the galaxy cluster SMACS 0723. 

Let's first do a search for this program to get the number of observations available.

In [None]:
# Get all JWST observations from Program 2736.  We only want the spectroscopic (wide-field slitless
# spectroscopy, or "WFSS") observations from NIRISS, so exclude the imaging mode observatons by specifying a
# product type in our observation query.
obs_table = Observations.query_criteria(obs_collection=["JWST"], 
                                        proposal_id=[2736],
                                        instrument_name=["NIRISS"], 
                                        dataproduct_type="spectrum")
n_obs = len(obs_table)
print("Number of observations: {0:d}".format(n_obs))

This one program has nearly two _thousand_ NIRISS observations at MAST.  Most of these are Stage 3 products in the form of extracted spectra. If we request all the files associated with these observations, the service will need to return so many files it's likely to time out!

In fact, for this program, there are 24,862 files across all the different instruments and stages of the pipeline products. Unfortunately, many files are associated with more than one observation. As a result, this query might return an even larger number of files, due to duplicate observations. This is a quirk we are actively working on eliminating.

At any rate, even if we limit only to the highest-stage of calibrated products, there are more than 5,800 NIRISS extracted spectral FITS files we can retrieve. We mention these numbers to caution you: <b>depending on the type of observation, there can be many products underneath a single observation at MAST</b>.  

So how *do* we download all the calibrated products for these observations? 

To avoid a timeout error, we will not try to retreive products for all 2000 observations. Instead, we should break up our requests into subsets of observations, then call `download_products()` directly on those subsets. Subsets of length five generally perform best, without running the risk of timing out. Once we have the results, we can request only the calibrated data products to download.  Let's do that now.

In [None]:
# We will request products for the CAOM Observations in bunches to minimize
# the number of requests made, without trying to get back too many at once.
num_at_once = 2

# For the purposes of our tutorial, we don't *actually* want to download thousands of files, so instead of looping
# through the complete set of 1,900+ observations, let's trim our Astropy table to the first 20 rows.
# If you want to try to retrieve all the products on your own time, just comment out these two lines below.
obs_table = obs_table[:20]
n_obs = len(obs_table)

# Loop through all the observations, and request the list of associated products in subsets.  We will pass the 
# `obsid` column in our Astropy table of observations to `download_products()` so it knows what records we want
# products for.  This bypasses the need to call `get_product_list()` first.

for batch, index in enumerate(range(0, n_obs, num_at_once)):
    
    # Make a list of the `obsid` identifiers from our Astropy table of observations to get products for.
    # We grab `num_at_once` rows at a time.
    if index+num_at_once <= n_obs:
        obsids = list(obs_table[index:index+num_at_once]['obsid'])
    else:
        obsids = list(obs_table[index:n_obs])
        
    # Progress indicator...
    print( f'Batch #{batch+1}' )
    print( f'For ``obsids``:' + '\n' + f'{obsids}' + '\n' )
    
    # Get list of products 
    products = Observations.get_product_list(obsids)
    
    # Filter the products to only get science products of calibration level 3 
    filtered_products = Observations.filter_products(products, 
                                                     productType=["SCIENCE"], 
                                                     calib_level=[3])
    
    # Download products for these records.
    print('Products downloaded:')
    print(filtered_products['productFilename'])
    #manifest = Observations.download_products(filtered_products)

<div class="alert alert-block alert-info">
<b>(text to work on)</b>
NOTE: the `obsid` parameter is an internal reference ID, defined by a unique integer, and is only used to look-up products in our cross-mission database tables by specific functions.  Since it is just a random, unique number, by itself it has no value to a user, and furthermore MAST reserves the right to change this value over time.

This is in stark contrast to the `Observation ID` column you see in the Portal browser (the `obs_id` column if using `astroquery.mast`), which *is* a more useful identifier to a given MAST observation, since it is a string that encodes information about the observation following rules defined by the mission.

In summary: `obsid` is a more-or-less random integer that contains no useful insignt other than to serve as a pointer to a particular observation at MAST.  It is used only by MAST functions behind-the-scenes or when calling `download_products()` without relying on the return value of `get_product_list()`.  `Observation ID` (in the Portal GUI) or `obs_id` (in `astroquery.mast`) is a unique string that follows a naming convention defined by the mission.
</div>

***

I would say that this is supplemental information. If you want we can weave it into the narrative, but it's not _needed_ to download data. I can take a look at this once I'm back

-TD

## About this notebook

This notebook was developed by Scott Fleming and Jenny Medina. For support, please contact the Archive HelpDesk at archive@stsci.edu, or through the [JWST HelpDesk Portal](https://jwsthelp.stsci.edu). 
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/>