# Get observations from program ID
This notebook uses the python [astroquery.mast Observations](https://astroquery.readthedocs.io/en/latest/mast/mast_obsquery.html) class of the [MAST API](https://mast.stsci.edu/api/v0/) to query specific data products of a specific program. We are looking for NIRISS imaging and WFSS files of the [NGDEEP program](https://www.stsci.edu/jwst/phase2-public/2079.pdf) (ID 2079). The observations are in three [NIRISS filters](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-pupil-and-filter-wheels): F115W, F150W, and F200W using both GR150R and GR150C grisms.

**Use case**: use MAST to download data products.<br>
**Data**: JWST/NIRISS images and spectra from program 2079.<br>
**Tools**: astropy, astroquery, numpy, os, glob, (yaml)<br>
**Cross-instrument**: all<br>

**Content**
- [Imports](#imports)
- [Querying for Observations](#setup)
  - [Search with Proposal ID](#propid)
  - [Search with Observation ID](#obsid)
- [Filter and Download Products](#filter)
  - [Filtering Data Before Downloading](#filter_data)
  - [Downloading Data](#downloading)
- [Reorganize Directory Structure](#reorg)


**Author**: Camilla Pacifici (cpacifici@stsci.edu) & Rachel Plesha (rplesha@stsci.edu)<br>
**Last modified**: January 2024

This notebook was inspired by the [JWebbinar session about MAST](https://github.com/spacetelescope/jwebbinar_prep/blob/main/mast_session/Crowded_Field/Crowded_Field.ipynb).

<a id='imports'></a>
## Imports

In [1]:
from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.io import fits
from astroquery.mast import Observations
import numpy as np
import os
import glob

<a id='setup'></a>
## Querying for Observations

The observations class in ``astroquery.mast`` is used to download JWST data. Use the metadata function to see the available search options and their descriptions.

Note that for JWST, the instrument names have a specific format. More information about that can be found at: https://outerspace.stsci.edu/display/MASTDOCS/JWST+Instrument+Names 

In [None]:
Observations.get_metadata("observations")

The two most common ways to download specific datasets are by using the [proposal ID](https://www.stsci.edu/jwst/science-execution/program-information) or by using the [observation ID](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/file_naming.html).

<a id='propid'></a>
#### Search with Proposal ID

In [None]:
# Select the proposal ID, instrument, and some useful keywords (filters in this case).
obs_table = Observations.query_criteria(obs_collection=["JWST"], 
                                        instrument_name=["NIRISS/IMAGE", "NIRISS/WFSS"],
                                        provenance_name=["CALJWST"], # Executed observations
                                        filters=['F115W','F150W','F200W'],
                                        proposal_id=[2079],
                                       )

print(len(obs_table), 'files found')
# look at what was obtained in this query for a select number of column names of interest
obs_table[['obs_collection', 'instrument_name', 'filters', 'target_name', 'obs_id', 's_ra', 's_dec', 't_exptime', 'proposal_id']]

<a id='obsid'></a>
#### Search with Observation ID
The observation ID (obs_id) allows for flexibility of searching by the proposal ID and the observation ID because of how the JWST filenames are structured. More information about the JWST file naming conventions can be found at: https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/file_naming.html.

Additionally, there is flexibility using wildcards inside of the search criteria. For example, instead of specifying both "NIRISS/IMAGE" and "NIRISS/WFSS", we can specify "NIRISS*", which picks up both file modes. The wildcard also works within the obs_id, so we do not have to list all of the different IDs.

In [None]:
# Obtain a list to download from a specific list of observation IDs instead
obs_id_table = Observations.query_criteria(instrument_name=["NIRISS*"],
                                           provenance_name=["CALJWST"], # Executed observations
                                           obs_id=['jw02079-o004*'], # Searching for PID 2079 observation 004
                                          ) 

print(len(obs_id_table), 'files found')

<a id='filter'></a>
## Filter and Download Products

If there are too many files to download, the API will time out. Instead, it is better to divide the observations in batches to download one at a time.

In [None]:
batch_size = 5 # 5 files at a time maximizes the download speed.

# Let's split up our list of files, ``obs_table``, into batches according to our batch size.
obs_batches = [obs_table[i:i+batch_size] for i in range(0, len(obs_table), batch_size)]
print("How many batches?", len(obs_batches))

single_group = obs_batches[0] # Useful to inspect the files obtained from one group
print("Inspect the first batch to ensure that it matches expectations of what you want downloaded:")
single_group['obs_collection', 'instrument_name', 'filters', 'target_name', 'obs_id', 's_ra', 's_dec', 't_exptime', 'proposal_id']

Select the type of products needed. The various levels are:
- uncalibrated files
    - productType=["SCIENCE"]
    - productSubGroupDescription=['UNCAL']
    - calib_level=[1]
- rate images
    - productType=["SCIENCE"]
    - productSubGroupDescription=['RATE']
    - calib_level=[2]
- level 2 associations for both spectroscopy and imaging
    - productType=["INFO"]
    - productSubGroupDescription=['ASN']
    - calib_level=[2]
- level 3 associations for imaging
    - productType=["INFO"]
    - productSubGroupDescription=['ASN']
    - dataproduct_type=["image"]
    - calib_level=[3]

<a id='filter_data'></a>
#### Filtering Data Before Downloading

In [None]:
# creating a dictionary of the above information to use for inspection of the filtering function
file_dict = {'uncal' : {'product_type' : 'SCIENCE',
                        'productSubGroupDescription' : 'UNCAL',
                        'calib_level' : [1]},
             'rate' : {'product_type' : 'SCIENCE',
                       'productSubGroupDescription' : 'RATE',
                       'calib_level' : [2]},
             'level2_association' : {'product_type' : 'INFO',
                                     'productSubGroupDescription' : 'ASN',
                                     'calib_level' : [2]},
             'level3_association' : {'product_type' : 'INFO',
                                     'productSubGroupDescription' : 'ASN',
                                     'calib_level' : [3]},
             }

In [None]:
## Look at the files existing for each of these different levels
files_to_download = []
for index, batch_exposure in enumerate(single_group):
    
    print('*'*50)
    print(f"Exposure #{index+1} ({batch_exposure['obs_id']})")
    # pull out the product names from the list to filter
    products = Observations.get_product_list(batch_exposure)
    
    for filetype, query_dict in file_dict.items():
        print('File type:', filetype)
        filtered_products = Observations.filter_products(products,
                                                         productType=query_dict['product_type'],
                                                         productSubGroupDescription=query_dict['productSubGroupDescription'],
                                                         calib_level=query_dict['calib_level'],
                                                         )
        print(filtered_products['productFilename'])
        files_to_download.extend(filtered_products['productFilename'])
        print()
    print('*'*50)


From above, we can see that for each exposure name in the observation list (`obs_table`), there are many associated files in the background that need to be downloaded as well. This is why we need to work in batches to download.

<a id='downloading'></a>
#### Downloading Data
To actually download the products, provide ``Observations.download_products()`` with a list of the filtered products. 

If the data are proprietary, you may also need to set up your API token. **NEVER** commit your token to a public repository. An alternative is to create a separate configuration file (config_file.yaml) that is readable only to you and has a key: 'mast_token' : _API token_

To make create a new API token visit to following link: 
    https://auth.mast.stsci.edu/token?suggested_name=Astroquery&suggested_scope=mast:exclusive_access

In [None]:
download_dir = 'data'

# Now let's get the products for each batch of observations, and filter down to only the products of interest.
for index, batch in enumerate(obs_batches):
    
    # Progress indicator...
    print('\n'+f'Batch #{index+1} / {len(obs_batches)}')
    
    # Make a list of the `obsid` identifiers from our Astropy table of observations to get products for.
    obsids = batch['obsid']
    print('Working with the following ``obsid``s:')
    for number, obs_text in zip(obsids, batch['obs_id']):
        print(f"{number} : {obs_text}")
    
    # Get list of products 
    products = Observations.get_product_list(obsids)
    
    # Filter the products to only get only the products of interest
    filtered_products = Observations.filter_products(products, 
                                                     productType=["SCIENCE", "INFO"],
                                                     productSubGroupDescription=["RATE", "ASN"], # Not using "UNCAL" here since we can start with the rate files
                                                     calib_level=[2, 3] # not using 1 here since not getting the UNCAL files
                                                    )
    # Download products for these records.
    manifest = Observations.download_products(filtered_products,
                                              download_dir=download_dir,
                                              flat=True, # astroquery v0.4.7 or later only
                                              ) 
    print('Products downloaded:\n', filtered_products['productFilename'])
    
    # only downloading the first batch of 5 observations
    break # comment this out if you want to download everything

In [None]:
downloaded_files = glob.glob(os.path.join(download_dir, '*.fits')) + glob.glob(os.path.join(download_dir, '*.json'))
print(len(downloaded_files), 'files downloaded to:', download_dir)

<a id='reorg'></a>
## Reorganize Directory Structure

This section takes the downloaded data and sorts it into a slightly different file structure to work with later in the notebooks. The expected format for this section is that all downloaded data are in a single directory (`flat=True`). The new file stucture is as follows:
- all rate files under NGDEEP/rate
- level 2 association files under NGDEEP/asn_level2
- level 3 association files under NGDEEP/asn_level3

In [None]:
# first, make all of the new directories
topdir = 'data/NGDEEP'
if not os.path.exists(topdir):
    os.mkdir(topdir)
    print('Created:', topdir)

for subdir in ['rate', 'asn_level2', 'asn_level3']:
    new_subdir = os.path.join(topdir, subdir)
    if not os.path.exists(new_subdir):
        os.mkdir(new_subdir)
        print('Created:', new_subdir)

file_dict = {'rate' : 'rate',
             'image2' : 'asn_level2',
             'spec2' : 'asn_level2',
             'image3' : 'asn_level3',
             'spec3' : 'asn_level3',
            }

# now move all of the files to the appropriate locations
for filename in glob.glob('data/*.fits') + glob.glob('data/*.json'):
    try:
        index = filename.split('_')[2] # json files; looking for image2/image3 or spec2/spec3
        subdir = file_dict[index]
    except KeyError:
        try:
            index2 = filename.split('_')[-1].split('.')[0] # rate files
            subdir = file_dict[index2]
        except KeyError:
            print(f'Unrecognized index: {index} or {index2}')
            continue
    
    new_file = os.path.join(topdir, subdir, os.path.basename(filename))
    os.rename(filename, new_file)
    print(f'Moved: {filename} to {new_file}')

<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/>