# MIRI MRS IFU Spectroscopy Part 1: 
# Downloading Data

Aug 2023

**Use case:** Reduce MRS Data With User Defined Master Background Step. This is particularly relevant if you did not obtain a Dedicated Background with your observations. While the pipeline will subtract a sky background derived from an annulus, the underlying background may be prohibitively complicated and the user may wish to measure their own background from elsewhere in the cube.<br>
**Data:** Publicly available science data for SN 1987A (Program 1232). For this notebook, we will follow the science workflow outlined by [Jones et al. 2023](https://ui.adsabs.harvard.edu/abs/2023arXiv230706692J/abstract).<br>
**Tools:** jwst, jdaviz, matplotlib, astropy.<br>
**Cross-intrument:** NIRSpec, MIRI.<br>
**Documentation:** This notebook is part of a STScI's larger [post-pipeline Data Analysis Tools Ecosystem](https://jwst-docs.stsci.edu/jwst-post-pipeline-data-analysis) and can be [downloaded](https://github.com/spacetelescope/dat_pyinthesky/tree/main/jdat_notebooks/MRS_Mstar_analysis) directly from the [JDAT Notebook Github directory](https://github.com/spacetelescope/jdat_notebooks).<br>

### Introduction: Spectral extraction in the JWST calibration pipeline

The JWST calibration pipeline performs spectrac extraction for all spectroscopic data using basic default assumptions that are tuned to produce accurately calibrated spectra for the majority of science cases. This default method is a simple fixed-width boxcar extraction, where the spectrum is summed over a number of pixels along the cross-dispersion axis, over the valid wavelength range. An aperture correction is applied at each pixel along the spectrum to account for flux lost from the finite-width aperture. 

The ``extract_1d`` step uses the following inputs for its algorithm:
- the spectral extraction reference file: this is a json-formatted file, available as a reference file from the [JWST CRDS system](https://jwst-crds.stsci.edu)
- the bounding box: the ``assign_wcs`` step attaches a bounding box definition to the data, which defines the region over which a valid calibration is available. We will demonstrate below how to visualize this region. 

However the ``extract_1d`` step has the capability to perform more complex spectral extractions, requiring some manual editing of parameters and re-running of the pipeline step. 


### Aims

This notebook will demonstrate how to re-run the spectral extraction step with different settings to illustrate the capabilities of the JWST calibration pipeline. 


### Assumptions

We will demonstrate the spectral extraction methods on resampled, calibrated spectral images. The basic demo and two examples run on Level 3 data, in which the nod exposures have been combined into a single spectral image. Two examples will use the Level 2b data - one of the nodded exposures. 


### Test data

The data used in this notebook is an observation of the Type Ia supernova SN2021aefx, observed by Jha et al in PID 2072 (Obs 1). These data were taken with zero exclusive access period, and published in [Kwok et al 2023](https://ui.adsabs.harvard.edu/abs/2023ApJ...944L...3K/abstract). You can retrieve the data from [this Box folder](https://stsci.box.com/s/i2xi18jziu1iawpkom0z2r94kvf9n9kb), and we recommend you place the files in the ``data/`` folder of this repository, or change the directory settings in the notebook prior to running. 

You can of course use your own data instead of the demo data. 


### JWST pipeline version and CRDS context

This notebook was written using the calibration pipeline version 1.10.2. We set the CRDS context explicitly to 1089 to match the current latest version in MAST. If you use different pipeline versions or CRDS context, please read the relevant release notes ([here for pipeline](https://github.com/spacetelescope/jwst), [here for CRDS](https://jwst-crds.stsci.edu)) for possibly relevant changes.

### Contents

1. [The Level 3 data products](#l3data)
2. [The spectral extraction reference file](#x1dref)
3. [Example 1: Changing the aperture width](#ex1)
4. [Example 2: Changing the aperture location](#ex2)
5. [Example 3: Extraction with background subtraction](#ex3)
6. [Example 4: Tapered column extraction](#ex4)

## Import Packages

- `astropy.io` fits for accessing FITS files
- `os` for managing system paths
- `matplotlib` for plotting data
- `urllib` for downloading data
- `tarfile` for unpacking data
- `numpy` for basic array manipulation
- `jwst` for running JWST pipeline and handling data products
- `json` for working with json files
- `crds` for working with JWST reference files

In [None]:
# Set CRDS variables first
import os

os.environ['CRDS_CONTEXT'] = 'jwst_1089.pmap'
os.environ['CRDS_PATH'] = os.environ['HOME']+'/crds_cache'
os.environ['CRDS_SERVER_URL'] = 'https://jwst-crds.stsci.edu'
print(f'CRDS cache location: {os.environ["CRDS_PATH"]}')

In [None]:
import sys,os, pdb
# Basic system utilities for interacting with files
import glob
import time
import shutil
import warnings
import zipfile
import urllib.request

# Astropy utilities for opening FITS and ASCII files
from astropy.io import fits
from astropy.io import ascii
from astropy.utils.data import download_file
from regions import Regions
from astropy import units as u

from astroquery.mast import Observations

# Astropy utilities for making plots
from astropy.visualization import (LinearStretch, LogStretch, ImageNormalize, ZScaleInterval)

# Numpy for doing calculations
import numpy as np

# Matplotlib for making plots
import matplotlib.pyplot as plt
from matplotlib import rc

# Import the base JWST package
import jwst

# JWST pipelines (encompassing many steps)
from jwst.pipeline import Detector1Pipeline
from jwst.pipeline import Spec2Pipeline
from jwst.pipeline import Spec3Pipeline

# JWST pipeline utilities
from jwst import datamodels # JWST datamodels
from jwst.associations import asn_from_list as afl # Tools for creating association files
from jwst.associations.lib.rules_level2_base import DMSLevel2bBase # Definition of a Lvl2 association file
from jwst.associations.lib.rules_level3_base import DMS_Level3_Base # Definition of a Lvl3 association file

from stcal import dqflags # Utilities for working with the data quality (DQ) arrays

import shutil

# Import packages for multiprocessing.  These won't be used on the online demo, but can be
# very useful for local data processing unless/until they get integrated natively into
# the cube building code.  These need to be imported before anything else.

import multiprocessing
#multiprocessing.set_start_method('fork')
from multiprocessing import Pool
import os

# Set the maximum number of processes to spawn based on available cores
usage = 'all' # Either 'none' (single thread), 'quarter', 'half', or 'all' available cores

from specutils import Spectrum1D
from matplotlib.pyplot import cm

from jdaviz import Cubeviz

#shutil.copytree('/astro/armin/data/mshahbandeh/aefx/input_dir/', '/astro/armin/data/mshahbandeh/aefx/input_dir_sc/')
#shutil.copytree('/astro/armin/data/mshahbandeh/aefx/input_dir/', '/astro/armin/data/mshahbandeh/aefx/input_dir_bkg/')

In [None]:
# Set parameters to be changed here.
# It should not be necessary to edit cells below this in general unless modifying pipeline processing steps.

import sys,os, pdb

# CRDS context (if overriding)
#%env CRDS_CONTEXT jwst_0771.pmap

# Point to where the uncalibrated FITS files are from the science observation
input_dir = './mastDownload/1232/uncal/'

# Point to where you want the output science results to go
output_dir = './output/87A/'

# Point to where the uncalibrated FITS files are from the background observation
# If no background observation, leave this blank
input_bgdir = ' '

# Point to where the output background observations should go
# If no background observation, leave this blank
output_bgdir = ' '

# Whether or not to run a given pipeline stage
# Science and background are processed independently through det1+spec2, and jointly in spec3

# Science processing
dodet1=True
dospec2=True
dospec3=True

# Background processing
dodet1bg=True
dospec2bg=True

# If there is no background folder, ensure we don't try to process it
if (input_bgdir == ''):
    dodet1bg=False
    dospec2bg=False

In [None]:
## Output subdirectories to keep science data products organized
## Note that the pipeline might complain about this as it is intended to work with everything in a single
## directory, but it nonetheless works fine for the examples given here.
det1_dir = os.path.join(output_dir, 'stage1/') # Detector1 pipeline outputs will go here
#spec2_dir = os.path.join(output_dir, 'stage2/') # Spec2 pipeline outputs will go here
spec2_dir = os.path.join(output_dir, 'stage2/') # Spec2 pipeline outputs will go here
spec2_bgdir = ' '
#spec3_dir = os.path.join(output_dir, 'stage3/') # Spec3 pipeline outputs will go here
spec3_dir = os.path.join(output_dir, 'stage3/') # Spec3 pipeline outputs will go here

# We need to check that the desired output directories exist, and if not create them
if not os.path.exists(det1_dir):
    os.makedirs(det1_dir)
if not os.path.exists(spec2_dir):
    os.makedirs(spec2_dir)
if not os.path.exists(spec3_dir):
    os.makedirs(spec3_dir)

In [None]:
# Output subdirectories to keep background data products organized
det1_bgdir = os.path.join(output_bgdir, 'stage1/') # Detector1 pipeline outputs will go here
spec2_bgdir = os.path.join(output_bgdir, 'stage2/') # Spec2 pipeline outputs will go here

# We need to check that the desired output directories exist, and if not create them
if (output_bgdir != ''):
    if not os.path.exists(det1_bgdir):
        os.makedirs(det1_bgdir)
    if not os.path.exists(spec2_bgdir):
        os.makedirs(spec2_bgdir)

# 2. Download all MRS data from SN 1987A PID 1232 (Public)

#### If you want to run the entire MRS pipeline from start to finish, you will need to download nearly 100 GB of data. The vast majority of these data are the Level0 raw ramp (uncal.fits) and Level1 ramp (rate.fits and rateints.fits) files. For our purposes, we encourage you to simply download the Level2 calibrated data (cal.fits), which totals only 3 GB.

In [None]:
# Let's get a list of all observations associated with this proposal
obs_list = Observations.query_criteria(proposal_id=1232)

# We can chooose the columns we want to display in our table
disp_col = ['dataproduct_type','instrument_name','calib_level','obs_id',
            'target_name','filters','proposal_pi', 'obs_collection']
obs_list[disp_col].show_in_notebook()

In [None]:
# Level 2b cal.fits

mask = (obs_list['instrument_name'] == 'MIRI/IFU')
data_products = Observations.get_product_list(obs_list[mask])

filtered_prod = Observations.filter_products(data_products, calib_level=[2], productType="SCIENCE", productSubGroupDescription="CAL")

# Again, we choose columns of interest for convenience
disp_col = ['obsID','dataproduct_type','productFilename','size','calib_level']
filtered_prod.show_in_notebook(display_length=10)

In [None]:
total = sum(filtered_prod['size'])
print('{:.2f} GB'.format(total/10**9))

In [None]:
# Don't forget to login, if accessing non-public data! You can un-comment the line below:
# Observations.login()

# You can download all of the products by removing the '[:5]' from the line below:
manifest = Observations.download_products(filtered_prod)
print(manifest['Status'])

In [None]:
# Check to see if the input directory exists. If not, create it. Move all _cal.fits files into that directory.

if os.path.exists(input_dir):
    print(input_dir+" already exists")
else:
    print("Creating Directory "+input_dir)
    os.mkdir(input_dir)
    
if os.path.exists("./output"):
    print("./output already exists")
else:
    print("Creating Directory ./output")
    os.mkdir("./output")
    
if os.path.exists(output_dir):
    print(output_dir+" already exists")
else:
    print("Creating Directory "+output_dir)
    os.mkdir(output_dir)
    
if os.path.exists(det1_dir):
    print(det1_dir+" already exists")
else:
    print("Creating Directory "+det1_dir)
    os.mkdir(det1_dir)
    
if os.path.exists(spec2_dir):
    print(spec2_dir+" already exists")
else:
    print("Creating Directory "+spec2_dir)
    os.mkdir(spec2_dir)
    
if os.path.exists(spec3_dir):
    print(spec3_dir+" already exists")
else:
    print("Creating Directory "+spec3_dir)
    os.mkdir(spec3_dir)
    
print("Moving All Uncal Files To Input Directory")
for file in glob.glob('./mastDownload/JWST/*/*_uncal.fits'):
    root = file.split('/')
    print(root[-1])
    if os.path.isfile(input_dir+'/'+root[-1]):
        print('Deleting '+input_dir+'/'+root[-1])
        os.remove(input_dir+'/'+root[-1])
    print('Moving '+input_dir+'/'+root[-1])
    shutil.move(file, input_dir)
    
print("Moving All Cal Files To Input Directory")
for file in glob.glob('./mastDownload/JWST/*/*_cal.fits'):
    root = file.split('/')
    print(root[-1])
    if os.path.isfile(spec2_dir+'/'+root[-1]):
        print('Deleting '+spec2_dir+'/'+root[-1])
        os.remove(spec2_dir+'/'+root[-1])
    print('Moving '+spec2_dir+'/'+root[-1])
    shutil.move(file, spec2_dir)

# 2. Detector1 Pipeline

#### Not necessary to run the Detector1 stage of the pipeline for this notebook. But here is sample code in case you do.

# 3. Spec2 Pipeline


#### Not necessary to run the Spec2 stage of the pipeline for this notebook. But here is sample code in case you do.

# 4. Spec3 Pipeline

In [None]:
# Define a useful function to write out a Lvl3 association file from an input list
# Note that any background exposures have to be of type x1d.
def writel3asn(scifiles, bgfiles, asnfile, prodname):
    # Define the basic association of science files
    asn = afl.asn_from_list(scifiles, rule=DMS_Level3_Base, product_name=prodname)
        
    # Add background files to the association
    nbg=len(bgfiles)
    for ii in range(0,nbg):
        asn['products'][0]['members'].append({'expname': bgfiles[ii], 'exptype': 'background'})
        
    # Write the association to a json file
    _, serialized = asn.dump()
    with open(asnfile, 'w') as outfile:
        outfile.write(serialized)

In [None]:
# Find and sort all of the input files

# Science Files need the cal.fits files
sstring = spec2_dir + 'jw*mirifu*cal.fits'
calfiles = np.array(sorted(glob.glob(sstring)))

# Background Files need the x1d.fits files
sstring = spec2_bgdir + 'jw*mirifu*x1d.fits'
bgfiles = np.array(sorted(glob.glob(sstring)))

print('Found ' + str(len(calfiles)) + ' science files to process')
print('Found ' + str(len(bgfiles)) + ' background files to process')

In [None]:
# Make an association file that includes all of the different exposures
#asnfile=os.path.join(output_dir, 'spec2_l3asn.json')
asnfile='spec2_l3asn.json'
dospec3 = 1.
if dospec3:
    writel3asn(calfiles, bgfiles, asnfile, 'Level3')

In [None]:
asnfile

#### Running spec3 in 'multi' output mode to create a single cube with all sub-bands stitched together.

In [None]:
# Define a function that will call the spec3 pipeline with our desired set of parameters
# This is designed to run on an association file
def runspec3(filename):
    crds_config = Spec3Pipeline.get_config_from_reference(filename)
    spec3 = Spec3Pipeline.from_config_section(crds_config)

    spec3.output_dir = spec3_dir
    spec3.save_results = True

    # Cube building configuration options
    spec3.cube_build.output_type = 'multi' # 'band', 'channel', or 'multi' type cube output

    # Overrides for whether or not certain steps should be skipped
    spec3.master_background.skip = True
    spec3.subtract_background = False
    spec3.extract_1d.subtract_background=False
    spec3(filename)


In [None]:
spec3 = 1.
if dospec3:
    runspec3(asnfile)
else:
    print('Skipping Spec3 processing')