<img style="float: center;" src='https://github.com/spacetelescope/jwst-pipeline-notebooks/raw/main/_static/stsci_header.png' alt="stsci_logo" width="900px"/> 

# NIRISS Wide Field Slitless Spectroscopy (WFSS) Pipeline Notebook

**Authors**: R. Plesha<br>
**Last Updated**: August 19, 2025<br>
**Pipeline Version**: 1.19.1 (Build 12.0)

# **Purpose**:

This notebook provides a framework for processing generic Near-Infrared Imager and Slitless Spectrograph (NIRISS) wide field slitless spectroscopy (WFSS) data through the James Webb Space Telescope (JWST) pipeline.  Data is assumed to be located in one observation folder according to paths set up below. It should not be necessary to edit any cells other than in the [Configuration](#1.-Configuration) section unless modifying the standard pipeline processing options. Additional notebooks showing how to optimize and modify sources being extracted for NIRISS WFSS data can be found on the [JDAT notebooks github](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

**Data**:
This example uses data from the [Program ID 2079](https://www.stsci.edu/jwst/science-execution/program-information?program=2079) observation 004 (PI: Finkelstein) observing the Hubble Ultra Deep Field (HUDF). The observations are in three [NIRISS filters](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-pupil-and-filter-wheels): F115W, F150W, and F200W use both GR150R and GR150C [grisms](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-gr150-grisms). In this example we are only looking at data using the F200W filter. A [NIRISS WFSS observation sequence](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-observing-strategies/niriss-wfss-recommended-strategies) typically consists of a direct image followed by a grism observation in the same blocking filter to help identify the sources in the field. In program 2079, the exposure sequence follows the pattern: direct image -> GR150R -> direct image -> GR150C -> direct image.

Example input data to use will be downloaded automatically unless disabled (i.e., to use local files instead).

**JWST pipeline version and CRDS context** This notebook was written for the calibration pipeline version given above. The JWST Calibration Reference Data System (CRDS) context used is associated with the pipeline version as listed [here](https://jwst-crds.stsci.edu/display_build_contexts/). If you use different pipeline version or CRDS context, please read the relevant release notes ([here for pipeline](https://github.com/spacetelescope/jwst), [here for CRDS](https://jwst-crds.stsci.edu/)) for possibly relevant changes.<BR>

**Updates**:
This notebook is regularly updated as improvements are made to the pipeline. Find the most up to date version of this notebook at: https://github.com/spacetelescope/jwst-pipeline-notebooks/

**Recent Changes**:<br>
August 19, 2025: original notebook released<br>

<hr style="border:1px solid gray"> </hr>

## Table of Contents
1. [Configuration](#1.-Configuration) 
2. [Package Imports](#2.-Package-Imports)
3. [Demo Mode Setup](#3.-Demo-Mode-Setup)
4. [Directory Setup](#4.-Directory-Setup)
5. [Detector 1 Pipeline](#5.-Detector1-Pipeline)
6. [Image2 Pipeline](#6.-Image2-Pipeline)
7. [Image3 Pipeline](#7.-Image3-Pipeline)
8. [Visualize the source catalog](#8.-Visualize-the-source-catalog)
9. [Spec2 Pipeline](#9.-Spec2-Pipeline)
10. [Spec3 Pipeline](#10.-Spec3-Pipeline)
11. [Visualize the Spectra](#11.-Visualize-the-spectra)

<hr style="border:1px solid gray"> </hr>

## 1. Configuration
------------------
Set basic configuration for running notebook.

#### Install dependencies and parameters

To make sure that the pipeline version is compatabile with the steps
discussed below and the required dependencies and packages are installed,
you can create a fresh conda environment and install the provided
`requirements.txt` file:
```
conda create -n niriss_wfss_pipeline python=3.11
conda activate niriss_wfss_pipeline
pip install -r requirements.txt
```

Set the basic parameters to use with this notebook. These will affect
what data is used, where data is located (if already in disk), and
pipeline modules run in this data. The list of parameters are:

* demo_mode
* sci_dir (directory where the data is / will be)
* pipeline modules:
  * dodet1
  * doimage2
  * doimage3
  * dospec2
  * dospec3
* doviz (show visualizations of the data within the notebook)

In [None]:
# Basic import necessary for configuration
import os

<div class="alert alert-block alert-warning">
Adjust any parameters in the cell directly below this before running to ensure <code>demo_mode</code> runs correctly.
</div>

Set <code>demo_mode = True</code> to run in demonstration mode. In this mode this notebook will download example data from the Barbara A.
Mikulski Archive for Space Telescopes (MAST) and process everything through the pipeline. This will all happen in a local directory unless modified in [Section 3](#3.-Demo-Mode-Setup) below.

Set <code>demo_mode = False</code> if you want to process your own data that has already been downloaded and provide the location of the data in the `sci_dir` variable in the cell below.<br>

In [None]:
# Set parameters for demo_mode, channel, band, data mode directories, and 
# processing steps.

# -----------------------------Demo Mode---------------------------------
demo_mode = True

if demo_mode:
    print('Running in demonstration mode using online example data!')

# --------------------------User Mode Directories------------------------
# If demo_mode = False, look for user data in these paths
if not demo_mode:
    # Set directory paths for processing specific data; these will need
    # to be changed to your local directory setup (below are given as
    # examples)
    user_home_dir = os.path.expanduser('~')

    # Point to where science observation data are
    # Assumes uncalibrated data in sci_dir/uncal/ and results in stage1,
    # stage2, stage3 directories
    sci_dir = os.path.join(user_home_dir, 'nis_wfss_demo_data/PID2079/obs004/')

    print(f'Running using user input data from: {sci_dir}')

cwd = os.getcwd()
# --------------------------Set Processing Steps--------------------------
# Individual pipeline stages can be turned on/off here.  Note that a later
# stage won't be able to run unless data products have already been
# produced from the prior stage.

# Science processing
dodet1 = True  # calwebb_detector1
doimage2 = True  # calwebb_image2
doimage3 = True  # calwebb_image3
dospec2 = True # calwebb_spec2
dospec3 = True # calwebb_spec3
doviz = True # Visualize outputs

### Set CRDS context and server
Before importing <code>CRDS</code> and <code>JWST</code> modules, we need to configure our environment. This includes defining a CRDS cache directory in which to keep the reference files that will be used by the calibration pipeline. The pipeline will fetch and download the needed reference files to this directory.

If the root directory for the local CRDS cache directory has not been set already, it will be set to create one in the home directory.

In [None]:
# ------------------------Set CRDS context and paths----------------------

# Set CRDS context (if overriding to use a specific version of reference
# files; leave commented out to use latest reference files by default)
#%env CRDS_CONTEXT  jwst_1413.pmap

# Check whether the local CRDS cache directory has been set.
# If not, set it to the user home directory
if (os.getenv('CRDS_PATH') is None):
    os.environ['CRDS_PATH'] = os.path.join(os.path.expanduser('~'), 'crds')
# Check whether the CRDS server URL has been set.  If not, set it.
if (os.getenv('CRDS_SERVER_URL') is None):
    os.environ['CRDS_SERVER_URL'] = 'https://jwst-crds.stsci.edu'

# Echo CRDS path in use
print(f"CRDS local filepath: {os.environ['CRDS_PATH']}")
print(f"CRDS file server: {os.environ['CRDS_SERVER_URL']}")

<hr style="border:1px solid gray"> </hr>

## 2. Package Imports
------------------

In [None]:
# Basic system utilities for interacting with files
# ----------------------General Imports------------------------------------
import glob
import time

# Data calculations and manipulation
import numpy as np
import pandas as pd

# -----------------------Plotting Imports----------------------------------
from matplotlib import pyplot as plt
# interactive plots within the notebook
%matplotlib inline

# -----------------------Astronomy Imports--------------------------------
# ASCII files, and downloading demo files
from astroquery.mast import MastMissions

# Astropy routines for visualizing detected sources:
from astropy.io import fits
from astropy.io.fits import getheader
from astropy.table import Table

# for JWST calibration pipeline
import jwst
import crds

from jwst.pipeline import Detector1Pipeline
from jwst.pipeline import Image2Pipeline
from jwst.pipeline import Image3Pipeline
from jwst.pipeline import Spec2Pipeline
from jwst.pipeline import Spec3Pipeline

# JWST pipeline utilities
from jwst.associations import asn_from_list  # Tools for creating association files
from jwst.associations.lib.rules_level2_base import DMSLevel2bBase  # Definition of a Lvl2 association file
from jwst.associations.lib.rules_level3_base import DMS_Level3_Base  # Definition of a Lvl3 association file

# Echo pipeline version and CRDS context in use
print(f"JWST Calibration Pipeline Version: {jwst.__version__}")
print(f"Using CRDS Context: {crds.get_context_name('jwst')}")

### Define convenience functions

#### Plotting Spec2 & Spec3 convenience functions

In [None]:
# this function will be used to plot the i2d image for a specific source as well as the catalog x/y centroid for that source
def plot_i2d_plus_source(catname, source_id, ax):
    # open the i2d & catalog and find the associated source number            
    i2dname = catname.replace('cat.ecsv', 'i2d.fits')
    
    cat = Table.read(catname)
    cat_line = cat[cat['label'] == source_id]
    
    # plot the image
    with fits.open(i2dname) as i2d:
        display_vals = [np.nanpercentile(i2d[1].data, 1), np.nanpercentile(i2d[1].data, 98)]
        ax_i2d.imshow(i2d[1].data, vmin=display_vals[0], vmax=display_vals[1], origin='lower', cmap='gist_gray')
    
    # plot up the source catalog
    xcentroid = cat_line['xcentroid'][0]
    ycentroid = cat_line['ycentroid'][0]
    ax.set_xlim(xcentroid-20, xcentroid+20)
    ax.set_ylim(ycentroid-20, ycentroid+20)
    ax.scatter(xcentroid, ycentroid, s=20, facecolors='None', edgecolors='red', alpha=0.9)
    ax.annotate(source_id, 
                (xcentroid+0.5, ycentroid+0.5), 
                fontsize=10,
                color='red')
    
    return ax

In [None]:
# this function is used to plot the wavelength vs. flux values for x1d & c1d spectra for a specific source
def plot_spectrum(specfile, source_fluxes, ax, image3_dir, ext=1, wavemin=1.75, wavemax=2.2, legend=True):
    
    with fits.open(specfile) as spec:

        # pull out relevant keywords
        grism = spec[0].header['FILTER']
        catname = os.path.join(image3_dir, spec[0].header['SCATFILE'])
        try:
            label = f"{grism} dither {spec[0].header['DIT_PATT']}"
        except KeyError:
            label = f"{grism}" # there is no dither in the c1d files

        # find where in the file the source data are
        wh_spec_source = np.where(spec[ext].data['SOURCE_ID'] == source_id)[0]
        
        # if the source isn't in the file, then return a blank axis
        if not len(wh_spec_source):
            print(f'Source {source_id} not found in {specfile}')
            return ax, catname, source_fluxes, grism
                  
        # grab the wavelength & flux data and trim off the edges for visalization purposes
        wave = spec[ext].data['WAVELENGTH'][wh_spec_source]
        flux = spec[ext].data['FLUX'][wh_spec_source]
        
        wh_wave = np.where((wave >= wavemin) & (wave <= wavemax)) # cutting off the edges
        wave = wave[wh_wave]
        flux = flux[wh_wave]
        
        source_fluxes.extend(flux) # keep the flux to set the limits of the plot later
    
    if grism == 'GR150R':
        linestyle = '-'
    else:
        linestyle = '--'

    ax.plot(wave, flux, label=label, ls=linestyle)
    if legend:
        ax.legend(bbox_to_anchor=(1, 1))

    return ax, catname, source_fluxes, grism

In [None]:
# Start a timer to keep track of runtime
time0 = time.perf_counter()

<hr style="border:1px solid gray"> </hr>

## 3. Demo Mode Setup
#### (skip if not using demo data)
------------------

If running in demonstration mode, set up the program information to retrieve the uncalibrated data automatically from MAST using [astroquery](https://astroquery.readthedocs.io/en/latest/mast/mast.html). Here we will be using the [MastMissions](https://spacetelescope.github.io/mast_notebooks/notebooks/multi_mission/missions_mast_search/missions_mast_search.html) interface which allows for flexibility in search criteria, and is equivalent to using the [JWST mission specific search](https://mast.stsci.edu/search/ui/#/jwst) on MAST. <br>

For illustrative purposes, we focus on data taken through the NIRISS [F200W filter](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-filters) and start with uncalibrated data products. To search for additional filters, update the `niriss_pupil` field in `query_criteria` to be a comma separated list of filters in a single string value, i.e. "F200W, F115W". To search for a specific grism used, add the `opticalElements` field in `query_criteria`, setting the value equal to "GR150R" or "GR150C". Note that searching based on a specific grism will not download the associated direct images.

Information about the JWST file naming conventions can be found at: https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/file_naming.html

In [None]:
# Set up the program information and paths for demo program
if demo_mode:
    print('Running in demonstration mode and will download example data from MAST')
    program = 2079
    sci_observtn = '004'
    
    # creating a directory for the data called "nis_wfss_demo_data" 
    #   located in the directory you are currently in
    data_dir = os.path.join(cwd, 'nis_wfss_demo_data')
    sci_dir = os.path.join(data_dir, f"PID{program}/obs{sci_observtn}")
    uncal_dir = os.path.join(sci_dir, 'uncal')

    # Create the directories if they do not exist
    os.makedirs(sci_dir, exist_ok=True)
    os.makedirs(uncal_dir, exist_ok=True)

<div class="alert alert-block alert-warning">
This demo selects only filter <b>F200W</b> data by default; this observation contains data for the F115W and F150W filters, too
</div>

In [None]:
if demo_mode:
    print(f'Using the Missions MAST interface to find data for Program {program} observation {sci_observtn}:')
    missions = MastMissions(mission='jwst')

    # query the data; sometimes this step can take a bit of time
    datasets = missions.query_criteria(instrume='NIRISS',  # From Near-Infrared Imager and Slitless Spectrograph
                                       #opticalElements='GR150R', # uncomment to filter on only GR150R grism data (no direct images)
                                       niriss_pupil='F200W',  # Download only the F200W filter data for this example
                                       program=program,  # Proposal number 2079
                                       observtn=sci_observtn, # observation 004
                                       )
    products = missions.get_unique_product_list(datasets)
    print(f'Total number of unique products found: {len(products)}')

    # filter down to only the files that we need from MAST
    files_to_download = missions.filter_products(products, file_suffix=['_uncal'])
    
    print(f'Total number of uncal files to download: {len(files_to_download)}')

Download all the uncal and association files for the provided program, observation, and filter.

<div class="alert alert-block alert-warning">
Warning: If this notebook is halted during this step the downloaded file
may be incomplete, and cause crashes later on!
</div>

In [None]:
if demo_mode:
    print('Downloading the data:')
    # download uncal file
    manifest = missions.download_products(files_to_download, flat=True, download_dir=uncal_dir)

In [None]:
# Print out the time benchmark
time_download_end = time.perf_counter()
print(f"Runtime so far: {(time_download_end - time0)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

4.<font color='white'>-</font>Directory Setup<a class="anchor" id="intro"></a>
------------------
Set up detailed paths to input/output stages here.

In [None]:
# Define output subdirectories to keep science data products organized
uncal_dir = os.path.join(sci_dir, 'uncal')  # Uncalibrated pipeline inputs should be here
det1_dir = os.path.join(sci_dir, 'stage1')  # calwebb_detector1 pipeline outputs will go here
image2_dir = os.path.join(sci_dir, 'stage2_ima')  # calwebb_image2 pipeline outputs will go here
image3_dir = os.path.join(sci_dir, 'stage3_ima')  # calwebb_image3 pipeline outputs will go here
spec2_dir = os.path.join(sci_dir, 'stage2_spec')  # calwebb_spec2 pipeline outputs will go here
spec3_dir = os.path.join(sci_dir, 'stage3_spec')  # calwebb_spec3 pipeline outputs will go here

# We need to check that the desired output directories exist, and if not create them
if not os.path.exists(det1_dir):
    os.makedirs(det1_dir)
if not os.path.exists(image2_dir):
    os.makedirs(image2_dir)
if not os.path.exists(image3_dir):
    os.makedirs(image3_dir)
if not os.path.exists(spec2_dir):
    os.makedirs(spec2_dir)
if not os.path.exists(spec3_dir):
    os.makedirs(spec3_dir)

In [None]:
# Print out the time benchmark
time1 = time.perf_counter()
print(f"Runtime so far: {time1 - time0:0.4f} seconds")

<hr style="border:1px solid gray"> </hr>

## 5. Detector1 Pipeline
In this section we run the `*_uncal.fits` files through the [Detector1](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_detector1) stage of the pipeline to apply detector level calibrations and create a countrate data product where slopes are fit to the integration ramps. These `*_rate.fits` products are 2D (nrows x ncols), averaged over all integrations. 3D countrate data products (`*_rateints.fits`) are also created (nintegrations x nrows x ncols) which have the fitted ramp slopes for each integration.

If there are no modifications to the steps at this stage needed, you can also save time by downloading these `*_rate.fits` files directly from MAST and starting at stage2. However, it is best to ensure that you are using the same pipeline version as MAST which can be checked in the `CAL_VER` header keyword. 

In stage1, both the direct images (`EXP_TYPE=NIS_IMAGE`) and dispersed grism images (`EXP_TYPE=NIS_WFSS`) are calibrated. In addition to the `EXP_TYPE` keyword, the keyword `FILTER` can be used to distinguish exposure types for NIRISS WFSS data. `FILTER=CLEAR` indicates a direct image while `FILTER=GR150R` or `FILTER=GR150C` indicates a dispersed image. The keyword `PUPIL` is the blocking filter used in both direct images and dispersed images. We can also use the `PATT_NUM`, `XOFFSET`, and `YOFFSET` header keywords to see the dither pattern that was used for both the direct images and the dispersed images. The multiple direct image dithers will be combined in image3, while the multiple dithers in the dispersed images are combined as individual sources after extraction in spec3. 

In [None]:
# Set up a dictionary to define how the Detector1 pipeline should be configured

# Boilerplate dictionary setup
det1dict = {}
det1dict['group_scale'], det1dict['dq_init'], det1dict['saturation'] = {}, {}, {}
det1dict['ipc'], det1dict['superbias'], det1dict['refpix'] = {}, {}, {}
det1dict['linearity'], det1dict['persistence'], det1dict['dark_current'], = {}, {}, {}
det1dict['charge_migration'], det1dict['jump'], det1dict['ramp_fit'] = {}, {}, {}
det1dict['gain_scale'] = {}

# Overrides for whether or not certain steps should be skipped
# skipping the persistence step
#det1dict['persistence']['skip'] = True

# Overrides for various reference files
# Files should be in the base local directory or provide full path
#det1dict['dq_init']['override_mask'] = 'myfile.fits' # Bad pixel mask
#det1dict['saturation']['override_saturation'] = 'myfile.fits' # Saturation
#det1dict['reset']['override_reset'] = 'myfile.fits' # Reset
#det1dict['linearity']['override_linearity'] = 'myfile.fits' # Linearity
#det1dict['rscd']['override_rscd'] = 'myfile.fits' # RSCD
#det1dict['dark_current']['override_dark'] = 'myfile.fits' # Dark current subtraction
#det1dict['jump']['override_gain'] = 'myfile.fits' # Gain used by jump step
#det1dict['ramp_fit']['override_gain'] = 'myfile.fits' # Gain used by ramp fitting step
#det1dict['jump']['override_readnoise'] = 'myfile.fits' # Read noise used by jump step
#det1dict['ramp_fit']['override_readnoise'] = 'myfile.fits' # Read noise used by ramp fitting step

# Turn on multi-core processing (off by default).  Choose what fraction of cores to use (quarter, half, or all)
det1dict['jump']['maximum_cores'] = 'half'

# Alter parameters to optimize removal of snowball residuals (example)
#det1dict['jump']['expand_large_events'] = True
#det1dict['charge_migration']['signal_threshold'] = X

In [None]:
uncal_files = sorted(glob.glob(os.path.join(uncal_dir, '*_uncal.fits')))

# Run Detector1 stage of pipeline, specifying:
#   output directory to save *_rateints.fits files
#   save_results flag set to True so the files are saved locally
if dodet1:
    for uncal in uncal_files:
        rate_result = Detector1Pipeline.call(uncal, output_dir=det1_dir, steps=det1dict, save_results=True)
else:
    print('Skipping Detector1 processing')

In [None]:
# Print information about each rate file
rate_files = sorted(glob.glob(os.path.join(det1_dir, "*rate.fits")))

for file_num, ratefile in enumerate(rate_files):
    rate_hdr = fits.getheader(ratefile) # Primary header for each rate file
    
    # information we want to store that might be useful to us later for evaluating the data
    temp_hdr_dict = {"PATHNAME": ratefile, # full path to the filename to be used later
                     "FILENAME": rate_hdr['FILENAME'],
                     "FILTER": [rate_hdr["FILTER"]], # Grism; GR150R/GR150C
                     "PUPIL": [rate_hdr["PUPIL"]], # Filter used; F090W, F115W, F140M, F150W F158M, F200W
                     "EXPSTART": [rate_hdr['EXPSTART']], # Exposure start time (MJD)
                     "PATT_NUM": [rate_hdr["PATT_NUM"]], # Position number within dither pattern for WFSS
                     "NUMDTHPT": [rate_hdr["NUMDTHPT"]], # Total number of points in entire dither pattern
                     "XOFFSET": [rate_hdr["XOFFSET"]], # X offset from pattern starting position for NIRISS (arcsec)
                     "YOFFSET": [rate_hdr["YOFFSET"]], # Y offset from pattern starting position for NIRISS (arcsec)
                     "CAL_VER": [rate_hdr["CAL_VER"]], # JWST pipeline calibration version
                     }

    # Turn the dictionary into a pandas dataframe to make it easier to read
    if file_num == 0:
        # if this is the first file, make an initial dataframe
        rate_df = pd.DataFrame(temp_hdr_dict)
    else:
        # otherwise, append to the dataframe for each file
        new_data_df = pd.DataFrame(temp_hdr_dict)
        # merge the two dataframes together to create a dataframe with all 
        rate_df = pd.concat([rate_df, new_data_df], ignore_index=True, axis=0)

rate_dfsort = rate_df.sort_values('EXPSTART', ignore_index=False)
# Look at the resulting dataframe
rate_dfsort[['FILENAME', 'FILTER', 'PUPIL', 'EXPSTART', 'PATT_NUM', 'NUMDTHPT', 'XOFFSET', 'YOFFSET', 'CAL_VER']]

In [None]:
# Quick quality control plot to illustrate the direct and dispersed image rate files
if doviz:
    # plot set up
    fig = plt.figure(figsize=(20, 35))
    cols = 3
    rows = int(np.ceil(len(rate_dfsort['PATHNAME']) / cols))
    
    # loop over the rate files and plot them
    for plt_num, rf in enumerate(rate_dfsort['PATHNAME']):
    
        # determine where the subplot should be
        xpos = (plt_num % 40) % cols
        ypos = ((plt_num % 40) // cols) # // to make it an int.
    
        # make the subplot
        ax = plt.subplot2grid((rows, cols), (ypos, xpos))
    
        # open the data and plot it
        with fits.open(rf) as hdu:
            data = hdu[1].data
            data[np.isnan(data)] = 0 # filling in nan data with 0s to help with the matplotlib color scale.
            
            display_vals = [np.nanpercentile(data, 1), np.nanpercentile(data, 99.5)]
            ax.imshow(data, vmin=display_vals[0], vmax=display_vals[1], origin='lower')
    
            # adding in grid lines as a visual aid
            for gridline in [500, 1000, 1500]:
                ax.axhline(gridline, color='black', alpha=0.5)
                ax.axvline(gridline, color='black', alpha=0.5)
    
            ax.set_title(f"#{plt_num+1}: {hdu[0].header['FILTER']} {hdu[0].header['PUPIL']} Dither{hdu[0].header['PATT_NUM']}")

In [None]:
# Print out the time benchmark
time_det1_end = time.perf_counter()
print(f"Runtime for Detector1: {(time_det1_end - time1)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 6. Image2 Pipeline

This section focuses only on calibrating only the direct images in order to obtain a source catalog and segmentation mapping of the field to use as input into the Spec2 stage later. 

In the [Image2 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image2.html), calibrated unrectified data products are created (`*_cal.fits` files). 

In this pipeline processing stage, the [world coordinate system (WCS)](https://jwst-pipeline.readthedocs.io/en/latest/jwst/assign_wcs/index.html#assign-wcs-step) is assigned, the data are [flat fielded](https://jwst-pipeline.readthedocs.io/en/latest/jwst/flatfield/index.html#flatfield-step), and a [photometric calibration](https://jwst-pipeline.readthedocs.io/en/latest/jwst/photom/index.html#photom-step) is applied to convert from units of countrate (ADU/s) to surface brightness (MJy/sr).

By default, the [background subtraction step](https://jwst-pipeline.readthedocs.io/en/latest/jwst/background_step/index.html#background-step)
and the [resampling step](https://jwst-pipeline.readthedocs.io/en/latest/jwst/resample/index.html#resample-step) are not performed for NIRISS at this stage of the pipeline. The background subtraction is turned off since there is no background template for the imaging mode and the local background is removed during the background correction for photometric measurements around individual sources. The resampling step occurs during the `Image3` stage by default. While the resampling step can be turned on during the `Image2` stage to, e.g., generate a source catalog for each image, the data quality from the `Image3` stage will be better since the bad pixels, which adversely affect
both the centroids and photometry in individual images, will be mostly removed.

The `*_rate.fits` products will be calibrated into `*_cal.fits` files. More information about the steps performed in the Image2 part of the pipeline can be found in the [Image2 pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image2.html).

In [None]:
# Set up a dictionary to define how the Image2 pipeline should be configured.

# Boilerplate dictionary setup
image2dict = {}
image2dict['assign_wcs'], image2dict['flat_field'] = {}, {}
image2dict['photom'], image2dict['resample'] = {}, {}

# Overrides for whether or not certain steps should be skipped (example)
#image2dict['resample']['skip'] = False

# Overrides for various reference files
# Files should be in the base local directory or provide full path
#image2dict['assign_wcs']['override_distortion'] = 'myfile.asdf'  # Spatial distortion (ASDF file)
#image2dict['assign_wcs']['override_filteroffset'] = 'myfile.asdf'  # Imager filter offsets (ASDF file)
#image2dict['assign_wcs']['override_specwcs'] = 'myfile.asdf'  # Spectral distortion (ASDF file)
#image2dict['assign_wcs']['override_wavelengthrange'] = 'myfile.asdf'  # Wavelength channel mapping (ASDF file)
#image2dict['flat_field']['override_flat'] = 'myfile.fits'  # Pixel flatfield
#image2dict['photom']['override_photom'] = 'myfile.fits'  # Photometric calibration array

Find and sort all of the input files, ensuring use of absolute paths

In [None]:
sstring = os.path.join(det1_dir, 'jw*rate.fits')  # Use files from the detector1 output folder
rate_files = sorted(glob.glob(sstring))
filetype = np.repeat('UNKNOWNUNKNOWN', len(rate_files))
for ii in range(0, len(rate_files)):
    rate_files[ii] = os.path.abspath(rate_files[ii])
    hdr = getheader(rate_files[ii])
    filetype[ii] = hdr['EXP_TYPE']
rate_files = np.array(rate_files)

# Select only NIS_IMAGE files
indx = np.where(filetype == 'NIS_IMAGE')
rate_files = rate_files[indx]

print(f"Found  {str(len(rate_files))} science imaging files")

In [None]:
# Run Image2 stage of pipeline, specifying:
# output directory to save *_cal.fits files
# save_results flag set to True so the rate files are saved

if doimage2:
    for rate in rate_files:
        img2 = Image2Pipeline.call(rate, output_dir=image2_dir, steps=image2dict, save_results=True)
else:
    print("Skipping Image2 processing.")

In [None]:
# Print out the time benchmark
time_image2_end = time.perf_counter()
print(f"Runtime for Image2: {(time_image2_end - time_det1_end)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 7. Image3 Pipeline

In this section we continue calibrating the direct images with the Image3 stage of the pipeline to obtain a source catalog and segmentation mapping of the field to use as input into the Spec2 stage later. In the [Image3 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image3.html), the individual `*_cal.fits` files for each of the dither positions are combined to one single distortion corrected image (`*_i2d.fits` files).

By default, the Image3 stage of the pipeline performs the following steps on NIRISS data:
* [tweakreg](https://jwst-pipeline.readthedocs.io/en/latest/jwst/tweakreg/README.html) - creates source catalogs of pointlike sources for each input image. The source catalog for each input image is compared to each other to derive coordinate transforms to align the images relative to each other.
* As of CRDS context jwst_1156.pmap and later, the pars-tweakreg parameter reference file for NIRISS performs an absolute astrometric correction to GAIA data release 3 by default (i.e., the abs_refcat parameter is set to GAIADR3). Though this default correction generally improves results compared with not doing this alignment, it could potentially result in poor performance in crowded or sparse fields, so users are encouraged to check astrometric accuracy and revisit this step if necessary.
* As of pipeline version 1.14.0, the default source finding algorithm for NIRISS is IRAFStarFinder which testing shows returns good accuracy for undersampled NIRISS PSFs at short wavelengths ([Goudfrooij 2022](https://www.stsci.edu/files/live/sites/www/files/home/jwst/documentation/technical-documents/_documents/JWST-STScI-008324.pdf)).
* [skymatch](https://jwst-pipeline.readthedocs.io/en/latest/jwst/skymatch/description.html) - measures the background level from the sky to use as input into the subsequent outlier detection and resample steps.
* outlier detection - flags any remaining cosmic rays, bad pixels, or other artifacts not already flagged during the detector1 stage of the pipeline, using all input images to create a median image so that outliers in individual images can be identified.
* [resample](https://jwst-pipeline.readthedocs.io/en/latest/jwst/resample/main.html) - resamples each input image based on its WCS and distortion information and creates a single undistorted image.
* [source catalog](https://jwst-pipeline.readthedocs.io/en/latest/jwst/source_catalog/main.html) - creates a catalog of detected sources along with measured photometries and morphologies (i.e., point-like vs extended). Useful for quicklooks, but optimization is likely needed for specific science cases, which is an on-going investigation for the NIRISS team. Users may wish to experiment with changing the snr_threshold and deblend options. Modifications to the following parameters will not significantly improve data quality and it is advised to keep them at their default values: aperture_ee1, aperture_ee2, aperture_ee3, ci1_star_threshold, ci2_star_threshold.

In [None]:
# Set up a dictionary to define how the Image3 pipeline should be configured
# Boilerplate dictionary setup
image3dict = {}
image3dict['assign_mtwcs'], image3dict['tweakreg'], image3dict['skymatch'] = {}, {}, {}
image3dict['outlier_detection'], image3dict['resample'], image3dict['source_catalog'] = {}, {}, {}

# Overrides for whether or not certain steps should be skipped (example)
#image3dict['outlier_detection']['skip'] = True

# Example parameters for the source_catalog step
#image3dict['source_catalog']['kernel_fwhm'] = 5.0
#image3dict['source_catalog']['snr_threshold'] = 10.0
#image3dict['source_catalog']['npixels'] = 50
#image3dict['source_catalog']['deblend'] = True

# Example parameters for the tweakreg step
#image3dict['tweakreg']['snr_threshold'] = 20
#image3dict['tweakreg']['abs_refcat'] = 'GAIADR3'
#image3dict['tweakreg']['searchrad'] = 3.0,
#image3dict['tweakreg']['kernel_fwhm'] = 2.302
#image3dict['tweakreg']['fitgeometry'] = 'shift'


# Overrides for various reference files
# Files should be in the base local directory or provide full path
#image3dict['source_catalog']['override_apcorr'] = 'myfile.fits'  # Aperture correction parameters
#image3dict['source_catalog']['override_abvegaoffset'] = 'myfile.asdf'  # Data to convert from AB to Vega magnitudes (ASDF file)

Find and sort all of the input files, ensuring use of absolute paths

In [None]:
# Science Files need the cal.fits files
sstring = os.path.join(image2_dir, 'jw*cal.fits')
cal_files = sorted(glob.glob(sstring))
for ii in range(0, len(cal_files)):
    cal_files[ii] = os.path.abspath(cal_files[ii])
cal_files = np.array(cal_files)

print(f'Found {str(len(cal_files))} science imaging files to process')

### Create Association Files

An association file lists the exposures to calibrated together in `Stage 3`
of the pipeline. Note that an association file is available for download
from MAST, with a filename of `*_asn.json`. Here we show how to create an
association file to point to the data products created when processing data
through the pipeline. Note that the output products will have a rootname
that is specified by the `product_name` in the association file. For
this tutorial, the rootname of the output products will be
`image3_association`.

In [None]:
# Create Level 3 Associations for each pupil (blocking filter) type
if doimage3:
    # What are the PUPIL values of all the images?
    pupil = np.repeat('UNKNOWN', len(cal_files))
    for ii in range(0, len(cal_files)):
        hdr = getheader(cal_files[ii])
        pupil[ii] = hdr['PUPIL']
        
    # What were the unique values of PUPIL?
    uqpupil = np.unique(pupil)
    
    # Loop over unique pupil values
    for thispupil in uqpupil:
        indx = np.where(pupil == thispupil)[0]
        these_files = cal_files[indx]
        hdr = getheader(these_files[0])
        pid = hdr['PROGRAM']
        obs = hdr['OBSERVTN']
        filt = hdr['FILTER']
        instrum = hdr['INSTRUME']
        product_name = 'jw' + pid + '-' + obs + '_' + instrum + '_' + filt + '-' + thispupil
        asn_filename = product_name + '_image3_asn.json'
    
        association = asn_from_list.asn_from_list(these_files, rule=DMS_Level3_Base,
                                                  product_name=product_name)
    
        association.data['asn_type'] = 'image3'
        association.data['program'] = pid
    
        # Format association as .json file
        _, serialized = association.dump(format="json")

        # Write out association file
        association_im3 = os.path.join(sci_dir, asn_filename)
        with open(association_im3, "w") as fd:
            fd.write(serialized)

### Run Image3

In Image3, the `*_cal.fits` individual pointing files will be calibrated into a single combined `*_i2d.fits` image. More information about the steps performed in the Image3 part of the pipeline can be found in the [Image3 pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image3.html).

In [None]:
# Run Stage 3
if doimage3:
    asn_files = np.array(sorted(glob.glob(sci_dir + '/*image3_asn.json')))
    for asn in asn_files:
        img3 = Image3Pipeline.call(asn, output_dir=image3_dir, steps=image3dict, save_results=True)
else:
    print('Skipping Image3 processing')

In [None]:
# Print out the time benchmark
time_image3_end = time.perf_counter()
print(f"Runtime for Image3: {(time_image3_end - time_image2_end)/60:0.0f} minutes")

In [None]:
# Find the outputs of the Image3 pipeline, which will be needed for processing the spectral data
# Print which outputs were found for reference

# Combined image over multiple dithers/mosaic
image3_i2d = np.array(sorted(glob.glob(os.path.join(image3_dir, '*i2d.fits'))))
print('Direct images:')
for file in image3_i2d:
    print(file.split('/')[-1])

# Segmentation map that defines the extent of a source
image3_segm = np.array(sorted(glob.glob(os.path.join(image3_dir, '*segm.fits'))))
print('Segmentation maps:')
for file in image3_segm:
    print(file.split('/')[-1])
    
# Source catalog that defines the RA/Dec of a source at a particular pixel
image3_cat = np.array(sorted(glob.glob(os.path.join(image3_dir, '*cat.ecsv'))))
print('Source catalogs:')
for file in image3_cat:
    print(file.split('/')[-1])

<hr style="border:1px solid gray"> </hr>

## 8. Visualize the source catalog

Using the `*_i2d.fits` combined image and the source catalog produced by Image3, we can visually inspect if we're happy with where the pipeline found the sources to use in the Spec2 stage of the pipeline. In the following figures, what has been defined as an extended source by the pipeline is shown in light blue, and what has been defined as a point source by the pipeline is shown in light red. This definition affects the extraction box in the WFSS images as well as in the contamination correction step of the pipeline.

The segmentation maps are also a product of the Image3 pipeline, and they are used the help determine the source catalog. Let's take a look at those to ensure we are happy with what it is defining as a source.

In the segmentation map on the figure to the right, each blob should correspond to a physical target. There are cases where sources can be blended, in which case the parameters for making the segmentation map and source catalog should be changed. If using the demo data, an example of this can be seen in the Observation 004 F200W filter image where two galaxies at ~(1600, 1300) have been blended into one source. This is discussed in more detail in the custom Image3 run in the [NIRISS WFSS JDAT notebooks](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

In [None]:
if doviz:            
    cols = 2
    rows = len(image3_i2d)
    
    fig = plt.figure(figsize=(15, 15*(rows/2)))
    
    for plt_num, img in enumerate(np.sort(np.concatenate([image3_segm, image3_i2d]))):
    
        # determine where the subplot should be
        xpos = (plt_num % 40) % cols
        ypos = ((plt_num % 40) // cols) # // to make it an int.
    
        # make the subplot
        ax = plt.subplot2grid((rows, cols), (ypos, xpos))
    
        if 'i2d' in img:
            cat = Table.read(img.replace('i2d.fits', 'cat.ecsv'))
            cmap = 'gist_gray'
        else:
            cmap = 'gist_gray'
            
        # plot the image
        with fits.open(img) as hdu:
            display_vals = [np.nanpercentile(hdu[1].data, 1), np.nanpercentile(hdu[1].data, 99)]
            ax.imshow(hdu[1].data, vmin=display_vals[0], vmax=display_vals[1], origin='lower', cmap=cmap)
            title = f"{hdu[0].header['PUPIL']}"
    
        # also plot the associated catalog
        extended_sources = cat[cat['is_extended'] == 1] # 1 is True; i.e. is extended
        point_sources = cat[cat['is_extended'] == 0] # 0 is False; i.e. is a point source
        
        for color, sources, source_type in zip(['cyan', 'pink'], [extended_sources, point_sources], ['Extended Source', 'Point Source']):
            # plotting the sources
            ax.scatter(sources['xcentroid'], sources['ycentroid'], s=150, facecolors='None', edgecolors=color, alpha=0.9)
    
            # adding source labels 
            for i, source_num in enumerate(sources['label']):
                ax.annotate(source_num, 
                            (sources['xcentroid'][i]+0.5, sources['ycentroid'][i]+0.5), 
                            fontsize=10,
                            color=color)
            ax.scatter(-999, -999, label=source_type, s=20, facecolors='None', edgecolors=color, alpha=0.9)
        if 'i2d' in img:
            ax.set_title(f"{title} combined image\n(i2d)")
        else:
            ax.set_title(f"{title} segmentation map\n(segm)")
        
        # zooming in on a smaller region
        ax.set_xlim(1250, 1750)
        ax.set_ylim(1250, 1750)

        ax.legend(framealpha=0.6)
    
    # Helps to make the axes not overlap ; you can also set this manually if this doesn't work
    plt.tight_layout()

In addition to the segmentation mapping, the source catalog itself can be useful to look at to examine the source centroids, calculated fluxes, and source extents

In [None]:
# Print a source catalogs to illustrate the contents
cat = Table.read(image3_cat[0])
cat

In all likelihood, you will need to rerun Image3 with different parameters in order to return an optimal source catalog to use with your NIRISS WFSS data. You can additionally refine the source catalog so that Spec2 and Spec3 only run on the sources that you care most about. Some examples of this can be found in the [NIRISS WFSS JDAT notebooks](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

<hr style="border:1px solid gray"> </hr>

## 9. Spec2 Pipeline

After running Image3 and thus getting the the segmentation map and source catalog, the [Spec2 pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec2.html#calwebb-spec2) is ready to be run. The spec2 pipeline first runs [assign_wcs](https://jwst-pipeline.readthedocs.io/en/latest/jwst/assign_wcs/main.html), [background](https://jwst-pipeline.readthedocs.io/en/latest/jwst/background_subtraction/description.html), and [flat_field](https://jwst-pipeline.readthedocs.io/en/latest/jwst/flatfield/main.html) corrections first on the full-frame `*_rate.fits` files. The [srctype](https://jwst-pipeline.readthedocs.io/en/latest/jwst/srctype/description.html) step is run to determine the extent of the extraction box size before the [extract_2d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_2d/main.html) step is run, producing individual cutouts for the brightest 100 sources defined in the Image3 source catalog. The [wfss_contam](https://jwst-pipeline.readthedocs.io/en/latest/jwst/wfss_contam/description.html) step is run towards the end of the [extract_2d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_2d/main.html) step and is currently not on by default as the step is being improved. The [photom](https://jwst-pipeline.readthedocs.io/en/latest/jwst/photom/main.html) step is then run on the cutouts, producing flux calibrated 2-D spectral (`*_cal.fits`) files. The [extract_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_1d/description.html) step is run last, producing level 2 `*_x1d.fits` files.

In [None]:
# Set up a dictionary to define how the Spec2 pipeline should be configured.

# -------------------------Boilerplate dictionary setup-------------------------
spec2dict = {}
spec2dict['assign_wcs'], spec2dict['badpix_selfcal'] = {}, {}
spec2dict['msa_flagging'], spec2dict['nsclean'] = {}, {}
spec2dict['imprint_subtract'], spec2dict['bkg_subtract'] = {}, {}
spec2dict['srctype'], spec2dict['wavecorr'] = {}, {}
spec2dict['flat_field'], spec2dict['pathloss'] = {}, {}
spec2dict['photom'], spec2dict['pixel_replace'] = {}, {}
spec2dict['cube_build'], spec2dict['extract_1d'] = {}, {}

# ---------------------------Override reference files---------------------------

# Overrides for various reference files (example).
# Files should be in the base local directory or provide full path.
#spec2dict['extract_1d']['override_extract1d'] = 'myfile.json'

# -----------------------------Set step parameters------------------------------

# Overrides for whether or not certain steps should be skipped (example).
#spec2dict['bkg_subtract']['skip'] = True
#spec2dict['bkg_subtract']['save_results'] = True # save background subtracted full-frame images
#spec2dict['wfss_contam']['skip'] = False # uncomment to turn on contamination correction

### Create Association File

As with the imaging part of the pipeline, there are association files for spec2. These are a bit more complex in that they need to have the science (WFSS) data, direct image, source catalog, and segmentation map included as members. For the science data, the rate files are used as inputs, similarly to Image2. Also like Image2, there should be one association file for each dispersed image dither position in an observing sequence. In this case, that should match the number of rate files where `FILTER=GR150R` or `FILTER=GR150C`.

Here we define a function to create the necessary association files for each dispersed image.

In [None]:
def writel2asn(scifile, dimagefiles, catalogfiles, segmfiles, prodname):
    # Define the basic association of science files
    asn = asn_from_list.asn_from_list([scifile], rule=DMSLevel2bBase, product_name=prodname)  # Wrap in array since input was single exposure

    # Which pupil element (blocking filter) does the dispersed image use?
    scihdr = getheader(scifile)
    pupil = scihdr['PUPIL']
    
    # Ensure that the direct image uses the same pupil (e.g., in case there were multiple)
    dimage_pupil = np.repeat('UNKNOWN', len(dimagefiles))
    for ii in range(0, len(dimagefiles)):
        hdr = getheader(dimagefiles[ii])
        dimage_pupil[ii] = hdr['PUPIL']
    thisdimage = dimagefiles[dimage_pupil == pupil]
    
    # Ensure that the segmentation map uses the same pupil (e.g., in case there were multiple)
    segmap_pupil = np.repeat('UNKNOWN', len(segmfiles))
    for ii in range(0, len(segmfiles)):
        hdr = getheader(segmfiles[ii])
        segmap_pupil[ii] = hdr['PUPIL']
    thissegm = segmfiles[segmap_pupil == pupil]
    
    # Ensure that the catalog uses the same pupil (e.g., in case there were multiple)
    # No metadata in ECSV file, so require the pupil name in the filename
    cat_pupil = np.repeat(False, len(segmfiles))
    for ii in range(0, len(catalogfiles)):
        if pupil in catalogfiles[ii]:
            cat_pupil[ii] = True
    thiscat = catalogfiles[cat_pupil == True]
    
    print(catalogfiles[0])
    
    # Add the direct image, catalog, and segmentation files
    asn['products'][0]['members'].append({'expname': thisdimage[0], 'exptype': 'direct_image'})
    asn['products'][0]['members'].append({'expname': thiscat[0], 'exptype': 'sourcecat'})
    asn['products'][0]['members'].append({'expname': thissegm[0], 'exptype': 'segmap'})
    
    asnfile = os.path.join(sci_dir, scifile.split('/')[-1].replace('rate.fits', 'spec2_asn.json'))

    # Write the association to a json file
    _, serialized = asn.dump()
    with open(asnfile, 'w') as outfile:
        outfile.write(serialized)
        
    return asnfile

Find and sort all of the input files, ensuring use of absolute paths

In [None]:
# Find the input dispersed spectra files from the detector1 pipeline stage
sstring = os.path.join(det1_dir, 'jw*rate.fits')  # Use files from the detector1 output folder
rate_files = sorted(glob.glob(sstring))
filetype = np.repeat('UNKNOWNUNKNOWN', len(rate_files))
for ii in range(0, len(rate_files)):
    rate_files[ii] = os.path.abspath(rate_files[ii])
    hdr = getheader(rate_files[ii])
    filetype[ii] = hdr['EXP_TYPE']
rate_files = np.array(rate_files)
# Select only NIS_WFSS files
indx = np.where(filetype == 'NIS_WFSS')
rate_files = rate_files[indx]

print(f"Found  {str(len(rate_files))} science WFSS files")
print(f"Found  {str(len(image3_i2d))} direct image files")
print(f"Found  {str(len(image3_cat))} catalog files")
print(f"Found  {str(len(image3_segm))} segmentation map files")

In [None]:
if dospec2:
    os.chdir(image3_dir) # This is necessary since the pipeline looks in the current directory for the catalog
    for file in rate_files:
        asnfile = writel2asn(file, image3_i2d, image3_cat, image3_segm, 'Level2')
        Spec2Pipeline.call(asnfile, steps=spec2dict, save_results=True, output_dir=spec2_dir)
else:
    print('Skipping Spec2 processing for SCI data')

In [None]:
# Print out the time benchmark
time_spec2_end = time.perf_counter()
print(f"Runtime for Spec2: {(time_spec2_end - time_image3_end)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 10. Spec3 Pipeline

NIRISS WFSS data are minimally processed through the [Spec3 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec3.html) to combine calibrated data from multiple dithers within an observation. The spec3 products are unique for a specific grism and blocking filter combination; the different grism data are not combined by default. As of pipeline version 1.19.1, the level 3 source-based `*_cal.fits` files created in this step in the [exp_to_source](https://jwst-pipeline.readthedocs.io/en/latest/jwst/exp_to_source/main.html) step are no longer saved by default, and the `*_x1d.fits` files created in the [extract_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_1d/description.html) and the `*_c1d.fits` files created in the [combine_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/combine_1d/description.html) step are now saved as a single file per grism and filter combination with all of the extracted sources contained within that file.

In [None]:
# Set up a dictionary to define how the Spec3 pipeline should be configured.

# -------------------------Boilerplate dictionary setup-------------------------
spec3dict = {}
spec3dict['assign_mtwcs'], spec3dict['master_background'] = {}, {}
spec3dict['outlier_detection'], spec3dict['pixel_replace'] = {}, {}
spec3dict['cube_build'], spec3dict['extract_1d'] = {}, {}

# ---------------------------Override reference files---------------------------

# Overrides for various reference files.
# Files should be in the base local directory or provide full path.
#spec3dict['extract_1d']['override_extract1d'] = 'myfile.json'

# -----------------------------Set step parameters------------------------------

# Overrides for whether or not certain steps should be skipped (example).
#spec3dict['outlier_detection']['skip'] = True

In [None]:
# Find the cal.fits files
sstring = os.path.join(spec2_dir, 'jw*cal.fits')
cal_files = sorted(glob.glob(sstring))
for ii in range(0, len(cal_files)):
    cal_files[ii] = os.path.abspath(cal_files[ii])
cal_files = np.array(cal_files)

print(f'Found {str(len(cal_files))} science spectroscopy files to process')

### Create Association Files

There will be one spec3 association per blocking filter and grism combination, in which all of the extracted 1-D spectra within an observation with that filter and grism combination are coadded into a single spectrum for each source. If using only one blocking filter (e.g., F200W) with both grisms (GR150R & GR150C) for example, we would expect two spec3 association files, each of which contains all of the corresponding cal.fits files to combine.

In [None]:
# Create Level 3 Associations for each pupil (blocking filter) type
if dospec3:
    # What are the PUPIL values of all the images?
    pupil = np.repeat('UNKNOWN', len(cal_files))
    for ii in range(0, len(cal_files)):
        hdr = getheader(cal_files[ii])
        pupil[ii] = hdr['PUPIL']
        
    # What were the unique values of PUPIL?
    uqpupil = np.unique(pupil)
    
    # Loop over unique pupil values
    for thispupil in uqpupil:
        pindx = np.where(pupil == thispupil)[0]
        temp_files = cal_files[pindx]
        
        # What were the values of FILTER (e.g., GR160R vs GR150C)?
        filt = np.repeat('UNKNOWN', len(temp_files))
        for ii in range(0, len(temp_files)):
            hdr = getheader(temp_files[ii])
            filt[ii] = hdr['FILTER']
            
        # What were the unique values of FILTER?
        uqfilter = np.unique(filt)
        
        # Loop over unique filter values
        for thisfilter in uqfilter:
            findx = np.where(filt == thisfilter)[0]
            these_files = temp_files[findx]
        
            hdr = getheader(these_files[0])
            pid = hdr['PROGRAM']
            obs = hdr['OBSERVTN']
            instrum = hdr['INSTRUME']
            product_name = 'jw' + pid + '-' + obs + '_' + instrum + '_' + thisfilter + '-' + thispupil
            asn_filename = product_name + '_spec3_asn.json'
    
            association = asn_from_list.asn_from_list(these_files, rule=DMS_Level3_Base,
                                                      product_name=product_name)
    
            association.data['asn_type'] = 'spec3'
            association.data['program'] = pid
    
            # Format association as .json file
            _, serialized = association.dump(format="json")

            # Write out association file
            association_im3 = os.path.join(sci_dir, asn_filename)
            with open(association_im3, "w") as fd:
                fd.write(serialized)
                
    # Get info about all associations that we created
    asn_files = np.array(sorted(glob.glob(sci_dir + '/*spec3_asn.json')))
    print('Using', len(asn_files), 'associations:')
    for asn in asn_files:
        print(asn.split('/')[-1])

In [None]:
# Run Stage 3
if dospec3:
    os.chdir(spec3_dir) # This seems necessary to get spec3 results into the right place
    for asn in asn_files:
        spec3 = Spec3Pipeline.call(asn, output_dir=spec3_dir, steps=spec3dict, save_results=True)
else:
    print('Skipping Spec3 processing')

In [None]:
# Print a list of the spec3 output x1d and c1d files
spec3_x1ds = sorted(glob.glob(os.path.join(spec3_dir, "jw*x1d.fits")))
print('X1d files:')
for file in spec3_x1ds:
    print(file.split('/')[-1])

spec3_c1ds = sorted(glob.glob(os.path.join(spec3_dir, "jw*c1d.fits")))
print('C1d files:')
for file in spec3_c1ds:
    print(file.split('/')[-1])

In [None]:
# Print out the time benchmark
time_spec3_end = time.perf_counter()
print(f"Runtime for Spec3: {(time_spec3_end - time_spec2_end)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 11. Visualize the spectra

The outputs of spec3 are `*_x1d.fits` and `*_c1d.fits` files. Here we do a quick look into some important parts of these files.

Each extension of the spec3 `*_x1d.fits` files contains the extracted, 1-D spectra for an individual dither for a single grism, filter, and extracted order combination. The specific filenames and extracted order can be verified with the `FILENAME` and `SPORDER` keywords in the header of each extension respectively. Within the extension, each of the extracted sources across all dithers are listed, with the values being empty if the particular dither did not contain data for that source. Also contained within each extension is information related to the extraction of a particular source, including the extents and starting size of the extraction box in the full reference frame. More information about the columns contained withing the `*_x1d.fits` files can be found in the [x1d filetype documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html#extracted-1-d-spectroscopic-data-x1d-and-x1dints).

In [None]:
# Print information about the structure of the x1d files by reading in the first one
if doviz:
    sample_x1d = fits.open(spec3_x1ds[0])

    print("***Format of the level 3 x1d file:")
    sample_x1d.info()

    print("\n***Cal files used to create this level 3 x1d file:")
    for ext in range(len(sample_x1d))[1:-1]:
        print(f"Extension {ext}: {sample_x1d[ext].header['FILENAME']}, order {sample_x1d[ext].header['SPORDER']}")

    print("\n***Columns contained in each extension of the level 3 x1d file:")
    print(sample_x1d[1].data.columns)

The `*_c1d.fits` files contain combined extensions of the same order in the spec3 `*_x1d.fits` files into a single file. The source numbers in the `*_c1d.fits` match those in the level 3 `*_x1d.fits` files. More information about the columns contained withing the `*_c1d.fits` files can be found in the [c1d filetype documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html#combined-1-d-spectroscopic-data-c1d).

In [None]:
# Print information about the structure of the c1d files by reading in the first one
if doviz:
    sample_c1d = fits.open(spec3_c1ds[0])

    print("***Format of the c1d file:")
    sample_c1d.info()

    print("\n***Files contained in the c1d file:")
    for ext in range(len(sample_c1d))[1:-1]:
        print(f"Extension {ext}: order {sample_c1d[ext].header['SPORDER']}")
    
    print("\n***Columns contained in each extension of the c1d file:")
    print(sample_c1d[1].data.columns)

Digging a little bit further into the different source IDs and how those are handled, you can see that in each extension the source IDs are now identical.  

In [None]:
if doviz:
    for ext in np.arange(len(sample_x1d))[1:-1]:
        print(f"Extension {ext}: {sample_x1d[ext].header['FILENAME']}, Order {sample_x1d[ext].header['SPORDER']}")
        print("  Sources:\n", sample_x1d[ext].data['SOURCE_ID'])

If a source wasn't extracted for a given extension, the values will be filled in with a value of "0" or "nan". The column `N_ALONGDISP` is zero for sources that aren't extracted in certain files. `N_ALONGDISP` represents the number of pixels in the trace along the dispersion direction, so if it is zero, no pixels were used.

### Spec3 Visualization

Look at some sources, plotting the final `*_c1d.fits` files for each grism. Show the `*_i2d.fits` image for a specific source, followed by the level 3 `*_x1d.fits` individual spectra for each of the two grisms, followed by the `*_c1d.fits` combined spectrum for each of the grisms if available.

In [None]:
# make sure you have run the cells defined convienence functions section: plot_i2d_plus_source & plot_spectrum
# this cell looks at the i2d images, the level 3 x1d spectra, and the combined c1d spectra for both grisms for several sources
if doviz:
    spec3_c1ds = np.array(sorted(glob.glob(os.path.join(spec3_dir, "jw*c1d.fits"))))
    # If there are multiple pupils (blocking filters) pick one for illustration
    pupil = np.repeat('UNKNOWN', len(spec3_c1ds))
    for ii in range(0, len(spec3_c1ds)):
        hdr = getheader(spec3_c1ds[ii])
        pupil[ii] = hdr['PUPIL']
    uqpupil = np.unique(pupil)
    spec3_c1ds = spec3_c1ds[pupil == uqpupil[-1]]

    # define some cool sources to look at
    #sources = [417, 422, 505, 1296, 606]
    #nsources = len(sources)
    
    # or grab some sources from the first x1d file
    nsources = 4 # 100 sources are extracted by default
    source_offset = 10 # offsetting what nsources to plot
    with fits.open(spec3_c1ds[0]) as temp_c1d:
        sources = temp_c1d[1].data['SOURCE_ID'][source_offset:nsources+source_offset]

    # setting up the figure
    cols = 4
    rows = nsources
    fig_c1d = plt.figure(figsize=(15, 4*(rows/2)))

    # looping through the different sources to plot; one per row
    for nsource, source_id in enumerate(sources):

        # setting up the subplots for a single source
        ypos = nsource
        ax_i2d = plt.subplot2grid((rows, cols), (ypos, 0)) 
        ax_x1d_r = plt.subplot2grid((rows, cols), (ypos, 1))
        ax_x1d_c = plt.subplot2grid((rows, cols), (ypos, 2))
        ax_c1d = plt.subplot2grid((rows, cols), (ypos, 3))
    
        source_fluxes = [] # save the source flux to set the plot limits

        # plot all of the 1-D combined spectra from the c1d files
        for nfile, c1dfile in enumerate(spec3_c1ds):
            
            # plotting the c1d spectra
            ax_c1d, catname, source_fluxes, grism = plot_spectrum(c1dfile, source_fluxes, ax_c1d, image3_dir)
                
            # plot the level 3 x1d files
            x1dfile = c1dfile.replace('c1d', 'x1d')
            with fits.open(x1dfile) as x1d:
                for ext in range(len(x1d))[1:-1]:
                    if grism == 'GR150R':
                        ax_x1d_r, catname, source_fluxes, grism = plot_spectrum(x1dfile, source_fluxes, ax_x1d_r, image3_dir, ext=ext, legend=False)
                    else:
                        ax_x1d_c, catname, source_fluxes, grism = plot_spectrum(x1dfile, source_fluxes, ax_x1d_c, image3_dir, ext=ext, legend=False)
            
            # plot the direct image of the source based on the source number from the source catalog
            if nfile == 0:
                ax_i2d = plot_i2d_plus_source(catname, source_id, ax_i2d)

        # plot labels and such
        if len(source_fluxes):
            # there may not have been data to extract if everything was saturated
            ax_c1d.set_ylim(np.nanmin(source_fluxes), np.nanmax(source_fluxes))
            
        # Add labels to the subplots
        if nsource == 0:
            ax_i2d.set_title('Direct Image\n(i2d)')
            ax_x1d_r.set_title('Individual GR150R 1-D Spectrum\n(level 3 x1d)')
            ax_x1d_c.set_title('Individual GR150C 1-D Spectrum\n(level 3 x1d)')
            ax_c1d.set_title('Combined 1-D Spectrum\n(c1d)')
        ax_i2d.set_ylabel(f'Source\n{source_id}', fontsize=15)
            
    fig_c1d.tight_layout()
    fig_c1d.show()

<hr style="border:1px solid gray"> </hr>

<img style="float: center;" src="https://github.com/spacetelescope/jwst-pipeline-notebooks/raw/main/_static/stsci_footer.png" alt="stsci_logo" width="200px"/> 