<img style="float: center;" src='https://github.com/spacetelescope/jwst-pipeline-notebooks/raw/main/_static/stsci_header.png' alt="stsci_logo" width="900px"/> 

# NIRISS Wide Field Slitless Spectroscopy (WFSS) Pipeline Notebook

**Authors**: R. Plesha<br>
**Last Updated**: August 13, 2025<br>
**Pipeline Version**: 1.19.1 (Build 12.0)

# **Purpose**:

This notebook provides a framework for processing generic Near-Infrared Imager and Slitless Spectrograph (NIRISS) wide field slitless spectroscopy (WFSS) data through the James Webb Space Telescope (JWST) pipeline.  Data is assumed to be located in one observation folder according to paths set up below. It should not be necessary to edit any cells other than in the [Configuration](#1.-Configuration) section unless modifying the standard pipeline processing options. Additional notebooks showing how to optimize and modify sources being extracted for NIRISS WFSS data can be found on the [JDAT notebooks github](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

**Data**:
This example uses data from the [Program ID 2079](https://www.stsci.edu/jwst/science-execution/program-information?program=2079) observation 004 (PI: Finkelstein) observing the Hubble Ultra Deep Field (HUDF). The observations are in three [NIRISS filters](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-pupil-and-filter-wheels): F115W, F150W, and F200W use both GR150R and GR150C [grisms](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-gr150-grisms). In this example we are only looking at data using the F200W filter. A [NIRISS WFSS observation sequence](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-observing-strategies/niriss-wfss-recommended-strategies) typically consists of a direct image followed by a grism observation in the same blocking filter to help identify the sources in the field. In program 2079, the exposure sequence follows the pattern: direct image -> GR150R -> direct image -> GR150C -> direct image.

Example input data to use will be downloaded automatically unless disabled (i.e., to use local files instead).

**JWST pipeline version and CRDS context** This notebook was written for the calibration pipeline version given above. The JWST Calibration Reference Data System (CRDS) context used is associated with the pipeline version as listed [here](https://jwst-crds.stsci.edu/display_build_contexts/). If you use different pipeline version or CRDS context, please read the relevant release notes ([here for pipeline](https://github.com/spacetelescope/jwst), [here for CRDS](https://jwst-crds.stsci.edu/)) for possibly relevant changes.<BR>

**Updates**:
This notebook is regularly updated as improvements are made to the pipeline. Find the most up to date version of this notebook at: https://github.com/spacetelescope/jwst-pipeline-notebooks/

**Recent Changes**:<br>
August 13, 2025: original notebook released<br>

<hr style="border:1px solid gray"> </hr>

## Table of Contents
1. [Configuration](#1.-Configuration) 
2. [Package Imports](#2.-Package-Imports)
3. [Demo Mode Setup](#3.-Demo-Mode-Setup)
4. [Detector 1 Pipeline](#4.-Detector1-Pipeline)
5. [Image2 Pipeline](#5.-Image2-Pipeline)
6. [Image3 Pipeline](#6.-Image3-Pipeline)
7. [Spec2 Pipeline](#7.-Spec2-Pipeline)
8. [Spec3 Pipeline](#8.-Spec3-Pipeline)

<hr style="border:1px solid gray"> </hr>

## 1. Configuration
------------------
Set basic configuration for running notebook.

#### Install dependencies and parameters

To make sure that the pipeline version is compatabile with the steps
discussed below and the required dependencies and packages are installed,
you can create a fresh conda environment and install the provided
`requirements.txt` file:
```
conda create -n niriss_wfss_pipeline python=3.11
conda activate niriss_wfss_pipeline
pip install -r requirements.txt
```

Set the basic parameters to use with this notebook. These will affect
what data is used, where data is located (if already in disk), and
pipeline modules run in this data. The list of parameters are:

* demo_mode
* sci_dir (directory where the data is / will be)
* pipeline modules:
  * dodet1
  * doimage2
  * doimage3
  * dospec2
  * dospec3
* doviz (show visualizations of the data within the notebook)

In [None]:
# Basic import necessary for configuration
import os

<div class="alert alert-block alert-warning">
Adjust any parameters in the cell directly below this before running to ensure <code>demo_mode</code> runs correctly.
</div>

Set <code>demo_mode = True</code> to run in demonstration mode. In this mode this notebook will download example data from the Barbara A.
Mikulski Archive for Space Telescopes (MAST) and process everything through the pipeline. This will all happen in a local directory unless modified in [Section 3](#3.-Demo-Mode-Setup) below.

Set <code>demo_mode = False</code> if you want to process your own data that has already been downloaded and provide the location of the data in the `sci_dir` variable in the cell below.<br>

In [None]:
# Set parameters for demo_mode, channel, band, data mode directories, and 
# processing steps.

# -----------------------------Demo Mode---------------------------------
demo_mode = True

if demo_mode:
    print('Running in demonstration mode using online example data!')

# --------------------------User Mode Directories------------------------
# If demo_mode = False, look for user data in these paths
if not demo_mode:
    # Set directory paths for processing specific data; these will need
    # to be changed to your local directory setup (below are given as
    # examples)
    user_home_dir = os.path.expanduser('~')

    # Point to where science observation data are
    # Assumes uncalibrated data in sci_dir/uncal/ and results in stage1,
    # stage2, stage3 directories
    sci_dir = os.path.join(user_home_dir, 'nis_wfss_demo_data/2079/obs004/')

    print(f'Running using user input data from: {sci_dir}')

cwd = os.getcwd()
# --------------------------Set Processing Steps--------------------------
# Individual pipeline stages can be turned on/off here.  Note that a later
# stage won't be able to run unless data products have already been
# produced from the prior stage.

# Science processing
dodet1 = True  # calwebb_detector1
doimage2 = True  # calwebb_image2
doimage3 = True  # calwebb_image3
dospec2 = True # calwebb_spec2
dospec3 = True # calwebb_spec3
doviz = True # Visualize outputs

### Set CRDS context and server
Before importing <code>CRDS</code> and <code>JWST</code> modules, we need to configure our environment. This includes defining a CRDS cache directory in which to keep the reference files that will be used by the calibration pipeline. The pipeline will fetch and download the needed reference files to this directory.

If the root directory for the local CRDS cache directory has not been set already, it will be set to create one in the home directory.

In [None]:
# ------------------------Set CRDS context and paths----------------------

# Set CRDS context (if overriding to use a specific version of reference
# files; leave commented out to use latest reference files by default)
#%env CRDS_CONTEXT  jwst_1413.pmap

# Check whether the local CRDS cache directory has been set.
# If not, set it to the user home directory
if (os.getenv('CRDS_PATH') is None):
    os.environ['CRDS_PATH'] = os.path.join(os.path.expanduser('~'), 'crds')
# Check whether the CRDS server URL has been set.  If not, set it.
if (os.getenv('CRDS_SERVER_URL') is None):
    os.environ['CRDS_SERVER_URL'] = 'https://jwst-crds.stsci.edu'

# Echo CRDS path in use
print(f"CRDS local filepath: {os.environ['CRDS_PATH']}")
print(f"CRDS file server: {os.environ['CRDS_SERVER_URL']}")

<hr style="border:1px solid gray"> </hr>

## 2. Package Imports
------------------

In [None]:
# Basic system utilities for interacting with files
# ----------------------General Imports------------------------------------
import glob
import time
import json

# Data calculations and manipulation
import numpy as np
import pandas as pd

# -----------------------Plotting Imports----------------------------------
from matplotlib import pyplot as plt
# interactive plots within the notebook
%matplotlib inline

# -----------------------Astronomy Imports--------------------------------
# ASCII files, and downloading demo files
from astroquery.mast import MastMissions

# Astropy routines for visualizing detected sources:
from astropy.io import fits
from astropy.table import Table

# for JWST calibration pipeline
import jwst
import crds

from jwst.pipeline import Detector1Pipeline
from jwst.pipeline import Image2Pipeline
from jwst.pipeline import Image3Pipeline
from jwst.pipeline import Spec2Pipeline
from jwst.pipeline import Spec3Pipeline

# JWST pipeline utilities
from jwst import datamodels

# Echo pipeline version and CRDS context in use
print(f"JWST Calibration Pipeline Version: {jwst.__version__}")
print(f"Using CRDS Context: {crds.get_context_name('jwst')}")

### Define convenience functions

#### Plotting Spec2 & Spec3 convenience functions

In [None]:
# this function will be used to plot the i2d image for a specific source as well as the catalog x/y centroid for that source
def plot_i2d_plus_source(catname, source_id, ax):
    # open the i2d & catalog and find the associated source number            
    i2dname = catname.replace('cat.ecsv', 'i2d.fits')
    
    cat = Table.read(catname)
    cat_line = cat[cat['label'] == source_id]
    
    # plot the image
    with fits.open(i2dname) as i2d:
        ax_i2d.imshow(i2d[1].data, vmin=0, vmax=0.3, origin='lower', cmap='gist_gray')
    
    # plot up the source catalog
    xcentroid = cat_line['xcentroid'][0]
    ycentroid = cat_line['ycentroid'][0]
    ax.set_xlim(xcentroid-20, xcentroid+20)
    ax.set_ylim(ycentroid-20, ycentroid+20)
    ax.scatter(xcentroid, ycentroid, s=20, facecolors='None', edgecolors='black', alpha=0.9)
    ax.annotate(source_id, 
                (xcentroid+0.5, ycentroid+0.5), 
                fontsize=10,
                color='black')
    
    return ax

In [None]:
# this function is used to plot the wavelength vs. flux values for x1d & c1d spectra for a specific source
def plot_spectrum(specfile, source_fluxes, ax, sci_dir, ext=1, wavemin=1.75, wavemax=2.2, legend=True):
    
    with fits.open(specfile) as spec:

        # pull out relevant keywords
        grism = spec[0].header['FILTER']
        catname = os.path.join(sci_dir, spec[0].header['SCATFILE'])
        try:
            label = f"{grism} dither {spec[0].header['DIT_PATT']}"
        except KeyError:
            label = f"{grism}" # there is no dither in the c1d files

        # find where in the file the source data are
        wh_spec_source = np.where(spec[ext].data['SOURCE_ID'] == source_id)[0]
        
        # if the source isn't in the file, then return a blank axis
        if not len(wh_spec_source):
            print(f'Source {source_id} not found in {specfile}')
            return ax, catname, source_fluxes, grism
                  
        # grab the wavelength & flux data and trim off the edges for visalization purposes
        wave = spec[ext].data['WAVELENGTH'][wh_spec_source]
        flux = spec[ext].data['FLUX'][wh_spec_source]
        
        wh_wave = np.where((wave >= wavemin) & (wave <= wavemax)) # cutting off the edges
        wave = wave[wh_wave]
        flux = flux[wh_wave]
        
        source_fluxes.extend(flux) # keep the flux to set the limits of the plot later
    
    if grism == 'GR150R':
        linestyle = '-'
    else:
        linestyle = '--'

    ax.plot(wave, flux, label=label, ls=linestyle)
    if legend:
        ax.legend(bbox_to_anchor=(1,1))

    return ax, catname, source_fluxes, grism

In [None]:
# this plots the 2-D spectra for a single source. It flips the data & axes so that it appears that dispersion is increasing to the right
def plot_spec2_cal(x1dfile, source_id, ax, transpose=False, vmin=0, vmax=10):

    cal_file = x1dfile.replace('x1d.fits', 'cal.fits')
    with fits.open(cal_file) as cal_hdu:
        wh_cal = find_source_ext(cal_hdu, source_id)

         # if the source isn't in the file, then return a blank axis
        if wh_cal == -999:
            print(f'Source {source_id} not found in {cal_file}')
            return ax
            
        if transpose is True:
            # we flip the GR150R data so that we can look at the two cal images along the same dispersion axis
            cal_data = np.transpose(cal_hdu[wh_cal].data)
        else:
            cal_data = cal_hdu[wh_cal].data

        if np.nanmedian(cal_data) > vmax:
            vmax = np.nanmedian(cal_data) + np.nanmedian(cal_data)*0.2
            
        ax.imshow(cal_data, vmin=vmin, vmax=vmax, origin='lower', aspect='auto')

        # the dispersion is in the -x direction, so flip the axis for ease in visualization
        ax.invert_xaxis()

                
    return ax

#### Other convienence functions

In [None]:
# this function will print out the pipeline steps run on a given file
def check_steps_run(filename):
    
    # Read in file as datamodel
    dm = datamodels.open(filename)
    
    # Check which steps were run
    print(f"{dm.meta.filename} - {dm.meta.exposure.type}")
    for step, status in dm.meta.cal_step.instance.items():
        print(f"{step}: {status}")
    print()

In [None]:
# this function will print out the reference files used on a given file
def check_ref_file_used(filename):

    # Read in file as datamodel
    dm = datamodels.open(filename)

    # Check which reference files were used
    print(f"{dm.meta.filename} - {dm.meta.exposure.type}")
    for step, status in dm.meta.ref_file.instance.items():
        print(f"{step}: {status}")
    print()

In [None]:
# a function to use to find the extension the source is located in the cal files
def find_source_ext(cal_hdu, source_id, info=True):    
    # look for cal extension, too, but only in the SCI extension; 
    # fill in with a source ID of -999 for all other extensions to get the right extension value
    cal_source_ids = np.array([cal_hdu[ext].header['SOURCEID'] if cal_hdu[ext].header['EXTNAME'] == 'SCI'
                               else -999 for ext in range(len(cal_hdu))[1:-1]]) 

    try:
        wh_cal = np.where(cal_source_ids == source_id)[0][0] + 1 # need to add 1 for the primary header
    except IndexError:
        # this source doesn't exist
        return -999

    if info:
        print(f"Extension {wh_cal} in {cal_hdu[0].header['FILENAME']} contains the data for source {source_id} from our catalog")

    return wh_cal

<hr style="border:1px solid gray"> </hr>

## 3. Demo Mode Setup
#### (skip if not using demo data)
------------------

If running in demonstration mode, set up the program information to retrieve the uncalibrated data automatically from MAST using [astroquery](https://astroquery.readthedocs.io/en/latest/mast/mast.html). Here we will be using the [MastMissions](https://spacetelescope.github.io/mast_notebooks/notebooks/multi_mission/missions_mast_search/missions_mast_search.html) interface which allows for flexibility in search criteria, and is equivalent to using the [JWST mission specific search](https://mast.stsci.edu/search/ui/#/jwst) on MAST. <br>

For illustrative purposes, we focus on data taken through the NIRISS [F200W filter](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-filters) and start with uncalibrated data products. To search for additional filters, update the `niriss_pupil` field in `query_criteria` to be a comma separated list of filters in a single string value, i.e. "F200W, F115W". To search for a specific grism used, add the `opticalElements` field in `query_criteria`, setting the value equal to "GR150R" or "GR150C". Note that searching based on a specific grism will not download the associated direct images.

Information about the JWST file naming conventions can be found at: https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/file_naming.html

In [None]:
time_download_start = time.perf_counter()

In [None]:
# Set up the program information and paths for demo program
if demo_mode:
    print('Running in demonstration mode and will download example data from MAST')
    program = 2079
    sci_observtn = '004'
    
    # creating a directory for the data called "nis_wfss_demo_data" 
    #   located in the directory you are currently in
    data_dir = os.path.join(cwd, 'nis_wfss_demo_data')
    sci_dir = os.path.join(data_dir, f"{program}/obs{sci_observtn}")

    # Create the directories if they do not exist
    os.makedirs(sci_dir, exist_ok=True)

<div class="alert alert-block alert-warning">
This demo selects only filter <b>F200W</b> data by default; this observation contains data for the F115W and F150W filters, too
</div>

In [None]:
if demo_mode:
    print(f'Using the Missions MAST interface to find data for Program {program} observation {sci_observtn}:')
    missions = MastMissions(mission='jwst')

    # query the data; sometimes this step can take a bit of time
    datasets = missions.query_criteria(instrume='NIRISS',  # From Near-Infrared Imager and Slitless Spectrograph
                                       #opticalElements='GR150R', # uncomment to filter on only GR150R grism data (no direct images)
                                       niriss_pupil='F200W',  # Download only the F200W filter data for this example
                                       program=program,  # Proposal number 2079
                                       observtn=sci_observtn, # observation 004
                                      )
    products = missions.get_unique_product_list(datasets)
    print(f'Total number of unique products found: {len(products)}')

    # filter down to only the files that we need from MAST
    files_to_download = missions.filter_products(products, file_suffix=['_uncal'])
    asns_to_download = missions.filter_products(products, file_suffix=['_asn']) # '_pool' to download the pool.csv file instead
    
    print(f'Total number of uncal files to download: {len(files_to_download)}')
    print(f'Total number of associations to download: {len(asns_to_download)}')

Download all the uncal and association files for the provided program, observation, and filter.

<div class="alert alert-block alert-warning">
Warning: If this notebook is halted during this step the downloaded file
may be incomplete, and cause crashes later on!
</div>

In [None]:
if demo_mode:
    print('Downloading the data:')
    # download uncal file
    manifest = missions.download_products(files_to_download, flat=True, download_dir=sci_dir)
    
    # download the association files to the top level science directory to move around in the directory setup
    asns_manifest = missions.download_products(asns_to_download, flat=True, download_dir=sci_dir)

There is currently a bug in the downloading through the MAST missions interface where all filters are downloaded for level 3 products, so we temporarily remove the F115W and F150W image3 and spec3 association files that were downloaded so that we do not see crashes when running the image3 pipeline and spec3 pipeline.

In [None]:
if demo_mode:
    for spec3_asn in glob.glob(os.path.join(sci_dir, '*spec3*_asn.json')) + glob.glob(os.path.join(sci_dir, '*image3*_asn.json')):
        temp_asn = json.load(open(spec3_asn))
        # check the associations to see which have the F200W filter in the output product name
        #   we look at what is in a spec3 association file later in the notebook
        if 'f200w' not in temp_asn['products'][0]['name']:
            os.remove(spec3_asn)

In [None]:
# Print out the time benchmark
time_download_end = time.perf_counter()
print(f"Runtime for Detector1: {(time_download_end - time_download_start)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 4. Detector1 Pipeline
In this section we run the `*_uncal.fits` files through the [Detector1](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_detector1) stage of the pipeline to apply detector level calibrations and create a countrate data product where slopes are fit to the integration ramps. These `*_rate.fits` products are 2D (nrows x ncols), averaged over all integrations. 3D countrate data products (`*_rateints.fits`) are also created (nintegrations x nrows x ncols) which have the fitted ramp slopes for each integration.

In [None]:
time_det1_start = time.perf_counter()

In [None]:
uncal_files = sorted(glob.glob(os.path.join(sci_dir, '*_uncal.fits')))

# Run Detector1 stage of pipeline, specifying:
#   output directory to save *_rateints.fits files
#   save_results flag set to True so the files are saved locally
if dodet1:
    for uncal in uncal_files:
        rate_result = Detector1Pipeline.call(uncal,
                                             output_dir=sci_dir,
                                             save_results=True)
else:
    print('Skipping Detector1 processing')

### Inspect Level 1 Output Products
Below we look at the `*_rate.fits` file outputs of the Detector1Pipeline. If there are no modifications to the steps at this stage needed, you can also save time by downloading these `*_rate.fits` files directly from MAST and starting at stage2. However, it is best to ensure that you are using the same pipeline version as MAST which can be checked in the `CAL_VER` header keyword. 

In stage1, both the direct images (`EXP_TYPE=NIS_IMAGE`) and dispersed grism images (`EXP_TYPE=NIS_WFSS`) were calibrated. In addition to the `EXP_TYPE` keyword, the keyword `FILTER` can be used to distinguish exposure types for NIRISS WFSS data. `FILTER=CLEAR` indicates a direct image while `FILTER=GR150R` or `FILTER=GR150C` indicates a dispersed image. The keyword `PUPIL` is the blocking filter used in both direct images and dispersed images. We can also use the `PATT_NUM`, `XOFFSET`, and `YOFFSET` header keywords to see the dither pattern that was used for both the direct images and the dispersed images. The multiple direct image dithers will be combined in image3, while the multiple dithers in the dispersed images are combined as individual sources after extraction in spec3. 

In [None]:
# first look for all of the rate files you have downloaded
rate_files = sorted(glob.glob(os.path.join(sci_dir, "*rate.fits")))

for file_num, ratefile in enumerate(rate_files):

    rate_hdr = fits.getheader(ratefile) # Primary header for each rate file

    # information we want to store that might be useful to us later for evaluating the data
    temp_hdr_dict = {"PATHNAME": ratefile, # full path to the filename to be used later
                     "FILENAME": rate_hdr['FILENAME'],
                     "FILTER": [rate_hdr["FILTER"]], # Grism; GR150R/GR150C
                     "PUPIL": [rate_hdr["PUPIL"]], # Filter used; F090W, F115W, F140M, F150W F158M, F200W
                     "EXPSTART": [rate_hdr['EXPSTART']], # Exposure start time (MJD)
                     "PATT_NUM": [rate_hdr["PATT_NUM"]], # Position number within dither pattern for WFSS
                     "NUMDTHPT": [rate_hdr["NUMDTHPT"]], # Total number of points in entire dither pattern
                     "XOFFSET": [rate_hdr["XOFFSET"]], # X offset from pattern starting position for NIRISS (arcsec)
                     "YOFFSET": [rate_hdr["YOFFSET"]], # Y offset from pattern starting position for NIRISS (arcsec)
                     "CAL_VER": [rate_hdr["CAL_VER"]], # JWST pipeline calibration version
                     }

    # Turn the dictionary into a pandas dataframe to make it easier to read
    if file_num == 0:
        # if this is the first file, make an initial dataframe
        rate_df = pd.DataFrame(temp_hdr_dict)
    else:
        # otherwise, append to the dataframe for each file
        new_data_df = pd.DataFrame(temp_hdr_dict)

        # merge the two dataframes together to create a dataframe with all 
        rate_df = pd.concat([rate_df, new_data_df], ignore_index=True, axis=0)

rate_dfsort = rate_df.sort_values('EXPSTART', ignore_index=False)

# Look at the resulting dataframe
rate_dfsort[['FILENAME', 'FILTER', 'PUPIL', 'EXPSTART', 'PATT_NUM', 'NUMDTHPT', 'XOFFSET', 'YOFFSET', 'CAL_VER']]

Shown below are the rate files to give an idea of the above sequence visually. Grid lines are shown as a visual guide for the dithers

In [None]:
if doviz:
    
    # plot set up
    fig = plt.figure(figsize=(20, 35))
    cols = 3
    rows = int(np.ceil(len(rate_dfsort['PATHNAME']) / cols))
    
    # loop over the rate files and plot them
    for plt_num, rf in enumerate(rate_dfsort['PATHNAME']):
    
        # determine where the subplot should be
        xpos = (plt_num % 40) % cols
        ypos = ((plt_num % 40) // cols) # // to make it an int.
    
        # make the subplot
        ax = plt.subplot2grid((rows, cols), (ypos, xpos))
    
        # open the data and plot it
        with fits.open(rf) as hdu:
            data = hdu[1].data
            data[np.isnan(data)] = 0 # filling in nan data with 0s to help with the matplotlib color scale.
            
            ax.imshow(data, vmin=0, vmax=1.5, origin='lower')
    
            # adding in grid lines as a visual aid
            for gridline in [500, 1000, 1500]:
                ax.axhline(gridline, color='black', alpha=0.5)
                ax.axvline(gridline, color='black', alpha=0.5)
    
            ax.set_title(f"#{plt_num+1}: {hdu[0].header['FILTER']} {hdu[0].header['PUPIL']} Dither{hdu[0].header['PATT_NUM']}")

Additionally, you can look into what steps were performed and reference files used during the Detector1 stage of the pipeline

In [None]:
# first look at the direct images
dir_img_rate = rate_dfsort[rate_dfsort['FILTER'] == 'CLEAR']['PATHNAME'].iloc[0]
check_steps_run(dir_img_rate)

# then look at the dispersed, grism images
grism_img_rate = rate_dfsort[rate_dfsort['FILTER'] == 'GR150C']['PATHNAME'].iloc[0]
check_steps_run(grism_img_rate)

You can also check which reference files were used to calibrate the dataset:

In [None]:
check_ref_file_used(dir_img_rate) # direct image
check_ref_file_used(grism_img_rate) # dispersed image

In [None]:
# Print out the time benchmark
time_det1_end = time.perf_counter()
print(f"Runtime for Detector1: {(time_det1_end - time_det1_start)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 5. Image2 Pipeline

This section focuses only on calibrating only the direct images in order to obtain a source catalog and segmentation mapping of the field to use as input into the Spec2 stage later. 

In the [Image2 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image2.html), calibrated unrectified data products are created (`*_cal.fits` files). 

In this pipeline processing stage, the [world coordinate system (WCS)](https://jwst-pipeline.readthedocs.io/en/latest/jwst/assign_wcs/index.html#assign-wcs-step) is assigned, the data are [flat fielded](https://jwst-pipeline.readthedocs.io/en/latest/jwst/flatfield/index.html#flatfield-step), and a [photometric calibration](https://jwst-pipeline.readthedocs.io/en/latest/jwst/photom/index.html#photom-step) is applied to convert from units of countrate (ADU/s) to surface brightness (MJy/sr).

By default, the [background subtraction step](https://jwst-pipeline.readthedocs.io/en/latest/jwst/background_step/index.html#background-step)
and the [resampling step](https://jwst-pipeline.readthedocs.io/en/latest/jwst/resample/index.html#resample-step) are not performed for NIRISS at this stage of the pipeline. The background subtraction is turned off since there is no background template for the imaging mode and the local background is removed during the background correction for photometric measurements around individual sources. The resampling step occurs during the `Image3` stage by default. While the resampling step can be turned on during the `Image2` stage to, e.g., generate a source catalog for each image, the data quality from the `Image3` stage will be better since the bad pixels, which adversely affect
both the centroids and photometry in individual images, will be mostly removed.

In [None]:
time_image2 = time.perf_counter()

### Image2 Association Files

First, we will take a look inside the association files to better understand everything that is contained in them. For image2 association files, there should be one asn file for each dither position in an observing sequence which is set by the [exposure strategy](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-observing-strategies/niriss-wfss-recommended-strategies). In this case, that should match the number of direct images (`FILTER=CLEAR`) in `rate_df` because each direct image is at a unique dither position (XOFFSET, YOFFSET) within an observing sequence. For this program and observation, there is one direct image with only one dither before the grism images, another direct image with four dithers between the change in grisms, and a direct image with three dithers at the end of a blocking filter sequence. This leads to a total of eight images per observing sequence, with five observing sequences in the observation using the blocking filters F115W -> F115W -> F150W -> F150W -> F200W. In demo mode, we have only downloaded the final observing sequence with F200W.

In [None]:
if doimage2:
    
    image2_asns = glob.glob(os.path.join(sci_dir, "*image2*_asn.json"))

    # Verify the number of associations
    print(len(image2_asns), 'Image2 ASN files') # there should be 8 asn files for image2 in demo mode
    # the number of association files should match the number of direct image rate files
    print(len(rate_df[rate_df['FILTER'] == 'CLEAR']), 'Direct Image rate files')

In [None]:
if doimage2:
    # look at one of the association files
    image2_asn_data = json.load(open(image2_asns[0]))
    for key, data in image2_asn_data.items():
        print(f"{key} : {data}")

From this association, we can tell many things about the observation:
1. From `asn_type` and `asn_rule`, we can see that this is an image2 association
2. From `code_version` we can see what version of the code this association was created with. If this does not match the version of the pipeline you are currently using, you are encouraged to create new associations directly from the [pool file](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/asn_generate.html) or by providing [a list of exposures](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/asn_from_list.html) and their associated exposure type to ensure there are no errors downstream in the pipeline.
3. From `degraded_status` we can see that there are no exposures to not be included in the calibration.
4. From `constraints`, we can see this is not a time series observation (TSO), the observation is part of program 2079, observed with NIRISS with the CLEAR (i.e. imaging for WFSS) and F200W blocking filter.
5. From `products` we can see there is only one exposure associated. This is typical for image2 where there is usually only one exposure per dither per observing sequence.

We can also take a closer look at the products section of the association to better understand the types of files and the members associated together. For NIRISS images at the image2 stage, this will only be one `*_rate.fits` file.

In [None]:
if doimage2:
    print(f'asn_type : {image2_asn_data["asn_type"]}')
    print(f'code_version : {image2_asn_data["code_version"]}')
    
    # in particular, take a closer look at the product filenames with the association file:
    for product in image2_asn_data['products']:
        for key, value in product.items():
            if key == 'members':
                print(f"{key}:")
                for member in value:
                    print(f"    {member['expname']} {member['exptype']}")
            else:
                print(f"{key}: {value}")

### Run Image2

The `*_rate.fits` products will be calibrated into `*_cal.fits` files. More information about the steps performed in the Image2 part of the pipeline can be found in the [Image2 pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image2.html).

In [None]:
if doimage2:    
    os.chdir(sci_dir)
    for img2_asn in image2_asns:
        # calibrate with the image2 pipeline
        img2 = Image2Pipeline.call(img2_asn, 
                                   output_dir=sci_dir,
                                   save_results=True)
    os.chdir(cwd)

We can look at which steps were turned on and which reference files were used on the direct images up to the Image2 stage of the pipeline.

In [None]:
# we take a look at the same direct image rate file that is now a *_cal.fits file
img2_filename = dir_img_rate.replace('rate', 'cal')
check_steps_run(img2_filename)
check_ref_file_used(img2_filename)

In [None]:
# Print out the time benchmark
time_image2_end = time.perf_counter()
print(f"Runtime for Image2: {(time_image2_end - time_image2)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 6. Image3 Pipeline

In this section we continue calibrating the direct images with the Image3 stage of the pipeline to obtain a source catalog and segmentation mapping of the field to use as input into the Spec2 stage later. In the [Image3 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image3.html), the individual `*_cal.fits` files for each of the dither positions are combined to one single distortion corrected image (`*_i2d.fits` files).

By default, the Image3 stage of the pipeline performs the following steps on NIRISS data:
* [tweakreg](https://jwst-pipeline.readthedocs.io/en/latest/jwst/tweakreg/README.html) - creates source catalogs of pointlike sources for each input image. The source catalog for each input image is compared to each other to derive coordinate transforms to align the images relative to each other.
* As of CRDS context jwst_1156.pmap and later, the pars-tweakreg parameter reference file for NIRISS performs an absolute astrometric correction to GAIA data release 3 by default (i.e., the abs_refcat parameter is set to GAIADR3). Though this default correction generally improves results compared with not doing this alignment, it could potentially result in poor performance in crowded or sparse fields, so users are encouraged to check astrometric accuracy and revisit this step if necessary.
* As of pipeline version 1.14.0, the default source finding algorithm for NIRISS is IRAFStarFinder which testing shows returns good accuracy for undersampled NIRISS PSFs at short wavelengths ([Goudfrooij 2022](https://www.stsci.edu/files/live/sites/www/files/home/jwst/documentation/technical-documents/_documents/JWST-STScI-008324.pdf)).
* [skymatch](https://jwst-pipeline.readthedocs.io/en/latest/jwst/skymatch/description.html) - measures the background level from the sky to use as input into the subsequent outlier detection and resample steps.
* outlier detection - flags any remaining cosmic rays, bad pixels, or other artifacts not already flagged during the detector1 stage of the pipeline, using all input images to create a median image so that outliers in individual images can be identified.
* [resample](https://jwst-pipeline.readthedocs.io/en/latest/jwst/resample/main.html) - resamples each input image based on its WCS and distortion information and creates a single undistorted image.
* [source catalog](https://jwst-pipeline.readthedocs.io/en/latest/jwst/source_catalog/main.html) - creates a catalog of detected sources along with measured photometries and morphologies (i.e., point-like vs extended). Useful for quicklooks, but optimization is likely needed for specific science cases, which is an on-going investigation for the NIRISS team. Users may wish to experiment with changing the snr_threshold and deblend options. Modifications to the following parameters will not significantly improve data quality and it is advised to keep them at their default values: aperture_ee1, aperture_ee2, aperture_ee3, ci1_star_threshold, ci2_star_threshold.

In [None]:
time_image3 = time.perf_counter()

### Image3 Association Files
The contents of image3 association files are quite similar to image2, but notice now that there are many more members that are associated together, and they use the individual pointing cal files from image2. Image3 resamples and combines images of the same blocking filter (PUPIL for NIRISS WFSS) from all dither and observing sequences to form a single image.

In [None]:
if doimage3:
    image3_asns = glob.glob(os.path.join(sci_dir, "*image3*_asn.json"))
    print(len(image3_asns), 'Image3 ASN files') # there should be 1 image3 association files

    # the number of image3 association files should match the number of unique blocking filters used
    uniq_filters = np.unique(rate_df[rate_df['FILTER'] == 'CLEAR']['PUPIL'])
    print(f"{len(uniq_filters)} unique filter(s) used: {uniq_filters}")

In [None]:
if doimage3:
    # open the image3 association to look at
    image3_asn_data = json.load(open(image3_asns[0]))
    print(f'asn_type : {image3_asn_data["asn_type"]}')
    print(f'code_version : {image3_asn_data["code_version"]}')
    
    # in particular, take a closer look at the product filenames with the association file:
    for product in image3_asn_data['products']:
        for key, value in product.items():
            if key == 'members':
                print(f"{key}:")
                for member in value:
                    print(f"    {member['expname']} {member['exptype']}")
            else:
                print(f"{key}: {value}")

### Run Image3

In Image3, the `*_cal.fits` individual pointing files will be calibrated into a single combined `*_i2d.fits` image. More information about the steps performed in the Image3 part of the pipeline can be found in the [Image3 pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image3.html).

Provided are some commented out parameters that can be tweaked to optimize your Image3 outputs for your particular data. The full list of parameters can be found in the [tweakreg](https://jwst-pipeline.readthedocs.io/en/latest/jwst/tweakreg/README.html) and [sourcecatalog](https://jwst-pipeline.readthedocs.io/en/latest/jwst/source_catalog/main.html) documentation.

In [None]:
if doimage3:    
    os.chdir(sci_dir)
    for img3_asn in image3_asns:
        # calibrate with the image3 pipeline
        img3 = Image3Pipeline.call(img3_asn, 
                                   output_dir=sci_dir,
                                   # steps={'source_catalog': {'kernel_fwhm': 5.0,
                                   #                           'snr_threshold': 10.0,
                                   #                           'npixels': 50,
                                   #                           'deblend': True,
                                   #                          },
                                   #        'tweakreg': {'snr_threshold': 20,
                                   #                     'abs_refcat': 'GAIADR3',
                                   #                     'searchrad': 3.0,
                                   #                     'kernel_fwhm': 2.302,
                                   #                     'fitgeometry': 'shift',
                                   #                     },
                                   #       },
                                   save_results=True)
    os.chdir(cwd)

Checking all steps run and reference files used on the direct images

In [None]:
# look at an i2d file to see the steps & reference files used
img3_filename = glob.glob(os.path.join(sci_dir, '*i2d.fits'))[0]
check_steps_run(img3_filename)
check_ref_file_used(img3_filename)

### Inspect Image3 Output Products

Using the `*_i2d.fits` combined image and the source catalog produced by Image3, we can visually inspect if we're happy with where the pipeline found the sources to use in the Spec2 stage of the pipeline. In the following figures, what has been defined as an extended source by the pipeline is shown in orange-red, and what has been defined as a point source by the pipeline is shown in grey. This definition affects the extraction box in the WFSS images as well as in the contamination correction step of the pipeline.

In [None]:
# These are all resuts from the Image3 pipeline
image3_i2d = sorted(glob.glob(os.path.join(sci_dir, '*i2d.fits'))) # combined image over multiple dithers/mosaic
image3_segm = sorted(glob.glob(os.path.join(sci_dir, '*segm.fits'))) # segmentation map that defines the extent of a source
image3_cat = sorted(glob.glob(os.path.join(sci_dir, '*cat.ecsv'))) # Source catalog that defines the RA/Dec of a source at a particular pixel

#### i2d & segementation mapping
The segmentation maps are also a product of the Image3 pipeline, and they are used the help determine the source catalog. Let's take a look at those to ensure we are happy with what it is defining as a source.

In the segmentation map on the figure to the right, each blue blob should correspond to a physical target. There are cases where sources can be blended, in which case the parameters for making the semgentation map and source catalog should be changed. An example of this can be seen below in the observation 004 F200W filter image where two galaxies at ~(1600, 1300) have been blended into one source. This is discussed in more detail in the custom Image3 run in the [NIRISS WFSS JDAT notebooks](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

In [None]:
if doviz:            
    cols = 2
    rows = len(image3_i2d)
    
    fig = plt.figure(figsize=(15, 15*(rows/2)))
    
    for plt_num, img in enumerate(np.sort(np.concatenate([image3_segm, image3_i2d]))):
    
        # determine where the subplot should be
        xpos = (plt_num % 40) % cols
        ypos = ((plt_num % 40) // cols) # // to make it an int.
    
        # make the subplot
        ax = plt.subplot2grid((rows, cols), (ypos, xpos))
    
        if 'i2d' in img:
            cat = Table.read(img.replace('i2d.fits', 'cat.ecsv'))
            cmap = 'gist_gray'
        else:
            cmap = 'tab20c_r'
            
        # plot the image
        with fits.open(img) as hdu:
            ax.imshow(hdu[1].data, vmin=0, vmax=0.3, origin='lower', cmap=cmap)
            title = f"{hdu[0].header['PUPIL']}"
    
        # also plot the associated catalog
        extended_sources = cat[cat['is_extended'] == 1] # 1 is True; i.e. is extended
        point_sources = cat[cat['is_extended'] == 0] # 0 is False; i.e. is a point source
        
        for color, sources, source_type in zip(['maroon', 'black'], [extended_sources, point_sources], ['Extended Source', 'Point Source']):
            # plotting the sources
            ax.scatter(sources['xcentroid'], sources['ycentroid'], s=20, facecolors='None', edgecolors=color, alpha=0.9)
    
            # adding source labels 
            for i, source_num in enumerate(sources['label']):
                ax.annotate(source_num, 
                            (sources['xcentroid'][i]+0.5, sources['ycentroid'][i]+0.5), 
                            fontsize=10,
                            color=color)
            ax.scatter(-999, -999, label=source_type, s=20, facecolors='None', edgecolors=color, alpha=0.9)
        if 'i2d' in img:
            ax.set_title(f"{title} combined image\n(i2d)")
        else:
            ax.set_title(f"{title} segmentation map\n(segm)")
        
        # zooming in on a smaller region
        ax.set_xlim(1250, 1750)
        ax.set_ylim(1250, 1750)

        ax.legend(framealpha=0.3)
    
    # Helps to make the axes not overlap ; you can also set this manually if this doesn't work
    plt.tight_layout()

#### Source Catalog

In addition to the segmentation mapping, the source catalog itself can be useful to look at to examine the source centroids, calculated fluxes, and source extents

In [None]:
# first, look at the current, custom source catalog for the F200W filter
cat = Table.read(image3_cat[0])
cat

In all likelihood, you will need to rerun Image3 with different parameters in order to return an optimal source catalog to use with your NIRISS WFSS data. You can additionally refine the source catalog so that Spec2 and Spec3 only run on the sources that you care most about. Some examples of this can be found in the [NIRISS WFSS JDAT notebooks](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

In [None]:
# Print out the time benchmark
time_image3_end = time.perf_counter()
print(f"Runtime for Image3: {(time_image3_end - time_image3)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 7. Spec2 Pipeline

After running Image3 and thus getting the the segmentation map and source catalog, the [Spec2 pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec2.html#calwebb-spec2) is ready to be run. The spec2 pipeline first runs [assign_wcs](https://jwst-pipeline.readthedocs.io/en/latest/jwst/assign_wcs/main.html), [background](https://jwst-pipeline.readthedocs.io/en/latest/jwst/background_subtraction/description.html), and [flat_field](https://jwst-pipeline.readthedocs.io/en/latest/jwst/flatfield/main.html) corrections first on the full-frame `*_rate.fits` files. The [srctype](https://jwst-pipeline.readthedocs.io/en/latest/jwst/srctype/description.html) step is run to determine the extent of the extraction box size before the [extract_2d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_2d/main.html) step is run, producing individual cutouts for the brightest 100 sources defined in the Image3 source catalog. The [wfss_contam](https://jwst-pipeline.readthedocs.io/en/latest/jwst/wfss_contam/description.html) step is run towards the end of the [extract_2d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_2d/main.html) step and is currently not on by default as the step is being improved. The [photom](https://jwst-pipeline.readthedocs.io/en/latest/jwst/photom/main.html) step is then run on the cutouts, producing flux calibrated 2-D spectral (`*_cal.fits`) files. The [extract_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_1d/description.html) step is run last, producing level 2 `*_x1d.fits` files.

In [None]:
time_spec2 = time.perf_counter()

<a id="spec2_asn"></a>
### Spec2 Association Files

As with the imaging part of the pipeline, there are association files for spec2. These are a bit more complex in that they need to have the science (WFSS) data, direct image, source catalog, and segmentation map included as members. For the science data, the rate files are used as inputs, similarly to Image2. Also like Image2, there should be one association file for each dispersed image dither position in an observing sequence. In this case, that should match the number of rate files where `FILTER=GR150R` or `FILTER=GR150C`. For this program and observation, there are three dithers per grism, and both GR150R and GR150C are used, totaling six exposures per observing sequence with five observing sequences in the observation using the blocking filters F115W -> F115W -> F150W -> F150W -> F200W.

In [None]:
if dospec2:
    spec2_asns = glob.glob(os.path.join(sci_dir, "*spec2*_asn.json"))

    # the number of spec2 association files should match the number of grism image rate files
    print(len(spec2_asns), 'Spec2 ASN files')
    print(len(rate_df[(rate_df['FILTER'] == 'GR150R') | (rate_df['FILTER'] == 'GR150C')]), 'Dispersed image rate files')
    

Each individual exposure within a spec2 association contains a science image, a direct image, a source catalog, and a segmentation map all to be used within spec2.

In [None]:
if dospec2:
    # look at one of the association files
    asn_data = json.load(open(spec2_asns[0]))
    print(f'asn_type : {asn_data["asn_type"]}')
    print(f'code_version : {asn_data["code_version"]}')
    
    # in particular, take a closer look at the product filenames with the association file:
    for product in asn_data['products']:
        for key, value in product.items():
            if key == 'members':
                print(f"{key}:")
                for member in value:
                    print(f"    {member['expname']} : {member['exptype']}")
            else:
                print(f"{key}: {value}")

<a id="spec2_run"></a>
### Run spec2
In Spec2, the `*_rate.fits` files run through various corrections before using the source catalog to extract the 100 brightest sources into 1-D spectra (level 2 `*_x1d.fits` files). More information about the steps performed during the spec2 stage of the pipeline can be found in the [Spec2 pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec2.html).

We also show the option of how to turn on the contamination step as an option, although there are several known bugs still with this stage as of pipeline version 1.19.1, so we caution use of this step currently. We also show how to save the background subtracted full-frame file as an intermediate product (`*_bsub.fits`) as well as the flat-fielded and background subtracted full-frame images as an intermediate product (`*_flat_field.fits`) which is the last step performed before the individual source cutouts are created.

In [None]:
if dospec2:
    for spec2_asn in spec2_asns:
        os.chdir(sci_dir)
        # calibrate with spec2 pipeline
        spec2 = Spec2Pipeline.call(spec2_asn,
                                   output_dir=sci_dir,
                                   steps={'bkg_subtract' : {'save_results' : True}, # save background subtracted full-frame images
                                          #'flat_field' : {'save_results' : True}, # save bkg subtracted & flat-fielded full-frame images
                                          #'wfss_contam' : {'skip' : False}, # uncomment to turn on contamination correction
                                          },
                                   save_results=True)
        os.chdir(cwd)

Again, we can check the steps that are run up through the spec2 pipeline and the corresponding reference files used.

In [None]:
example_x1d = glob.glob(os.path.join(sci_dir, '*nis_x1d.fits'))[0]
check_steps_run(example_x1d) # direct image
check_ref_file_used(example_x1d) # dispersed image

<a id="spec2_examine"></a>
### Examining the Outputs of Spec2

The outputs of spec2 are `*_cal.fits` and `*_x1d.fits` files. Here we do a quick look into some important parts of these files.
- [_cal.fits file format further reading](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html#calibrated-data-cal-and-calints)
- [_x1d.fits file format further reading](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html#extracted-1-d-spectroscopic-data-x1d-and-x1dints)

As of build 1.19.1, the level 2 `*_x1d.fits` files contain all extracted sources for a single exposure in a single extension for each order extracted. Only the sources that were extracted in that exposure are included in the level 2 products, unlike the level 3 products which will be discussed later. Additional information about the extraction of the sources are provided as columns within the data extension.

In the `*_cal.fits` files, the 0th and final extension in each file do not contain science data, but the remaining extensions correspond to each extracted source. The `*_cal.fits` files contain the 2D cutout information for each source in seven extensions for each source (SCI, DQ, ERR, WAVELENGTH, VAR_POISSON, VAR_RNOISE, VAR_FLAT).

Notice that there are more sources in the source catalog than there are extensions in the files. This is because the pipeline defaults to only extracting the 100 brightest sources. To change this behavior, supply the pipeline with the paramter `wfss_nbright`.

In [None]:
spec2_x1ds = sorted(glob.glob(os.path.join(sci_dir, "*nis_x1d.fits")))
spec2_sample_x1d = fits.open(spec2_x1ds[0])

print("***Format of the level 2 x1d file:")
print(spec2_sample_x1d.info())

print("\n***Columns contained in each extension of the level 2 x1d file:")
print(spec2_sample_x1d[1].data.columns)

print(f"\n***Sources extracted for order 1 in the level 2 x1d file {spec2_sample_x1d[0].header['FILENAME']}:")
print(spec2_sample_x1d[1].data['SOURCE_ID'])

In NIRISS WFSS data there are many sources of interest to look at. In this visualization we look at, for five selected sources, the source as it appears in the i2d image, two example grism `*_cal.fits` 2-D spectral cutouts, and the level 2 `*_x1d.fits` 1-D extracted spectra for all grism dithers. With the contamination step currently turned off, the contamination can be easily visible when comparing the 1-D and 2-D spectra of the two grisms, especially for source 505 where you can see an order 0 contaminant in the GR150C example `*_cal.fits` image at ~(75, 5), which appears as a large emission line for the GR150C 1-D spectrum.

Note that the `*_cal.fits` data for GR150R are transposed so that the dispersion direction is along the -x axis. For both GR150R and GR150C `*_cal.fits` files, the axis is then flipped for visualization purposes.

In [None]:
# make sure you have run the cells defined convienence functions section: plot_i2d_plus_source, plot_spec2_cal, & plot_spectrum
# here we look at the source as identified by the source catalog in the i2d image, the two grism cal files, and the x1d files
if doviz:
    # grab the spec2 x1d output products
    spec2_x1d_files = sorted(glob.glob(os.path.join(sci_dir, '*nis_x1d.fits*')))

    # define some cool sources to look at
    sources = [417, 422, 505, 1296, 606]
    nsources = len(sources)
    
    # or grab some sources from the first x1d file
    # nsources = 4 # 100 sources are extracted by default
    # source_offset = 10 # offsetting what nsources to plot
    # with fits.open(spec2_x1d_files[0]) as temp_x1d:
    #     sources = temp_x1d[1].data['SOURCE_ID'][source_offset:nsources+source_offset]

    # setting up the figure
    cols = 4
    rows = nsources
    fig = plt.figure(figsize=(15, 4*(rows/2)))
    
    # looping through the different sources to plot; one per row
    for nsource, source_id in enumerate(sources):
        # we are only plotting a single cal file cutout for each grism
        plot_gr150r = True
        plot_gr150c = True

        # setting up the subplots for a single source
        ypos = nsource
        ax_i2d = plt.subplot2grid((rows, cols), (ypos, 0)) 
        ax_cal_r = plt.subplot2grid((rows, cols), (ypos, 1)) 
        ax_cal_c = plt.subplot2grid((rows, cols), (ypos, 2)) 
        ax_x1d = plt.subplot2grid((rows, cols), (ypos, 3))
    
        source_fluxes = [] # save the source flux to set the plot limits
                
        # plot all of the 1-D spectra from the x1d files
        for nfile, x1dfile in enumerate(spec2_x1d_files):

            ax_x1d, catname, source_fluxes, grism = plot_spectrum(x1dfile, source_fluxes, ax_x1d, sci_dir)
            
            # plot the direct image of the source based on the source number from the source catalog
            if nfile == 0:
                
                ax_i2d = plot_i2d_plus_source(catname, source_id, ax_i2d)
            
            # plot one example cal image from the GR150R grism, transposed to disperse in the same direction as GR150C
            if plot_gr150r and grism == 'GR150R':
                ax_cal_r = plot_spec2_cal(x1dfile, source_id, ax_cal_r, transpose=True)
                plot_gr150r = False
                
            # plot one example cal image from the GR150C grism
            if plot_gr150c and grism == 'GR150C':
                ax_cal_c = plot_spec2_cal(x1dfile, source_id, ax_cal_c)
                plot_gr150c = False

        if len(source_fluxes):
            # there may not have been data to extract if everything was saturated
            ax_x1d.set_ylim(np.nanmin(source_fluxes), np.nanmax(source_fluxes))
            ax_x1d.legend(bbox_to_anchor=(1,1))
        
        # Add labels to the subplots
        if nsource == 0:
            ax_cal_r.set_title('Example Transposed GR150R cutout\n(cal)')
            ax_cal_c.set_title('Example GR150C cutout\n(cal)')
            ax_i2d.set_title('Direct Image\n(i2d)')
            ax_x1d.set_title('All Collapsed 1-D Spectrum\n(level 2 x1d)')
        ax_i2d.set_ylabel(f'Source\n{source_id}', fontsize=15)

            
    fig.tight_layout()
    fig.show()

In [None]:
# Print out the time benchmark
time_spec2_end = time.perf_counter()
print(f"Runtime for Spec2: {(time_spec2_end - time_spec2)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

## 8. Spec3 Pipeline

NIRISS WFSS data are minimally processed through the [Spec3 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec3.html) to combine calibrated data from multiple dithers within an observation. The spec3 products are unique for a specific grism and blocking filter combination; the different grism data are not combined by default. As of pipeline version 1.19.1, the level 3 source-based `*_cal.fits` files created in this step in the [exp_to_source](https://jwst-pipeline.readthedocs.io/en/latest/jwst/exp_to_source/main.html) step are no longer saved by default, and the `*_x1d.fits` files created in the [extract_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_1d/description.html) and the `*_c1d.fits` files created in the [combine_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/combine_1d/description.html) step are now saved as a single file per grism and filter combination with all of the extracted sources contained within that file.

In [None]:
time_spec3 = time.perf_counter()

### Spec3 Association Files

There will be one spec3 association per blocking filter and grism combination, in which all of the extracted 1-D spectra within an observation with that filter and grism combination are coadded into a single spectrum for each source. In the demo case, we are looking at only one blocking filter (F200W) with both grisms (GR150R & GR150C), so we would expect two spec3 association files. There are three dithered exposures per grism for the F200W observation, so each spec3 association file will contain those three `*_cal.fits` files to combine.

In [None]:
if dospec3:
        spec3_asns = glob.glob(os.path.join(sci_dir, "*spec3*_asn.json"))
    
        # the number of spec3 association files should match the number of grism + filter combinations
        print(len(spec3_asns), 'Spec3 ASN files')
        grism_df = rate_df[(rate_df['FILTER'] == 'GR150R') | (rate_df['FILTER'] == 'GR150C')]
        grism_filter = grism_df['FILTER'] + grism_df['PUPIL']
        print(len(np.unique(grism_filter)), 'unique filter+grism combinations') 

In [None]:
if dospec3:
    # look at one of the association files
    asn_data = json.load(open(spec3_asns[1]))
    print(f'asn_type : {asn_data["asn_type"]}')
    print(f'code_version : {asn_data["code_version"]}')

    # in particular, take a closer look at the product filenames with the association file:
    for product in asn_data['products']:
        for key, value in product.items():
            if key == 'members':
                print(f"{key}:")
                for member in value:
                    print(f"    {member['expname']} : {member['exptype']}")
            else:
                print(f"{key}: {value}")

### Run spec3

In Spec3, the `*_cal.fits` files are reorganized based on source number from the Image3 Pipeline's source catalog, extracted into level 3 `*_x1d.fits` files, and then combined into a single 1-D spectrum (`*_c1d.fits` files) for each source. More information about the steps performed during the spec3 stage of the pipeline can be found in the [Spec3 pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec3.html).

In [None]:
if dospec3:
    os.chdir(sci_dir)
    for spec3_asn in spec3_asns:
        # calibrate with spec3 pipeline
        spec2 = Spec3Pipeline.call(spec3_asn, save_results=True)
    os.chdir(cwd)

We can look at the steps run and the corresponding reference files used for the Spec3 c1d output files.

In [None]:
example_c1d = glob.glob(os.path.join(sci_dir, '*c1d.fits'))[0]
check_steps_run(example_c1d) # direct image
check_ref_file_used(example_c1d) # dispersed image

<a id="spec3_examine"></a>
### Examining the Outputs of Spec3

The outputs of spec3 are `*_x1d.fits` and `*_c1d.fits` files. Here we do a quick look into some important parts of these files.

### File Structure 

Each extension of the spec3 `*_x1d.fits` files contains the extracted, 1-D spectra for an individual dither for a single grism, filter, and extracted order combination. The specific filenames and extracted order can be verified with the `FILENAME` and `SPORDER` keywords in the header of each extension respectively. Within the extension, each of the extracted sources across all dithers are listed, with the values being empty if the particular dither did not contain data for that source. Also contained within each extension is information related to the extraction of a particular source, including the extents and starting size of the extraction box in the full reference frame. More information about the columns contained withing the `*_x1d.fits` files can be found in the [x1d filetype documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html#extracted-1-d-spectroscopic-data-x1d-and-x1dints).

In [None]:
dospec3
# the level 3 products have a different naming scheme
spec3_x1ds = sorted(glob.glob(os.path.join(sci_dir, "jw?????-o???_*x1d.fits")))
sample_x1d = fits.open(spec3_x1ds[0])

print("***Format of the level 3 x1d file:")
sample_x1d.info()

print("\n***Files contained in the level 3 x1d file:")
for ext in range(len(sample_x1d))[1:-1]:
    print(f"Extension {ext}: {sample_x1d[ext].header['FILENAME']}, order {sample_x1d[ext].header['SPORDER']}")

print("\n***Columns contained in each extension of the level 3 x1d file:")
sample_x1d[1].data.columns

The `*_c1d.fits` files contain combined extensions of the same order in the spec3 `*_x1d.fits` files into a single file. The source numbers in the `*_c1d.fits` match those in the level 3 `*_x1d.fits` files. More information about the columns contained withing the `*_c1d.fits` files can be found in the [c1d filetype documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html#combined-1-d-spectroscopic-data-c1d).

In [None]:
# the level 3 products have a different naming scheme
spec3_c1ds = sorted(glob.glob(os.path.join(sci_dir, "jw?????-o???_*c1d.fits")))
sample_c1d = fits.open(spec3_c1ds[0])

print("***Format of the c1d file:")
sample_c1d.info()

print("\n***Files contained in the c1d file:")
for ext in range(len(sample_c1d))[1:-1]:
    print(f"Extension {ext}: order {sample_c1d[ext].header['SPORDER']}")
    
print("\n***Columns contained in each extension of the c1d file:")
sample_c1d[1].data.columns

Digging a little bit further into the different source IDs and how those are handled, you can see that in each extension the source IDs are now identical.  

In [None]:
for ext in np.arange(len(sample_x1d))[1:-1]:
    print(f"Extension {ext}: {sample_x1d[ext].header['FILENAME']}, Order {sample_x1d[ext].header['SPORDER']}")
    print("  Sources:\n", sample_x1d[ext].data['SOURCE_ID'])

If a source wasn't extracted for a file, the values will be filled in with a value of "0" or "nan". In this example, we look at where the column `N_ALONGDISP` is zero to find which sources aren't extracted in certain files. `N_ALONGDISP` represents the number of pixels in the trace along the dispersion direction, so if it is zero, no pixels were used.

In [None]:
# looking at extension 1 (first file) as an example of what a source looks like if it's not extracted
ext = 1
wh_no_source = np.where(sample_x1d[ext].data['N_ALONGDISP'] == 0)[0]
if len(wh_no_source) > 0:
    print(f"{sample_x1d[ext].header['FILENAME']} does not extract the following sources:")
    print(f"  {sample_x1d[ext].data['SOURCE_ID'][wh_no_source]}")
    print("Unique values in the different columns:")
    for colname in sample_x1d[ext].data.names:
        print(f"  {colname} : {np.unique(sample_x1d[ext].data[colname][wh_no_source[0]])}")

### Spec3 Visualization

To compare with the Spec2 output products above, we look at the same sources, plotting instead the final `*_c1d.fits` files for each grism. In the figure below, we again show the `*_i2d.fits` image for a specific source, followed by the level 3 `*_x1d.fits` individual spectra for each of the two grisms, followed by the `*_c1d.fits` combined spectrum for each of the grisms.

In [None]:
# make sure you have run the cells defined convienence functions section: plot_i2d_plus_source & plot_spectrum
# this cell looks at the i2d images, the level 3 x1d spectra, and the combined c1d spectra for both grisms for several sources
if doviz:

    spec3_c1ds = sorted(glob.glob(os.path.join(sci_dir, "jw?????-o???_*c1d.fits")))

    # define some cool sources to look at
    sources = [417, 422, 505, 1296, 606]
    nsources = len(sources)
    
    # or grab some sources from the first x1d file
    # nsources = 4 # 100 sources are extracted by default
    # source_offset = 10 # offsetting what nsources to plot
    # with fits.open(spec3_c1ds[0]) as temp_c1d:
    #     sources = temp_c1d[1].data['SOURCE_ID'][source_offset:nsources+source_offset]

    # setting up the figure
    cols = 4
    rows = nsources
    fig_c1d = plt.figure(figsize=(15, 4*(rows/2)))

    # looping through the different sources to plot; one per row
    for nsource, source_id in enumerate(sources):

        # setting up the subplots for a single source
        ypos = nsource
        ax_i2d = plt.subplot2grid((rows, cols), (ypos, 0)) 
        ax_x1d_r = plt.subplot2grid((rows, cols), (ypos, 1))
        ax_x1d_c = plt.subplot2grid((rows, cols), (ypos, 2))
        ax_c1d = plt.subplot2grid((rows, cols), (ypos, 3))
    
        source_fluxes = [] # save the source flux to set the plot limits

        # plot all of the 1-D combined spectra from the c1d files
        for nfile, c1dfile in enumerate(spec3_c1ds):
            
            # plotting the c1d spectra
            ax_c1d, catname, source_fluxes, grism = plot_spectrum(c1dfile, source_fluxes, ax_c1d, sci_dir)
                
            # plot the level 3 x1d files
            x1dfile = c1dfile.replace('c1d', 'x1d')
            with fits.open(x1dfile) as x1d:
                for ext in range(len(x1d))[1:-1]:
                    if grism == 'GR150R':
                        ax_x1d_r, catname, source_fluxes, grism = plot_spectrum(x1dfile, source_fluxes, ax_x1d_r, sci_dir, ext=ext, legend=False)
                    else:
                        ax_x1d_c, catname, source_fluxes, grism = plot_spectrum(x1dfile, source_fluxes, ax_x1d_c, sci_dir, ext=ext, legend=False)
            
            # plot the direct image of the source based on the source number from the source catalog
            if nfile == 0:
                ax_i2d = plot_i2d_plus_source(catname, source_id, ax_i2d)

        # plot labels and such
        if len(source_fluxes):
            # there may not have been data to extract if everything was saturated
            ax_c1d.set_ylim(np.nanmin(source_fluxes), np.nanmax(source_fluxes))
        
        # Add labels to the subplots
        if nsource == 0:
            ax_i2d.set_title('Direct Image\n(i2d)')
            ax_x1d_r.set_title('Individual GR150R 1-D Spectrum\n(level 3 x1d)')
            ax_x1d_c.set_title('Individual GR150C 1-D Spectrum\n(level 3 x1d)')
            ax_c1d.set_title('Combined 1-D Spectrum\n(c1d)')
        ax_i2d.set_ylabel(f'Source\n{source_id}', fontsize=15)

            
    fig_c1d.tight_layout()
    fig_c1d.show()

In [None]:
# Print out the time benchmark
time_spec3_end = time.perf_counter()
print(f"Runtime for Spec3: {(time_spec3_end - time_spec3)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>