<img style="float: center;" src='https://github.com/spacetelescope/jwst-pipeline-notebooks/raw/main/_static/stsci_header.png' alt="stsci_logo" width="900px"/> 

# NIRISS Wide Field Slitless Spectroscopy (WFSS) Pipeline Notebook

**Authors**: R. Plesha<br>
**Last Updated**: September 10, 2025<br>
**Pipeline Version**: 1.19.1 (Build 12.0)

# **Purpose**:

This notebook provides a framework for processing generic Near-Infrared Imager and Slitless Spectrograph (NIRISS) wide field slitless spectroscopy (WFSS) data through the James Webb Space Telescope (JWST) pipeline. Data from a single proposal and observation ID is assumed to be located in one observation folder according to paths set up below. It should not be necessary to edit any cells other than in the [Configuration](#1.-Configuration) section unless modifying the standard pipeline processing options. Additional notebooks showing how to optimize and modify sources being extracted for NIRISS WFSS data can be found on the [JDAT notebooks github](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

**Data**:
This example uses data from the [Program ID 2079](https://www.stsci.edu/jwst/science-execution/program-information?program=2079) observation 004 (PI: Finkelstein) observing the Hubble Ultra Deep Field (HUDF). The observations are in three [NIRISS filters](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-pupil-and-filter-wheels): F115W, F150W, and F200W use both GR150R and GR150C [grisms](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-gr150-grisms). In this example we are only looking at data from one of the two observations using the F200W filter. A [NIRISS WFSS observation sequence](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-observing-strategies/niriss-wfss-recommended-strategies) typically consists of a direct image followed by a grism observation in the same blocking filter to help identify the sources in the field. In program 2079, the exposure sequence follows the pattern: direct image -> GR150R -> direct image -> GR150C -> direct image.

Example input data to use will be downloaded automatically unless disabled by the `dodownload` parameter (i.e., to use local files instead).

**JWST pipeline version and CRDS context**: 
This notebook was written for the calibration pipeline version given above. The JWST Calibration Reference Data System (CRDS) context used is associated with the pipeline version as listed [here](https://jwst-crds.stsci.edu/display_build_contexts/). If you use different pipeline version or CRDS context, please read the relevant release notes ([here for pipeline](https://github.com/spacetelescope/jwst/releases), [here for CRDS](https://jwst-crds.stsci.edu/display_context_history/)) for possibly relevant changes. The results of this notebook may differ from the products hosted on the [MAST archive](https://mast.stsci.edu/search/ui/#/jwst) depending on the pipeline version and CRDS context you are using.

**Updates**:
This notebook is regularly updated as improvements are made to the pipeline. Find the most up to date version of this notebook at: https://github.com/spacetelescope/jwst-pipeline-notebooks/

**Recent Changes**:<br>
September 10, 2025: original notebook released<br>

<hr style="border:1px solid gray"> </hr>

## Table of Contents
1. [Configuration](#1.-Configuration) 
2. [Package Imports](#2.-Package-Imports)
3. [Directory Setup](#3.-Directory-Setup)
4. [Demo Mode Setup (Data Download)](#4.-Demo-Mode-Setup-(Data-Download))
5. [Detector 1 Pipeline](#5.-Detector1-Pipeline)
6. [Image2 Pipeline](#6.-Image2-Pipeline)
7. [Image3 Pipeline](#7.-Image3-Pipeline)
8. [Visualize the source catalog](#8.-Visualize-the-source-catalog)
9. [Spec2 Pipeline](#9.-Spec2-Pipeline)
10. [Spec3 Pipeline](#10.-Spec3-Pipeline)
11. [Visualize the Spectra](#11.-Visualize-the-spectra)

<hr style="border:1px solid gray"> </hr>

## 1. Configuration
------------------
Set basic configuration for running notebook.

#### Install dependencies and parameters

To make sure that the pipeline version is compatabile with the steps
discussed below and the required dependencies and packages are installed,
you can create a fresh conda environment and install the provided
`requirements.txt` file:
```
conda create -n niriss_wfss_pipeline python=3.12
conda activate niriss_wfss_pipeline
pip install -r requirements.txt
```

Set the basic parameters to use with this notebook. These will affect
what data is used, where data is located (if already in disk), names of any outputs, and
pipeline modules run in this data. The list of parameters are:

* demo_mode
* sci_dir (directory where the data is / will be)
* dodownload (download the data locally)
* pipeline modules:
  * dodet1 (run detector1)
  * doimage2 (run image2)
  * doimage3 (run image3)
  * dospec2 (run spec2)
  * dospec3 (run spec3)
* doviz (show visualizations of the data within the notebook)
* program (proposal ID of your data for the level 3 association files)
* sci_observtn (observation of your data for level 3 the association files)

In [None]:
# Basic import necessary for configuration
import os

# establishing what directory we're currently working in
cwd = os.getcwd()

<div class="alert alert-block alert-warning">
Adjust any parameters in the cell directly below this before running to ensure <code>demo_mode</code> runs correctly.
</div>

Set <code>demo_mode = True</code> to run in demonstration mode. In this mode this notebook will download example data from the Barbara A.
Mikulski Archive for Space Telescopes (MAST) and process everything through the pipeline. This will all happen in a local directory unless modified in the configuration below (variable `data_dir`).

In [None]:
# -----------------------------Demo Mode---------------------------------
# use the provided example for demonstration purposes
demo_mode = True

if demo_mode:
    program = '02079'
    sci_observtn = '004' # as an example; 001 also exists for this program

    # creating a directory for the data called "nis_wfss_demo_data" 
    #   located in the directory you are currently in
    data_dir = os.path.join(cwd, 'nis_wfss_demo_data')
    sci_dir = os.path.join(data_dir, f"PID{program}/obs{sci_observtn}")

    print(f'Running in demonstration mode using example data from program {program} obs{sci_observtn}!')
    print(f'Data located in: {sci_dir}')

    # you will want to download the demo data
    dodownload = True

Set <code>demo_mode = False</code> if you want to process your own data that has already been downloaded. To do so, in the cell below, provide the program ID in the `program` variable, the observation ID in the `sci_observtn` variable, and the top path level location of the data in the `sci_dir` variable. The notebook expects that the `uncal` files are in a directory under `sci_dir` called `uncal`.

If you would like to additionally download the data for a specific program through this notebook, you can additionally set the `dodownload` variable to True below, and the data will be downloaded to the provided `sci_dir` directory.

In [None]:
# --------------------------User Mode Directories------------------------
# If demo_mode = False, look for user data in these paths
if not demo_mode:
    # Set directory paths for processing specific data; these will need
    # to be changed to your local directory setup (below are given as
    # examples)
    user_home_dir = os.path.expanduser('~')

    # Point to where science observation data are
    # Assumes uncalibrated data in sci_dir/uncal/ and results in stage1,
    # stage2, stage3 directories
    program = '02079' # modify this to your specific program
    sci_observtn = '004' # modify this to your specific program
    sci_dir = os.path.join(user_home_dir, f'nis_wfss_demo_data/PID{program}/obs{sci_observtn}/')
    dodownload = False # if you would like to download your data using astroquery, set to True & don't skip Demo mode setup section

    print(f'Running using user input data from: {sci_dir} for program {program} obs{sci_observtn}')

Set any of the variables below to be True (do the processing) or False (don't do the processing). To run this notebook from start to completion, it is expected that the output products from each of the stages below are located in the appropriate directories as set in [#3.-Directory Setup](#3.-Directory-Setup). If these output products do not exist, any of the later stages of the pipeline may not work as intended. 

In [None]:
# --------------------------Set Processing Steps--------------------------
# Individual pipeline stages can be turned on/off here.  Note that a later
# stage won't be able to run unless data products have already been
# produced from the prior stage.

# visualization of products within the notebook
doviz = True # Visualize outputs

# Science processing
dodet1 = True  # calwebb_detector1; files saved in "stage1" directory
doimage2 = True  # calwebb_image2; files saved in "stage2_img" directory
doimage3 = True  # calwebb_image3; files saved in "stage3_img" directory
dospec2 = True # calwebb_spec2; files saved in "stage2_spec" directory
dospec3 = True # calwebb_spec3; files saved in "stage3_spec" directory

### Set CRDS context and server
Before importing <code>CRDS</code> and <code>JWST</code> modules, we need to configure our environment. This includes defining a CRDS cache directory in which to keep the reference files that will be used by the calibration pipeline. The pipeline will fetch and download the needed reference files to this directory.

If the root directory for the local CRDS cache directory has not been set already, it will be set to create one in the home directory.

In [None]:
# ------------------------Set CRDS context and paths----------------------

# Set CRDS context (if overriding to use a specific version of reference
# files; leave commented out to use latest reference files by default)
# %env CRDS_CONTEXT  jwst_1413.pmap

# Check whether the local CRDS cache directory has been set.
# If not, set it to the user home directory
if (os.getenv('CRDS_PATH') is None):
    os.environ['CRDS_PATH'] = os.path.join(os.path.expanduser('~'), 'crds')
# Check whether the CRDS server URL has been set.  If not, set it.
if (os.getenv('CRDS_SERVER_URL') is None):
    os.environ['CRDS_SERVER_URL'] = 'https://jwst-crds.stsci.edu'

# Echo CRDS path in use
print(f"CRDS local filepath: {os.environ['CRDS_PATH']}")
print(f"CRDS file server: {os.environ['CRDS_SERVER_URL']}")

<hr style="border:1px solid gray"> </hr>

## 2. Package Imports
------------------

In [None]:
# Basic system utilities for interacting with files
# ----------------------General Imports------------------------------------
import glob
import time
from pathlib import Path

# Data calculations and manipulation
import numpy as np
import pandas as pd
import json
from collections import defaultdict

# -----------------------Plotting Imports----------------------------------
from matplotlib import pyplot as plt
# interactive plots within the notebook
%matplotlib inline

# -----------------------Astronomy Imports--------------------------------
# ASCII files, and downloading demo files
from astroquery.mast import Observations
from astroquery.mast.utils import remove_duplicate_products

# Astropy routines for visualizing detected sources:
import astropy.units as u
from astropy.io import fits
from astropy.table import Table
from astropy.coordinates import SkyCoord

# for JWST calibration pipeline
import jwst
import crds

from jwst.pipeline import Detector1Pipeline
from jwst.pipeline import Image2Pipeline
from jwst.pipeline import Image3Pipeline
from jwst.pipeline import Spec2Pipeline
from jwst.pipeline import Spec3Pipeline

# JWST pipeline utilities
from jwst import datamodels
from jwst.associations import asn_from_list  # Tools for creating association files
from jwst.associations.lib.rules_level2_base import DMSLevel2bBase  # Definition of a Lvl2 association file
from jwst.associations.lib.rules_level3_base import DMS_Level3_Base  # Definition of a Lvl3 association file

# Echo pipeline version and CRDS context in use
print(f"JWST Calibration Pipeline Version: {jwst.__version__}")
print(f"Using CRDS Context: {crds.get_context_name('jwst')}")

### Define convenience functions

These functions are used within the notebook and assist with plotting, finding the appropriate extension for a specific source in spec2 cal data, and verifying what steps and reference files were used for a provided file. These may be useful for your own analysis outside of this notebook, but are written for this notebook in particular.

#### Plotting Spec2 & Spec3 convenience functions

In [None]:
# this function will be used to plot the i2d image for a specific source as well as the catalog x/y centroid for that source
def plot_i2d_plus_source(catname, source_id, ax):
    # open the i2d & catalog and find the associated source number            
    i2dname = catname.replace('cat.ecsv', 'i2d.fits')
    
    cat = Table.read(catname)
    cat_line = cat[cat['label'] == source_id]
    
    # plot the image
    with fits.open(i2dname) as i2d:
        display_vals = [np.nanpercentile(i2d[1].data, 1), np.nanpercentile(i2d[1].data, 98)]
        ax_i2d.imshow(i2d[1].data, vmin=display_vals[0], vmax=display_vals[1], origin='lower', cmap='gist_gray_r')
    
    # plot up the source catalog
    xcentroid = cat_line['xcentroid'][0]
    ycentroid = cat_line['ycentroid'][0]
    ax.set_xlim(xcentroid-20, xcentroid+20)
    ax.set_ylim(ycentroid-20, ycentroid+20)
    if cat_line['is_extended'] is True:
        cat_color = 'deepskyblue'
        cat_marker = 'o'
    else:
        cat_color = 'deeppink'
        cat_marker = 's'
    ax.scatter(xcentroid, ycentroid, s=20, facecolors='None', edgecolors=cat_color, marker=cat_marker, alpha=0.9)
    ax.annotate(source_id, 
                (xcentroid+0.5, ycentroid+0.5), 
                fontsize=10,
                color=cat_color)
    
    return ax

In [None]:
# this function is used to plot the wavelength vs. flux values for x1d & c1d spectra for a specific source
def plot_spectrum(specfile, source_fluxes, ax, image3_dir, ext=1, legend=True):

    # trimming off some of the edges where the flux is unreliable
    plot_limits = {'F090W': {'wavemin': 0.85, 'wavemax': 0.9},
                   'F115W': {'wavemin': 0.9, 'wavemax': 1.25},
                   'F150W': {'wavemin': 1.35, 'wavemax': 1.65},
                   'F200W': {'wavemin': 1.75, 'wavemax': 2.2},
                   'F140M': {'wavemin': 1.25, 'wavemax': 1.5},
                   'F158M': {'wavemin': 1.45, 'wavemax': 1.65},
                   }

    with fits.open(specfile) as spec:

        # pull out relevant keywords
        grism = spec[0].header['FILTER']
        pupil = spec[0].header['PUPIL']
        catname = os.path.join(image3_dir, spec[0].header['SCATFILE'])
        try:
            label = f"{grism} dither {spec[0].header['DIT_PATT']}"
        except KeyError:
            label = f"{grism}" # there is no dither in the c1d files

        # find where in the file the source data are
        wh_spec_source = np.where(spec[ext].data['SOURCE_ID'] == source_id)[0]
        
        # if the source isn't in the file, then return a blank axis
        if not len(wh_spec_source):
            print(f'Source {source_id} not found in {specfile}')
            return ax, catname, source_fluxes, grism
                  
        # grab the wavelength & flux data and trim off the edges for visalization purposes
        wave = spec[ext].data['WAVELENGTH'][wh_spec_source]
        flux = spec[ext].data['FLUX'][wh_spec_source]
        
        wavemin = plot_limits[pupil]['wavemin']
        wavemax = plot_limits[pupil]['wavemax']
        wh_wave = np.where((wave >= wavemin) & (wave <= wavemax)) # cutting off the edges
        wave = wave[wh_wave]
        flux = flux[wh_wave]

        if len(flux[np.isnan(flux)]) == len(flux):
            print(f'There are no valid pixels for {os.path.basename(specfile)} source {source_id} {grism}. Source likely on edge of detector; not plotting')
        else:
            source_fluxes.extend(flux) # keep the flux to set the limits of the plot later
    
    if grism == 'GR150R':
        linestyle = '-'
    else:
        linestyle = '--'

    ax.plot(wave, flux, label=label, ls=linestyle)
    if legend:
        ax.legend(bbox_to_anchor=(1, 1))

    return ax, catname, source_fluxes, grism

In [None]:
# this function is used to plot the spec2 cal files for a specific source
def plot_spec2_cal(x1dfile, source_id, ax, transpose=False):

    cal_file = x1dfile.replace('x1d.fits', 'cal.fits')
    with fits.open(cal_file) as cal_hdu:
        wh_cal = find_source_ext(cal_hdu, source_id)

        # if the source isn't in the file, then return a blank axis
        if wh_cal == -999:
            print(f'Source {source_id} not found in {cal_file}')
            return ax
            
        if transpose is True:
            # we flip the GR150R data so that we can look at the two cal images along the same dispersion axis
            cal_data = np.transpose(cal_hdu[wh_cal].data)
        else:
            cal_data = cal_hdu[wh_cal].data

        cal_display_vals = [np.nanpercentile(cal_data, 5), np.nanpercentile(cal_data, 90)]        
        ax.imshow(cal_data, vmin=cal_display_vals[0], vmax=cal_display_vals[1], origin='lower', aspect='auto')

        # the dispersion is in the -x direction, so flip the axis for ease in visualization
        ax.invert_xaxis()
   
    return ax

#### Other convienence functions

In [None]:
# a function to use to find the extension the source is located in the cal files
def find_source_ext(cal_hdu, source_id, info=True):    
    # look for cal extension, too, but only in the SCI extension; 
    # fill in with a source ID of -999 for all other extensions to get the right extension value
    cal_source_ids = np.array([cal_hdu[ext].header['SOURCEID'] if cal_hdu[ext].header['EXTNAME'] == 'SCI'
                               else -999 for ext in range(len(cal_hdu))[1:-1]]) 

    try:
        wh_cal = np.where(cal_source_ids == source_id)[0][0] + 1 # need to add 1 for the primary header
    except IndexError:
        # this source doesn't exist
        return -999

    if info:
        print(f"Extension {wh_cal} in {cal_hdu[0].header['FILENAME']} contains the data for source {source_id} from our catalog")

    return wh_cal

In [None]:
# a function to quickly see all of the steps that were run on a specified file
def check_steps_run(filename):
    
    # Read in file as datamodel
    dm = datamodels.open(filename)
    
    # Check which steps were run
    print(f"{dm.meta.filename} - {dm.meta.exposure.type}")
    for step, status in dm.meta.cal_step.instance.items():
        print(f"{step}: {status}")
    print()

In [None]:
# a function to quickly see all of the reference files that were used on a specified file
def check_ref_file_used(filename):

    # Read in file as datamodel
    dm = datamodels.open(filename)

    # Check which reference files were used
    print(f"{dm.meta.filename} - {dm.meta.exposure.type}")
    for step, status in dm.meta.ref_file.instance.items():
        print(f"{step}: {status}")
    print()

In [None]:
# a function to find the closest source_id for a given RA/Dec
def find_closest_source_id(ra, dec, catalog_name):
    
    # open the source catalog
    cat = Table.read(catalog_name)
    
    # set up a skycoord object for the given RA/Dec
    c = SkyCoord(ra=ra*u.degree, dec=dec*u.degree)

    # Find the closest match to those coordinates in the source catalog
    nearest_id, distance_2d, distance_3d = c.match_to_catalog_sky(cat['sky_centroid']) 

    return cat['label'][nearest_id]

In [None]:
# Start a timer to keep track of runtime
time0 = time.perf_counter()

<hr style="border:1px solid gray"> </hr>

# 3. Directory Setup
------------------
Set up detailed paths to input/output stages here.

In [None]:
# Define output subdirectories to keep science data products organized
uncal_dir = os.path.join(sci_dir, 'uncal')  # Uncalibrated pipeline inputs should be here
det1_dir = os.path.join(sci_dir, 'stage1')  # calwebb_detector1 pipeline outputs will go here
image2_dir = os.path.join(sci_dir, 'stage2_img')  # calwebb_image2 pipeline outputs will go here
image3_dir = os.path.join(sci_dir, 'stage3_img')  # calwebb_image3 pipeline outputs will go here
spec2_dir = os.path.join(sci_dir, 'stage2_spec')  # calwebb_spec2 pipeline outputs will go here
spec3_dir = os.path.join(sci_dir, 'stage3_spec')  # calwebb_spec3 pipeline outputs will go here

# We need to check that the desired output directories exist, and if not create them
for cal_dir in [sci_dir, uncal_dir, det1_dir, image2_dir, image3_dir, spec2_dir, spec3_dir]:
    os.makedirs(cal_dir, exist_ok=True)

<hr style="border:1px solid gray"> </hr>

# 4. Demo Mode Setup (Data Download)
------------------

If running in demonstration mode, set up the program information to
retrieve the uncalibrated data automatically from MAST using
[astroquery](https://astroquery.readthedocs.io/en/latest/mast/mast.html).
MAST allows for flexibility of searching by the proposal ID and the
observation ID instead of just filenames.<br>

For illustrative purposes, we focus on data taken through the NIRISS [F200W filter](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-filters) and start with uncalibrated data products, or `_uncal` files. To search for additional filters, update the `filters` field in `query_criteria` to include the additional filter, i.e. \['F200W', 'F115W']. Note that if the observation does not have a level 3 product for any reason, the `obs_id` field in `query_criteria` will need to be changed to search on the string of the format "jw\*+program+sci_observtn+*".

Information about the JWST file naming conventions can be found at: https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/file_naming.html

Note -- if for some reason this section does not work, this is equivalet to downloading the `_uncal.fits` files from this MAST search:<br>
https://mast.stsci.edu/search/ui/#/jwst/results?instruments=NIRISS&program_id=2079&obs_id=004&custom_col_val_0=1b&custom_col_sel_1=niriss_pupil&custom_col_val_1=F200W&

<div class="alert alert-block alert-warning">
This demo selects only filter <b>F200W</b> data by default, but the demo program contains data for the F115W and F150W filters too
</div>

In [None]:
if dodownload:
    if demo_mode:
        # Obtain a list of observation IDs for the specified demo program
        sci_obs_id_table = Observations.query_criteria(instrument_name=["NIRISS/IMAGE", "NIRISS/WFSS"],
                                                       provenance_name=["CALJWST"],  # Executed observations
                                                       filters=['F200W'],  # Data for Specific Filter
                                                       obs_id=['jw*' + program + '-o' + sci_observtn + '*']
                                                       )
    
    else:
        sci_obs_id_table = Observations.query_criteria(instrument_name=["NIRISS/IMAGE", "NIRISS/WFSS"],
                                                       provenance_name=["CALJWST"],  # Executed observations
                                                       obs_id=['jw*' + program + '-o' + sci_observtn + '*']
                                                       )

In [None]:
# Turn the list of visits into a list of uncalibrated data files
if dodownload:
    # Define types of files to select
    file_dict = {'uncal': {'product_type': 'SCIENCE',
                           'productSubGroupDescription': 'UNCAL',
                           'calib_level': [1]}}

    batch_size = 5 # 5 files at a time maximizes the download speed.

    # This is necessary when there are many exposures in a program
    # split up our list of files into batches according to our batch size.
    obs_batches = [sci_obs_id_table[i:i+batch_size] for i in range(0, len(sci_obs_id_table), batch_size)]
    
    # Science files
    sci_files_to_download = []
    # Loop over visits identifying uncalibrated files that are associated
    # with them
    for exposure in (obs_batches):
        products = Observations.get_product_list(exposure)
        for filetype, query_dict in file_dict.items():
            filtered_products = Observations.filter_products(products, productType=query_dict['product_type'],
                                                             productSubGroupDescription=query_dict['productSubGroupDescription'],
                                                             calib_level=query_dict['calib_level'])
            sci_files_to_download.extend(filtered_products['dataURI'])
 
    sci_files_to_download = sorted(sci_files_to_download)
    sci_files_to_download = remove_duplicate_products(sci_files_to_download, 'filename')
    
    print(f"Science files selected for downloading: {len(sci_files_to_download)}")

Download all the uncal files and place them into the appropriate
directories.

<div class="alert alert-block alert-warning">
Warning: If this notebook is halted during this step the downloaded file
may be incomplete, and cause crashes later on!
</div>

In [None]:
if dodownload:
    for filename in sci_files_to_download:
        sci_manifest = Observations.download_file(filename,
                                                  local_path=os.path.join(uncal_dir, Path(filename).name))

In [None]:
# Print out the time benchmark
time_download_end = time.perf_counter()
print(f"Runtime for downloading data: {(time_download_end - time0)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

# 5. Detector1 Pipeline
------------------
In this section we run the `*_uncal.fits` files through the [Detector1](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_detector1) stage of the pipeline to apply detector level calibrations and create a countrate data product where slopes are fit to the integration ramps. These `*_rate.fits` products are 2D (nrows x ncols), averaged over all integrations. 3D countrate data products (`*_rateints.fits`) are also created (nintegrations x nrows x ncols) which have the fitted ramp slopes for each integration.

If there are no modifications to the steps at this stage needed, you can also save time by downloading these `*_rate.fits` files directly from MAST and starting at stage2. However, it is best to ensure that you are using the same pipeline version as MAST which can be checked in the `CAL_VER` header keyword. 

The parameters in each of the [Detector1 steps](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_detector1.html#calwebb-detector1) can be modified from the default values, including overwriting reference files that are used. This dictionary of the modified parameters for each of the steps is then fed into the `steps` parameter of the `Detector1Pipeline` call.

In [None]:
time_det1_start = time.perf_counter()

In [None]:
# Set up a dictionary to define how the Detector1 pipeline should be configured

# this sets up any entry to det1dict to be a dictionary itself
det1dict = defaultdict(dict)

# ---------------------------Override reference files---------------------------
# Example overrides for various reference files
#   Files should be in the base local directory or provide full path
# det1dict['dq_init']['override_mask'] = 'myfile.fits' # Bad pixel mask
# det1dict['saturation']['override_saturation'] = 'myfile.fits' # Saturation
# det1dict['linearity']['override_linearity'] = 'myfile.fits' # Linearity
# det1dict['dark_current']['override_dark'] = 'myfile.fits' # Dark current subtraction
# det1dict['jump']['override_gain'] = 'myfile.fits' # Gain used by jump step
# det1dict['ramp_fit']['override_gain'] = 'myfile.fits' # Gain used by ramp fitting step
# det1dict['jump']['override_readnoise'] = 'myfile.fits' # Read noise used by jump step
# det1dict['ramp_fit']['override_readnoise'] = 'myfile.fits' # Read noise used by ramp fitting step

# -----------------------------Set step parameters------------------------------
# Example overrides for whether or not certain steps should be skipped;
# det1dict['persistence']['skip'] = True # skipping the persistence step

# Example of turning on multi-core processing for the jump step (a single core is used by default).
#   Choose what fraction of cores to use (quarter, half, all, or the default=1); This will speed up the calibration time
# det1dict['jump']['maximum_cores'] = 'half'

# Example of altering parameters to optimize removal of snowball residuals
# det1dict['jump']['expand_large_events'] = True
# det1dict['charge_migration']['signal_threshold'] = X

In [None]:
uncal_files = sorted(glob.glob(os.path.join(uncal_dir, '*_uncal.fits')))

# Run Detector1 stage of pipeline, specifying:
#   output directory to save *_rateints.fits files
#   save_results flag set to True so the files are saved locally
if dodet1:
    for uncal in uncal_files:
        rate_result = Detector1Pipeline.call(uncal, output_dir=det1_dir, steps=det1dict, save_results=True)
else:
    print('Skipping Detector1 processing')

### Inspect Detector1 Output Products
In the Detector1 stage, both the direct images (`EXP_TYPE=NIS_IMAGE`) and dispersed grism images (`EXP_TYPE=NIS_WFSS`) are calibrated. In addition to the `EXP_TYPE` keyword, the keyword `FILTER` can be used to distinguish exposure types for NIRISS WFSS data. `FILTER=CLEAR` indicates a direct image while `FILTER=GR150R` or `FILTER=GR150C` indicates a dispersed image. The keyword `PUPIL` is the blocking filter used in both direct images and dispersed images. We can also use the `PATT_NUM`, `XOFFSET`, and `YOFFSET` header keywords to see the dither pattern that was used for both the direct images and the dispersed images. The multiple direct image dithers will be combined in image3, while the multiple dithers in the dispersed images are combined as individual sources after extraction in spec3. 

In [None]:
# Print information about each rate file
rate_files = sorted(glob.glob(os.path.join(det1_dir, "*rate.fits")))

for file_num, ratefile in enumerate(rate_files):
    rate_hdr = fits.getheader(ratefile) # Primary header for each rate file
    
    # information we want to store that might be useful to us later for evaluating the data
    temp_hdr_dict = {"PATHNAME": os.path.abspath(ratefile), # full path to the filename to be used later
                     "FILENAME": rate_hdr['FILENAME'], # base filename for printing readability
                     "EXP_TYPE": [rate_hdr['EXP_TYPE']], # NIS_IMAGE or NIS_WFSS
                     "FILTER": [rate_hdr["FILTER"]], # Grism; GR150R/GR150C
                     "PUPIL": [rate_hdr["PUPIL"]], # Filter used; F090W, F115W, F140M, F150W F158M, F200W
                     "EXPSTART": [rate_hdr['EXPSTART']], # Exposure start time (MJD)
                     "PATT_NUM": [rate_hdr["PATT_NUM"]], # Position number within dither pattern for WFSS
                     "NUMDTHPT": [rate_hdr["NUMDTHPT"]], # Total number of points in entire dither pattern
                     "XOFFSET": [rate_hdr["XOFFSET"]], # X offset from pattern starting position for NIRISS (arcsec)
                     "YOFFSET": [rate_hdr["YOFFSET"]], # Y offset from pattern starting position for NIRISS (arcsec)
                     "CAL_VER": [rate_hdr["CAL_VER"]], # JWST pipeline calibration version
                     }

    # Turn the dictionary into a pandas dataframe to make it easier to read & use later
    if file_num == 0:
        # if this is the first file, make an initial dataframe
        rate_df = pd.DataFrame(temp_hdr_dict)
    else:
        # otherwise, append to the dataframe for each file
        new_data_df = pd.DataFrame(temp_hdr_dict)
        # merge the two dataframes together to create a dataframe with all 
        rate_df = pd.concat([rate_df, new_data_df], ignore_index=True, axis=0)

rate_dfsort = rate_df.sort_values('EXPSTART', ignore_index=False) # sort by exposure start time

# Look at the resulting dataframe
rate_dfsort[['FILENAME', 'EXP_TYPE', 'FILTER', 'PUPIL', 'EXPSTART', 'PATT_NUM', 'NUMDTHPT', 'XOFFSET', 'YOFFSET', 'CAL_VER']]

Shown below are the rate files to give an idea of the above sequence visually. Grid lines are shown as a visual guide for the dithers

In [None]:
# Quick plot to visually illustrate the table above showing the
#   direct image and grism sequence for the downloaded data
if doviz:
    # plot set up
    fig = plt.figure(figsize=(20, 35))
    cols = 3
    rows = int(np.ceil(len(rate_dfsort['PATHNAME']) / cols))
    
    # loop over the rate files and plot them
    for plt_num, rf in enumerate(rate_dfsort['PATHNAME']):
    
        # determine where the subplot should be
        xpos = (plt_num % 40) % cols
        ypos = ((plt_num % 40) // cols) # // to make it an int.
    
        # make the subplot
        ax = plt.subplot2grid((rows, cols), (ypos, xpos))
    
        # open the data and plot it
        with fits.open(rf) as hdu:
            data = hdu[1].data
            data[np.isnan(data)] = 0 # filling in nan data with 0s to help with the matplotlib color scale.
            
            display_vals = [np.nanpercentile(data, 1), np.nanpercentile(data, 99.5)]
            ax.imshow(data, vmin=display_vals[0], vmax=display_vals[1], origin='lower')
    
            # adding in grid lines as a visual aid
            for gridline in [500, 1000, 1500]:
                ax.axhline(gridline, color='black', alpha=0.5)
                ax.axvline(gridline, color='black', alpha=0.5)
            
            ax.set_title(f"#{plt_num+1}: {hdu[0].header['EXP_TYPE']} {hdu[0].header['FILTER']} {hdu[0].header['PUPIL']} Dither{hdu[0].header['PATT_NUM']}")
            
    fig.suptitle(f'PID{program} o{sci_observtn} Observing Sequence rate Images (pixel space)', fontsize=16, x=0.5, y=0.9)

Additionally, you can look into what steps were performed and reference files used during the Detector1 stage of the pipeline. These calls can be used at any stage of the pipeline to see or confirm what different steps or reference files were used. We show both the direct image and the dispersed (grism) images below.

In [None]:
# first look at the direct images
dir_img_rate = rate_dfsort[rate_dfsort['EXP_TYPE'] == 'NIS_IMAGE']['PATHNAME'].iloc[0]
check_steps_run(dir_img_rate)

# then look at the dispersed, grism images
grism_img_rate = rate_dfsort[rate_dfsort['EXP_TYPE'] == 'NIS_WFSS']['PATHNAME'].iloc[0]
check_steps_run(grism_img_rate)

In [None]:
check_ref_file_used(dir_img_rate) # direct image
check_ref_file_used(grism_img_rate) # dispersed image

In [None]:
# Print out the time benchmark
time_det1_end = time.perf_counter()
print(f"Runtime for Detector1: {(time_det1_end - time_det1_start)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

# 6. Image2 Pipeline
------------------

This section focuses on calibrating only the direct images in order to obtain a source catalog and segmentation mapping of the field to use as input into the Spec2 stage later. 

In the [Image2 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image2.html), calibrated unrectified data products are created (`*_cal.fits` files). In this pipeline processing stage, the [world coordinate system (WCS)](https://jwst-pipeline.readthedocs.io/en/latest/jwst/assign_wcs/index.html#assign-wcs-step) is assigned, the data are [flat fielded](https://jwst-pipeline.readthedocs.io/en/latest/jwst/flatfield/index.html#flatfield-step), and a [photometric calibration](https://jwst-pipeline.readthedocs.io/en/latest/jwst/photom/index.html#photom-step) is applied to convert from units of countrate (ADU/s) to surface brightness (MJy/sr).

By default, the [background subtraction step](https://jwst-pipeline.readthedocs.io/en/latest/jwst/background_subtraction/index.html) and the [resampling step](https://jwst-pipeline.readthedocs.io/en/latest/jwst/resample/index.html#resample-step) are not performed for NIRISS at this stage of the pipeline. The background subtraction is turned off since there is no background template for the imaging mode and the local background is removed during the background correction for photometric measurements around individual sources. The resampling step occurs during the Image3 stage by default. While the resampling step can be turned on during the Image2 stage to, e.g., generate a source catalog for each image, the data quality from the Image3 stage will be better since the bad pixels, which adversely affect
both the centroids and photometry in individual images, will be mostly removed.

For NIRISS imaging, it is equivalent to run the Image2 pipeline directly on the imaging rate files versus on the Image2 association files. Therefore, here we will simply use the dataframe table we set up in the Detector1 stage to filter on the imaging rate files and calibrate those directly rather than calibrating with the association files. To use the association files, simply replace the rate filename in the call with the association filename.

In [None]:
time_image2 = time.perf_counter()

The parameters in each of the [Image2 steps](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image2.html) can be modified from the default values, including overwriting reference files that are used during this stage. This dictionary of the modified parameters for each of the steps is then fed into the `steps` parameter of the `Image2Pipeline` call. The syntax for modifying some of these parameters is below.

In [None]:
# Set up a dictionary to define how the Image2 pipeline should be configured.

# this sets up any entry to image2dict to be a dictionary itself
image2dict = defaultdict(dict)

# -----------------------------Set step parameters------------------------------
# Example overrides for whether or not certain steps should be skipped
# image2dict['resample']['skip'] = False

# ---------------------------Override reference files---------------------------
# Example overrides for various reference files
#   Files should be in the base local directory or provide full path
# image2dict['assign_wcs']['override_distortion'] = 'myfile.asdf'  # Spatial distortion (ASDF file)
# image2dict['assign_wcs']['override_filteroffset'] = 'myfile.asdf'  # Imager filter offsets (ASDF file)
# image2dict['assign_wcs']['override_specwcs'] = 'myfile.asdf'  # Spectral distortion (ASDF file)
# image2dict['assign_wcs']['override_wavelengthrange'] = 'myfile.asdf'  # Wavelength channel mapping (ASDF file)
# image2dict['flat_field']['override_flat'] = 'myfile.fits'  # Pixel flatfield
# image2dict['photom']['override_photom'] = 'myfile.fits'  # Photometric calibration array

In [None]:
img_rate_files = rate_dfsort[rate_dfsort['EXP_TYPE'] == 'NIS_IMAGE']['PATHNAME']

print(f'Found {str(len(img_rate_files))} imaging rate files to process for level 2')

In [None]:
# Run Image2 stage of pipeline, specifying:
# output directory to save *_cal.fits files
# save_results flag set to True so the rate files are saved

if doimage2:
    for rate in img_rate_files:
        img2 = Image2Pipeline.call(rate, output_dir=image2_dir, steps=image2dict, save_results=True)
else:
    print("Skipping Image2 processing.")

In [None]:
# Print out the time benchmark
time_image2_end = time.perf_counter()
print(f"Runtime for Image2: {(time_image2_end - time_image2):0.0f} seconds")

<hr style="border:1px solid gray"> </hr>

# 7. Image3 Pipeline
------------------

In this section we continue calibrating the direct images with the Image3 stage of the pipeline to obtain a source catalog and segmentation mapping of the field to use as input into the Spec2 stage later. In the [Image3 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image3.html), the individual `*_cal.fits` files for each of the dither positions are combined to one single distortion corrected image (`*_i2d.fits` files).

By default, the Image3 stage of the pipeline performs the following steps on NIRISS data:
* [tweakreg](https://jwst-pipeline.readthedocs.io/en/latest/jwst/tweakreg/README.html) - creates source catalogs of pointlike sources for each input image. The source catalog for each input image is compared to each other to derive coordinate transforms to align the images relative to each other.
* As of CRDS context jwst_1156.pmap and later, the pars-tweakreg parameter reference file for NIRISS performs an absolute astrometric correction to GAIA data release 3 by default (i.e., the abs_refcat parameter is set to GAIADR3). Though this default correction generally improves results compared with not doing this alignment, it could potentially result in poor performance in crowded or sparse fields, so users are encouraged to check astrometric accuracy and revisit this step if necessary.
* As of pipeline version 1.14.0, the default source finding algorithm for NIRISS is IRAFStarFinder which testing shows returns good accuracy for undersampled NIRISS PSFs at short wavelengths ([Goudfrooij 2022](https://www.stsci.edu/files/live/sites/www/files/home/jwst/documentation/technical-documents/_documents/JWST-STScI-008324.pdf)).
* [skymatch](https://jwst-pipeline.readthedocs.io/en/latest/jwst/skymatch/description.html) - measures the background level from the sky to use as input into the subsequent outlier detection and resample steps.
* outlier detection - flags any remaining cosmic rays, bad pixels, or other artifacts not already flagged during the detector1 stage of the pipeline, using all input images to create a median image so that outliers in individual images can be identified.
* [resample](https://jwst-pipeline.readthedocs.io/en/latest/jwst/resample/main.html) - resamples each input image based on its WCS and distortion information and creates a single undistorted image.
* [source catalog](https://jwst-pipeline.readthedocs.io/en/latest/jwst/source_catalog/main.html) - creates a catalog of detected sources along with measured photometries and morphologies (i.e., point-like vs extended). Useful for quicklooks, but optimization is likely needed for specific science cases, which is an on-going investigation for the NIRISS team. Users may wish to experiment with changing the snr_threshold and deblend options. Modifications to the following parameters will not significantly improve data quality and it is advised to keep them at their default values: aperture_ee1, aperture_ee2, aperture_ee3, ci1_star_threshold, ci2_star_threshold.

In [None]:
time_image3 = time.perf_counter()

Find and sort all of the input image2 cal files, ensuring use of absolute paths

In [None]:
# Science Files need the cal.fits files
sstring = os.path.join(image2_dir, 'jw*cal.fits')
img3_cal_files = sorted(glob.glob(sstring))
for ii, cal_relpath in enumerate(img3_cal_files):
    img3_cal_files[ii] = os.path.abspath(cal_relpath)
img3_cal_files = np.array(img3_cal_files)

print(f'Found {str(len(img3_cal_files))} imaging cal files to process for level 3')

### Create Image3 Association Files

An association file lists the exposures to calibrated together in the Image3 stage of the pipeline. Note that an association file is available for download from MAST, with a filename of `*image3_asn.json`. Additionally, you can download the `_pool.csv` file for a specific observation and create associations directly from the pool file using the [asn_generate](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/asn_generate.html) function with the latest version of the pipeline. In both of these cases, the pipeline is expecting the files being calibrated to exist in the same directory that the association is in. Below, we show how to create an image3 association file by providing a [list of exposures](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/asn_from_list.html) that we have processed through the pipeline and saved in separate directories. Also note that the output products will have a rootname that is specified by the `product_name` in the association file. For this tutorial, the rootname of the output products will end with `image3_asn.json`.

In [None]:
# Create Level 3 Associations for each pupil (blocking filter) type
if doimage3:

    # Parameters to be used for the NIRISS imaging 3 association creation
    img3_pid = str(program) # associations are only set up to combine for the same program & observation
    img3_obs = str(sci_observtn) # associations are only set up to combine for the same program & observation
    img3_filt = 'CLEAR' # For imaging mode, the second filter wheel is set to clear
    img3_ins = 'NIRISS'

    # Identify the unique filters used (keyword=PUPIL) for the NIRISS images
    img3_all_pupils = np.array([fits.getval(cf, 'PUPIL') for cf in img3_cal_files])
    img3_uniq_pupils = np.unique(img3_all_pupils)
    
    # Loop over unique pupil values
    for img3_pupil in img3_uniq_pupils:
        img3_indx = np.where(img3_all_pupils == img3_pupil)[0]
        img3_pupil_files = img3_cal_files[img3_indx]

        # setting up the association filename to match the default pipeline level3 naming output
        img3_product_name = f"jw{img3_pid}-o{img3_obs}_{img3_ins}_{img3_filt}-{img3_pupil}".lower()
        img3_asn_filename = img3_product_name + '_image3_asn.json'
    
        img3_association = asn_from_list.asn_from_list(img3_pupil_files, rule=DMS_Level3_Base,
                                                       product_name=img3_product_name)
    
        img3_association.data['asn_type'] = 'image3'
        img3_association.data['program'] = img3_pid
    
        # Format association as .json file
        _, serialized = img3_association.dump(format="json")

        # Write out association file
        img3_asn_pathname = os.path.join(sci_dir, img3_asn_filename)
        with open(img3_asn_pathname, "w") as fd:
            fd.write(serialized)
        print(f'Writing image3 association: {img3_asn_pathname}')

Take a quick look at the contents of the first image3 association file to get a feel for what is being associated

In [None]:
if doimage3:
    image3_asns = glob.glob(os.path.join(sci_dir, "*image3*_asn.json"))
    
    # open the image3 association to look at
    image3_asn_data = json.load(open(image3_asns[0]))
    print(f'asn_type : {image3_asn_data["asn_type"]}')
    print(f'code_version : {image3_asn_data["code_version"]}')
    
    # in particular, take a closer look at the product filenames with the association file:
    for product in image3_asn_data['products']:
        for key, value in product.items():
            if key == 'members':
                print(f"{key}:")
                for member in value:
                    print(f"    {member['expname']} {member['exptype']}")
            else:
                print(f"{key}: {value}")

### Run Image3

In Image3, the `*_cal.fits` individual pointing files will be calibrated into a single combined `*_i2d.fits` image. The parameters in each of the [Image3 steps](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image3.html) can be modified from the default values, including overwriting reference files that are used during this stage. This dictionary of the modified parameters for each of the steps is then fed into the `steps` parameter of the `Image3Pipeline` call. The syntax for modifying some of these parameters is below; the full list of parameters can be found in the [tweakreg](https://jwst-pipeline.readthedocs.io/en/latest/jwst/tweakreg/README.html) and [sourcecatalog](https://jwst-pipeline.readthedocs.io/en/latest/jwst/source_catalog/main.html) documentation.


In [None]:
# Set up a dictionary to define how the Image3 pipeline should be configured

# this sets up any entry to image3dict to be a dictionary itself
image3dict = defaultdict(dict)

# -----------------------------Set step parameters------------------------------
# Example overrides for whether or not certain steps should be skipped
#   Some of these example values differ from default values to improve the demo scene
# image3dict['outlier_detection']['skip'] = True

# Example parameters for the source_catalog step
# image3dict['source_catalog']['kernel_fwhm'] = 5.0
# image3dict['source_catalog']['snr_threshold'] = 10.0
# image3dict['source_catalog']['npixels'] = 50
# image3dict['source_catalog']['deblend'] = True

# Example parameters for the tweakreg step
# image3dict['tweakreg']['snr_threshold'] = 20
# image3dict['tweakreg']['abs_refcat'] = 'GAIADR3'
# image3dict['tweakreg']['searchrad'] = 3.0,
# image3dict['tweakreg']['kernel_fwhm'] = 2.302
# image3dict['tweakreg']['fitgeometry'] = 'shift'

# ---------------------------Override reference files---------------------------
# Example overrides for various reference files
#   Files should be in the base local directory or provide full path
# image3dict['source_catalog']['override_apcorr'] = 'myfile.fits'  # Aperture correction parameters
# image3dict['source_catalog']['override_abvegaoffset'] = 'myfile.asdf'  # Data to convert from AB to Vega magnitudes (ASDF file)

In [None]:
# Run Image3
if doimage3:
    asn_files = np.sort(glob.glob(os.path.join(sci_dir, '*image3_asn.json')))
    for asn in asn_files:
        img3 = Image3Pipeline.call(asn, output_dir=image3_dir, steps=image3dict, save_results=True)
else:
    print('Skipping Image3 processing')

In [None]:
# Print out the time benchmark
time_image3_end = time.perf_counter()
print(f"Runtime for Image3: {(time_image3_end - time_image3)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

# 8. Visualize Image3 Output Products
------------------

Using the combined image (`*_i2d.fits` ), the segmentation map files (`*_segm.fits`), and the source catalog  (`*cat.ecsv`) produced by the Image3 stage of the pipeline, we can visually inspect if we agree with where the sources were found to use further in the Spec2 stage of the pipeline.

In [None]:
# Find the outputs of the Image3 pipeline, which will be needed for processing the spectral data
# Print which outputs were found for reference

# Combined image over multiple dithers/mosaic
image3_i2d = np.sort(glob.glob(os.path.join(image3_dir, '*i2d.fits')))
print('Direct images:')
for i2d_filename in image3_i2d:
    print(f"  {os.path.basename(i2d_filename)}")

# Segmentation map that defines the extent of a source
image3_segm = np.sort(glob.glob(os.path.join(image3_dir, '*segm.fits')))
print('Segmentation maps:')
for seg_filename in image3_segm:
    print(f"  {os.path.basename(seg_filename)}")
    
# Source catalog that defines the RA/Dec of a source at a particular pixel
image3_cat = np.sort(glob.glob(os.path.join(image3_dir, '*cat.ecsv')))
print('Source catalogs:')
for cat_filename in image3_cat:
    print(f"  {os.path.basename(cat_filename)}")

### i2d & segementation mapping

The segmentation maps are used the help determine the source catalog. Let's take a look at those to ensure we agree with what is being defined as a source. In the following figures, the combined image is shown on the left and the the segmentation map is shown on the right, where each black blob in the segmentation map should correspond to a physical target. The sources identified in the source catalog are overlayed on top of both of these, where what has been defined as an extended source by the pipeline is shown as a blue circle, and what has been defined as a point source by the pipeline is shown as a pink square. This definition affects the extraction box in the WFSS images as well as in the contamination correction step of the pipeline, so it is important to get correct.

There are cases where sources can be blended, in which case the parameters for making the segmentation map and source catalog should be modified. If using the demo data, an example of this can be seen in the Observation 004 F200W filter image where two galaxies at ~(1600, 1300) have been blended into one source. This is discussed in more detail in the custom Image3 run in the [NIRISS WFSS JDAT notebooks](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

In [None]:
if doviz:            
    cols = 2
    rows = len(image3_i2d)
    
    fig = plt.figure(figsize=(15, 15*(rows/2)))
    
    for plt_num, img in enumerate(np.sort(np.concatenate([image3_segm, image3_i2d]))):
    
        # determine where the subplot should be
        xpos = (plt_num % 40) % cols
        ypos = ((plt_num % 40) // cols) # // to make it an int.
    
        # make the subplot
        ax = plt.subplot2grid((rows, cols), (ypos, xpos))
    
        if 'i2d' in img:
            cat = Table.read(img.replace('i2d.fits', 'cat.ecsv'))
            cmap = 'gist_gray_r'
        else:
            cmap = 'gist_gray_r'
            
        # plot the image
        with fits.open(img) as hdu:
            display_vals = [np.nanpercentile(hdu[1].data, 1), np.nanpercentile(hdu[1].data, 99)]
            ax.imshow(hdu[1].data, vmin=display_vals[0], vmax=display_vals[1], origin='lower', cmap=cmap)
            title = f"{hdu[0].header['PUPIL']}"
    
        # also plot the associated catalog
        extended_sources = cat[cat['is_extended'] == 1] # 1 is True; i.e. is extended
        point_sources = cat[cat['is_extended'] == 0] # 0 is False; i.e. is a point source

        for color, sources, source_type, marker in zip(['deepskyblue', 'deeppink'], [extended_sources, point_sources], ['Extended Source', 'Point Source'], ['o', 's']):
            # plotting the sources
            ax.scatter(sources['xcentroid'], sources['ycentroid'], marker=marker, s=150, facecolors='None', edgecolors=color, alpha=0.9)
    
            # adding source labels 
            for i, source_num in enumerate(sources['label']):
                ax.annotate(source_num, 
                            (sources['xcentroid'][i]+1, sources['ycentroid'][i]+1), 
                            fontsize=10,
                            color=color)
            ax.scatter(-999, -999, marker=marker, label=source_type, s=150, facecolors='None', edgecolors=color, alpha=0.9)
            
        # setting titles
        if 'i2d' in img:
            ax.set_title(f"{title} combined image\n(i2d)", fontsize=16)
        else:
            ax.set_title(f"{title} segmentation map\n(segm)", fontsize=16)
        
        # zooming in on a smaller region
        ax.set_xlim(1250, 1750)
        ax.set_ylim(1250, 1750)

        ax.legend(framealpha=0.6, fontsize=14, loc='upper left')

    # more labels 
    fig.supxlabel('x-pixel', fontsize=14)
    fig.supylabel('y-pixel', fontsize=14, x=0)
    
    # Helps to make the axes not overlap ; you can also set this manually if this doesn't work
    plt.tight_layout()

In addition to the segmentation mapping, the source catalog itself can be useful to look at to examine the source centroids, calculated fluxes, and source extents

In [None]:
# Print a source catalogs to illustrate the contents
cat = Table.read(image3_cat[0])
cat

In all likelihood, you will need to rerun Image3 with different parameters in order to return an optimal source catalog to use with your NIRISS WFSS data. You can additionally refine the source catalog so that Spec2 and Spec3 only run on the sources that you care most about. Some examples of this can be found in the [NIRISS WFSS JDAT notebooks](https://github.com/spacetelescope/jdat_notebooks/tree/main/notebooks/NIRISS/NIRISS_WFSS_advanced).

<hr style="border:1px solid gray"> </hr>

# 9. Spec2 Pipeline
------------------
After running Image3 and thus getting the the segmentation map and source catalog, the [Spec2 pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec2.html#calwebb-spec2) is ready to be run. The spec2 pipeline first runs [assign_wcs](https://jwst-pipeline.readthedocs.io/en/latest/jwst/assign_wcs/main.html), [background](https://jwst-pipeline.readthedocs.io/en/latest/jwst/background_subtraction/description.html), and [flat_field](https://jwst-pipeline.readthedocs.io/en/latest/jwst/flatfield/main.html) corrections first on the full-frame `*_rate.fits` files. The [srctype](https://jwst-pipeline.readthedocs.io/en/latest/jwst/srctype/description.html) step is run to determine the extent of the extraction box size before the [extract_2d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_2d/main.html) step is run, producing individual cutouts for the brightest 100 sources defined in the Image3 source catalog. The [wfss_contam](https://jwst-pipeline.readthedocs.io/en/latest/jwst/wfss_contam/description.html) step is run towards the end of the [extract_2d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_2d/main.html) step and is currently not on by default as the step is being improved. The [photom](https://jwst-pipeline.readthedocs.io/en/latest/jwst/photom/main.html) step is then run on the cutouts, producing flux calibrated 2-D spectral (`*_cal.fits`) files. The [extract_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_1d/description.html) step is run last, producing level 2 `*_x1d.fits` files.

In [None]:
time_spec2 = time.perf_counter()

### Create Spec2 Association File

As with the imaging part of the pipeline, there are association files for spec2. These are a bit more complex in that they need to have the science (WFSS) data, direct image, source catalog, and segmentation map included as members. For the science data, the rate files are used as inputs, similar to Image2. Also like Image2, there should be one association file for each dispersed image dither position in an observing sequence.

Like Image3, we are creating a spec2 association file manually by providing a [list of exposures](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/asn_from_list.html) that we have processed through the pipeline and saved in separate directories rather than downloading directly from MAST or using the [pool files](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/asn_generate.html). Note that the output products will have a rootname that is specified by the `product_name` in the association file. For this tutorial, the rootname of the output products will end with `_spec2_asn.json`.

In [None]:
def write_spec2asn(grismfile, dimagefiles, catalogfiles, segmfiles, prodname):
    
    # Define the basic association of science files
    asn = asn_from_list.asn_from_list([grismfile], rule=DMSLevel2bBase, product_name=prodname)  # Wrap in list since input is single exposure

    # Which pupil element (blocking filter) does the dispersed image use?
    grism_pupil = fits.getval(grismfile, 'PUPIL')
    grism_pid = fits.getval(grismfile, 'PROGRAM')
    grism_obs = fits.getval(grismfile, 'OBSERVTN')

    # Find the direct images with the same matching program, observation, and pupil to use
    dir_img_match = []
    for dir_img in dimagefiles:
        img_pid = fits.getval(dir_img, 'PROGRAM')
        img_obs = fits.getval(dir_img, 'OBSERVTN')
        img_pupil = fits.getval(dir_img, 'PUPIL')

        if img_pupil == grism_pupil and img_pid == grism_pid and img_obs == grism_obs:
            dir_img_match.append(dir_img)

    # ensure that there is only one match found for the grism image
    if len(dir_img_match) == 0:
        raise ValueError(f'Could not find a matching i2d image for {scifile}. Please ensure that you have processed the appropriate data for {grism_pupil} PID {grism_pid} o{grism_obs}')
    elif len(dir_img_match) > 1:
        raise ValueError(f'Multiple i2ds found matching: {grism_pupil} PID {grism_pid} o{grism_obs}. Please download the associations directly from MAST to proceed further.')
    else:
        # there should only be one match per filter/program/observation combination, so grab that one
        dir_img_match = dir_img_match[0]
        
    # There should be a set of i2d, segm, and cat that have the same rootname, so we will just replace the filetype suffix
    dir_seg_match = dir_img_match.replace('_i2d.fits', '_segm.fits')
    dir_cat_match = dir_img_match.replace('_i2d.fits', '_cat.ecsv')
        
    # Add the direct image, catalog, and segmentation files
    asn['products'][0]['members'].append({'expname': dir_img_match, 'exptype': 'direct_image'})
    asn['products'][0]['members'].append({'expname': dir_cat_match, 'exptype': 'sourcecat'})
    asn['products'][0]['members'].append({'expname': dir_seg_match, 'exptype': 'segmap'})
    
    spec2_asnfile = os.path.join(sci_dir, os.path.basename(grismfile).replace('rate.fits', 'spec2_asn.json'))

    # Write the association to a json file
    _, serialized = asn.dump()
    with open(spec2_asnfile, 'w') as outfile:
        outfile.write(serialized)

    print(f'Writing spec2 association: {spec2_asnfile}')
    return spec2_asnfile

In [None]:
# find the rate files using our dataframe table that we created in the detector1 stage of the notebook
grism_rate_files = rate_dfsort[rate_dfsort['EXP_TYPE'] == 'NIS_WFSS']['PATHNAME']

print(f'Found {str(len(grism_rate_files))} grism rate files to process for level 2')

# use the rate files and image3 output products to define spec2 association files
if dospec2:
    for file in grism_rate_files:
        asnfile = write_spec2asn(file, image3_i2d, image3_cat, image3_segm, 'Level2')

Take a quick look at the contents of an example spec2 association file to get a feel for what is being associated

In [None]:
if dospec2:
    spec2_asns = glob.glob(os.path.join(sci_dir, "*spec2*_asn.json"))
    
    # look at one of the association files
    asn_data = json.load(open(spec2_asns[0]))
    print(f'asn_type : {asn_data["asn_type"]}')
    print(f'code_version : {asn_data["code_version"]}')
    
    # in particular, take a closer look at the product filenames with the association file:
    for product in asn_data['products']:
        for key, value in product.items():
            if key == 'members':
                print(f"{key}:")
                for member in value:
                    print(f"    {member['expname']} : {member['exptype']}")
            else:
                print(f"{key}: {value}")

### Run Spec2

In Spec2, the `*_rate.fits` files run through various corrections before using the source catalog to extract the 100 brightest sources by default into 1-D spectra (level 2 `*_x1d.fits` files). The parameters in each of the [Spec2 steps](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec2.html) can be modified from the default values, including overwriting reference files that are used during this stage and saving additional files. This dictionary of the modified parameters for each of the steps is then fed into the `steps` parameter of the `Spec2Pipeline` call. The syntax for modifying some of these parameters is below. In particular, we show the option of only extracting the 10 brightest sources by setting `wfss_nbright`, as well as showing several options related to the contamination step, including turning on the step, saving the simulated images, and using additional cores to reprocess more quickly. There are several known bugs still with this stage as of pipeline version 1.19.1, so we caution use of this step currently. We also show how to save the background subtracted full-frame file as an intermediate product (*_bsub.fits). These background products are expected to be a default output in an upcoming pipeline build.

In [None]:
# Set up a dictionary to define how the Spec2 pipeline should be configured.

# this sets up any entry to spec2dict to be a dictionary itself
spec2dict = defaultdict(dict)

# ---------------------------Override reference files---------------------------
# Overrides for various reference files (example).
#   Files should be in the base local directory or provide full path.
# spec2dict['assign_wcs']['override_distortion'] = 'myfile.asdf'  # Spatial distortion (ASDF file)
# spec2dict['assign_wcs']['override_filteroffset'] = 'myfile.asdf'  # Imager filter offsets (ASDF file)
# spec2dict['assign_wcs']['override_specwcs'] = 'myfile.asdf'  # Spectral distortion (ASDF file)
# spec2dict['assign_wcs']['override_wavelengthrange'] = 'myfile.asdf'  # Wavelength channel mapping (ASDF file)
# spec2dict['bkg_subtract']['override_bkg'] = 'myfile.fits' # WFSS Background subtraction
# spec2dict['extract_2d']['override_wavelengthrange'] = 'myfile.asdf'  # Wavelength channel mapping (ASDF file)
# spec2dict['flat_field']['override_flat'] = 'myfile.fits'  # Pixel flatfield
# spec2dict['wfss_contam']['override_wavelengthrange'] = 'myfile.asdf'  # Wavelength channel mapping (ASDF file)
# spec2dict['wfss_contam']['override_photom'] = 'myfile.fits'  # Photometric calibration array
# spec2dict['photom']['override_photom'] = 'myfile.fits'  # Photometric calibration array

# -----------------------------Set step parameters------------------------------
# Overrides for whether or not certain steps should be skipped (example).
# spec2dict['bkg_subtract']['skip'] = True # don't perform the background subtraction
# spec2dict['bkg_subtract']['save_results'] = True # save background subtracted full-frame images
# spec2dict['flat_field']['save_results'] = True # save the background subtracted, flat-field corrected, full-frame images
# spec2dict['extract_2d']['wfss_nbright'] = 10 # only extract the 10 brightest sources
# spec2dict['wfss_contam']['skip'] = False # uncomment to turn on contamination correction
# spec2dict['wfss_contam']['save_simulated_image'] = True # save the simulated images produced by the pipeline
# spec2dict['wfss_contam']['maximum_cores'] = 'half' # (quarter, half, all, or the default=1); This will speed up the calibration time

In [None]:
if dospec2:
    for spec2_asn in spec2_asns:
        os.chdir(image3_dir) # This is necessary since the pipeline looks in the current directory for the catalog
        spec2 = Spec2Pipeline.call(spec2_asn, steps=spec2dict, save_results=True, output_dir=spec2_dir)
        os.chdir(cwd) # change back into your original directory
else:
    print('Skipping Spec2 processing for SCI data')

In [None]:
# Print out the time benchmark
time_spec2_end = time.perf_counter()
print(f"Runtime for Spec2: {(time_spec2_end - time_spec2)/60:0.0f} minutes")

### Visualize Spec2 Outputs

In NIRISS WFSS data there are many sources of interest to look at. In this visualization we look at, for five selected sources, the source as it appears in the i2d image, two example grism `*_cal.fits` 2-D spectral cutouts (if available, otherwise they may appear blank), and the level 2 `*_x1d.fits` 1-D extracted spectra for all grism dithers where available. With the contamination step currently turned off, the contamination can be easily visible when comparing the 1-D and 2-D spectra of the two grisms.

Note that the `*_cal.fits` data for GR150R are transposed so that the dispersion direction is along the -x axis. For both GR150R and GR150C `*_cal.fits` files, the axis is then flipped for visualization purposes

In [None]:
# here we look at the source as identified by the source catalog in the i2d image, the two grism cal files, and the x1d files
#   this cell is grabbing the files & sources to look at
if doviz:
    # grab the spec2 x1d output products
    spec2_x1d_files = sorted(glob.glob(os.path.join(spec2_dir, '*nis_x1d.fits*')))

    # If there are multiple pupils (blocking filters) pick one for illustration
    spec2_unique_pupils = np.unique([fits.getval(x1d, 'PUPIL') for x1d in spec2_x1d_files])
    pupil_x1ds = [x1d for x1d in spec2_x1d_files if fits.getval(x1d, 'PUPIL') == spec2_unique_pupils[0]]

    if demo_mode:
        # find the source catalog for this set of filters
        source_catfile = os.path.join(image3_dir, fits.getval(pupil_x1ds[0], 'SCATFILE'))

        # define some cool sources to look at if using the demo mode data
        ra_decs = [(53.149299532671414, -27.788593590014425), # galaxy strong emission
                   (53.1490537204553, -27.774406172992315), # galaxy with contamination
                   (53.14917151184476, -27.79305522163517), # galaxy with contamination
                   (53.17659115324354, -27.785519434446663), # larger footprint galaxy
                   (53.158405995297684, -27.794984326598932)] # point source
        sources = [find_closest_source_id(ra, dec, source_catfile) for ra, dec in ra_decs]
        nsources = len(sources)
    else:
        # or grab some sources from the first x1d file
        nsources = 5 # 100 sources are extracted by default
        source_offset = 10 # offsetting what nsources to plot to avoid extra bright sources
        with fits.open(pupil_x1ds[0]) as temp_x1d:
            sources = temp_x1d[1].data['SOURCE_ID'][source_offset:nsources+source_offset]

In [None]:
# here we look at the source as identified by the source catalog in the i2d image, the two grism cal files, and the x1d files
#   this cell is doing the figure set-up and plotting
if doviz:
    # setting up the figure
    cols = 4
    rows = nsources
    fig = plt.figure(figsize=(15, 4*(rows/2)))
    fig.suptitle(f"Spec2 Products for PID{program} o{sci_observtn} {spec2_unique_pupils[0]}")
    
    # looping through the different sources to plot; one per row
    for nsource, source_id in enumerate(sources):
        # we are only plotting a single cal file cutout for each grism
        plot_gr150r = True
        plot_gr150c = True

        # setting up the subplots for a single source
        ypos = nsource
        ax_i2d = plt.subplot2grid((rows, cols), (ypos, 0)) 
        ax_cal_r = plt.subplot2grid((rows, cols), (ypos, 1)) 
        ax_cal_c = plt.subplot2grid((rows, cols), (ypos, 2)) 
        ax_x1d = plt.subplot2grid((rows, cols), (ypos, 3))
    
        source_fluxes = [] # save the source flux to set the plot limits
                
        # plot all of the 1-D spectra from the x1d files
        for nfile, x1dfile in enumerate(pupil_x1ds):

            ax_x1d, catname, source_fluxes, grism = plot_spectrum(x1dfile, source_fluxes, ax_x1d, image3_dir, legend=False)
            
            # plot the direct image of the source based on the source number from the source catalog
            if nfile == 0:
                
                ax_i2d = plot_i2d_plus_source(catname, source_id, ax_i2d)
            
            # plot one example cal image from the GR150R grism, transposed to disperse in the same direction as GR150C
            if plot_gr150r and grism == 'GR150R':
                ax_cal_r = plot_spec2_cal(x1dfile, source_id, ax_cal_r, transpose=True)
                plot_gr150r = False
                
            # plot one example cal image from the GR150C grism
            if plot_gr150c and grism == 'GR150C':
                ax_cal_c = plot_spec2_cal(x1dfile, source_id, ax_cal_c)
                plot_gr150c = False

        if len(source_fluxes):
            # there may not have been data to extract if everything was saturated
            ax_x1d.set_ylim(np.nanmin(source_fluxes), np.nanmax(source_fluxes))
            ax_x1d.legend(bbox_to_anchor=(1, 1), ncols=np.ceil(len(pupil_x1ds)/6))
        
        # Add labels to the subplots
        if nsource == 0:
            ax_cal_r.set_title('Example Transposed GR150R cutout\n(cal)')
            ax_cal_c.set_title('Example GR150C cutout\n(cal)')
            ax_i2d.set_title('Direct Image\n(i2d)')
            ax_x1d.set_title('All Collapsed 1-D Spectrum\n(level 2 x1d)')
            
        ax_i2d.set_ylabel(f'Source\n{source_id}', fontsize=15)
        ax_cal_r.set_xlabel('dispersion --->')
        ax_cal_c.set_xlabel('dispersion --->')
        ax_x1d.set_xlabel('Wavelength (microns)')
        ax_x1d.set_ylabel('F_nu (Jy)')
        ax_x1d.ticklabel_format(axis='y', style='sci', scilimits=(0, 0)) # forcing scientific notation for the spectra

    fig.tight_layout()
    fig.show()








<hr style="border:1px solid gray"> </hr>

# 10. Spec3 Pipeline
------------------

NIRISS WFSS data are minimally processed through the [Spec3 stage of the pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec3.html) to combine calibrated data from multiple dithers within an observation. The spec3 products are unique for a specific grism and blocking filter combination; the different grism data are not combined by default. As of pipeline version 1.19.1, the level 3 source-based `*_cal.fits` files created in this step in the [exp_to_source](https://jwst-pipeline.readthedocs.io/en/latest/jwst/exp_to_source/main.html) step are no longer saved by default, and the `*_x1d.fits` files created in the [extract_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_1d/description.html) and the `*_c1d.fits` files created in the [combine_1d](https://jwst-pipeline.readthedocs.io/en/latest/jwst/combine_1d/description.html) step are now saved as a single file per grism and filter combination with all of the extracted sources contained within that file.

In [None]:
time_spec3 = time.perf_counter()

In [None]:
# Find the cal.fits files
sstring = os.path.join(spec2_dir, 'jw*cal.fits')
spec2_cal_files = sorted(glob.glob(sstring))
for ii, cal_relpath in enumerate(spec2_cal_files):
    spec2_cal_files[ii] = os.path.abspath(cal_relpath)
spec2_cal_files = np.array(spec2_cal_files)

print(f'Found {str(len(spec2_cal_files))} grism spectroscopy cal files to process for level 3')

### Create Spec3 Association Files

There will be one spec3 association per blocking filter and grism combination, in which all of the extracted 1-D spectra within an observation with that filter and grism combination are coadded into a single spectrum for each source. If using only one blocking filter (e.g., F200W) with both grisms (GR150R & GR150C) for example, we would expect two spec3 association files, each of which contains all of the corresponding cal.fits files to combine.

Like with Image3 and Spec2 before, we will be creating Image3 associations by providing a [list of exposures](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/asn_from_list.html) that we have processed through the pipeline and saved in separate directories rather than downloading directly from MAST or using the [pool files](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/asn_generate.html). Note that the output products will have a rootname that is specified by the `product_name` in the association file. For this tutorial, the rootname of the output products will end with `_spec3_asn.json`.

In [None]:
# Create Level 3 Associations for each pupil (blocking filter) type
if dospec3:

    # Parameters to be used for the NIRISS spec3 association creation
    spec3_pid = str(program) # associations are only set up to combine for the same program & observation
    spec3_obs = str(sci_observtn) # associations are only set up to combine for the same program & observation
    spec3_ins = 'NIRISS'

    # Identify the unique filters & grisms used for the NIRISS WFSS cal files
    spec3_dict = {}
    spec3_dict['PUPIL'] = np.array([fits.getval(cf, 'PUPIL') for cf in spec2_cal_files])
    spec3_dict['FILTER'] = np.array([fits.getval(cf, 'FILTER') for cf in spec2_cal_files])
    spec3_dict['PATHNAMES'] = np.array(spec2_cal_files)

    spec3_df = pd.DataFrame(spec3_dict)

    # Loop over unique pupil values
    for spec3_filter in spec3_df['PUPIL'].unique():        
        # Loop over unique filter values
        for spec3_grism in spec3_df['FILTER'].unique():
            # find the files specific to each of the filters & grisms
            spec3_files = spec3_df[(spec3_df['PUPIL'] == spec3_filter) & (spec3_df['FILTER'] == spec3_grism)]['PATHNAMES']

            # build the association names to match the default names from the pipeline
            product_name = f"jw{spec3_pid}-o{spec3_obs}_{spec3_ins}_{spec3_grism}-{spec3_filter}".lower()
            spec3_asn_filename = product_name + '_spec3_asn.json'
    
            spec3_association = asn_from_list.asn_from_list(spec3_files, rule=DMS_Level3_Base,
                                                            product_name=product_name)
    
            spec3_association.data['asn_type'] = 'spec3'
            spec3_association.data['program'] = spec3_pid
    
            # Format association as .json file
            _, serialized = spec3_association.dump(format="json")

            # Write out association file
            association_spec3 = os.path.join(sci_dir, spec3_asn_filename)
            with open(association_spec3, "w") as fd:
                fd.write(serialized)

            print(f'Writing spec3 association: {association_spec3}')

Take a quick look at the contents of the first spec3 association file to get a feel for what is being associated

In [None]:
if dospec3:
    spec3_asns = glob.glob(os.path.join(sci_dir, "*spec3_asn.json"))
    
    # open the image3 association to look at
    spec3_asn_data = json.load(open(spec3_asns[0]))
    print(f'asn_type : {spec3_asn_data["asn_type"]}')
    print(f'code_version : {spec3_asn_data["code_version"]}')
    
    # in particular, take a closer look at the product filenames with the association file:
    for product in spec3_asn_data['products']:
        for key, value in product.items():
            if key == 'members':
                print(f"{key}:")
                for member in value:
                    print(f"    {member['expname']} {member['exptype']}")
            else:
                print(f"{key}: {value}")

### Run Spec3

In Spec3, the `*_cal.fits` files are reorganized based on source number from the Image3 Pipeline's source catalog, extracted into level 3 `*_x1d.fits` files, and then combined into a single 1-D spectrum (`*_c1d.fits` files) for each source. The parameters in each of the [Spec3 steps](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec3.html) can be modified from the default values, including overwriting reference files that are used during this stage. This dictionary of the modified parameters for each of the steps is then fed into the `steps` parameter of the `Spec3Pipeline` call. The syntax for modifying some of these parameters is below.

In [None]:
# Set up a dictionary to define how the Spec3 pipeline should be configured.

# this sets up any entry to spec3dict to be a dictionary itself
spec3dict = defaultdict(dict)

# -----------------------------Set step parameters------------------------------

# Overrides for whether or not certain steps should be skipped (example).
# spec3dict['pixel_replace']['skip'] = True

In [None]:
# Run Stage 3
if dospec3:
    for spec3_asn in spec3_asns:
        os.chdir(spec3_dir)
        spec3 = Spec3Pipeline.call(spec3_asn, output_dir=spec3_dir, steps=spec3dict, save_results=True)
        os.chdir(cwd) # change back into the directory you started in
else:
    print('Skipping Spec3 processing')

In [None]:
# Print out the time benchmark
time_spec3_end = time.perf_counter()
print(f"Runtime for Spec3: {(time_spec3_end - time_spec3)/60:0.0f} minutes")

<hr style="border:1px solid gray"> </hr>

# 11. Understanding the Spec3 Outputs
------------------
The outputs of spec3 are `*_x1d.fits` and `*_c1d.fits` files. Here we do a quick look into some important parts of these files.

Each extension of the spec3 `*_x1d.fits` files contains the extracted, 1-D spectra for an individual dither for a single grism, filter, and extracted order combination. The specific filenames and extracted order can be verified with the `FILENAME` and `SPORDER` keywords in the header of each extension respectively. Within the extension, each of the extracted sources across all dithers are listed, with the values being empty if the particular dither did not contain data for that source. Also contained within each extension is information related to the extraction of a particular source, including the extents and starting size of the extraction box in the full reference frame. More information about the columns contained withing the `*_x1d.fits` files can be found in the [x1d filetype documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html#extracted-1-d-spectroscopic-data-x1d-and-x1dints).

In [None]:
# Print a list of the spec3 output x1d and c1d files
spec3_x1ds = sorted(glob.glob(os.path.join(spec3_dir, "*x1d.fits")))
print('spec3 x1d files:')
for x1d_filename in spec3_x1ds:
    print(f"   {os.path.basename(x1d_filename)}")

spec3_c1ds = sorted(glob.glob(os.path.join(spec3_dir, "*c1d.fits")))
print('spec3 c1d files:')
for c1d_filename in spec3_c1ds:
    print(f"   {os.path.basename(c1d_filename)}")

In [None]:
# Print information about the structure of the x1d files by reading in the first one
if doviz:
    sample_x1d = fits.open(spec3_x1ds[0])

    print("***Format of the level 3 x1d file:")
    sample_x1d.info()

    print("\n***cal files used to create this level 3 x1d file:")
    for ext in range(len(sample_x1d))[1:-1]:
        print(f"Extension {ext}: {sample_x1d[ext].header['FILENAME']}, order {sample_x1d[ext].header['SPORDER']}")

    print("\n***Columns contained in each extension of the level 3 x1d file:")
    print(sample_x1d[1].data.columns)

The `*_c1d.fits` files contain combined extensions of the same order in the spec3 `*_x1d.fits` files into a single file. The source numbers in the `*_c1d.fits` match those in the level 3 `*_x1d.fits` files. More information about the columns contained within the `*_c1d.fits` files can be found in the [c1d filetype documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html#combined-1-d-spectroscopic-data-c1d).

In [None]:
# Print information about the structure of the c1d files by reading in the first one
if doviz:
    sample_c1d = fits.open(spec3_c1ds[0])

    print("***Format of the c1d file:")
    sample_c1d.info()

    print("\n***Extracted orders contained in the c1d file:")
    for ext in range(len(sample_c1d))[1:-1]:
        print(f"Extension {ext}: order {sample_c1d[ext].header['SPORDER']}")
    
    print("\n***Columns contained in each extension of the c1d file:")
    print(sample_c1d[1].data.columns)

Digging a little bit further into the different source IDs and how those are handled, you can see that in each extension the source IDs are now identical, which is not always the case in the level 2 x1d files.

In [None]:
if doviz:
    for ext in np.arange(len(sample_x1d))[1:-1]:
        print(f"Extension {ext}: {sample_x1d[ext].header['FILENAME']}, Order {sample_x1d[ext].header['SPORDER']}")
        print("  Sources:\n", sample_x1d[ext].data['SOURCE_ID'])

If a source was not extracted for a given extension, the values will be filled in with a value of "0" or "nan". The column `N_ALONGDISP` is a useful tracer for finding sources that were not extracted as it represents the number of pixels in the trace along the dispersion direction, so if it is zero, no pixels were used.

In [None]:
# looking at extension 1 (first file) as an example of what a source looks like if it's not extracted
ext = 1
wh_no_source = np.where(sample_x1d[ext].data['N_ALONGDISP'] == 0)[0]
if len(wh_no_source) > 0:
    print(f"{sample_x1d[ext].header['FILENAME']} does not extract the following sources:")
    print(f"  {sample_x1d[ext].data['SOURCE_ID'][wh_no_source]}")
    print("Different column defaults when a source is not extracted:")
    for colname in sample_x1d[ext].data.names:
        print(f"  {colname} : {np.unique(sample_x1d[ext].data[colname][wh_no_source[0]])}")

### Visualize Spec3 Outputs

To compare with the Spec2 output products above, we look at the same sources, plotting instead the final `*_c1d.fits` files for each grism. We again show the `*_i2d.fits` image for a specific source, followed by the level 3 `*_x1d.fits` individual spectra for each of the two grisms (if both were used--if one is not used that column will be blank), followed by the `*_c1d.fits` combined spectrum for each of the grisms if available.

In [None]:
# looking at the i2d images, the level 3 x1d spectra, and the combined c1d spectra for both grisms for several sources
#   this cell is grabbing the files & sources to look at
if doviz:
    # grab the c1d files to plot
    spec3_c1ds = np.sort(glob.glob(os.path.join(spec3_dir, "*c1d.fits")))

    # If there are multiple pupils (blocking filters) pick one for illustration
    unique_pupils = np.unique([fits.getval(c1d, 'PUPIL') for c1d in spec3_c1ds])
    pupil_c1ds = [c1d for c1d in spec3_c1ds if fits.getval(c1d, 'PUPIL') == unique_pupils[0]]
    
    if demo_mode:
        # find the source catalog for this set of filters
        source_catfile = os.path.join(image3_dir, fits.getval(pupil_x1ds[0], 'SCATFILE'))

        # define some cool sources to look at if using the demo mode data
        ra_decs = [(53.149299532671414, -27.788593590014425), # galaxy strong emission
                   (53.1490537204553, -27.774406172992315), # galaxy with contamination
                   (53.14917151184476, -27.79305522163517), # galaxy with contamination
                   (53.17659115324354, -27.785519434446663), # larger footprint galaxy
                   (53.158405995297684, -27.794984326598932)] # point source
        sources = [find_closest_source_id(ra, dec, source_catfile) for ra, dec in ra_decs]
        nsources = len(sources)
    else:
        # or grab some sources from the first x1d file
        nsources = 5 # 100 sources are extracted by default
        source_offset = 10 # offsetting what nsources to plot to avoid extra bright sources
        with fits.open(pupil_c1ds[0]) as temp_c1d:
            sources = temp_c1d[1].data['SOURCE_ID'][source_offset:nsources+source_offset]    

In [None]:
# make sure you have run the cells defined convienence functions section: plot_i2d_plus_source & plot_spectrum
# this cell looks at the i2d images, the level 3 x1d spectra, and the combined c1d spectra for both grisms for several sources
if doviz:
    # setting up the figure
    cols = 4
    rows = nsources
    fig_c1d = plt.figure(figsize=(15, 4*(rows/2)))

    # looping through the different sources to plot; one per row
    for nsource, source_id in enumerate(sources):

        # setting up the subplots for a single source
        ypos = nsource
        ax_i2d = plt.subplot2grid((rows, cols), (ypos, 0)) 
        ax_x1d_r = plt.subplot2grid((rows, cols), (ypos, 1))
        ax_x1d_c = plt.subplot2grid((rows, cols), (ypos, 2))
        ax_c1d = plt.subplot2grid((rows, cols), (ypos, 3))
    
        source_fluxes = [] # save the source flux to set the plot limits

        # plot all of the 1-D combined spectra from the c1d files
        for nfile, c1dfile in enumerate(pupil_c1ds):
            
            # plotting the c1d spectra
            ax_c1d, catname, source_fluxes, grism = plot_spectrum(c1dfile, source_fluxes, ax_c1d, image3_dir)
                
            # plot the level 3 x1d files
            x1dfile = c1dfile.replace('c1d', 'x1d')
            with fits.open(x1dfile) as x1d:
                for ext in range(len(x1d))[1:-1]:
                    if grism == 'GR150R':
                        ax_x1d_r, catname, source_fluxes, grism = plot_spectrum(x1dfile, source_fluxes, ax_x1d_r, image3_dir, ext=ext, legend=False)
                    else:
                        ax_x1d_c, catname, source_fluxes, grism = plot_spectrum(x1dfile, source_fluxes, ax_x1d_c, image3_dir, ext=ext, legend=False)
            
            # plot the direct image of the source based on the source number from the source catalog
            if nfile == 0:
                ax_i2d = plot_i2d_plus_source(catname, source_id, ax_i2d)

        # plot labels and such
        if len(source_fluxes):
            # there may not have been data to extract if everything was saturated
            ax_c1d.set_ylim(np.nanmin(source_fluxes), np.nanmax(source_fluxes))
            
        # Add labels to the subplots
        if nsource == 0:
            ax_i2d.set_title('Direct Image\n(i2d)')
            ax_x1d_r.set_title('Individual GR150R 1-D Spectrum\n(level 3 x1d)')
            ax_x1d_c.set_title('Individual GR150C 1-D Spectrum\n(level 3 x1d)')
            ax_c1d.set_title('Combined 1-D Spectrum\n(c1d)')
        ax_i2d.set_ylabel(f'Source\n{source_id}', fontsize=15)

        for ax in [ax_x1d_r, ax_x1d_c, ax_c1d]:
            ax.set_xlabel('Wavelength (microns)')
            ax.set_ylabel('F_nu (Jy)')
            ax.ticklabel_format(axis='y', style='sci', scilimits=(0, 0)) # forcing scientific notation for the spectra
        
    fig_c1d.tight_layout()
    fig_c1d.show()

<img style="float: center;" src="https://github.com/spacetelescope/jwst-pipeline-notebooks/raw/main/_static/stsci_footer.png" alt="stsci_logo" width="200px"/> 