# NIRCam Custom Pipeline Example
---
**Author**: Ben Sunnquist (bsunnquist@stsci.edu)| **Latest Update**: 6 Feb 2024

<div class="alert alert-block alert-warning">
    <h3><u><b>Notebook Goals</b></u></h3>  
<ul>
    <li>Correct snowballs in rate images using custom jump step settings</li>
    <li>Manually flag anomalous regions (e.g. persistence, scattered light) in data quality arrays to remove them from the final drizzled image</li>
    <li>Correct 1/f noise and amplifier offset residuals</li>
    <li>Correct alignment issues in data with few good sources by feeding custom source catalogs into tweakreg</li>
</ul>
</div>

## Table of Contents
* [Introduction](#intro)
* [Pipeline Resources and Documentation](#resources)
* [Imports](#imports)
* [Convenience Functions](#funcs)
* [Data](#data)
* [Run the default pipeline](#default_pipeline)
* [Check reference files](#reffile_check)
* [Run the detector1 pipeline with custom jump step settings for snowballs](#detector1)
* [Flag persistence regions in data quality arrays](#persistence)
* [Run the image2 pipeline](#image2)
* [Correct 1/f residuals and amplifier offsets](#1overf)
    * [Create image segmentation maps with blotting](#segmaps)
    * [Run the 1/f and amplifier offset correction](#1overf_corr)
* [Correct tweakreg alignment issues](#tweakreg)
    * [Generate custom source catalogs for tweakreg](#catalogs)
    * [Run the image3 pipeline with the custom tweakreg catalogs](#image3)
* [Compare the results of the default and custom pipeline runs](#compare)

<a id='intro'></a>
## Introduction

This notebook will calibrate NIRCam imaging data all the way through the JWST pipeline, from uncalibrated images to a combined final drizzled image. 

After processing with the default pipeline settings, several issues will be discovered and investigated, including snowballs/asteroids, persistence, 1/f and amplifier offset residuals, and image alignment/source catalog issues. We'll solve each of these issues through a combination of re-running the pipeline steps with custom pipeline settings and manually modifying the images in between pipeline stages. Along the way, we'll leverage several useful pipeline functions to help investigate and correct these issues. 

At the end of the notebook, we'll compare the final products of the default pipeline run to our custom pipeline run.

<a id='resources'></a>
## Pipeline Resources and Documentation

* [JWST Pipeline Documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/user_documentation/introduction.html) includes descriptions and parameters for all pipeline steps.

* [JWST Pipeline Github](https://github.com/spacetelescope/jwst) contains the source code and installation instructions for the pipeline.

* [CRDS Reference Files](https://jwst-crds.stsci.edu/) is where all JWST reference files used by the pipeline are located.

* Submit a ticket to the [Help Desk](https://stsci.service-now.com/jwst?id=sc_cat_item&sys_id=27a8af2fdbf2220033b55dd5ce9619cd&sysparm_category=e15706fc0a0a0aa7007fc21e1ab70c2f) if you have any questions or problems regarding the pipeline.

<a id='imports'></a>
## Imports

In [None]:
# Changes for the Science Platform environment
# Preparing cached data
import os
import glob

preloaded_fits_dir = "/home/shared/preloaded-fits/jwebbinar_31/nircam/nircam_custom_pipeline_example/"
for filename in glob.glob(os.path.join(preloaded_fits_dir, "*.fits")):
    basename = os.path.basename(filename)
    if not os.path.exists(basename):
        os.symlink(filename, basename)


In [None]:
# How I made this environment:
# conda create -n jwst1125 python=3.11 notebook
# source activate jwst1125
# pip install jwst==1.12.5

import os
os.environ['CRDS_PATH'] = '/home/jovyan/crds_cache/' 
os.environ['CRDS_SERVER_URL'] = 'https://jwst-crds.stsci.edu'

from astropy.convolution import convolve, Gaussian2DKernel
from astropy.stats import sigma_clipped_stats, sigma_clip
from astropy.table import Table
from astropy.visualization import ZScaleInterval
import astropy.io.fits as fits
import glob
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
from photutils.detection import DAOStarFinder
from photutils.segmentation import detect_sources, detect_threshold
from PIL import Image, ImageDraw
import shutil
import numpy as np

import crds
import jwst
from jwst.associations import asn_from_list
from jwst.associations.lib.rules_level3_base import DMS_Level3_Base
from jwst.datamodels import dqflags, ImageModel
from jwst.outlier_detection.outlier_detection import gwcs_blot
from jwst.pipeline import Detector1Pipeline, Image2Pipeline, Image3Pipeline
print('Using JWST Pipeline v{}'.format(jwst.__version__))


<a id='funcs'></a>
## Convenience Functions

In [None]:
def plot_ramp_data(file, x, y, cutout=5):
    """Plots up-the-ramp values and displays the science and
    data quality arrays for all groups in the jump image as well
    as the rate image.

    Parameters
    ----------
    file : str
        The rate image file name. A corresponding jump file should be
        in the same location.

    x : int
        The x location of the pixel of interest.

    y : int
        The y location of the pixel of interest.

    cutout : int
        The cutout size in pixels around the pixel of interest for
        the image displays. The full dimension will be cutout*2 pixels.
    """

    # Get relevant data
    data = fits.getdata(file, 'SCI')
    dq = fits.getdata(file, 'DQ')
    ramp_data = fits.getdata(file.replace('rate.fits', 'jump.fits'), 'SCI')
    ramp_dq = fits.getdata(file.replace('rate.fits', 'jump.fits'), 'GROUPDQ')
    rate = data[y,x]
    ramp_vals = ramp_data[:,:,y,x]
    groups = np.arange(0, ramp_data.shape[1]).astype(int)
    ngroups = len(groups)
    
    # Plot up-the-ramp signal
    fig, ax = plt.subplots(3, ngroups, figsize=(ngroups*4, ngroups*3), constrained_layout=True)
    ax[0,0].scatter(groups+1, ramp_vals)
    ax[0,0].grid(ls='--')
    ax[0,0].set_xlabel('Group #')
    ax[0,0].set_ylabel('Signal [DN]')
    ax[0,0].set_title('Pixel {}, {}'.format(x,y))
    
    # Plot rate image and dq
    fig.delaxes(ax[0,3])
    z = ZScaleInterval()
    vmin, vmax = z.get_limits(data)
    ax[0,1].imshow(data[y-cutout:y+cutout+1, x-cutout:x+cutout+1], 
                   vmin=vmin, vmax=vmax, origin='lower', cmap='gray')
    ax[0,1].set_title('Rate Image\n{:.3f} DN/s'.format(data[y,x]))
    ax[0,2].imshow(dq[y-cutout:y+cutout+1, x-cutout:x+cutout+1], 
                   vmin=0, vmax=0.1, origin='lower', cmap='gray')
    dq_vals = dqflags.dqflags_to_mnemonics(dq[y,x], dqflags.pixel)
    if len(dq_vals)==0:
        dq_vals = "{'GOOD'}"
    ax[0,2].set_title('Rate DQ\n{}'.format(dq_vals))
    
    # Plot each group image and dq
    for i in groups:
        vmin, vmax = z.get_limits(ramp_data[0,i])
        ax[1,i].imshow(ramp_data[0,i,y-cutout:y+cutout+1, x-cutout:x+cutout+1], 
                       vmin=vmin, vmax=vmax, origin='lower', cmap='gray')
        ax[1,i].set_title('Group {} Image\n{} DN'.format(i+1, int(ramp_data[0,i,y,x])))
        ax[2,i].imshow(ramp_dq[0,i,y-cutout:y+cutout+1, x-cutout:x+cutout+1], 
                       vmin=0, vmax=0.1, origin='lower', cmap='gray')
        dq_vals = dqflags.dqflags_to_mnemonics(ramp_dq[0,i,y,x], dqflags.pixel)
        if len(dq_vals)==0:
            dq_vals = "{'GOOD'}"
        ax[2,i].set_title('Group {} DQ\n{}'.format(i+1, dq_vals))


<a id='data'></a>
## Data

We'll use 5 dithered detector B3 images of a sparse calibration field near the ecliptic (PID-4443 observation 2). The data uses a readout pattern of DEEP8 with 4 groups/integration and one integration/exposure. The filter/pupil used are F070W/CLEAR. We'll also use the 5 corresponding detector BLONG F277W/CLEAR calibrated images for source identification.

In [None]:
print('Files used:')
for file in sorted(glob.glob('*nrcb3*_uncal.fits')) + sorted(glob.glob('*nrcblong*_cal.fits')):
    print(file)

<a id='default_pipeline'></a>
## Run default pipeline

First we'll run all stages of the pipeline using all default parameters. The only step we'll change is to skip the dark current correction as it takes a very long time to run, and in the shortwave it doesn't change the results much as most pixels in the dark reference file are set to zero; however, this will result in a handful of uncorrected warm/hot pixels, so this isn't recommended for normal processing. We'll output the results of the jump step and tweakreg catalogs, and save all of the output files with the "_default" suffix. We'll use these outputs to identify and debug issues and for comparison to the custom pipeline run.

[Next Cell](#default_pipeline_output)

In [None]:
# Run all stages of the pipeline using all default parameters (estimated runtime ~8 min)

# Run detector1 pipeline
uncal_files = sorted(glob.glob('*nrcb3*_uncal.fits'))
for file in uncal_files:
    result = Detector1Pipeline.call(file, save_results=True,
                                    steps={'jump': {'save_results': True},
                                           'dark_current': {'skip': True}}, 
                                    output_file='{}default'.format(os.path.basename(file).split('uncal')[0]))

# Run image2 pipeline
rate_files = sorted(glob.glob('*nrcb3_default_rate.fits'))
for file in rate_files:
    result = Image2Pipeline.call(file, save_results=True)

# Create association file for image3
cal_files = sorted(glob.glob('*nrcb3_default_cal.fits'))
asn = asn_from_list.asn_from_list(cal_files, rule=DMS_Level3_Base, product_name='nircam_f070w_default')
with open('nircam_f070w_default.json', 'w') as outfile:
    outfile.write(asn.dump()[1])

# Run image3 pipeline
result = Image3Pipeline.call('nircam_f070w_default.json', save_results=True, steps={'tweakreg': {'save_catalogs': True}})


<a id='default_pipeline_output'></a>

In [None]:
# View the final default drizzle product

data = fits.getdata('nircam_f070w_default_i2d.fits', 'SCI')
plt.figure(figsize=(8,8))
plt.imshow(data, origin='lower', cmap='gray', vmin=0.1, vmax=0.5)
plt.title('Default Drizzle', fontsize=20)

Several issues are apparent in the data from the default pipeline run:
* Multiple copies of sources are visible, i.e. the data is misaligned
* The upper right corners of each image have high signal levels
* Strong horizontal noise
* Several smaller artifacts with bright halos
* Lots of outlier/bad pixels in areas with low coverage

For the remainder of this notebook, we'll investigate and correct each of these issues and generate a new final product using custom pipeline processing.

<a id='reffile_check'></a>
## Check pipeline reference files

A good first step when investigating issues in your data is confirming the data was calibrated with the most recent references files available in CRDS, as these references files are routinely updated and improved. All reference files used by the pipeline are stored in [CRDS](https://jwst-crds.stsci.edu/), and those used for your specific data are written out while running the pipeline and stored in the data's primary headers.

In [None]:
# Check that data is using the most recent reference files from CRDS.
# This cell can be used with any input file type e.g. rate, cal, i2d.

file = 'nircam_f070w_default_i2d.fits'
header = fits.getheader(file, 'PRIMARY')
reffile_mapping = crds.getrecommendations(header)
for reffile in reffile_mapping:
    reffile_match = os.path.basename(reffile_mapping[reffile])
    if 'n/a' in reffile_mapping[reffile]:  # not all reffiles are relevant to dataset
        continue
    try:
        reffile_used = os.path.basename(header[crds.jwst.locate.filekind_to_keyword(reffile)])
        if reffile_used != reffile_match:
            print('WARNING: Mismatch for {} reference file: \n \t Expected: {} \n \t     Used: {}'.format(reffile, 
                                                                                                          reffile_match, reffile_used))
        else:
            print('Successfully matched {} reference file: \n \t Expected: {} \n \t     Used: {}'.format(reffile, 
                                                                                                         reffile_match, reffile_used))
    except KeyError:
        print('{} reference file not found in image header.'.format(reffile))

<a id='detector1'></a>
## Run the detector1 pipeline with custom jump step settings

Let's check the up-the-ramp signal and data quality flags for a couple of the artifacts with bright halos.  

In [None]:
plot_ramp_data('jw04443002001_02101_00012_nrcb3_default_rate.fits', 1155, 346, cutout=40)

These objects are ["Snowballs"](https://jwst-docs.stsci.edu/data-artifacts-and-features/snowballs-and-shower-artifacts). Snowballs are bright circular sources caused by large cosmic ray impacts that appear on a timescale much shorter than the detector readout time. The cores are often saturated and the full extent of their impact is often missed by the jump step in the pipeline, resulting in a bright halo in the rate images due to these outer pixels being unflagged during the ramp fit step.

The jump step in the pipeline offers several parameters to catch and flag the extent of these snowballs (see "Parameters that affect Near-IR Snowball Flagging" section in the [jump step arguments page](https://jwst-pipeline.readthedocs.io/en/latest/jwst/jump/arguments.html)). We'll tweak several of these snowball parameters in the custom detector1 pipeline run below.

In [None]:
plot_ramp_data('jw04443002001_02101_00004_nrcb3_default_rate.fits', 1895, 185, cutout=80)

This other bright halo artifact appears to be an asteroid or some other moving object. Similar to the snowballs, the full extent of this object is not flagged in the jump step, resulting in a bright halo visible in the rate image due to the outer wings being unflagged in the ramp fit step. The solution for this artifact will be similar to the snowballs - we'll use custom jump settings in the detector1 pipeline run to expand the flagged regions which avoids the bright wings being included in the ramp fit step ([jump step arguments page](https://jwst-pipeline.readthedocs.io/en/latest/jwst/jump/arguments.html)).

[Next Cell](#detector1_output)

In [None]:
# Run detector1 pipeline with expanded snowball/asteroid flagging. Expand large jump event dq flags by 2.5x, where
# large events are defined as 100 interconnected pixels flagged as a jump (no saturation required).
# Similar to the default pipeline run, skip the dark current step to save time.

# Run detector1 pipeline
uncal_files = sorted(glob.glob('*nrcb3*_uncal.fits'))
for file in uncal_files:
    result = Detector1Pipeline.call(file, save_results=True, steps={'jump': {'save_results': True, 
                                                                             'expand_large_events': True, 
                                                                             'sat_required_snowball': False, 
                                                                             'min_jump_area': 100, 
                                                                             'expand_factor': 2.5},
                                                                    'dark_current': {'skip': True}})


<a id='detector1_output'></a>

After using these custom jump settings, we see the jump data qualty flags have been expanded to include the full extent of these objects. Since the bright halo areas are now flagged, they aren't included in the ramp fit step and don't show up in the rate image anymore.

In [None]:
plot_ramp_data('jw04443002001_02101_00012_nrcb3_rate.fits', 1155, 346, cutout=40)

In [None]:
plot_ramp_data('jw04443002001_02101_00004_nrcb3_rate.fits', 1895, 185, cutout=80)

<a id='persistence'></a>
## Flag persistence regions in data quality arrays

The upper right corner of our images show increased signal that decays with time, where the affected region is fixed in detector space. This is characteristic of [persistence](https://jwst-docs.stsci.edu/jwst-near-infrared-camera/nircam-performance/nircam-persistence), which is a residual signal caused by bright sources imaged prior to the affected data. In our case, a main belt asteroid survey covered in bright sources was imaged ~6 hours prior to our data.

Several detector areas on NIRCam are especially sensitive to persistence (see these areas [here](https://jwst-docs.stsci.edu/jwst-near-infrared-camera/nircam-performance/nircam-persistence#NIRCamPersistence-Persistencemaps)), including the upper right corner of B3. Large areas on A3 and B4 show similar behavior.  

In [None]:
# Plot the individual images to view the persistence signal in the upper right corner

fig, ax = plt.subplots(1, 5, figsize=(50,10))
files = sorted(glob.glob('*nrcb3_default_cal.fits'))
for i,file in enumerate(files):
    data = fits.getdata(file, 'SCI')
    date, time = fits.getheader(file)['DATE-OBS'], fits.getheader(file)['TIME-OBS']
    ax[i].imshow(data, origin='lower', cmap='gray', vmin=0.2, vmax=.5)
    ax[i].set_title('{}\n{}'.format(date, time), fontsize=30)


Currently, the persistence step in the pipeline doesn't do anything and persistence modeling is still ongoing, so we'll opt to manually flag these areas with high persistene in each image's data quality array before running the remaining pipeline stages. This will prevent these affected areas from being used when creating the final drizzled image. 

While we're flagging persistence here, this same method can be used to manually flag any other anomalous features in your data, such as the NIRCam scatterd light  ["claws"](https://jwst-docs.stsci.edu/jwst-near-infrared-camera/nircam-instrument-features-and-caveats/nircam-claws-and-wisps).

Before we start, let's first demonstrate how the data quality arrays work and explore some related pipeline convenience functions.

In [None]:
# View all available jwst pipeline data quality flags.
# Data quality flags are added together in the image's data quality array, e.g. a DEAD and DO_NOT_USE pixel will
# have a value of 1024+1 = 1025

dqflags.pixel

More details on the various data quality flags can be found [here](https://jwst-pipeline.readthedocs.io/en/latest/jwst/references_general/references_general.html#data-quality-flags).

In [None]:
# Example on how to translate a specific value in an image's data quality array to words

dqflags.dqflags_to_mnemonics(262657, dqflags.pixel)

In [None]:
# Example on how to calculate the total number of pixels flagged as a certain bad pixel type and, alternatively,
# how many pixels are NOT flagged as that bad pixel type.

file = 'jw04443002001_02101_00004_nrcb3_rate.fits'
dq = fits.getdata(file, 'DQ')

n_dead = len(dq[dq&dqflags.pixel['DEAD']!=0])
print('{} dead pixels in {}.'.format(n_dead, file))

n_not_dead = len(dq[dq&dqflags.pixel['DEAD']==0])
print('{} pixels NOT dead in {}.'.format(n_not_dead, file))


In this example, we'll draw a region around the persistence area in ds9, and translate that region file into a mask with the same image dimensions as our data. We'll then flag the area as PERSISTENCE and DO_NOT_USE in each image's DQ arrays. The DO_NOT_USE flag is necessary to make the pipeline ignore the pixels during the image3 pipeline processing.

In [None]:
# Create a mask to flag the region with high persistence.
# If you don't want to bother with ds9 region files, you can simply manually create the polygon
# list by inputting the x,y coordinates of the bad region.

# How to create the persistence region file in ds9:
# 1. Open the _cal image in ds9 (Analysis -> Smooth is useful here to smooth the image to ensure you flag the full extent 
# of the bad regions)
# 2. Click Edit->Region and then click Region -> Shape -> Polygon
# 3. Drawn polygon around persistence area
# 4. Save the region file (Region -> Save Regions) as e.g. persistence.reg; when saving, set format=ds9, coord system=image

with open('persistence.reg') as file:
    for line in file:
        if line.startswith('polygon'):
            coords = [float(coord) for coord in line.split('(')[1].split(')')[0].split(',')]
            polygon = list(zip(coords[::2], coords[1::2]))
            polygon.append(polygon[0])
            print(polygon)

# Create the persistence mask using the polygon region created above
img = Image.new('L', (2048, 2048), 0)
ImageDraw.Draw(img).polygon(polygon, outline=1, fill=1)
pers_mask = np.array(img)
plt.imshow(pers_mask, origin='lower', cmap='gray')


In [None]:
# Flag the persistence region in each image's DQ arrays as PERSISTENCE and DO_NOT_USE.

files = sorted(glob.glob('*nrcb3_rate.fits'))
for file in files:
    h = fits.open(file)
    original_dq = h['DQ'].data
    new_dq = np.copy(original_dq)
    # Add the PERSISTENCE and DO_NOT_USE flags to the pixels in the persistence region that are not already
    # flagged as such. This method preserves any other existing dq flags.
    new_dq[(pers_mask==1) & (original_dq&dqflags.pixel['PERSISTENCE']==0)] += dqflags.pixel['PERSISTENCE']
    new_dq[(pers_mask==1) & (original_dq&dqflags.pixel['DO_NOT_USE']==0)] += dqflags.pixel['DO_NOT_USE']
    h['DQ'].data = new_dq
    h.writeto(file, overwrite=True)
    h.close()

# Plot the old and new data quality arrays for one of the images
fig, ax = plt.subplots(1, 2, figsize=(10,5))
ax[0].imshow(original_dq, origin='lower', cmap='gray', vmin=4, vmax=5)
ax[0].set_title('Original DQ')
ax[1].imshow(new_dq, origin='lower', cmap='gray', vmin=4, vmax=5)
ax[1].set_title('New DQ')


<a id='image2'></a>
## Run the image2 pipeline

Next we'll run the image2 pipeline using our modified rate images. The data quality modifications and custom snowball handling above will be carried over by default, so no custom settings are needed here.

[Next Cell](#image2_output)

In [None]:
# Run image2 pipeline on our modified rate files

rate_files = sorted(glob.glob('*nrcb3_rate.fits'))
for file in rate_files:
    result = Image2Pipeline.call(file, save_results=True)


<a id='image2_output'></a>

<a id='1overf'></a>
## Correct 1/f residuals and amplifier offsets

1/f noise is correlated noise that appears as horizontal banding in NIRCam images. While the [reference pixel step](https://jwst-pipeline.readthedocs.io/en/stable/jwst/refpix/description.html) in the detector1 pipeline is designed to correct this noise, it isn't perfect, and remaining horizontal banding can be seen throughout the later stages of the pipeline.

Several [community tools](https://www.stsci.edu/jwst/science-planning/tools-from-the-community) have been created to correct this remaining 1/f noise in NIRCam images. In this notebook, we'll apply a basic correction by subtracting the source-masked, median-collapsed row values relative to the pedestal from each image. While we're correcting this 1/f noise, we'll also tweak the overall signal levels in the 4 amplifiers to better match eachother, as the borders between neighboring amplifiers often show an abrupt change in background levels.

In [None]:
# Images show both 1/f noise residuals as horizontal bands and small amplifier offsets

data = fits.getdata('jw04443002001_02101_00008_nrcb3_default_cal.fits')
data_conv = convolve(data, Gaussian2DKernel(x_stddev=3))
plt.figure(figsize=(5,5))
plt.imshow(data_conv, cmap='gray', origin='lower', vmin=.25 ,vmax=.4)

<a id='segmaps'></a>
### Generate segmentation maps

Before correcting the 1/f noise, we'll first need to generate segmentation maps for each input image. Because the sources appear more clearly in the longwave images, we'll generate the segmentation maps using the longwave data, and then use a pipeline function to blot back the results onto the shortwave images. The upper left quadrant of BLONG has ~the same field-of-view as B3 ([NIRCam FOV](https://jwst-docs.stsci.edu/jwst-near-infrared-camera/nircam-instrumentation/nircam-field-of-view)).

In [None]:
# Show SW image and corresponding LW image, to highlight why we're making segmentation maps using the LW

data_sw = fits.getdata('jw04443002001_02101_00008_nrcb3_cal.fits')
data_lw = fits.getdata('jw04443002001_02101_00008_nrcblong_cal.fits')
fig, ax = plt.subplots(1, 2, figsize=(20,10))
ax[0].imshow(data_sw, origin='lower', cmap='gray', vmin=0.2, vmax=.6)
ax[0].set_title('SW image', size=20)
ax[1].imshow(data_lw[1024:,0:1024], origin='lower', cmap='gray', vmin=0.13, vmax=0.21)
ax[1].set_title('LW image cutout', size=20)


We'll generate segmentation maps for each longwave image by gaussian-smoothing the image and flagging all 8 interconnected pixels 1 sigma above the background as a source. To make sure we catch the wings of the large bright sources, we'll also gaussian-smooth the segmentation map itself before writing it out.

In [None]:
# Make segmentation maps for each corresponding NRCBLONG image

files = sorted(glob.glob('*nrcb3_cal.fits'))
for file in files:
    # Find sources using corresponding lw image
    lw_file = file.replace('nrcb3', 'nrcblong')
    data  = fits.getdata(lw_file, 'SCI')
    dq = fits.getdata(lw_file, 'DQ')
    data = np.ma.masked_array(data, mask=dq!=0)  # avoids bad pixels as sources

    # Make segmentation map
    mean, median, stddev = sigma_clipped_stats(data, sigma=3.0)
    data -= median  # subtract background
    threshold = 1.0 * stddev
    data_conv = convolve(data, Gaussian2DKernel(x_stddev=3))
    segmap_orig = detect_sources(data_conv, threshold, npixels=8).data.astype(int)
    segmap_orig[segmap_orig!=0] = 1

    # Smooth segmap to catch faint wings of sources
    segmap = convolve(segmap_orig, Gaussian2DKernel(x_stddev=3))
    segmap[segmap<0.05] = 0
    segmap[segmap>=0.05] = 1

    # Write out the final segmap
    fits.writeto(lw_file.replace('.fits', '_seg.fits'), segmap, overwrite=True)

# Plot one of the images and its corresponding segmap
fig, ax = plt.subplots(1, 2, figsize=(20,10))
data[data.mask==True] = 0  # dont display maked pixels
ax[0].imshow(data, origin='lower', cmap='gray', vmin=-.05, vmax=.05)
ax[0].set_title('LW image', size=20)
ax[1].imshow(segmap, origin='lower', cmap='gray', vmin=0, vmax=0.1)
ax[1].set_title('LW image segmap', size=20)


Next, we'll blot the segmentation maps from the longwave data onto the shortwave pixel-space, and write out the resulting segmentation map for each shortwave image.

In [None]:
# Make segmentation maps for each SW image by blotting back the segmap from the corresponding LW image

files = sorted(glob.glob('*nrcb3_cal.fits'))
for file in files:
    # Create an image model of the lw segmap
    lw_file = file.replace('nrcb3', 'nrcblong')
    lw_model = ImageModel(lw_file)
    lw_segmap = fits.getdata(lw_file.replace('.fits', '_seg.fits'))
    lw_model.data = lw_segmap
    
    # Blot the segmap data from the lw image onto the corresponding sw image
    model = ImageModel(file)
    blotted_data = gwcs_blot(lw_model, model, interp='nearest')
    fits.writeto(file.replace('.fits', '_seg.fits'), blotted_data, overwrite=True)

# Plot the sw image and segmap, as well as the corresponding lw segmap
fig, ax = plt.subplots(1, 3, figsize=(30,10))
ax[0].imshow(model.data, origin='lower', cmap='gray', vmin=0.2, vmax=.6)
ax[0].set_title('SW image', size=30)
ax[1].imshow(blotted_data, origin='lower', cmap='gray', vmin=0, vmax=.1)
ax[1].set_title('SW segmap', size=30)
ax[2].imshow(lw_segmap, origin='lower', cmap='gray', vmin=0, vmax=.1)
ax[2].set_title('LW segmap', size=30)
ax[2].add_patch(Rectangle((0, 1024), 1024, 1024, linewidth=5, edgecolor='red', facecolor='none'))  # highlight rough SW FOV


<a id='1overf_corr'></a>
### Run the 1/f and amplifier offset correction

We'll now correct the 1/f noise residuals and amplifier offsets in each shortwave image, using the segmentation maps created above for masking purposes.

In [None]:
# Correct 1/f noise residuals and amplifier offsets

files = sorted(glob.glob('*nrcb3_cal.fits'))
for file in files:
    # Get data and segmap
    h = fits.open(file)
    data = h['SCI'].data
    dq = h['DQ'].data
    segmap = fits.getdata(file.replace('.fits', '_seg.fits'))

    # Mask bad pixels, persistence, and sources
    data_masked = np.copy(data)
    data_masked[(dq!=0) | (segmap!=0)] = np.nan
    clipped = sigma_clip(data_masked, sigma=3)
    data_masked[clipped.mask==True] = np.nan

    # Get full-frame median
    med = np.nanmedian(data_masked)

    # Get median-collapsed row/column offsets, representing the 1/f residuals and amp offsets.
    # Bin the column offsets since they're larger-scale.
    collapsed_rows = np.nanmedian(data_masked - med, axis=1)
    collapsed_cols = np.nanmedian(data_masked - med, axis=0)
    bin_size = 16
    collapsed_cols_binned = [np.nanmedian(collapsed_cols[idx:idx+bin_size]) 
                             for idx in np.arange(0, len(collapsed_cols), bin_size)]

    # Create a correction image combining the collapsed row/column offsets
    correction_image = np.tile(np.repeat(collapsed_cols_binned, bin_size), (2048, 1)) + \
                       np.swapaxes(np.tile(collapsed_rows, (2048, 1)), 0, 1)
    
    # Apply the correction image to the original data
    data_new = data - correction_image

    # Write out the corrected file
    h['SCI'].data = data_new
    h.writeto(file.replace('.fits', '_corr.fits'), overwrite=True)
    h.close()


In [None]:
# Plot one image, the correction applied to it, and the corrected image

data = fits.getdata('jw04443002001_02101_00008_nrcb3_cal.fits')
data_new = fits.getdata('jw04443002001_02101_00008_nrcb3_cal_corr.fits')

fig, ax = plt.subplots(1, 3, figsize=(30,10))
ax[0].imshow(data, origin='lower', cmap='gray', vmin=0.2, vmax=.45)
ax[0].set_title('Original', size=30)
final_correction = data - data_new
final_correction[~np.isfinite(final_correction)] = 0  # ignore nans in display
ax[1].imshow(final_correction, origin='lower', cmap='gray', vmin=-.1, vmax=.1)
ax[1].set_title('Model', size=30)
ax[2].imshow(data_new, origin='lower', cmap='gray', vmin=0.2, vmax=.45)
ax[2].set_title('Corrected', size=30)


<a id='tweakreg'></a>
## Correct tweakreg alignment issues

The drizzled image from the default pipeline run showed multiple images of individual sources, suggesting an issue with the image alignments in the image3 pipeline tweakreg step. Since the images were misaligned, the sources themselves in some images were flagged as outliers, and only their fainter wings can be seen in the final drizzled image. This can be confirmed by inspecting the cosmic ray flagged (i.e. _crf.fits) data quality arrays, which are generated for each input image during the image3 outlier detection step.

In [None]:
# Plot default drizzled image cutout and the outlier flags in a single exposure

fig, ax = plt.subplots(1, 3, figsize=(15,5))
data = fits.getdata('nircam_f070w_default_i2d.fits', 'SCI')
ax[0].imshow(data[500:1300, 250:1050], origin='lower', cmap='gray', vmin=0.2, vmax=.5)
ax[0].set_title('Default Drizzle', size=15)
data = fits.getdata('jw04443002001_02101_00008_nrcb3_default_a3001_crf.fits', 'SCI')
ax[1].imshow(data[0:600, 0:600], origin='lower', cmap='gray', vmin=0.15, vmax=0.6)
ax[1].set_title('Default Image', size=15)
dq = fits.getdata('jw04443002001_02101_00008_nrcb3_default_a3001_crf.fits', 'DQ')
dq[dq&dqflags.pixel['OUTLIER']==0] = 0  # ony show pixels flagged as OUTLIER
ax[2].imshow(dq[0:600, 0:600], origin='lower', cmap='gray', vmin=0, vmax=0.1)
ax[2].set_title('OUTLIER DQ flags', size=15)


By inspecting the output logging from the default image3 pipeline run, we can see that most images failed to find enough sources for tweakreg alignment, so the tweakreg step was skipped for these images: 

<code>WARNING - Not enough matches (< 9) found for image catalog 'GROUP ID: jw04443002001_02101_00016_nrcb3_default_cal'
</code>

The couple images that did find enough sources had large unexpected shifts (XSH, YSH). This is not expected because aligning and blinking the images in ds9 (Frame -> Match -> Frame -> WCS) shows reasonable WCS alignment.

<code>Computed 'shift' fit for GROUP ID: jw04443002001_02101_00008_nrcb3_default_cal:
XSH: -0.575567  YSH: 2.02581</code>

The issue here is likely due to a combination of this image having few good sources to match on, and tweakreg matching on bad pixels/sources instead. Let's plot the image and it's data quality array, and overlay the sources identified by tweakreg.

In [None]:
# Plot a single image cutout and its jump flags, and overplot sources identified by tweakreg

fig, ax = plt.subplots(1, 2, figsize=(20,10))
data = fits.getdata('jw04443002001_02101_00008_nrcb3_default_cal.fits', 'SCI')
dq = fits.getdata('jw04443002001_02101_00008_nrcb3_default_cal.fits', 'DQ')
dq[dq&dqflags.pixel['JUMP_DET']==0] = 0  # only show pixels flagged as JUMP
t = Table.read('jw04443002001_02101_00008_nrcb3_default_cal_cat.ecsv')
ax[0].imshow(data, origin='lower', cmap='gray', vmin=0.2, vmax=.5)
ax[0].set_title('Default Image', size=30)
ax[0].scatter(t['x'], t['y'], marker='o', facecolor='none', edgecolor='limegreen', s=500, linewidths=3)
xlim, ylim = [1000,1500], [1775, 2044]
ax[0].set_xlim(xlim)
ax[0].set_ylim(ylim)
ax[1].imshow(dq, origin='lower', cmap='gray', vmin=0, vmax=.1)
ax[1].set_title('JUMP DQ Flags', size=30)
ax[1].set_xlim(xlim)
ax[1].set_ylim(ylim)
ax[1].scatter(t['x'], t['y'], marker='o', facecolor='none', edgecolor='limegreen', s=500, linewidths=3)


Here we see that every source identified by tweakreg is on top of a JUMP_DET flag. These sources only show up in a single image in WCS-space, so we know they are not real sources. Let's see what's happening up-the-ramp for one of these outlier JUMP_DET pixels:

In [None]:
plot_ramp_data('jw04443002001_02101_00004_nrcb3_rate.fits', 1152, 1357, cutout=5)

As seen in the ramp data, the jump step does not appear to be handling these cosmic rays correctly. Group 2 is correctly flagged as a jump, but group 3 is not, resulting in group 3 being included in the ramp fit step and thus a high value in the rate image. 

Group 4 is also mysteriously flagged as a jump even though it look like not much signal was accumulated between groups 3 and 4. This is potentially an issue with how the pipeline handles images with a low number of groups; the jump step works by comparing group differences to the median group difference, and in this case a low difference between groups (e.g. groups 3 to 4) is actually an outlier compared to the more common, larger group differences seen between the other groups. Group 2 may be flagged just due to being such a high jump compared to the median group differences.

This case highlights the importance of, if possible, increasing the number of groups/int in programs. At the moment, there is no pipeline setting to better handle these scenarios, so we'll instead create custom catalogs for each image after cleaning these bad pixels, and feed these custom catalogs into the tweakreg step.

<a id='catalogs'></a>
### Generate custom source catalogs for tweakreg

Here we'll create custom source catalogs for each input image. To avoid identifying those bad jump pixels described above as sources, we'll first identify them as a pixel flagged as either JUMP_DET or DO_NOT_USE and whose ratio of the original image to its gaussian-smoothed version is different by greater than 100%. Once identified, we'll replace these pixels with their gaussian-smoothed values and proceed with source identification on these cleaned images. We'll also avoid identifying anything within the persistence regions flagged previously as a source as well.

In [None]:
# Create custom catalogs to feed into tweakreg, avoiding bad pixels as sources

files = sorted(glob.glob('*nrcb3_cal_corr.fits'))
# make sure displayed image file for this cell matches example file used in previous cells
files = [files[0]] + files[2:] + [files[1]]
for file in files:
    data = fits.getdata(file, 'SCI')
    data_orig = np.copy(data)
    dq = fits.getdata(file, 'DQ')
    mask = (dq&dqflags.pixel['JUMP_DET']!=0) | (dq&dqflags.pixel['DO_NOT_USE']!=0)
    data_conv = convolve(data, Gaussian2DKernel(x_stddev=2), mask=mask)  # bad pixels not included in gaussian-fit
    ratio = abs(1 - (data/data_conv))
    data[(mask==True) & (ratio>1)] = data_conv[(mask==True) & (ratio>1)]  # replace bad pixels
    data[dq&dqflags.pixel['PERSISTENCE']!=0] = np.nan  # ignore persistence region
    mean, median, stddev = sigma_clipped_stats(data, sigma=3.0)
    daofind = DAOStarFinder(fwhm=0.935, threshold=10*stddev, brightest=200, min_separation=10)
    sources = daofind(data - median)
    t = Table()
    t['id'] = sources['id']
    t['x'] = sources['xcentroid']
    t['y'] = sources['ycentroid']
    t['flux'] = sources['flux']
    t.write(file.replace('.fits', '.ecsv'), format='ascii.ecsv', overwrite=True)

# Plot cutout of one of the original and cleaned images
fig, ax = plt.subplots(1, 2, figsize=(20,10))
data_orig[~np.isfinite(data_orig)] = 0  # ignore nans
ax[0].imshow(data_orig, origin='lower', cmap='gray', vmin=0.2, vmax=.5)
ax[0].set_title('Original', size=30)
xlim, ylim = [1000,1500], [1775, 2044]
ax[0].set_xlim(xlim)
ax[0].set_ylim(ylim)
data[~np.isfinite(data)] = 0  # ignore nans
ax[1].imshow(data, origin='lower', cmap='gray', vmin=0.2, vmax=.5)
ax[1].set_title('Cleaned', size=30)
ax[1].set_xlim(xlim)
ax[1].set_ylim(ylim)


In [None]:
# Confirm that the new catalogs are ignoring JUMPS now and finding real sources

# Plot a single image cutout and its jump flags, and overplot sources identified by tweakreg
fig, ax = plt.subplots(1, 2, figsize=(20,10))
data = fits.getdata('jw04443002001_02101_00008_nrcb3_default_cal.fits', 'SCI')
dq = fits.getdata('jw04443002001_02101_00008_nrcb3_default_cal.fits', 'DQ')
dq[dq&dqflags.pixel['JUMP_DET']==0] = 0  # only show pixels flagged as JUMP
t = Table.read('jw04443002001_02101_00008_nrcb3_cal_corr.ecsv')
ax[0].imshow(data, origin='lower', cmap='gray', vmin=0.2, vmax=.5)
ax[0].set_title('Default Image', size=30)
ax[0].scatter(t['x'], t['y'], marker='o', facecolor='none', edgecolor='limegreen', s=500, linewidths=3)
xlim, ylim = [1000,1500], [1775, 2044]
ax[0].set_xlim(xlim)
ax[0].set_ylim(ylim)
ax[1].imshow(dq, origin='lower', cmap='gray', vmin=0, vmax=.1)
ax[1].set_title('JUMP DQ Flags', size=30)
ax[1].set_xlim(xlim)
ax[1].set_ylim(ylim)
ax[1].scatter(t['x'], t['y'], marker='o', facecolor='none', edgecolor='limegreen', s=500, linewidths=3)


As seen above, these new catalogs are now finding several good sources to align on and are generally avoiding the bad pixels.

<a id='image3'></a>
### Run the image3 pipeline with the custom tweakreg catalogs

Now we'll run the final stage of the pipeline, image3, to combine all of our custom images into a final drizzled product. We need to make sure to feed in our custom catalogs created above to the tweakreg step. To do this, we need to both add the catalog name to the ```tweakreg_catalog``` parameter in the association file for each member and set ```use_custom_catalogs``` to True in the tweakreg step.

Since the images have reasonable alignment by default, we'll also decrease the search radius and tolerance values for source matching, and because we're still dealing with a small number of good sources in these images we'll decrease the minimum number of objects to match on. The tweakreg parameters are all described in detail [here](https://jwst-pipeline.readthedocs.io/en/latest/jwst/tweakreg/README.html#step-arguments).

[Next Cell](#image3_output)

In [None]:
# Create association file for image3 using custom catalogs
cal_files = sorted(glob.glob('*nrcb3_cal_corr.fits'))
asn = asn_from_list.asn_from_list(cal_files, rule=DMS_Level3_Base, product_name='nircam_f070w')
for member in asn['products'][0]['members']:  # add tweakreg catalogs to asn file
    member['tweakreg_catalog'] = member['expname'].replace('.fits', '.ecsv')
with open('nircam_f070w.json', 'w') as outfile:
    outfile.write(asn.dump()[1])

# Run image3 pipeline
result = Image3Pipeline.call('nircam_f070w.json', save_results=True, steps={'tweakreg': {'use_custom_catalogs': True,
                                                                                         'minobj': 8,
                                                                                         'searchrad': 1,
                                                                                         'tolerance': 0.5}})


<a id='image3_output'></a>

As seen in the output log above, the tweakreg step is now successfully aligning all of the images, and the shifts applied all appear reasonable (XSH,YSH ~0.03" or ~1 pixel).

<a id='compare'></a>
## Compare the results of the default and custom pipeline runs

Below, we compare the final result of the default pipeline run to our custom pipeline run. In the custom image, the snowball/asteroid residuals are removed, the horizontal banding and amplifier offsets are decreased, the areas of high persistence are removed, and the images are well-aligned.

In [None]:
# Compare the default and custom drizzle

data = fits.getdata('nircam_f070w_default_i2d.fits')
data_new = fits.getdata('nircam_f070w_i2d.fits')
fig, ax = plt.subplots(1, 2, figsize=(20,10))
ax[0].imshow(data, origin='lower', cmap='gray', vmin=0.2, vmax=.45)
ax[0].set_title('Default', size=30)
ax[1].imshow(data_new, origin='lower', cmap='gray', vmin=0.2, vmax=.45)
ax[1].set_title('Custom', size=30)


In [None]:
# Plot median-collapsed row values of default and custom drizzles

collapsed = np.nanmedian(data[700:1300, :], axis=0)
x = np.arange(len(collapsed))
plt.scatter(x, collapsed, label='Default', alpha=0.1)
collapsed = np.nanmedian(data_new[700:1300, :], axis=0)
x = np.arange(len(collapsed))
plt.scatter(x, collapsed, label='Custom', alpha=0.1)
plt.ylim(.28, .34)
plt.legend()
plt.xlabel('Column #')
plt.ylabel('Signal [MJy/sr]')
plt.grid(ls='--', color='gray', alpha=0.5)


In [None]:
# Plot histograms and print image stats of default and custom drizzles

h = plt.hist(data.flatten(), bins=100, range=(.2,.45), alpha=0.5, label='Default')
h = plt.hist(data_new.flatten(), bins=100, range=(.2,.45), alpha=0.5, label='Custom')
plt.legend()
plt.ylabel('Number of Pixels')
plt.xlabel('Signal [MJy/sr]')
plt.grid(ls='--', color='gray', alpha=0.5)

# print out image stats
mean, med, stddev = sigma_clipped_stats(data[1250:1750, 1250:1750], sigma=3)
print('Default \n mean: {:.4f} \n med: {:.4f} \n stddev: {:.5f}'.format(mean, med, stddev))
mean, med, stddev = sigma_clipped_stats(data_new[1250:1750, 1250:1750], sigma=3)
print('Custom \n mean: {:.4f} \n med: {:.4f} \n stddev: {:.5f}'.format(mean, med, stddev))
