<a id='top'></a>
# CEERS: Reducing NIRCam Imaging Data
---
**Author**: Micaela Bagley (mbagley@utexas.edu) 

**Latest Update**: 17 November 2021

This notebook follows the example of and includes some text and explanations from STScI's [JWebbinar 3: “Pipeline in Imaging mode”](https://www.stsci.edu/jwst/science-execution/jwebbinars). See the notebooks from the JWebbinar for more detailed information about running the pipeline.

<div class="alert alert-block alert-info">
    <h3><u><b>Notebook Goals</b></u></h3>
    <ul>Take a CEERS NIRCam pointing through all three stages of the JWST Calibration Pipeline. Specifically, we will:</ul>
    <ul>    
      <li>demonstrate calling the pipeline on a single image using all three calling methods; </li>
      <li>create partial mosaics in two filters, combining three dithered exposures; </li>
      <li>describe how to reduce all images for the CEERS pointing and produce full mosaics using the command line and batch scripts. </li>    
    </ul>
</div>

## Table of Contents
* [Introduction](#intro)
   * [Simulated Data](#sims)
* [Pipeline Resources and Documentation](#resources)
   * [Installation](#installation)
   * [Reference Files](#reference_files)
   * [System Requirements](#system_requirements)
* [Imports](#imports)
* [Methods for Calling Steps/Pipelines](#calling_methods)
* [Parameter Reference Files](#parameter_reffiles)
* [calwebb_detector1 - Ramps to Slopes](#detector1) 
   * [run() method](#run_method_detector1)
   * [call() method](#call_method_detector1)
   * [command line](#command_line_detector1)
* [Custom Step - Correction for Image Striping](#striping)
* [calwebb_image2 - Calibrated Slope Images](#image2)
   * [run() method](#run_method_image2)
   * [call() method](#call_method_image2)
   * [command line](#command_line_image2)
* [Custom Step - Removing A5 Detector Feature](#a5_detector)
* [Association Files](#associations)
* [Custom Step - Sky Subtraction](#skymatch)
* [Break - Reducing Additional Images](#break)
* [calwebb_image3 - Ensemble calibrations](#image3) 
   * [run() method](#run_method_image3)
   * [call() method](#call_method_image3)
   * [command line](#command_line_image3)

<a id='intro'></a>
## Introduction

<img align="left" width=15% src="CEERSlogo.png">

### The Cosmic Evolution Early Release Science Survey

CEERS will cover 100 sq. arcmin of the EGS field with JWST imaging and spectroscopy using NIRCam, MIRI, and NIRSpec. CEERS will demonstrate, test, and validate efficient extragalactic surveys with coordinated, overlapping parallel observations in a field supported by a rich set of HST/CANDELS multi-wavelength data.

<img src="CEERSmap.png">

CEERS has 10 NIRCam imaging pointings, shown in blue in the figure above with each module labeled with the pointing number. Pointings 1-6 are taken in parallel to prime NIRSpec MSA observations (shown in green), while pointings 7-10 are taken in parallel to prime MIRI imaging observations (shown in red). We also observe pointings 5-8 with the NIRCam WFSS (outlined in cyan) with MIRI imaging in parallel. 

In this notebook, we'll take a subset of simulated CEERS NIRCam images through the full JWST Calibration Pipeline. **We demonstrate the process with NIRCam imaging from CEERS pointing 5, which is also covered by NIRCam WFSS and MIRI simulated data.**

<a id='sims'></a>
### Simulated Data

We have simulated a CEERS observation using [Mirage (the Multi-Instrument RAmp Generator)](https://mirage-data-simulator.readthedocs.io/en/latest/) with input sources taken from a mock catalog created with the Santa Cruz Semi-Analytic Model (SAM). 

The base of the mock catalog is a lightcone constructed from a dark matter N-body simulation. The lightcone provides mock positions in the sky for sources, and the SAM provides an estimate of the DM halo merger histories. The SAM simulates the properties of galaxies in the halos using recipes for the physical processes that shape galaxy evolution (accretion, cooling, star formation, etc.). The physical properties are then forward modeled to create synthetic SEDs and NIRCam photometry. Nebular emission lines were added to the SEDs and are included in the broad and median band photometry. 

For more information on the SAM, please see the following references:

* The lightcone:       [Somerville et al. 2021](https://ui.adsabs.harvard.edu/abs/2021MNRAS.502.4858S/abstract); Yung et al. (in prep)

* JWST photometry:      [Yung et al. 2019a](https://ui.adsabs.harvard.edu/abs/2019MNRAS.483.2983Y/abstract)
* Physical properties:  [Yung et al. 2019b](https://ui.adsabs.harvard.edu/abs/2019MNRAS.490.2855Y/abstract)
* Morphology and size:  [Somerville et al. 2018](https://ui.adsabs.harvard.edu/abs/2018MNRAS.473.2714S/abstract)
* Nebular emission lines:   [Hirschmann et al. 2017](https://ui.adsabs.harvard.edu/abs/2017MNRAS.472.2468H/abstract); [2019](https://ui.adsabs.harvard.edu/abs/2019MNRAS.487..333H/abstract)
* Santa Cruz SAM:      
  * Base framework: [Somerville & Primack 1999](https://ui.adsabs.harvard.edu/abs/1999MNRAS.310.1087S/abstract)
  * Latest update:  [Somerville, Popping, Trager 2015](https://ui.adsabs.harvard.edu/abs/2015MNRAS.453.4337S/abstract)
  * Calibration and configuration: [Yung et al. 2019a](https://ui.adsabs.harvard.edu/abs/2019MNRAS.483.2983Y/abstract)
                     
* Based on DM simulation:  [Klypin et al. 2016](https://ui.adsabs.harvard.edu/abs/2016MNRAS.457.4340K/abstract)                  (Bolshoi-Planck simulation)
* Darkcone construct:   
  * [Behroozi et al. 2013a](https://ui.adsabs.harvard.edu/abs/2013ApJ...762..109B/abstract); [2013b](https://ui.adsabs.harvard.edu/abs/2013ApJ...763...18B/abstract)            (Rockstar, Consistent-Tree)
  * [Behroozi et al. 2019](https://ui.adsabs.harvard.edu/abs/2019MNRAS.488.3143B/abstract)                (UniverseMachine)


The simulated images were created with Mirage version 2.1.0. We used the pointing and XML files exported from the newest version of the [CEERS APT file](https://www.stsci.edu/jwst/phase2-public/1345.aptx) to get the correct observation specifications (dithers, groups, readout patterns, etc). As the NIRCam imaging is taken in parallel, these files required some modifications for Mirage to simulate the custom primary-parallel dither patterns planned for CEERS observations.

Galaxies are added to the images as Sersic profiles. Mirage also adds real point sources with magnitudes V<16 based on RA,Dec from 2MASS, WISE, and Gaia. We added fainter point sources (16 < V < 29) using the [Besancon Model](https://model.obs-besancon.fr/) to approximate the correct stellar density and luminosity distribution in the EGS field. 

**CEERS 5**

The CEERS 5 pointing includes 90 raw (`uncals/*uncal.fits`) simulated images:

* 24 each for F115W, F150W and F200W (8 short wavelength detectors x 3 dithers)
* 6 each for F277W, F356W and F444W (2 long wavelength detectors x 3 dithers)

**In this notebook, we will demonstrate how to run the JWST Calibration Pipeline on two of the raw images:** 

* jw01345005001_01101_00001_nrca1_uncal.fits - an F115W image from detector A1, and
* jw01345005001_01101_00001_nrca5_uncal.fits - an F277W image from detector A5

We choose these two images as an example. Running Stages 1 and 2 of the pipeline on all images is identical to the process we demonstrate here on these two images, with one exception. We have created a custom step that we apply only to A5 detectors to account for a feature present in our simulated data. We will discuss this step in the section on [Removing A5 Detector Feature](#a5_detector).

<div class="alert alert-block alert-success">
<strong>Changes for Simulated data</strong> 
    
**Note:** In a few places, we will make custom changes to the default pipeline to account for specifics of the simulated data. These additional or custom steps are necessary to:
    
1. ensure the data are reduced with the same reference files that were used in simulating them, and 
    
2. remove or correct for features that were introduced during the simulation but are not expected to be present in real data in the same way.
    
We will note these special cases with green boxes like this one.
</div>

<a id='resources'></a>
## Pipeline Resources and Documentation

There are several different places to find information on installing and running the pipeline. This notebook will provide examples of running the pipeline on a handful of images, but will not demonstrate all options and features. Please see the following links for more in-depth instructions and documentation.

* [High-level description of all pipeline stages and steps](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/main.html) from the `jwst` software documentation pages.

* JWST Documentation (JDox) for each pipeline stage, including a short summary of what each step does:

  * [JDox page for the Stage 1 pipeline](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_detector1) 

  * [JDox page for the Stage 2 pipeline](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_image2)
  
  * [JDox page for the Stage 3 pipeline](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_image3)
  
* [JWebbinar 3 notebooks and presentations](https://stsci.app.box.com/s/m3rb85ts4qzcz8t6tpgclcous2zdxaey)

* [`jwst` package documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/introduction.html) including how to run the pipeline, input/output files, etc.

* [`jwst` package GitHub repository](https://github.com/spacetelescope/jwst/blob/master/README.md), including installation instructions

* [Help Desk](https://stsci.service-now.com/jwst?id=sc_cat_item&sys_id=27a8af2fdbf2220033b55dd5ce9619cd&sysparm_category=e15706fc0a0a0aa7007fc21e1ab70c2f): If you have any questions or problems regarding the pipeline, submit a ticket to the Help Desk

<a id='installation'></a>
### Installation

<div class="alert alert-block alert-info">
    Before running this notebook, you will have to first install the <code>jwst</code> package. We recommend installing <strong>version 1.3.3</strong>, as that is the latest version tested with this notebook. 
    
**NOTE:** The `jwst` package requries Python 3.7+ <br><br>
    
The recommended way to install the pipeline is via `pip`. Follow the steps below to create a new conda environment, activate that environment, and then install the latest released version of the pipeline. You can name your environment anything you like. In the lines below, replace `<env_name>` with your chosen environment name.

>`conda create -n <env_name> python`<br>
>`conda activate <env_name>`<br>
>`pip install jwst==1.3.3`

You can download the latest released version by excluding `==1.3.3` from the `pip install jwst` command. For more detailed instructions on the various ways to install the package, including installing more recent development versions of the pipeline, see the [installation instructions](https://github.com/spacetelescope/jwst/blob/master/README.md) on GitHub.
    
</div>

<a id='reference_files'></a>
### Reference Files

[Calibration reference files](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/jwst-data-calibration-reference-files) are a collection of FITS and ASDF files that are used to remove instrumental signatures and calibrate JWST data.
For example, there are reference files handling the identification of bad pixels or those affected by saturation or persistence, the removal of dark current or flat field structure, flux calibration, etc.

When running a pipeline or pipeline step, the pipeline will automatically look for any required reference files in a pre-defined local directory. If the required reference files are not present, they will automatically be downloaded from the Calibration Reference Data System (CRDS) at STScI.
    
<div class="alert alert-block alert-info">
    
You will have to specify a local directory in which to store reference files, along with the server to use to download the reference files from CRDS. To accomplish this, there are two environment variables that should be set prior to calling the pipeline. These are the `CRDS_PATH` and `CRDS_SERVER_URL` variables. In the example below, reference files will be downloaded to the "crds_cache" directory under the home directory.

>`$ export CRDS_PATH=$HOME/crds_cache`<br>
>`$ export CRDS_SERVER_URL=https://jwst-crds.stsci.edu`<br>
OR:<br>
`os.environ["CRDS_PATH"] = "/user/myself/crds_cache"`<br>
`os.environ["CRDS_SERVER_URL"] = "https://jwst-crds.stsci.edu"`<br>

The first time you run the pipeline, the CRDS server should download all of the context and reference files that are needed for that pipeline run, and dump them into the `CRDS_PATH` directory. Subsequent executions of the pipeline will first look to see if it has what it needs in `CRDS_PATH` and anything it doesn't have will be downloaded from the STScI cache. 
</div>

<strong>Note:</strong>The <code>CRDS_PATH</code> directory will likely end up with **~7 - 9 GB** after running the pipeline steps in this notebook.</div>

<div class="alert alert-block alert-success">
<strong>Changes for Simulated data</strong> 

Finally, to ensure that you reduce the simulated images using the same reference files that are as close as possible to those used to create them, please set the following environment variable:

>`$ export CRDS_CONTEXT='jwst_0764.pmap'`
    
This will force the CRDS reference file mapping that is closest to that used to create the simulated images. This step will not be necessary with real data, as you will likely want to use the most recent reference files in your reduction. 
    
Note: The reference file mapping used to create the simulated images has since been deprecated. The main difference between the old mapping and `jwst_0674.pmap` is in the bad pixel maps. Therefore, a small number of pixels in your reduced images may be incorrectly flagged as bad or incorrectly assumed to be good.

</div>

You can either relaunch this notebook after setting these environment variables, or you can set them using the following cells:

In [16]:
# Uncomment below to set CRDS_PATH, CRDS_SERVER_URL, and CRDS_CONTEXT

# Make sure to replace with the path to your CRDS cache directory
%env CRDS_PATH=/home/jovyan/crds_cache/
%env CRDS_SERVER_URL=https://jwst-crds.stsci.edu
%env CRDS_CONTEXT=jwst_0764.pmap

env: CRDS_PATH=/home/jovyan/crds_cache/
env: CRDS_SERVER_URL=https://jwst-crds.stsci.edu
env: CRDS_CONTEXT=jwst_0764.pmap


<a id='system_requirements'></a>
### System Requirements

NIRCam image files are large, and can quickly fill up hard drive space. 

Please note that after running the steps in this notebook, your working directory and sub directories will take up **~8 GB** of space. This includes:

* ~900 MB of raw images and auxiliary files (galaxy seed images)
* ~400 MB of custom reference files (gain maps and an additional flat field for detector A5)
* ~6.8 GB of pipeline outputs, including some interim products that can be deleted to save space.

Additionally, as noted above, the `CRDS_PATH` directory will store an additional **7-9 GB**, with the NIRCam darks taking up the most space (~3.5 GB each).

Stage 3 of the pipeline also requires a considerable amount of memory, with the peak memory usage occuring during the resampling step. The partial mosaics we create in this notebook will reqiure ~4 GB of memory, reasonable for most laptops. However, the full CEERS 5 mosaics can require 90+ GB! (See `part2/README.txt` for information on how to produce the full mosaics.)

[Top of Notebook](#top)

<a id='imports'></a>
## Imports

In [17]:
import os
import numpy as np
import asdf
import json

# JWST pipeline-related modules
from jwst.datamodels import dqflags

# The entire jwst pipeline
from jwst.pipeline import calwebb_detector1
from jwst.pipeline import calwebb_image2
from jwst.pipeline import calwebb_image3
from jwst import datamodels

# importing an individual pipeline step
from jwst.skymatch import SkyMatchStep

# Custom scripts for use later
from plotimages import plot_images
from remstriping import measure_striping
from applyflat import apply_custom_flat

Set up matplotlib for plotting

In [18]:
import matplotlib.pyplot as plt
from matplotlib import rcParams

# Use this version for non-interactive plots (easier scrolling of the notebook)
#%matplotlib inline

# Use this version if you want interactive plots
%matplotlib notebook

# These gymnastics are needed to make the sizes of the figures
# be the same in both the inline and notebook versions
#%config InlineBackend.print_figure_kwargs = {'bbox_inches': None}

# You may want to change the following configurations to customize 
# figure sizes and resolutions
rcParams['figure.figsize'] = [11,8]
rcParams['figure.dpi'] = 80
rcParams['savefig.dpi'] = 80

Check which version of the pipeline you are running (we recommend v1.3.3 with these simulated images):

In [19]:
import jwst
print(jwst.__version__)

1.3.3


Check that the CRDS environment variables are set

In [20]:
try:
    print(os.environ['CRDS_PATH'])
except KeyError:
    print('CRDS_PATH environment variable not set!')

try:
    print(os.environ['CRDS_SERVER_URL'])
except KeyError:
    print('CRDS_SERVER_URL environment variable not set!')

try:
    print(os.environ['CRDS_CONTEXT'])
except KeyError:
    print('CRDS_CONTEXT environment variable not set!')

/home/jovyan/crds_cache/
https://jwst-crds.stsci.edu
jwst_0764.pmap


Set the directories for pipeline outputs and source data 

In [21]:
data_dir = '/home/shared/preloaded-fits/ceers-data/nircam/part1'
output_dir = os.path.join(os.getcwd(), 'calibrated')
if not os.path.exists(output_dir):
    os.mkdir(output_dir)


<a id='calling_methods'></a>
## Methods for Calling Steps/Pipelines

There are three common methods by which the pipeline or pipeline steps can be called. From within python, the `run()` and `call()` methods of the pipeline or step classes can be used. Alternatively, the `strun` command can be used from the command line. Within this notebook, we show examples of all three methods. 

When using the `call()` method or `strun`, optional input parameters can be specified via [parameter reference files](#parameter_reffiles). When using the `run()` method, these parameters are instead specified within python.

As a quick example, the following three cells demonstrate how to call the pipeline with all default parameter values. In these cases, the pipeline falls back to retrieving default parameter values from the pipeline code itself, or by retrieving the default parameter reference file stored in CRDS.

<div class="alert alert-block alert-info">
    
Using the run() method: default parameter values come from the pipeline itself
```    
 >>>   detector1 = calwebb_detector1.Detector1Pipeline()
 >>>   run_output = detector1.run(uncal_file)
```    
</div>

<div class="alert alert-block alert-info">
    
Using the call() method: default parameter reference file retrieved from CRDS
```
>>>    detector1 = calwebb_detector1.Detector1Pipeline()
>>>    call_output = detector1.call(uncal_file)
```    
</div>

<div class="alert alert-block alert-info">
    
Using strun on the command line with all default parameter values:    
```
    strun jwst.pipeline.Detector1Pipeline jw01345005001_01101_00001_nrca1_uncal.fits
```
    
</div>

<a id='parameter_reffiles'></a>
## Parameter Reference Files

When calling a pipeline or pipeline step using the `call()` method or `strun` on the command line, [parameter reference files](https://jwst-pipeline.readthedocs.io/en/stable/jwst/stpipe/config_asdf.html#config-asdf-files) can be used to specify values for input parameters. These reference files are [asdf](https://asdf.readthedocs.io/en/stable/) format and appear somewhat similar to json files when examined in a text editor. 

Versions of parameter reference files containing default parameter values for each step and pipeline are available in CRDS. When using the `call()` method, if you do not specify a parameter reference file name in the call, the pipeline or step will retrieve and use the appropriate file from CRDS, which will then run the pipeline or step with the parameter values in that file. If you provide the name of a parameter reference file, then the parameter values in that file will take precedence. For any parameter not specified in your parameter reference file, the pipeline will use the default value.

When using `strun`, the parameter reference file is a required input in order to specify non-default parameter values. 

As an example, you can save a copy of the default parameter file with the command:

    strun calwebb_detector1 jw01345005001_01101_00001_nrca1_uncal.fits --save-parameters detector1_params.asdf
    
This file can then be edited to change the default values and used when calling the pipeline. We have provided parameter files for each pipeline stage that we have edited to reflect how we run the pipeline on CEERS data.

In [22]:
# Define the parameter files here for convenience
detector1_paramfile = os.path.join(data_dir, 'detector1_edited.asdf')
image2_paramfile = os.path.join(data_dir, 'image2_edited.asdf')
image3_swc_paramfile = os.path.join(data_dir, 'image3_swc_edited.asdf')
image3_lwc_paramfile = os.path.join(data_dir, 'image3_lwc_edited.asdf')

Let's take a look at the contents of a parameter reference file. We'll open it using the asdf package, and use the `tree` attribute to see what's inside:

In [23]:
det1_reffile = asdf.open(detector1_paramfile)
det1_reffile.tree

{'asdf_library': {'author': 'The ASDF Developers',
  'homepage': 'http://github.com/asdf-format/asdf',
  'name': 'asdf',
  'version': '2.8.1'},
 'history': {'extensions': [{'extension_class': 'asdf.extension.BuiltinExtension',
    'software': {'name': 'asdf', 'version': '2.8.1'}}]},
 'class': 'jwst.pipeline.calwebb_detector1.Detector1Pipeline',
 'meta': {'author': '<SPECIFY>',
  'date': '2021-06-21T12:52:24',
  'description': 'Parameters for calibration step jwst.pipeline.calwebb_detector1.Detector1Pipeline',
  'instrument': {'name': '<SPECIFY>'},
  'origin': '<SPECIFY>',
  'pedigree': '<SPECIFY>',
  'reftype': '<SPECIFY>',
  'telescope': '<SPECIFY>',
  'useafter': '<SPECIFY>'},
 'name': 'Detector1Pipeline',
 'parameters': {'input_dir': 'uncals',
  'output_dir': 'calibrated',
  'output_ext': '.fits',
  'output_file': None,
  'output_use_index': True,
  'output_use_model': False,
  'post_hooks': [],
  'pre_hooks': [],
  'save_calibrated_ramp': False,
  'save_results': True,
  'search_ou

The top part of the file contains various metadata entries about the file itself. Below that, you'll see a `'name'` entry, which lists `Detector1Pipeline` as the class to which these parameters apply. The next line contains the `parameters` entry, which lists parameters and values attached to the pipeline itself. Below this is the `steps` entry, which contains a list of dictionaries. Each dictionary refers to one step within the pipeline, and specifies parameters and values that apply to that step. If you look through these entries, you'll see the same parameters and values that we specified manually when using the `run()` method below.

In [24]:
# Don't forget to close the file
det1_reffile.close()

[Top of Notebook](#top)

---
<a id='detector1'></a>
## The calwebb_detector1 pipeline: Ramps to Slopes

**Description**

The Stage 1 [*calwebb_detector1* pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_detector1.html#calwebb-detector1) applies basic detector-level corrections to all exposure types (imaging, spectroscopic, coronagraphic, etc.). It is applied to one exposure at a time, beginning with an uncalibrated multiaccum ramp (*_uncal.fits file). Each input raw data file is composed of one or more ramps (integrations) containing increasing count values from the non-destructive detector readouts. For details on multiaccum files and data collection, see the JDox page on [how up-the-ramp readouts work](https://jwst-docs.stsci.edu/understanding-exposure-times#UnderstandingExposureTimes-uptherampHowup-the-rampreadoutswork). The final output from this call is an uncalibrated slope image which is ready to go into the Stage 2 pipeline. "Uncalibrated" in this case means that the data are in units of DN/sec. In Stage 2 the flux calibration will be applied, at which point the data will be in physical units (e.g. MJy/sr) and referred to as "calibrated".

All JWST data, regardless of instrument and observing mode, are processed through the Stage 1 pipeline. The corrections performed are the same across all near-IR instruments. See [Figure 1](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_detector1) on the *calwebb_detector1* algorithm page for a map of which steps are performed on NIR data.

In the sections below, we will run the entire Stage 1 pipeline on two uncalibrated NIRCam files. The pipeline is a wrapper which will string together all of the appropriate steps in the proper order. To explore how each individual step of the Stage 1 pipeline changes the input data, please see the ['Ramps to Slopes' notebook from JWebbinar 3](https://stsci.app.box.com/s/z5bznws56f9m1j505vhnpud35nxrjr50). 

**Inputs**

* A raw exposure (`*_uncal.fits`) containing the 4-dimensional raw data from all detector readouts: (ncols x nrows x ngroups x nintegrations).

**Outputs**

* A 2D countrate image (`*_rate.fits`) resulting from averaging over the exposure's integrations.
* A 3D countrate image (`*_rateints.fits`) containing the results of each integration in separate extensions.

**Note:** The CEERS5 exposures only have one integration, and so the `*_rate.fits` and `*_rateints.fits` files will be identical. The `*_rateints.fits` files can be deleted to save disk space.

<div class="alert alert-block alert-success">
<strong>Changes for Simulated data</strong>
    
For the reduction of simulated CEERS data, we have made the following changes to the defaults for *calwebb_detector1*:
    
* We **do not** skip IPCStep (which is skipped by default). This step corrects for interpixel capicitance, which Mirage adds to the simulated data. 
    
* To save on time, we skip PersistenceStep as the simulated data do not include persistence. The [Persistence step](https://jwst-pipeline.readthedocs.io/en/stable/jwst/persistence/description.html) will be a relevant correction for real data.
    
We have made these changes to <code>detector1_edited.asdf</code>, and will specify them in the pipeline call using the <code>run()</code> method below.
    
We will also supply **custom-made gain maps for the Jump and RampFit steps**. These gain maps were created to match the average value that Mirage uses in creating the images, rather then the pixel-dependent gain corrections that are present in the gain reference files from CRDS.

</div>

<a id='run_method_detector1'></a>
#### Call the pipeline using the run() method

When using the `run()` method to execute a pipeline (or step), the pipeline class is first instantiated without the data to be processed. Optional input parameters are specified using attributes of the class instance. Finally, the call to the `run()` method is made and the data are supplied.  See here for [more examples of the run() method](https://jwst-pipeline.readthedocs.io/en/stable/jwst/stpipe/call_via_run.html).

The `run()` method does not take any kind of parameter reference file as input. If you wish to set values for various parameters, you must do that manually. Below, we set several parameters in order to show how it's done. 

Note that you can use the `spec` property to see the available parameters and default values for a pipeline step. For example:

> ```from jwst.refpix import RefPixStep```  
> ```print(RefPixStep.spec)``` 

will list the parameters and default values associated with the reference pixel subtraction step. The `spec` property is less useful for the pipelines themselves, as it does not show the parameters for the steps comprising the pipeline.

All steps and pipelines have several common parameters that can be set. 

* `save_results` specifies whether or not to save the output of that step/pipeline to a file. The default is False.
* `output_dir` is the directory into which the output files will be saved.
* `output_file` is the base filename to use for the saved result. Note that each step/pipeline will add a custom suffix onto output_file. 

We will use the `run()` method on the first of our raw files: `jw01345005001_01101_00001_nrca1_uncal.fits`

The following step can take a few minutes to run and will output a lot of logging information with details about what the pipeline is doing.

In [26]:
uncal_file = os.path.join(data_dir, 'uncals/jw01345005001_01101_00001_nrca1_uncal.fits')

# Create an instance of the pipeline class
detector1 = calwebb_detector1.Detector1Pipeline()

# Set some parameters that pertain to the
# entire pipeline
detector1.output_dir = output_dir
detector1.save_results = True

# Set some parameters that pertain to some of the individual steps
# turn on IPCStep
detector1.ipc.skip = False
# turn off PersistenceStep
detector1.persistence.skip = True

# Specify the name of the gain file that will override 
# the existing gain reference file used for the jump and ramp_fit steps
detector1.jump.override_gain = os.path.join(data_dir, 'gains_v2.1.0/jwst_nircam_gain_nrca1.fits')
detector1.ramp_fit.override_gain = os.path.join(data_dir, 'gains_v2.1.0/jwst_nircam_gain_nrca1.fits')

# Call the run() method
run_output = detector1.run(uncal_file)

2022-01-26 15:14:18,562 - stpipe.Detector1Pipeline - INFO - Detector1Pipeline instance created.
2022-01-26 15:14:18,563 - stpipe.Detector1Pipeline.group_scale - INFO - GroupScaleStep instance created.
2022-01-26 15:14:18,565 - stpipe.Detector1Pipeline.dq_init - INFO - DQInitStep instance created.
2022-01-26 15:14:18,567 - stpipe.Detector1Pipeline.saturation - INFO - SaturationStep instance created.
2022-01-26 15:14:18,568 - stpipe.Detector1Pipeline.ipc - INFO - IPCStep instance created.
2022-01-26 15:14:18,569 - stpipe.Detector1Pipeline.superbias - INFO - SuperBiasStep instance created.
2022-01-26 15:14:18,571 - stpipe.Detector1Pipeline.refpix - INFO - RefPixStep instance created.
2022-01-26 15:14:18,572 - stpipe.Detector1Pipeline.rscd - INFO - RscdStep instance created.
2022-01-26 15:14:18,574 - stpipe.Detector1Pipeline.firstframe - INFO - FirstFrameStep instance created.
2022-01-26 15:14:18,575 - stpipe.Detector1Pipeline.lastframe - INFO - LastFrameStep instance created.
2022-01-26 1

<div class="alert alert-block alert-info">
    
**Note:** One of the most time-intensive steps of the *calwebb_detector1* pipeline is downloading the dark reference file (>3 GB) from CRDS (see the [Reference Files](#reference_files) section). It can sometimes happen that the CRDS server times out before the file is fully downloaded. In these cases, the above call to `run()` will crash. If this happens repeatedly, you man choose to manually download the dark at the following link: 
    
[https://jwst-crds.stsci.edu/browse/jwst_nircam_dark_0040.fits](https://jwst-crds.stsci.edu/browse/jwst_nircam_dark_0040.fits)
<br><br>
    
The downloaded dark should then be placed in your `$CRDS_PATH` directory at:

    $CRDS_PATH/crds_cache/references/jwst/nircam

</div>

You'll notice that in the `calibrated` directory there are now two new files: `jw01345005001_01101_00001_nrca1_rate.fits` and `jw01345005001_01101_00001_nrca1_rateints.fits`. They are both count rate images (DN/sec), and the `rate.fits` file will be passed to the Stage 2 pipeline. As described above, the `rateints.fits` file is identical to the `rate.fits` file because the CEERS observations involve only a single integration.

Let's compare the raw input file with the countrate output file. We'll plot the last group of the raw multiaccum ramp file (`group=8` below) to see the counts from the full ramp.

In [27]:
# Specify which group of the uncal exposure using the group keyword
# There are 9 groups in the CEERS exposures, counting from 0 
# Let's look at the last one (group=8)
plot_images(os.path.join(data_dir, 'uncals/jw01345005001_01101_00001_nrca1_uncal.fits'),
            os.path.join(output_dir, 'jw01345005001_01101_00001_nrca1_rate.fits'), 
            title1='uncal', title2='rate', group=8)

<IPython.core.display.Javascript object>

<a id='call_method_detector1'></a>
#### Call the pipeline using the call() method

When using the `call()` method, a single command will instantiate and run the pipeline (or step). The input data and optional parameter reference files are supplied in this single command. See here for [example usage of call() method](https://jwst-pipeline.readthedocs.io/en/stable/jwst/stpipe/call_via_call.html).

There are two options for calling the pipeline with the `call()` method: 

2. providing a nested dictionary of parameter values, and
2. using the parameter reference file.

We will demonstrate method (1) with the second uncal file, `jw01345005001_01101_00001_nrca5_uncal.fits`. 

We also show how to use method (2) in a raw cell so as not to execute another call. If you wish to try it out, use the pull-down menu above to change the cell to be 'Code', and then execute it. (Or, Click 'Cell' > 'Cell Type' > 'Code')

<div class="alert alert-block alert-info">

<b>Method #1:</b>
In this case, build a nested dictionary that specifies parameter values for various steps, and provide it in the call to call().
</div>

In [None]:
uncal_file = 'uncals/jw01345005001_01101_00001_nrca5_uncal.fits'
parameter_dict = {'ipc': {'skip': False},
                  'persistence': {'skip': True},
                  'jump': {'override_gain': 'gains_v2.1.0/jwst_nircam_gain_nrca5.fits'},
                  'ramp_fit': {'override_gain': 'gains_v2.1.0/jwst_nircam_gain_nrca5.fits'}
                 }
call_output = calwebb_detector1.Detector1Pipeline.call(uncal_file, output_dir=output_dir, save_results=True,
                                                       steps=parameter_dict)


<div class="alert alert-block alert-info">
    
**Note:** If the dark reference file for the NRCA5 detector did not fully download, you can manually download it at the following link: 
    
[https://jwst-crds.stsci.edu/browse/jwst_nircam_dark_0043.fits](https://jwst-crds.stsci.edu/browse/jwst_nircam_dark_0043.fits)
<br><br>
    
The downloaded dark should then be placed in your `$CRDS_PATH` directory at:

    $CRDS_PATH/crds_cache/references/jwst/nircam

</div>

<div class="alert alert-block alert-info">

<b>Method #2:</b>
Provide the name of the observation file, the pipeline-specific input paramters, and the name of the parameter reference file that specifies step-specific parameters
</div>

<a id='command_line_detector1'></a>
#### Call the pipeline using the command line

Calling a pipeline or step from the command line is similar to using the `call()` method. The data file to be processed, along with an optional parameter reference file and optional parameter/value pairs can be provided to the `strun` command. See here for [additional examples of command line calls](https://jwst-pipeline.readthedocs.io/en/stable/jwst/introduction.html?highlight=%22command%20line%22#running-from-the-command-line).

In the cell below we provide two commands that use `strun` to call the *calwebb_detector1* pipeline on the two uncal files. The pipeline class is contained in the parameter reference file, and so there is no need to specify it in the command itself. We also override the gain files as in the above examples. 

<div class="alert alert-block alert-info">

    
```
strun detector1_edited.asdf uncals/jw01345005001_01101_00001_nrca1_uncal.fits --steps.jump.override_gain='gains_v2.1.0/jwst_nircam_gain_nrca1.fits' --steps.ramp_fit.override_gain='gains_v2.1.0/jwst_nircam_gain_nrca1.fits' 
    
strun detector1_edited.asdf uncals/jw01345005001_01101_00001_nrca5_uncal.fits --steps.jump.override_gain='gains_v2.1.0/jwst_nircam_gain_nrca5.fits' --steps.ramp_fit.override_gain='gains_v2.1.0/jwst_nircam_gain_nrca5.fits' 
```
</div>

[Top of Notebook](#top)

---
<a id='striping'></a>
## Custom Step - Correction for Image Striping

<div class="alert alert-block alert-success">
<strong>Changes for Simulated data</strong> --     
This step is a custom addition we have added for CEERS simulated data.

    
You may notice some horizontal and vertical striping patterns present in the `*_rate.fits` images. The striping is most likely due to [1/f noise related to the detector readout electronics](https://jwst-docs.stsci.edu/near-infrared-camera/nircam-instrumentation/nircam-detector-overview/nircam-detector-performance). <br><br>

We have found that the [RefPix step of *calwebb_detector1*](https://jwst-pipeline.readthedocs.io/en/stable/jwst/refpix/description.html) with `odd_even_columns=True` and `use_side_ref_pixels=True` does not fully remove the pattern, no matter what value is chosen for `side_smoothing_length`. Instead, we have developed a script, `remstriping.py`, to measure and remove the striping pattern from a countrate image. <br><br>

Below we use the `measure_striping` function to clean the two countrate images we have created so far. The striping patterns are measured using the following steps:
1. The appropriate flat field is applied to the countrate image, allowing for a cleaner measure of the striping patterns.
2. Source flux is masked out using the seed images output by Mirage. This method works for simulated data because we know the input positions of all sources. For real data, we would instead perform an iterative reduction, using the stacked exposures to determine the positions of sources and then repeating the reduction masking these source positions to measure the striping. We have included the Mirage seed images with the `*_uncal.fits` images in the `uncals` directory.
3. The background pedestal is measured and removed. 
4. The image is collapsed (using a sigma-clipped median) first along columns to measure the horizontal striping and then along rows to measure the vertical striping. 

The horizontal and vertical patterns are then subtracted from the input countrate image.<br><br>

**Note:** The original rate image file is copied to `*_rate_orig.fits`, and the output of `measure_striping` is saved to `*_rate.fits`, overwriting the input file.

    Args:

        image (str): image filename, including full relative path
        apply_flat (Optional [bool]): if True, identifies and applies the 
            corresponding flat field before measuring striping pattern. 
            Applying the flat first allows for a cleaner measure of the 
            striping, especially for the long wavelength detectors. 
            Default is True.
        mask_sources (Optional [bool]): If True, masks out sources in image
            before measuring the striping pattern so that source flux is 
            not included in the calculation of the sigma-clipped median.
            Sources are identified using the Mirage seed images.
            Default is True.
        seedim_directory (Optional [bool]): Directory containing 
            Mirage seed images, used if mask_sources is True. 
            Default is working directory.
        threshold (Optional [float]): threshold (in ADU/s) to use in the 
            seed images when identifying pixels to mask. This will depend on 
            the seed image and brightness of input sources. Default is 0.01
    
</div>

In [None]:
# measure and remove the horizontal and vertical striping from the two countrate images
rates = [os.path.join(output_dir,'jw01345005001_01101_00001_nrca1_rate.fits'),
         os.path.join(output_dir,'jw01345005001_01101_00001_nrca5_rate.fits')]
for rate in rates:
    measure_striping(rate, apply_flat=True, mask_sources=True, seedim_directory='uncals', threshold=0.01)
    
# There will be some warnings related to empty slices in the images, where the rows and columns 
# of reference pixels along the image edges have been masked out of the median calculation. 

Let's compare one of the countrate images before and after this correction.

In [None]:
plot_images(os.path.join(output_dir, 'jw01345005001_01101_00001_nrca1_rate_orig.fits'),
            os.path.join(output_dir, 'jw01345005001_01101_00001_nrca1_rate.fits'), 
            title1='original rate', title2='striping removed')

We have removed the completely horizontal and vertical striping. There are still some striping patterns present in the image on the right, especially visible along the left edge, though these are at a slight angle and will be removed as part of the flat field correction in Stage 2. The difference between the left and right images may be easiest to see if you are using matplotlib in interactive mode and zoom in on a subsection of each image.  



[Top of Notebook](#top)

---
<a id='image2'></a>
## The calwebb_image2 pipeline: Calibrated Slope Images

The Stage 2 [*calwebb_image2* pipeline](https://jwst-pipeline.readthedocs.io/en/stable/jwst/pipeline/calwebb_image2.html) applies instrumental corrections and calibrations to the slope images output from Stage 1. This includes background subtraction, the creation of a full World Coordinate System (WCS) for the data, application of the flat field, and flux calibration. In most cases the final output is an image in units of surface brightness. Whereas the input files had suffixes of `*_rate.fits*`, the output files have suffixes of `*_cal.fits*`.

In addition to the steps above, by default the Stage 2 pipeline will also run the [Resample](https://jwst-pipeline.readthedocs.io/en/stable/jwst/resample/main.html) step on the calibrated images, in order to remove the effects of instrument distortion. This step outputs files with the suffix `*_i2d.fits*` that contain "rectified" images. However, these files are meant only for user examination of the data. It is the `*_cal.fits*` files that are passed on to Stage 3 of the pipeline.

All JWST imaging mode data, regardless of instrument, are processed through the *calwebb\_image2* pipeline. The steps and the order in which they are performed is the same for all data. See [Figure 1](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_image2) on the *calwebb_image2* algorithm page for a map of the steps are performed on the input data.

**Inputs**
* A 2D countrate image (`*_rate.fits`) in units of DN/sec. The user can input a single image file or an association file listing several files, in which case the processing steps will be applied to each input exposure, one at a time.

**Outputs**
* A 2D calibrated, but unrectified, exposure (`*_cal.fits`) in units of MJy/sr
* A 2D resampled, or rectified, image (`*_i2d.fits`) in units of MJy/sr


**Note:** At this stage, the resampled `*_i2d.fits` images are intended for **quick-look use only**, while the `*_cal.fits` files are passed through for Stage 3 processing. We have chosen to **skip ResampleStep of *calwebb_image2* to save on both processing time and disk space**. If you wish to perform this step to inspect the outputs, change the `skip: true` in the `jwst.resample.resample_step.ResampleStep` dictionary of the `image2_edited.asdf` parameter file (line 136) to `skip: false`. Alternatively comment out the line `image2.resample.skip = True` in the cell using the `run()` method.

<a id='run_method_image2'></a>
#### Call the pipeline using the run() method

As before, we will use the `run()` method on the first of our uncalibrated files: `jw01345005001_01101_00001_nrca1_rate.fits`

In [None]:
rate_file = 'calibrated/jw01345005001_01101_00001_nrca1_rate.fits'

# Create an instance of the pipeline class
image2 = calwebb_image2.Image2Pipeline()

# Set some parameters that pertain to the
# entire pipeline
image2.output_dir = output_dir
image2.save_results = True
# turn off the ResampleStep, comment out to produce the 
# individual rectified *_i2d.fits for quick-look checks
image2.resample.skip = True

# Call the run() method
image2.run(rate_file)

You'll notice that in the `calibrated` directory there is now another new file: `jw01345005001_01101_00001_nrca1_cal.fits`. It is a calibrated image in units of MJy/sr. 

<a id='call_method_image2'></a>
#### Call the pipeline using the call() method

We do not have any recommended changes to the *calwebb_image2* pipeline for simulated CEERS images, and so the two options for using the `call()` method are essentially the same for *calwebb_image2*. As we are accepting the default parameter values, there is no need to specify a parameter dictionary, and so we only provide the paramfile. 

We will demonstrate the `call()` method with the second countrate file, `jw01345005001_01101_00001_nrca5_rate.fits`. 

In [None]:
rate_file = 'calibrated/jw01345005001_01101_00001_nrca5_rate.fits'
call_output = calwebb_image2.Image2Pipeline.call(rate_file, output_dir=output_dir,
                                                  save_results=True,
                                                  config_file=image2_paramfile)

<a id='command_line_image2'></a>
#### Call the pipeline using the command line

In the cell below we provide two commands that use `strun` to call the *calwebb_image2* pipeline on the two countrate files. 

<div class="alert alert-block alert-info">

    
```
strun image2_edited.asdf calibrated/jw01345005001_01101_00001_nrca1_rate.fits 
   
strun image2_edited.asdf calibrated/jw01345005001_01101_00001_nrca5_rate.fits 
    
```
</div>

After running *calwebb_image2*, let's compare the countrate input file with the calibrated output file for the A5 detector.

In [None]:
plot_images(os.path.join(output_dir, 'jw01345005001_01101_00001_nrca5_rate.fits'),
            os.path.join(output_dir, 'jw01345005001_01101_00001_nrca5_cal.fits'), 
            title1='rate', title2='cal')

The most notable difference between the `rate.fits` image on the left and the `cal.fits` image on the right is the application of the flat field correction. The `cal.fits` image is also now in units of MJy/sr rather than DN/sec.

[Top of Notebook](#top)

---
<a id='a5_detector'></a>
## Custom Step - Removing A5 Detector Feature

<div class="alert alert-block alert-success">
<strong>Changes for Simulated data</strong> --     
This step is a custom addition we have added for CEERS simulated data.

    
You may notice that there is a large-scale but low-level feature present in the bottom center of the `nrca5` calibrated image. We believe this feature has been artifically introduced by an inversion about x for some reference files during the simulated image creation. This feature is therefore not expected to be present in real data. <br><br>

We have determined that this is a multiplicative feature that can be removed similar to applying a flat field. We have therefore created a set of custom flat fields for the CEERS 5 pointing and observation specification. They are the result of combining 30 simulated `cal.fits` files with no input sources in each filter, and are stored in the `customflats` directory. <br><br>

Below we use the `apply_custom_flat` function from `applyflat.py` to remove this feature from the A5 image. <br><br>


**Note:** The original cal image file is copied to `*_unflat.fits`, and the output of `apply_custom_flat` is saved to `*_cal.fits`, overwriting the input file.

    Args:

        image (str): NRCA5 *cal.fits file, including full relative path
        suffix (Optional [str]): suffix to add to original input files.
            Default is '_unflat.fits'
    
</div>

In [None]:
# apply a custom flat to the NRCA5 detector
nrca5_cal = os.path.join(output_dir,'jw01345005001_01101_00001_nrca5_cal.fits')
apply_custom_flat(nrca5_cal) 

Let's compare the NRCA5 image before and after this correction.

In [None]:
plot_images(os.path.join(output_dir, 'jw01345005001_01101_00001_nrca5_unflat.fits'),
            os.path.join(output_dir, 'jw01345005001_01101_00001_nrca5_cal.fits'), 
            title1='before correction', title2='after correction')

[Top of Notebook](#top)

---
<a id='associations'></a>
## Association Files

The Stage 3 pipeline must be called using a json-formatted file called an ["association" (ASN) file](https://jwst-pipeline.readthedocs.io/en/stable/jwst/associations/index.html). The association file presents your data files in organized groups. When retrieving your observations from MAST, you will be able to download the association files for your data along with the fits files containing the observations.

We have created ASN files for the Stage 3 runs we demonstrate in this notebook:

* `f115w_nrca1.json` - groups the three F115W calibrated images we will combine into a mosaic 
* `f277w_nrca5.json` - groups the three F277W calibrated images we will combine into a mosaic 
* `jw0134500500*_01101_0000*_nrca*.json` - ASN files for running the Stage 3 pipeline step SkyMatch on individual images

See the [`asn_from_list()` function](https://jwst-pipeline.readthedocs.io/en/stable/api/jwst.associations.asn_from_list.asn_from_list.html#jwst.associations.asn_from_list.asn_from_list) for information on creating your own association files.

Let's open one asn file here as an example:

In [None]:
# Open the association file and load into a json object
with open('f115w_nrca1.json') as f_obj:
  asn_data = json.load(f_obj)

In [None]:
asn_data

Here we see that the association file begins with a few lines of data that give high-level information about the association. The most important entry here is the `asn_rule` field. Association files have different formats for the different stages of the pipeline. You should be sure that the `asn_rule` matches the pipeline that you will be running. In this case we'll be running the Stage 3 pipeline, and we see that the `asn_rule` mentions "Level3", which is what we want.

Beneath these lines, we see the `products` field. This field contains a list of dictionaries that specify the files that belong to this association, and the types of those files. When the Stage 3 pipeline is run on this association file, all files listed here will be run through the calibration steps.

[Top of Notebook](#top)

---
<a id='skymatch'></a>
## Custom Step - Sky Subtraction

<div class="alert alert-block alert-success">
<strong>Changes for Simulated data</strong> --     
This step is a custom addition we have added for CEERS simulated data.

The final stage of the pipeline, *calwebb_image3*, includes a step called [SkyMatch](https://jwst-pipeline.readthedocs.io/en/stable/jwst/skymatch/README.html). SkyMatch computes sky values in a collection of images that contain both sky and source signal. It can do so in a way that matches the sky levels of several images before they are combined to form a mosaic.<br><br>
    

We have found that the SkyMatch step does not properly remove the background in simulated CEERS images when run on a collection of images. This is likely due to a mismatch between the input photometric calibration parameters used by Mirage in simulating the data and those used by the `jwst` calibration pipeline. Specifically, Mirage translates input magnitudes into count rates using HST-style PHOTFLAM values derived from filter throughput curves. Mirage uses the same PHOTFLAM value for all short wavelength detectors for a given module and filter. The `jwst` calibration pipeline, however, converts count rates to MJy/sr using the 'photmjsr' parameter in the flux calibration reference file, which depends on the pixel area and a mean gain value, both of which vary detector to detector. A single value does not exist that can bring all simulated detector images to the same background level. Additionally, the CEERS dithers are not large enough to cover the gaps between detectors, and so there are many exposures with no overlap area in common for globally matching the sky values.<br><br>

We find that the background levels in the final mosaics are significantly improved if SkyMatchStep is run on each `*_cal.fits` file individually before running *calwebb_image3*. <br><br>
    
    
As a step in the Stage 3 pipeline, SkyMatchStep requires an association file, which we have created for each individual `*_cal.fits` file. We have also edited a parameter file (`skymatch_edited.asdf`) that includes some minor changes to the default sky statistics parameter values used by the step. We have tested a grid of parameter values and find that these yield the best sky calculations for the CEERS simulated data:

* lsigma, usigma = 2.0  - Lower and upper clipping limits, in sigmas, used when computing the sky value (Default=4.0)
* upper = 1.0  - An optional value indicating the upper limit of usable pixel values for computing the sky (Default=None)
* nclip = 10  - Number of clipping iterations to use when computing the sky value (Default=5)

In the following cells, we will demonstrate running SkyMatchStep individually on each image using the three methods for calling the pipeline.
    
</div>

In [None]:
# using the run() method with an ASN file created for 
# jw01345005001_01101_00001_nrca1_cal.fits
asn = 'jw01345005001_01101_00001_nrca1.json'
skymatch = SkyMatchStep()
skymatch.save_results = True
skymatch.output_dir = output_dir
# specifying output_file provides the base name to which 'skymatchstep' will
# be automatically added. If we do not specify output_file, SkyMatchStep will
# append 'skymatchstep' to the input filename, resulting in 
# jw01345005001_01101_00001_nrca1_cal_skymatchstep.fits
skymatch.output_file = 'jw01345005001_01101_00001_nrca1'

# sky statistics parameters
skymatch.skymethod = 'local' # the default is global+match, doesn't matter as we're processing files individually
skymatch.lsigma = 2.0
skymatch.usigma = 2.0
skymatch.nclip = 10
skymatch.upper = 1.0

# set the 'subtract' parameter so the calculated sky value is removed from the image
# (subtracting the calculated sky value from the image is off by default)
skymatch.subtract = True 

sky = skymatch.run(asn)

In [None]:
# using the call() method with an ASN file created for 
# jw01345005001_01101_00001_nrca5_cal.fits
asn = 'jw01345005001_01101_00001_nrca5.json'
call_output = SkyMatchStep.call(asn, output_dir=output_dir, input_dir='calibrated',
                                output_file='jw01345005001_01101_00001_nrca5',
                                save_results=True,
                                config_file='skymatch_edited.asdf')

Let's compare the distributions of pixel fluxes before and after running SkyMatchStep.

In [None]:
# read in input images

def plot_hist(image, ax, bins, color, label):
    with datamodels.ImageModel(os.path.join(output_dir,image)) as im:
        data = im.data
        # consider only non-zero and unflagged pixels
        data = data[(im.data != 0) & (im.dq == 0)]
    ax.hist(data, bins=bins, color=color, label=label, alpha=0.5)

# array of flux bins 
fluxbins = np.arange(-0.15, 0.5, 0.01)
 
fig,ax1 = plt.subplots(1, 1, tight_layout=True) 
                               
plot_hist('jw01345005001_01101_00001_nrca1_cal.fits', ax1, 
          fluxbins, 'k', 'Input cal')
plot_hist('jw01345005001_01101_00001_nrca1_skymatchstep.fits', 
          ax1, fluxbins, 'C0', 'Sky-subtracted')
ax1.axvline(0, color='k', ls='dashed')
ax1.legend(fontsize=20)
ax1.set_xlabel('flux/pixel (MJy/sr)')

<div class="alert alert-block alert-info">
    Using <code>strun</code> on the command line:
    
```
strun skymatch_edited.asdf jw01345005001_01101_00001_nrca1.json --output_dir=calibrated --output_file=jw01345005001_01101_00001_nrca1.fits

strun skymatch_edited.asdf jw01345005001_01101_00001_nrca5.json --output_dir=calibrated --output_file=jw01345005001_01101_00001_nrca5.fits    
```
</div>

[Top of Notebook](#top)

---
<a id='break'></a>
## Break - Reducing Additional Images

<div class="alert alert-block alert-info">
We have now taken two images through pipeline stages 1 and 2 as well as the custom reduction steps developed for CEERS simulated data. However, we have only processed one detector image from a single dither in each of the F115W and F277W filters. While the Stage 3 pipeline can run on a single file, it is more instructive to run it on an association of files. Before proceeding to the next step, please run the script <code>rundithers</code> that can be found in this directory. This script will perform the same steps as above on 4 additional images:

* <code>jw01345005002_01101_00002_nrca1_uncal.fits</code> - F115W, A1 detector, dither 2
* <code>jw01345005003_01101_00003_nrca1_uncal.fits</code> - F115W, A1 detector, dither 3
* <code>jw01345005002_01101_00002_nrca5_uncal.fits</code> - F277W, A5 detector, dither 2
* <code>jw01345005003_01101_00003_nrca5_uncal.fits</code> - F277W, A5 detector, dither 3
    
Pipeline steps will be run using the <code>strun</code> method, and the custom steps will be performed by calling the appropriate python scripts. <br><br>

<strong>Important:</strong> Make sure to [activate your conda environment](#installation) and that the [<code>CRDS_PATH</code>, <code>CRDS_SERVER_URL</code>, and <code>CRDS_CONTEXT</code> environment variables](#reference_files) are set in your terminal before running the script. <br><br>
    
For example, to run the script in a bash terminal, type:
    
    sh rundithers
</div>

Following completion of `rundithers`, you will see 22 new files in the `calibrated` directory. For each of the 4 input `*_uncal.fits` files, there should be:

* `*_rate_orig.fits` - The 2D countrate image averaged over all integrations, output by *calwebb_detector1*
* `*_rateints.fits` - The 3D countrate image with each integration in a separate extension, output by *calwebb_detector1*
* `*_rate.fits` - The countrate image with striping removed by our custom processing step
* `*_cal.fits` - The calibrated image, output by *calwebb_image2*
* `*_skymatchstep.fits` - The calibrated, skysubtracted image, output by SkyMatchStep

For the A5 detector F277W images, you will also see `*_unflat.fits` images, which are the original output by *calwebb_image2* before our custom processing step removed the detector feature by applying the custom flat field.

In the following cells, we will pass the `*_skymatchstep.fits` images to the Stage 3 pipeline to create combined mosaics in each filter.

---
<a id='image3'></a>
## calwebb_image3 - Ensemble Calibrations

The Stage 3 [*calwebb_image3* pipeline](https://jwst-pipeline.readthedocs.io/en/stable/jwst/pipeline/calwebb_image3.html) takes one or more calibrated slope images (`*_cal.fits` files) and combines them into a final mosaic image. It then creates a source catalog from this mosaic. Several steps are performed in order to prepare the data for the mosaic creation. These steps largely mirror what is done by [DrizzlePac](https://www.stsci.edu/scientific-community/software/drizzlepac.html) software when working with HST data. 

First, using common sources found across the input images, the WCS of each image is refined. Background levels are then matched across the inputs. Spurious sources (e.g. cosmic rays that were not flagged in the Jump step during Stage 1 processing) are removed by comparing each individual input image to a median image. The indivudal images are combined into a single mosaic image. A source catalog is created based on the mosaic image. And finally, the individual exposures are updated using the information from the preceding steps. New versions of the individual calibrated slope images are produced that contain matched backgrounds, flagged spurious sources, and improved WCS objects. 

All JWST imaging mode data, regardless of instrument, are processed through the *calwebb\_image3* pipeline. The steps and the order in which they are performed is the same for all data. The pipeline is a wrapper which will string together all of the appropriate steps in the proper order. See [Figure 1](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/stages-of-jwst-data-processing/calwebb_image3) on the *calwebb_image3* algorithm page for a map of the steps that are performed on the input data.

**Inputs**
* 2D calibrated images (`*_cal.fits`), organized in an ASN file 

**Outputs**
* 2D cosmic-ray flagged images (`*_crf.fits`), created during the OutlierDetection step
* 2D resampled, combined mosaic image (`*_i2d.fits`) including all exposures in the association, created during the Resample step  
* 2D segmentation map (`*_segm.fits`) based on the `*_i2d.fits` image, created by the SourceCatalog step
* Catalog of photometry (`*_cat.escv`) saved as an ASCII file in `ecsv` format, created by the SourceCatalog step

We have provided ASN files (created using `asn_from_list` as described in [Association files](#associations)) for running *calwebb_image3* on each set of F115W and F277W exposures reduced in this notebook.

<div class="alert alert-block alert-success">
<strong>Changes for Simulated data</strong> 
  
We have turned off two steps in the *calwebb_image3* pipeline:
    
* As the CEERS simulated images have not been created with mis-registrations, there is no need to run the TweakReg step. For real data, there may be small errors in the pointing for each exposure due to an imperfect knowledge of the guidestar positions. In this case TweakRegStep will need to be used to properly align each image. However, for these simulated images with perfect alignment, we have turned off the step to save on processing time and memory.

* We have also turned off SkyMatchStep, since we have already run this step on each image individually.
    
We have also determined that the optimal drizzle parameters for the final CEERS mosaics are:
    
* A pixel scale of 0.015"/pixel for the short wavelength images (F115W, F150W, F200W)
* A pixel scale of 0.03"/pixel for the long wavelength images (F277W, F356W, F444W)
* No "shrinking" of input pixels before drizzling them onto the output image grid (i.e., pixfrac = 1.0)

The drizzling is performed in the [Resample](https://jwst-pipeline.readthedocs.io/en/stable/jwst/resample/main.html) step. The output pixel scale is set by specifying the parameter `pixel_scale_ratio` as a ratio of input to output pixel scales. The `pixfrac` parameter is already set to 1.0 by default. We have therefore adopted the following:

* `pixel_scale_ratio = 0.48` --  output/input = 0.015/0.031 -- for the short wavelength images, and 
* `pixel_scale_ratio = 0.4762` -- 0.03/0.063 -- for the long wavelength images.
  
We have made these changes to `image3_swc_edited.asdf` for the short wavelength images and `image3_lwc_edited.asdf` for the long wavelength images. We will also specify them in the pipeline call using the <code>run()</code> method below.

</div>

<a id='run_method_image3'></a>
#### Call the pipeline using the run() method

As before, we will use the `run()` method on one set of our calibrated images. We will use the ASN file `f115w_nrca1.json`, which will combine the following three F115W images:

* `jw01345005001_01101_00001_nrca1_skymatchstep.fits`
* `jw01345005002_01101_00002_nrca1_skymatchstep.fits`
* `jw01345005003_01101_00003_nrca1_skymatchstep.fits`

This will create an F115W mosaic combining all three dithers, but including just one of the 8 short wavelength detectors. 

In [None]:
asn_file = 'f115w_nrca1.json'

# Create an instance of the pipeline class
image3 = calwebb_image3.Image3Pipeline()

# Set some parameters that pertain to the entire pipeline
image3.output_dir = output_dir
image3.save_results = True

# Set some parameters that pertain to some of the individual steps
# Turn off TweakRegStep
image3.tweakreg.skip = True  
# Turn off SkyMatchStep
image3.skymatch.skip = True
# Set the ratio of input to output pixels to create an output mosaic 
# on a 0.015"/pixel scale
image3.resample.pixel_scale_ratio = 0.48

# Call the run() method
image3.run(asn_file)

You'll notice that in the `calibrated` directory there are now several new files: 

* `jw0134500500*_01101_0000*_nrca1_a3001_crf.fits` are the individual, cosmic ray-flagged images.
* `f115w_nrca1_i2d.fits` is the resampled, rectified output mosaic.
* `f115w_nrca1_segm.fits` is the segmentation map associated with the mosaic.
* `f115w_nrca1_cat.ecsv` is the catalog of detected source positions and photometry. We have not modified any parameters for the SourceCatalog step in the above example, but there are a lot of parameters to play with controlling source detection and photometry.

<a id='call_method_image3'></a>
#### Call the pipeline using the call() method

We will use the `call()` method on the second set of calibrated images. We will use the ASN file `f277w_nrca5.json`, which will combine the following three F277W images:

* `jw01345005001_01101_00001_nrca5_skymatchstep.fits`
* `jw01345005002_01101_00002_nrca5_skymatchstep.fits`
* `jw01345005003_01101_00003_nrca5_skymatchstep.fits`

This will create an F277W mosaic combining all three dithers, but including just one of the 2 long wavelength detectors. 

Remember that in this example we are turning off TweakRegStep and SkyMatchStep, and changing the `pixel_scale_ratio` parameter of ResampleStep. These three changes are specified in the edited paramfile, `image3_lwc_edited.asdf`. We provide this paramfile in the call below, and so there is no need to specify a parameter dictionary. 

In [None]:
asn_file = 'f277w_nrca5.json'
# For the LWC filter F277W, we are using 'image3_lwc_edited.asdf' 
# (which has been saved to the variable image3_lwc_paramfile)
call_output = calwebb_image3.Image3Pipeline.call(asn_file, output_dir=output_dir,
                                                 save_results=True, config_file=image3_lwc_paramfile)

Now that the mosaics in both filters have been created, let's plot them side-by-side.

In [None]:
plot_images(os.path.join(output_dir, 'f115w_nrca1_i2d.fits'),
            os.path.join(output_dir, 'f277w_nrca5_i2d.fits'), 
            title1='F115W A1 Detector', title2='F277W A5 Detector')

The A1 detector overlaps the lower left quadrant of the A5 detector, and so the F115W image on the left covers the same region as the first ~1024 pixels in both *x* and *y* of the F277W image on the right. 

<a id='command_line_image3'></a>
#### Call the pipeline using the command line

In the cell below we provide two commands that use `strun` to call the *calwebb_image3* pipeline with the two ASN files.

<div class="alert alert-block alert-info">

    
```
strun image3_swc_edited.asdf f115w_nrca1.json 
   
strun image3_lwc_edited.asdf f277w_nrca5.json
    
```
</div>

This concludes the processing steps for this notebook. We have now brought 6 raw `*_uncal.fits` files through the full JWST Calibration Pipeline, including customized interim steps necessary to reduce the simulated data. 

If you would like to try reducing the full CEERS 5 pointing in all 6 filters, check out the second part of this data release, `CEERS5/part2`. 

**The full pointing requires a significant amount of disk space and memory, so please read `CEERS5/README.txt` before downloading and `CEERS5/part2/README.txt` running the reduction scripts in that directory.**

[Top of Notebook](#top)