<img style="float: center;" src='https://github.com/STScI-MIRI/MRS-ExampleNB/raw/main/assets/banner1.png' alt="stsci_logo" width="1000px"/> 


<!-- # TSO JWebbinar Notebook 1: Downloading and Calibrating `uncal` TSO Products -->
# NIRISS/SOSS Notebook 1: Downloading and Calibrating 'uncal' TSO Products
-----

**Authors**:
- **Tyler Baines** | Science Support Analyst | NIRISS Branch | tbaines@stsci.edu
- **Néstor Espinoza** | AURA Assistant Astronomer | Mission Scientist for Exoplanet Science | nespinoza@stsci.edu
- **Aarynn Carter** | AURA Assistant Astronomer | NIRISS Branch | aacarter@stsci.edu

**Date Published**: May 1st, 2024

**Last Updated**: May 1st, 2024

<!-- **Pipeline Version**: 1.12.5 -->

## Table of contents
1. [Introduction](#introduction)<br>
      1.1 [Purpose of this Notebook](#purpose)<br>
      1.2 [Data & Context of the Observations](#data)<br>
2. [Imports](#imports)<br>
3. [Downloading & Quick Looks at JWST TSO data](#download)<br>
      3.1 [Downloading TSO data from MAST](#mast)<br>
      3.2 [Quicklook, pt. I: Target Acquisition](#ta)<br>
      3.3 [Quicklook, pt. II: `datamodels` & TSO Science Data Products](#science)<br>
4. [A TSO tour through the `Detector1` stage](#detector1)<br>
      4.1 [Checking data quality flags](#dqflags)<br>
      4.2 [Identifying saturated pixels](#saturation)<br>
      4.3 [Removing detector-level effects: the `superbias` and `refpix` steps](#refpix)<br>
      4.4 [Linearity corrections](#linearity)<br>
      4.5 [Removing the dark current](#dark-current)<br>
      4.6 [Correcting 1/f noise](#one_over_f)<br>
      4.7 [Detecting "jumps" on up-the-ramp sampling](#jump)<br>
      4.8 [Fitting ramps with the `ramp_fit` step](#rampfit)<br>
5. [Final words](#final-words)<br>


CRDS Context used: jwst_1225.pmap

## 1. Introduction <a class="anchor" id="introduction"></a>
<hr style="border:1px solid black">

### 1.1 Purpose of this Notebook<a class="anchor" id="purpose"></a>

In this Notebook, we aim to perform an exploration of Time Series Observations (TSO) products, focusing in particular on products obtained by the [Transiting Exoplanet JWST Early Release Science (ERS) team](https://www.stsci.edu/jwst/science-execution/approved-programs/dd-ers/program-1366) --- a real science dataset that we will reduce starting from the most "raw" forms of data products that can be downloaded from MAST. We will learn how to download those products, as well as how to load them and make them interact with the JWST Calibration Pipeline to calibrate them. In a companion Notebook, we then perform spectroscopic analyses on this dataset.

### 1.2 Data & Context of the Observations<a class="anchor" id="data"></a> 

The input data for this Notebook are observations from the ERS [Program 1366](https://www.stsci.edu/jwst/science-execution/program-information) where we will explore observations of the exoplanet WASP-39b obtained with the JWST [NIRISS/SOSS](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-observing-modes/niriss-single-object-slitless-spectroscopy) mode. This mode uses a unique grism ([GR700XD](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-instrumentation/niriss-gr700xd-grism)) to seperate the light of a source across three diffraction orders covering a broad wavelength range from 0.6-2.8 $\mu m$ with a moderate spectral resolution (R $\approx$ 700 at 1.4 $\mu m$). 

<div class="alert alert-block alert-info"> <b>Notes on the validity of this notebook</b>: It is important to realize that this notebook, as it is, is likely to be quickly outdated as new algorithms and fixes are implemented into the JWST Calibration pipeline, as well as new methodologies and studies update our knowledge of optimally calibrating JWST data products. An up-to-date list of known JWST pipeline issues (some of which we touch on this notebook) can be found on the <a href="https://jwst-docs.stsci.edu/jwst-calibration-pipeline-caveats/known-issues-with-jwst-data-products">Known Issues with the JWST Data Products</a> JDox page. In doubt, or for any questions, please contact <a href="https://jwst-docs.stsci.edu/jwst-help-desk">the JWST Helpdesk</a>!

## 2. Imports <a class="anchor" id="imports"></a>
<hr style="border:1px solid black">

For this demonstration we will need the following packages to be instraalled in your python environemnt:
1. numpy 
2. scipy
3. astropy
4. matplotlib
5. astroquery
6. jwst
7. pastasoss

or you can create a conda environment using the provided requirements file following the step below:

```markdown
conda create -n jwst-soss-demo-py3.10 python=3.10 pip
conda activate jwst-soss-demo-py3.10
pip install -r requirements-soss.txt
```
With the working python environment setup lets begin importing the packages

In [None]:
# ------ General Imports ------
import os

if "CRDS_PATH" not in os.environ:
    %set_env CRDS_PATH $HOME/crds_cache
    %set_env CRDS_SERVER_URL https://jwst-crds.stsci.edu
else:
    print(f"CRDS_PATH: {os.environ['CRDS_PATH']}")
    print(f"CRDS_SERVER_URL: {os.environ['CRDS_SERVER_URL']}")

import numpy as np

# ------ Plotting/Stats Imports ------
import matplotlib.pyplot as plt

# ------ Downloading MAST data products ------
from astroquery.mast import Observations

# ------ JWST Calibration Pipeline Imports ------
import jwst
from jwst import datamodels
from jwst.pipeline import calwebb_detector1

Lastly, lets configure some of the plotting parameters:

In [None]:
plt.rcParams['figure.figsize'] = (12, 4)
plt.rcParams['figure.dpi'] = 100
plt.rcParams['image.origin'] = 'lower'
plt.rcParams['image.aspect'] = 'auto'
plt.rcParams['image.interpolation'] = None
plt.rcParams['image.cmap'] = 'inferno'

## 3. Downloading JWST TSO data <a class="anchor" id="download"></a>
<hr style="border:1px solid black">

The very first step when it comes to analyzing a JWST dataset is to download that data and perform some quick looks so we know the data quality is acceptable to begin with. Here, we will download the `uncal` products, which are one of the "raw"-est forms of dataproducts users can download from MAST. We will perform our data download from MAST using `astroquery.mast` and then use the JWST Calibration pipeline to read and have quicklooks at this data. Let's begin!

### 3.1 Downloading TSO data from MAST<a class="anchor" id="mast"></a> 

To download JWST data from MAST, we will use the `Observations` function from the `astroquery.mast` library. To do this, we need to indicate the properties of the dataset of interest. For this we need to figure out what instrument, filter, program ID _and_ target was obseved. Options for JWST TSO instruments are `NIRISS/SOSS`, `NIRSPEC/SLIT`, `NIRCAM/GRISM`, `MIRI/SLITLESS`, `NIRSPEC/SLIT`, etc. Here, we search for `NIRISS/SOSS`, and the `CLEAR;GR700XD` filter/grating combination, which corresponds to the dataset we want. We define the proposal ID for the ERS program (`1366`) and the name of the target, `WASP-39`:

In [None]:
# Query MAST data
query_results = Observations.query_criteria(
    instrument_name='NIRISS/SOSS',
    filters='CLEAR;GR700XD',
    proposal_id='1366',
    target_name='WASP-39'
)

# Define columns to display
columns_to_display = [
    'obs_collection', 'instrument_name', 'filters', 'target_name', 'obs_id',
    's_ra', 's_dec', 't_exptime', 'proposal_id'
]

# Display results
query_results[columns_to_display].show_in_notebook(display_length=3)


This stores _all_ possible observations in the `query_results` variable. Then, we filter all the products to get only the `SCIENCE`, `UNCAL` data products:

In [None]:
# Get data products:
data_products = Observations.get_product_list(query_results)
# data_products.show_in_notebook(display_length=5)

Now we'll filter the results to obtain the uncalibrated data products:

In [None]:
uncals = Observations.filter_products(
    data_products, 
    productType='SCIENCE',
    productSubGroupDescription='UNCAL')

In [None]:
uncals[['obs_id', 'productSubGroupDescription', 'size']].show_in_notebook()

In [None]:
print(f"Number of final data products {len(uncals)}")

Note how there are 9 data products. The ones with the lowest `size` values are the Target Aquisition exposures, used to align the telescope with the target of interest. These are _very_ useful to check the quality of the observations (and whether or not they were successful!). The _actual_ TSO data are all the products that follow. Note the latter data are segmented --- this is done in the ground to facilitate the processing of the data. 

Let's download all the data, including the Target Acquisition frames, which might be useful to diagnose the quality of observations (this might take some time ~5-7 mins):

In [None]:
# specify the location where you want to download your data to
download = True
download_dir = "data/"

# make sure the download directory exists; if not, write a new directory
if not os.path.exists(download_dir):
    os.mkdir(download_dir)

if download:
    Observations.download_products(uncals, download_dir=download_dir)

Great, all downloads are complete! Let's perform some quick looks at those datasets, which we can do right away without the need to actually calibrate our data products.

### 3.2<font color='white'> </font>Quicklook, pt. I: Target Acquisition<a class="anchor" id="ta"></a>

The first set of data products we will have a look at are the Target Acquisition (TA) frames. These are frames that are used to precisely center objects in JWST, so as to correct from any JWST blind pointing errors. These frames are taken before the optical element that disperses the light for our TSO observations is put into place, and before doing any small slews to the actual science targets (which should be in the worst case scenario a few tens of arcseconds away from the science target).

Note there are 4 TA frames. [This is expected](https://jwst-docs.stsci.edu/jwst-near-infrared-spectrograph/nirspec-operations/nirspec-target-acquisition/nirspec-wide-aperture-target-acquisition); the usual TA WATA procedure (which is used for TSOs) has one exposure that is used to correct for any pointing errors, and a post-correction TA, which is taken as a "confirmation" exposure. We can load those frames with the JWST `datamodels`, which as we will see below are extremely useful models to deal with JWST data, as follows: 

In [None]:
ta1 = datamodels.RampModel(download_dir + '/mastDownload/JWST/jw01366001001_02101_00001-seg001_nis/jw01366001001_02101_00001-seg001_nis_uncal.fits')
ta2 = datamodels.RampModel(download_dir + '/mastDownload/JWST/jw01366001001_02101_00002-seg001_nis/jw01366001001_02101_00002-seg001_nis_uncal.fits')
ta3 = datamodels.RampModel(download_dir + '/mastDownload/JWST/jw01366001001_02101_00003-seg001_nis/jw01366001001_02101_00003-seg001_nis_uncal.fits')
ta4 = datamodels.RampModel(download_dir + '/mastDownload/JWST/jw01366001001_02101_00004-seg001_nis/jw01366001001_02101_00004-seg001_nis_uncal.fits')

TAs = [ta1, ta2, ta3, ta4]

The `data` attribute of those `datamodels` (e.g., `ta1.data`) stores the actual data from those frames. Let's check the dimensions of those first to familiarize ourselves with those data products:

In [None]:
for ta in TAs:
    print(f'Dimensions of the TA frame of {ta.meta.filename}: {ta.shape}')

The dimensions come in the form `(integrations, groups, pixel, pixel)` --- so both are 1-integration exposures, of 13 groups each, on a 64x64 pixel frame. This actually is exactly what is expected from WATA TA frames!

Let's take a look at the TA frames:

In [None]:
int_index = 0
group_index = -1

vmin = 9000
vmax = 16000

fig, axes = plt.subplots(2, 4, figsize=(10, 5), sharex=True, sharey=True)

# Plot the TA images
for i, ta in enumerate(TAs):
    axes[0, i].imshow(ta.data[int_index, group_index, :, :], vmin=vmin, vmax=vmax)
    axes[0, i].set_title(f'TA_{i+1}')

# Plot the difference images
for i in range(len(TAs)):
    if i == 0:
        diff = TAs[i].data[int_index, group_index, :, :] - TAs[-1].data[int_index, group_index, :, :]
    else:
        diff = TAs[i].data[int_index, group_index, :, :] - TAs[i-1].data[int_index, group_index, :, :]
    axes[1, i].imshow(diff, interpolation=None)
    axes[1, i].set_title(f'TA_{i+1} - TA_{i if i > 0 else 4}')

# add vertical and horizontal crosshair. 
for ax in axes.ravel():
    ax.axvline(64//2, ls='--', color='white', lw=0.75)
    ax.axhline(64//2, ls='--', color='white', lw=0.75)

plt.tight_layout()
plt.show()

Nice! There is a source, although it is rather faint, the difference between TA frames supresses the background variation enable the target source to become more apparent given by the sources positive and negative in the differnece frames. The target is reasonably placed around the center of the frame by the 4th exposure. 

We won't need the TA loaded in anymore so lets go ahead and release them from memory. 

In [None]:
del ta1, ta2, ta3, ta4, TAs

### 3.3 Quicklook, pt. II: `datamodels` & TSO Science Data Products<a class="anchor" id="science"></a> 

Next up, we will load the **TSO science** data products so we can interact with them. Once again, we open them through the JWST `datamodels` --- and store all segments of data on lists and explored the dimensions of the data products:

In [None]:
files = [
    '/mastDownload/JWST/jw01366001001_04101_00001-seg001_nis/jw01366001001_04101_00001-seg001_nis_uncal.fits',
    '/mastDownload/JWST/jw01366001001_04101_00001-seg002_nis/jw01366001001_04101_00001-seg002_nis_uncal.fits',
    '/mastDownload/JWST/jw01366001001_04101_00001-seg003_nis/jw01366001001_04101_00001-seg003_nis_uncal.fits',
    '/mastDownload/JWST/jw01366001001_04101_00001-seg004_nis/jw01366001001_04101_00001-seg004_nis_uncal.fits'
]

uncal_nis = [datamodels.RampModel(download_dir + file) for file in files]

In [None]:
for uncal in uncal_nis:
    print(f'Dimensions of the TSO frame: {uncal}')

Note how we load each segment of data for each detector in simple python `list`s! This is the simplicity that these `datamodels` offer. We explore them a bit more below before continuing to the next Section.

<div class="alert alert-block alert-info"> <b>Note on memory usage</b>: Loading data products in lists is very useful, but be aware that in particular for TSOs --- which typically involve large data volumes --- Random-Access Memory (RAM) might be severely impacted. The above loaded data products, for instance, take of order ~5 GB --- and this will only be larger as we run pipeline steps below, which convert data products from, e.g., <code>int</code>s to <code>float</code>s, taking even more space. For a typical TSO, when running the pipeline steps we'll run below, consider on the order of ~50 GB will be used. If you don't have 50GB of RAM they should consider alternatives such as a server, or run files individually</div>

JWST `datamodels` simplify accessing vital information from `fits` files, such as instrument/mode, observation dates, and other header values, without needing to know the exact header keywords. For example, to find the observation date in NIS detector data, the datamodels `search` function quickly locates this information, demonstrating its practicality and ease of use.

In [None]:
uncal_nis[0].search('date')

Note how we just added a word similar to what we were looking for, and then this function will take a look to find where similar words are located in the `datamodels` attribute tree. From the above, it seems this information is in `meta.date`. Let's check:

In [None]:
uncal_nis[0].meta.date

It works! Note `date_beg` is the one we would be typically interested in checking (which was when the observations happened). Let's try another one. Suppose we wanted to know the name of the PI of this program. Again, the key word here is `pi` so let's insert that one in the `search` function:

In [None]:
uncal_nis[0].search('pi')

So this exists under `meta.program.pi_name`:

In [None]:
uncal_nis[0].meta.program.pi_name

As for the science frames, dimensions come in the form `(integrations, groups, pixel, pixel)`. So this is a 158-integration segment, with 9 groups per integration each, on a subarray of dimensions 256 x 2048 --- that sounds about right for NIRISS CLEAR/GR700XD exposures. Let's explore integration number 10 of the last group to get and idea of what NIRISS/SOSS data looks like:

In [None]:
segment_index = 0
int_index = 10
group_index = -1

data = uncal_nis[segment_index].data[int_index, group_index, :, :]

vmin = data.min()
vmax = data.mean() * 2.0

plt.figure(figsize=(12, 3))
plt.title('Uncal NIS CLEAR/GR700XD data; first segment, integration 10, last group')
plt.imshow(data, vmin=vmin, vmax=vmax)
plt.colorbar(label='Counts')
plt.show()

The data looks great! The sweeping streaks across the dectector columns is the spectrum of WASP-39 dispersed over 3 spectral orders where the brightest spectrum corresponds to order 1 (primary science) which spans a wavelength range from about 0.8 to 2.8 $\mu m $, the next brightest corresponds to order 2 (secondary science), and followed by order 3 has the lowest throughput of the three and may not be visible raw images. Some structure in the image is mostly dominated by detector-level effects that will be dealt with in the next section of this notebook. 

## 4. A TSO tour through the `Detector1` stage <a class="anchor" id="detector1"></a>
<hr style="border:1px solid black">

The `uncal` data products we loaded above contain a series of detector systematic effects that we need to remove before our data is ready for science. Now, we will move to calibrating those TSO `uncal` data products, which will take care of most of those effects. 

To perform this calibration, here we will follow most of the steps outlined in the `calwebb_detector1` or "Stage 1" processing described in the [JWST Calibration pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_detector1.html) --- in particular, the one defined for "Near-IR" instruments, such as NIRISS. This Stage 1 processing for Near-IR TSOs is defined by a series of steps, which in order are:

1. `group_scale` (not relevant, there is no impact at all for SOSS data)
2. `dq_init`
3. `saturation`
4. `superbias`
5. `refpix`
6. `linearity`
7. `dark_current`
8. `jump`
9. `ramp_fitting`
10. `gain_scale` (not relevant, there is no impact at all for SOSS data)

We will slightly modify and/or add some steps to suit our TSO needs below --- let's get started!

### 4.1<font color='white'>-</font>Checking data quality flags <a class="anchor" id="dqflags"></a>

An important component of any TSO analysis is to flag bad pixels, pixels identified as cosmic rays and/or identify saturated pixels. Bad pixels are, in fact, curated by the instrument teams in what we colloquially refer to as a "bad pixel mask" --- a mask one can "attach" to the data products with the JWST Calibration pipeline. This is exactly what the first step in the pipeline, the Data Quality initialized (`dq_init`) step, does. 

#### 4.1.1 Running & understanding the `dq_init` step

Let's run the `dq_init` step on the first segment of our NIS data products: 

In [None]:
# Let's run the DQ init step; first for the first segment of the NIS detector:
print('Running dq_init on NIS:')
nis_seg1_dqinit = calwebb_detector1.dq_init_step.DQInitStep.call(uncal_nis[0], save_results=False)

All right, data-quality flags have been attached to our uncalibrated data products. To figure out why these are so useful, let's take a look at this bad pixel mask that was attached to our data products; in particular, let's peek at the one attached to the NIS detector products. This mask lives in the `pixeldq` attribute of our products (e.g., `nis_seg1_dqinit.pixeldq`). To familiarize ourselves with this, let's print the dimensions of this mask:

In [None]:
print(f"Shape of resulting PixelDQ array: {nis_seg1_dqinit.pixeldq.shape}")

As expected, it has the same size as our subarray data. [As per the documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/references_general/references_general.html?highlight=data%20quality%20flags#data-quality-flags), most data-quality (DQ) flags should be zero in the subarray; let's plot this array to see how many of them get away from this value and where:

In [None]:
plt.figure(figsize=(12, 3))
plt.title('Non-zero data-quality values across the subarray')
plt.imshow(nis_seg1_dqinit.pixeldq, vmin=-0.5, vmax=0.5)
plt.show()

All right --- so there are "special" pixels all over the place! But, what are the `pixeldq` values telling us? Let's print the pixel in the very corner of the subarray:

In [None]:
nis_seg1_dqinit.pixeldq[0, 0]

[According to the documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/references_general/references_general.html?highlight=data%20quality%20flags#data-quality-flags), this pixel is a **reference pixel**. This makes sense: for NIRISS in this CLEAR/GR700XD mode, the 4 pixel columns on the left-most end, the 4 pixel columns on the right-most end, and the 4 top-most rows are indeed, reference pixels.

#### 4.1.2 Dynamically translating data-quality flags to human-readable form

Looking back and forth from the documentation page the data-quality flag values we read from our data-products is a very tedious task. In addition, as we will see below, a pixel can have eventually several flags (e.g., saturated, has a cosmic-ray, etc.) which will, in turn, change some of its data-quality flags to account for this. 

A handy function to convert those data-quality flag numbers to "human-readable" form is actually inside the `datamodels` class --- the `datamodels.dqflags`. This simply takes in a data-quality value, and spits out a `set` with strings defining what this is telling us given a so-called "mnemonic map" --- one which is actually already loaded in the `datamodels.dqflags.pixel` dictionary.

Let's try it out on the data-quality value we observed above:

In [None]:
datamodels.dqflags.dqflags_to_mnemonics(2147483648, mnemonic_map=datamodels.dqflags.pixel)

Indeed, we get back what we knew --- that is a reference pixel! With this handy-dandy function, we can write a simple snippet to figure out the total tally of all bad pixels as follows:

In [None]:
# Create a dictionary that will save all types of bad pixels:
bad_pixels = {}

rows, columns = nis_seg1_dqinit.pixeldq.shape

# Iterate through every row and column:
for row in range(rows):
    
    for column in range(columns):

        # Extract the bad pixel flag(s) for the current pixel at (row, column):
        bps = datamodels.dqflags.dqflags_to_mnemonics(
            nis_seg1_dqinit.pixeldq[row, column], 
            mnemonic_map=datamodels.dqflags.pixel
            )

        # Iterate through the possible flags (it can be more than one!):
        for bp in bps:

            # If already in the bad_pixels dict, simply add 1 to the counter. If not, create and instantiate to one:
            if bp in bad_pixels.keys():

                bad_pixels[bp] += 1

            else:

                bad_pixels[bp] = 1

In [None]:
# Now print total tally:
total_pixels = rows * columns

# Table headers
header = f"| {'PIXEL DQ FLAG':<20} | {'# of Bad pixels':^19} | {'Fraction of Bad pixel (%)':^25} |"

# Table separator
separator = "+" + "-" * 22 + "+" + "-" * 21 + "+" + "-" * 27 + "+"

# Printing the table
print(separator)
print(header)
print(separator)
for bp in bad_pixels.keys():
    val = bad_pixels[bp]
    frac = 100*(val/float(total_pixels))
    print(f"| {bp:<20} | {val:^19} | {frac:^25.2f} |")
print(separator)

Based on our discussion above, we can see some number above make sense. For instance, 10208 `REFERENCE_PIXELS` makes sense as there are a total of 8 columns (4 columns to left and right, and 4 rows to the top of the subarray) --- given the subarray height is 256 pixels and width 2048 pixels (2040 pixels to avoid double counting pixel in the upper corners) given a total of 10208 reference pixels as expected. 

Let's go ahead now and attach this bad pixel mask to all the segments of data:

In [None]:
nsegments = len(uncal_nis)
for i in range(nsegments):
    uncal_nis[i] = calwebb_detector1.dq_init_step.DQInitStep.call(uncal_nis[i])

<div class="alert alert-block alert-info"> <b>Note on saving data products with the JWST Calibration Pipeline</b>: Sometimes, one might find it useful to save data products after running each step into <code>.fits</code> files, so we can have "intermediate steps" stored in our system that we can check at a later time. This can be done when running any of the steps by adding the <code>save_results = True</code> flag to the step calls, e.g., <code>calwebb_detector1.dq_init_step.DQInitStep.call(uncal_nis[i], save_results = True)</code>. An output directory can also be defined by using the <code>output_dir</code> parameter.

### 4.2 Identifying saturated pixels <a class="anchor" id="saturation"></a>

One very important detail in JWST data analysis involves checking which pixels are "saturated" or not. Saturation in the JWST context is an [instrument-by-instrument defined upper signal level](https://jwst-docs.stsci.edu/methods-and-roadmaps/jwst-time-series-observations/jwst-time-series-observations-tso-saturation), typically defined as a limit above which detector effects not modeled by the pipeline might start to kick-in. As such, identifying which pixels are above this limit is important. In particular, the pipeline tends to omit those pixels from most analyses, as they might introduce unwanted systematic effects.

#### 4.2.1 Running and understanding the `saturation` step

Through the analysis of calibration datasets, the JWST instrument teams have defined signal values for each pixel above which they are considered as "saturated". This identification is done through the `saturation` step --- the next step of the JWST pipeline for Detector 1. Let's run it for the very first segment of data for NIS:

In [None]:
# Run saturation step:
saturation_results = calwebb_detector1.saturation_step.SaturationStep.call(uncal_nis[0])

The saturation step works by primarily comparing the observed count values with the saturation signal-levels defined for each pixel in a reference file. As can be seen above, that reference file is indicated by the line `stpipe.SaturationStep - INFO - Using SATURATION reference file [yourfile]`. In the case of our run at the time of writing, this was the `jwst_niriss_saturation_0015.fits` file --- but this might change as new analyses are made and the reference files get updated. 

In addition, at the time of writing, the `saturation` step in the JWST Calibration pipeline [by default flags not only pixels that exceed the signal limit defined by the instrument teams but also all `n_pix_grow_sat` pixels around it](https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline-overview/jwst-operations-pipeline-build-information/jwst-operations-pipeline-build-8-0-release-notes#JWSTOperationsPipelineBuild8.0ReleaseNotes-charge_spilling); which at the time of writing is set to a default of `1`. That means that if a given pixel exceeds the signal limit, all 8 pixels around it will be marked as saturated as well. This is done because it has been observed that "charge spilling" can be an issue --- i.e., charge going from one pixel to another. While such migration of charge happens at a wide range of count levels, this is particularly dramatic when a pixel saturates --- reason by which this is set in the pipeline.

We can check which pixels are saturated in a similar way as to how we checked the data-quality flags in [Section 3.1](#dqflags). The only difference with that analysis is that saturated pixels are integration and group-dependant, i.e., a property of a given pixel _in a given integration and group_. In other words, a pixel that is saturated in one integration and group might have "recovered" by the next integration and group.

To figure out the data-quality for all integrations and all groups we look at the `groupdq` attribute of our data products instead of the `pixeldq` which we used above. To familiarize ourselves with this, let's print the dimensions of this array first:

In [None]:
saturation_results.groupdq.shape

As expected, it has dimensions `(integrations, groups, row pixels, column pixels)`, just like the `data` array. The flags in the `groupdq` array follow the same structure as [all the data-quality flags described in the documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/references_general/references_general.html?highlight=data%20quality%20flags#data-quality-flags). 

#### 4.2.2 Exploring saturated pixels via the `groupdq` array

To illustrate how to use the `groupdq`, let's pick the last group of integration 10 again and see if any pixels seem to be saturated --- we also count all of the saturated pixels:

In [None]:
# Iterate through every row and column of integration number 10, last group:
integration, group = 10, -1
nsaturated = 0

# indice location of where saturated pixel found
row_idx = []
column_idx = []

verbose = False
for row in range(rows):
    
    for column in range(columns):

        # Extract the bad pixel flag(s) for the current pixel at (row, column):
        bps = datamodels.dqflags.dqflags_to_mnemonics(
            saturation_results.groupdq[integration, group, row, column], 
            mnemonic_map=datamodels.dqflags.pixel
            )
        
        # Check if pixel is saturated; if it is...
        if 'SATURATED' in bps:

            # ...print which pixel it is, and...
            if verbose:
                print('Pixel ({0:},{1:}) is saturated in integration 10, last group'.format(row, column))

            # ...count it:
            nsaturated += 1

            column_idx.append(column)
            row_idx.append(row)

print('\nA total of {0:} out of {1:} pixels ({2:.2f}%) are saturated'.format(nsaturated, 
                                                                             rows*columns, 
                                                                             100 * nsaturated / float(rows * columns)
                                                                            )
     )

As can be seen, not many pixels are saturated on a given group. Let's see how the up-the-ramp samples look like for one of those pixels --- let's say, pixel `(176, 1503)`. Let's show in the same plot the group data-quality flags at each group:

In [None]:
# pixel_row, pixel_column = row_idx[60], column_idx[60]
pixel_row, pixel_column = 176, 1503

plt.figure(figsize=(7, 4))
plt.title(f'Saturated Pixel: ({pixel_row}, {pixel_column})')
plt.plot(np.arange(saturation_results.data.shape[1])+1, 
         saturation_results.data[integration, :, pixel_row, pixel_column], 
         'o-', color='tomato')

plt.xlim(0.5, saturation_results.data.shape[1]+1.5)
plt.xlabel('Group number', fontsize=16)
plt.ylabel('Counts', fontsize=16, color='tomato')

plt.twinx()

plt.plot(np.arange(saturation_results.data.shape[1])+1, 
         saturation_results.groupdq[integration, :, pixel_row, pixel_column], 
         'o-', color='cornflowerblue')

plt.xlim(0.5, saturation_results.data.shape[1]+1.5)
plt.ylabel('Group Data-quality', fontsize=16, color='cornflowerblue')

plt.show()

Very interesting plot! Note that all groups appear to be saturated after group ~6 in this example. Likely a cosmic-ray hit happened at this group which left the pixel at a very high count number from group 6 up to the end of the ramp.

#### 4.2.3 Setting custom saturation limits with the `saturation` reference file

TSOs often obtain data from bright stars that might quickly (i.e., first few groups) give rise to saturated pixels. As described in some early JWST results (see, e.g., [Rustamkulov et al., 2023](https://www.nature.com/articles/s41586-022-05677-y)), in some cases one might even want to be a bit more aggressive on the level of saturation allowed in a given dataset in order to improve on the reliability of the results. As such, understanding how to modify the level of saturation allowed in a given dataset might turn out to be an important skill on real TSO data analysis. 

The key file that sets the limits used to call a pixel "saturated"  is the reference file of the `saturation` step. 

As discussed above, this can be seen directly on the outputs of the `saturation` step while its running, but it's also saved in our data products:

In [None]:
saturation_results.meta.ref_file.saturation.name

We can actually load this reference file using the `SaturationModel` as follows:

In [None]:
# Base directory where reference files are stored (this was defined in the Setup section above):
base_ref_files = os.environ["CRDS_PATH"]+"/references/jwst/niriss/"

# Read it in:
saturation_ref_file = datamodels.SaturationModel(base_ref_files+saturation_results.meta.ref_file.saturation.name[7:])

More often than not, however, the saturation reference file might not match exactly the dimensions of our subarray. This is because the reference file might be padded to match several other subarrays, and thus we have to figure out how to "cut" it to match our data. This is, in fact, our case:

In [None]:
saturation_ref_file.data.shape

Luckily, the JWST calibration pipeline has a handy function to transform the dimensions between instruments --- this is the `jwst.lib.reffile_utils.get_subarray_model` function, which recieves an input data model (e.g., the one from our data) along with the reference file, and spits out the same reference file model but with the right dimensions. Let's use it:

In [None]:
tailored_saturation_ref_file = jwst.lib.reffile_utils.get_subarray_model(saturation_results, saturation_ref_file)

Indeed, now our "tailored" reference file matches our science data dimensions:

In [None]:
tailored_saturation_ref_file.data.shape

Let's see how the saturation map looks like for our subarray:

In [None]:
plt.figure(figsize=(10, 3))
plt.title('Saturation map for NIS (SUBSSTRIP256 subarray)')
im = plt.imshow(tailored_saturation_ref_file.data)
plt.colorbar(label='Counts')
plt.show()

There's clearly some structure, albeit is not exactly clear what values different pixels take. To visualize this, let's print the saturation limit for pixel `(176, 1503)`, the one we explored above:

In [None]:
# pixel in refernce to a saturate pixel.
tailored_saturation_ref_file.data[pixel_row, pixel_column] 

If the counts surpass this limit, the pixel will be considered saturated. To see if this was the case, let's repeat the plot above marking this signal limit:

In [None]:
pixel_row, pixel_column = row_idx[60], column_idx[60] 

plt.figure(figsize=(7, 4))
plt.title(f'Saturated Pixel: ({pixel_row}, {pixel_column})')
plt.plot(np.arange(saturation_results.data.shape[1])+1, 
         saturation_results.data[integration, :, pixel_row, pixel_column], 
         'o-', color='tomato')

plt.plot([1, saturation_results.data.shape[1]+1], 
         [tailored_saturation_ref_file.data[pixel_row, pixel_column], 
          tailored_saturation_ref_file.data[pixel_row, pixel_column]],
         'r--', 
         label='Signal limit in reference file'
        )

plt.xlim(0.5, saturation_results.data.shape[1]+1.5)
plt.xlabel('Group number', fontsize=16)
plt.ylabel('Counts', fontsize=16, color='tomato')
plt.legend()

plt.twinx()

plt.plot(np.arange(saturation_results.data.shape[1])+1, 
         saturation_results.groupdq[integration, :, pixel_row, pixel_column], 
         'o-', color='cornflowerblue')

plt.xlim(0.5, saturation_results.data.shape[1]+1.5)
plt.ylabel('Group Data-quality', fontsize=16, color='cornflowerblue')

plt.show()

Indeed, this is the case! Note that, as described above, by default for NIRISS not only this pixel gets marked as saturated, but all pixels around it. To see this, note for instance the same plot as above but for of the neighboring pixels lets use pixel (177,1502):

In [None]:
pixel_row, pixel_column = 175, 1502

plt.figure(figsize=(7, 4))

plt.title(f'Same as above, but for neighboring pixel ({pixel_row},{pixel_column})')
plt.plot(np.arange(saturation_results.data.shape[1])+1, 
         saturation_results.data[integration, :, pixel_row, pixel_column], 
         'o-', color='tomato'
         )

plt.plot([1, saturation_results.data.shape[1]+1], 
         [tailored_saturation_ref_file.data[pixel_row, pixel_column], 
          tailored_saturation_ref_file.data[pixel_row, pixel_column]],
         'r--', 
         label='Signal limit in reference file'
        )

plt.xlim(0.5, saturation_results.data.shape[1]+1.5)
plt.xlabel('Group number', fontsize=16)
plt.ylabel('Counts', fontsize=16, color='tomato')
plt.legend()

plt.twinx()

plt.plot(np.arange(saturation_results.data.shape[1])+1, 
         saturation_results.groupdq[integration, :, pixel_row, pixel_column], 
         'o-', color='cornflowerblue')

plt.xlim(0.5, saturation_results.data.shape[1]+1.5)
plt.ylabel('Group Data-quality', fontsize=16, color='cornflowerblue')

plt.show()

# make sure to find pixel that is saturate and its neighbor 

Note how the signal level has not gone above the limit in the reference file, but it is marked as saturated because pixel (176,1503) is. Again, this is to account for possible charge spilling to the pixel.

Now, what if we wanted to mark as saturated all pixels, say, larger than 50\% these saturation values? Well, we can directly modify the reference file and repeat the calculation pointing at it:

In [None]:
saturation_ref_file.data = saturation_ref_file.data * 0.5

To incorporate this new reference file, we simply use the `override_saturation` flag, passing this new `SaturationModel` along: 

In [None]:
# Run saturation step:
saturation_results2 = calwebb_detector1.saturation_step.SaturationStep.call(uncal_nis[0], 
                                                                            override_saturation=saturation_ref_file)

Let's see how many pixels are now counted as saturated:

In [None]:
# Iterate through every row and column of integration number 10, last group:
integration, group = 10, -1
nsaturated = 0

verbose = False
for row in range(rows):
    
    for column in range(columns):

        # Extract the bad pixel flag(s) for the current pixel at (row, column):
        bps = datamodels.dqflags.dqflags_to_mnemonics(
            saturation_results2.groupdq[integration, group, row, column], 
            mnemonic_map=datamodels.dqflags.pixel
            )
        
        # Check if pixel is saturated; if it is...
        if 'SATURATED' in bps:

            # ...print which pixel it is, and...
            if verbose:
                print('Pixel ({0:},{1:}) is saturated in integration 10, last group'.format(row, column))

            # ...count it:
            nsaturated += 1

print('\nA total of {0:} out of {1:} pixels ({2:.2f}%) are saturated'.format(nsaturated, 
                                                                             rows*columns, 
                                                                             100 * nsaturated / float(rows * columns)
                                                                            )
     )

As expected, a much bigger portion! About 2.5\% of the pixels in the subarray are now masked (against 0.01\% from before) as saturated thanks to our higher threshold for flagging.

<div class="alert alert-block alert-info"> <b>Note on manually setting the saturation limit</b>: Setting the saturation limit manually should be done with care, and we recommend trying different saturation levels to check whether TSO science is impacted by this choice. In particular, we suggest to <i>never</i> set limits that are above the thresholds defined by the instrument teams, as these are typically set to levels above which the non-linearity correction (see below) is not expected to work.</div>

Before moving to the next step, let's run the saturation step on the other NIS segments:

In [None]:
for i in range(nsegments):
    uncal_nis[i] = calwebb_detector1.saturation_step.SaturationStep.call(uncal_nis[i])

### 4.3 Removing detector-level effects: the `superbias` and `refpix` steps <a class="anchor" id="refpix"></a>

So far, we have focused on flagging pixels for various effects (e.g., bad pixels, saturation) but we haven't worked directly with the actual counts on our data. In this Section, we deal with various (non-astrophysical) detector-level effects present in our data through two steps in the JWST Calibration pipeline: the `superbias` and the `refpix` steps. 

#### 4.3.1 Removing the pedestal from the detector: the `superbias` step

All detectors have mostly stable, factory-defined pedestal levels, which can be closely monitored with the right calibration exposures. Indeed, instrument teams closely monitor and refine this via what is called the "super" bias --- the spatial shape of this pedestal. The JWST Calibration pipeline substracts this pedestal from data via the `superbias` step.

Applying this correction to the data is very simple to do; let's apply it once again to the first segment of data for the NIS detector, so we can check how our data changes after applying the step:

In [None]:
superbias_results = calwebb_detector1.superbias_step.SuperBiasStep.call(uncal_nis[0])

Once again, we can see that there is a particular reference file being used to remove the pedestal, `jwst_niriss_superbias_0200.fits`, which can be explored in a similar way as how we explored the reference file for the `saturation` step above. Let's see how our data changed after applying this pedestal removal --- let's again take the last group of integration 10 as an example:

In [None]:
fig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True, sharey=True)

ax1, ax2 = axes.ravel()

# Plot before step
ax1.set_title('Before the Superbias step:')
ax1.set_ylabel('y [pixel]')
im1 = ax1.imshow(uncal_nis[0].data[10, -1, :, :] / np.nanmedian(uncal_nis[0].data[10, -1, :, :]))
im1.set_clim(-3, 2)
fig.colorbar(im1, ax=ax1, label='Normalized (to median) fluence')

# Plot after step
ax2.set_title('Before the Superbias step:')
ax2.set_xlabel('x [pixel]')
ax2.set_ylabel('y [pixel]')
im2 = ax2.imshow(superbias_results.data[10, -1, :, :] / np.nanmedian(superbias_results.data[10, -1, :, :]))
im2.set_clim(-3, 2)
fig.colorbar(im2, ax=ax2, label='Normalized (to median) fluence')

plt.tight_layout()
plt.show()

Wow! That's a huge change. Overall, this looks much better and the 3rd diffraction order that was previously being suppressed by detector effect is now visible. Let's plot the profiles of pixel column index 1500 to have a closer look:

In [None]:
column_index = 1500

fig, axes = plt.subplots(1, 2, figsize=(15, 5))

ax1, ax2 = axes.ravel()

ax1.plot(uncal_nis[0].data[10, -1, :, column_index], label='Before the Superbias step')
ax1.plot(superbias_results.data[10, -1, :, column_index], label='After the Superbias step')
ax1.set_xlabel('Row pixel index', fontsize=14)
ax1.set_ylabel('Counts', fontsize=14)
ax1.set_title('Comparison before/after Superbias step', fontsize=14)
ax1.legend()

ax2.set_title('Same, but median-subtracted counts', fontsize=14)
ax2.plot(uncal_nis[0].data[10, -1, :, column_index] - np.nanmedian(uncal_nis[0].data[0,-1, :, column_index]))
ax2.plot(superbias_results.data[10, -1, :, column_index] - np.nanmedian(superbias_results.data[0, -1, :, column_index]))
ax2.set_xlabel('Row pixel index', fontsize=14)
ax2.set_ylabel('Counts - Median Counts', fontsize=14)

plt.show()

As can be seen, a ton of structure has been removed. Also, all the pixels seem to be at the same background level. This is a good sign that the Superbias correction has worked, in principle, correctly. 

However, if we take a more detailed look at background pixels, we can note an interesting pattern. Let's plot a similar cut to the one above, but for column 250 --- which is far away from any illuminted pixels in the detector. Let's also plot the last superbias-corrected group and the second-to-last superbias-corrected group:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

ax1, ax2 = axes.ravel()

ax1.plot(superbias_results.data[10, -1, :, 250])
ax1.plot([0, 32], [0, 0], 'k--')
ax1.set_xlabel('Row pixel index', fontsize=14)
ax1.set_ylabel('Counts', fontsize = 14)
ax1.set_title('Superbias-corrected close-up, last group, integration 10', fontsize=14)
ax1.set_ylim(-250, 250)
ax1.set_xlim(0, 31)

ax2.set_title('Superbias-corrected close-up, second-to-last group, integration 10', fontsize=14)
ax2.plot(superbias_results.data[10, -2, :, 250])
ax2.plot([0, 32], [0, 0], 'k--')
ax2.set_ylim(-250,250)
ax2.set_xlim(0,31)
ax2.set_xlabel('Row pixel index', fontsize=14)
ax2.set_ylabel('Counts', fontsize=14)

plt.show()

Note how the pedestal correction the `superbias` step has, first of all, **not** brought the background down to **exactly** zero. The answer to this behavior is that there are other, group-dependant detector effects that need to be removed. These are the ones the so-called "reference pixels" in the detector aim at correcting for, which is done in the JWST Calibration pipeline via the `refpix` step --- the step we cover next in this Notebook.

#### 4.3.2 Removing group-dependant detector effects: the `refpix` step

All the JWST detectors contain reference pixels, typically located in some (or all) of the edges of the detectors. These pixels are ones for which their "sensitivity to light" has been deactivated, and are thus useful for tracking detector-level effects happening at the time of our observations. While all detectors have those, **not all detector subarrays** contain reference pixels. Some, like in our case, contain reference pixels only in certain portions of the subarray.

Let's visualize where those reference pixels are in our subarray by using the `pixeldq` flags:

In [None]:
# Create an array that will save locations of reference pixels:
reference_pixels = np.zeros([rows, columns])

# Iterate through every row and column:
for row in range(rows):
    
    for column in range(columns):

        # Extract the bad pixel flag(s) for the current pixel at (row, column):
        bps = datamodels.dqflags.dqflags_to_mnemonics(
            superbias_results.pixeldq[row, column], 
            mnemonic_map=datamodels.dqflags.pixel)

        if 'REFERENCE_PIXEL' in bps:

            reference_pixels[row, column] = 1

In [None]:
plt.figure(figsize=(12, 4))

plt.title('Location of reference pixels in the subarray: 4 columns/rows')
im = plt.imshow(reference_pixels)

# Arrows to indicate edges:
plt.text(1800-70, 32, 'Right Edge Ref Pixels', color='yellow')
plt.arrow(1780, 16, 150, -1, widt=5, head_width=10, head_length=100, color='white')

plt.text(30, 32, 'Left Edge Ref Pixels', color='yellow')
plt.arrow(268, 16, -150, -1, width=5, head_width=10, head_length=100, color='white')

# plot vertical error top reference pixels
plt.text(1044, 170, 'Top Ref Pixels', color='yellow', rotation=90)
plt.arrow(1024, 180, -0,150//4, width=5*2, head_width=10*2, head_length=100/2/2, color='white')

im.set_clim(-0.5, 0.5)

plt.show()



Note the white/yellow edges to the left and the right of the plot above, indicated by arrows, show the location of the reference pixels for our subarray. In other words, our subarray has reference pixels at the top of the frame andto the left and right-most sides, but not the bottom part.

Let's apply the `refpix` step to check how our data looks like after it:

In [None]:
refpix_results = calwebb_detector1.refpix_step.RefPixStep.call(superbias_results)

Let's plot the before and after applying this step:

<!-- Let's plot once again the figures above. First, a vertical cut of the profile at pixel column index 250: -->

In [None]:
fig, axes = plt.subplots(3, 1, figsize=(10, 9), sharex=True, sharey=True)

ax1, ax2, ax3 = axes.ravel()

# Plot before step
ax1.set_title('Before the RefPix step:')
ax1.set_ylabel('y [pixel]')
im1 = ax1.imshow(superbias_results.data[0, -1, :, :] / np.nanmedian(superbias_results.data[0, -1, :, :]))
im1.set_clim(-2, 2)
fig.colorbar(im1, ax=ax1, label='Normalized (to median) fluence')

# Plot after step
ax2.set_title('Before the RefPix step:')
ax2.set_xlabel('x [pixel]')
ax2.set_ylabel('y [pixel]')
im2 = ax2.imshow(refpix_results.data[0, -1, :, :] / np.nanmedian(refpix_results.data[0, -1, :, :]))
im2.set_clim(-2, 2)
fig.colorbar(im2, ax=ax2, label='Normalized (to median) fluence')

# Plot difference
ax3.set_title('Difference:')
ax3.set_xlabel('x [pixel]')
ax3.set_ylabel('y [pixel]')
im3 = ax3.imshow(superbias_results.data[0, -1, :, :] - refpix_results.data[0, -1, :, :])
fig.colorbar(im3, ax=ax3, label='Normalized (to median) fluence')

plt.tight_layout()
plt.show()

That looks much better. Several things have been removed, including some interesting high-frequency noise happening in the rows as show in the difference between the before and after. That's the so-called odd-even effect, that the refpix step takes care of efficiently thanks to reference pixels (pixels insenstive to light) in the detector.

As can be seen, most of the detector structure is taken care of up to this point. The backgrounds are now nicely suited slightly above zero, as they should; most detector effects are gone and the group looks much cleaner. It is very instructive to do this kind of visual checks on real data, as they can significantly impact the final achieved S/N if not properly accounted for.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

ax1, ax2 = axes.ravel()

ax1.plot(superbias_results.data[0, -1, :, column_index], label='Before the RefPix step')
ax1.plot(refpix_results.data[0, -1, :, column_index], label='After the RefPix step')
ax1.set_xlabel('Row pixel index', fontsize=14)
ax1.set_ylabel('Counts', fontsize=14)
ax1.set_title('Comparison before/after RefPix step', fontsize=14)
ax1.legend()


ax2.set_title('Same, but median-substracted counts', fontsize=14)
ax2.plot(superbias_results.data[0, -1, :, column_index] - np.nanmedian(superbias_results.data[0, -1, :, column_index]))
ax2.plot(refpix_results.data[0, -1, :, column_index] - np.nanmedian(refpix_results.data[0, -1, :, column_index]))
ax2.set_xlabel('Row pixel index', fontsize=14)
ax2.set_ylabel('Counts - Median Counts', fontsize=14)

plt.tight_layout()
plt.show()

We can see there's been a minor improvement in the results. 

It is interesting to note that the "banding" on the columns, as discussed above, has not dissapeared. This is more evident when plotting a series of groups from different integrations; let's plots the groups from integrations 10, 11 and 12:

In [None]:
# Integration 10, last group
plt.figure(figsize=(10, 3))
plt.title('Integration 10, last group')
im = plt.imshow(refpix_results.data[10, -1, :, :])
im.set_clim(-200, 400)
plt.colorbar(label='Counts')

# Integration 10, last group
plt.figure(figsize=(10, 3))
plt.title('Integration 11, last group')
im = plt.imshow(refpix_results.data[11, -1, :, :])
im.set_clim(-200, 400)
plt.colorbar(label='Counts')

# Integration 12, last group
plt.figure(figsize=(10, 3))
plt.title('Integration 12, last group')
im = plt.imshow(refpix_results.data[12, -1, :, :])
im.set_clim(-200, 400)
plt.colorbar(label='Counts')

plt.show()

This is, once again, expected as there are no reference pixels in the columns. We will explore how to correct this after going with the `linearity` correction/step, which we discuss next. Before moving on, we apply the superbias and reference pixel step to both detectors, all segments:

In [None]:
for i in range(nsegments):
    # Apply superbias and refpix to all NIS segments:
    uncal_nis[i] = calwebb_detector1.superbias_step.SuperBiasStep.call(uncal_nis[i])
    uncal_nis[i] = calwebb_detector1.refpix_step.RefPixStep.call(uncal_nis[i])

### 4.4 Linearity corrections <a class="anchor" id="linearity"></a>

As a pixel accumulates charge, it becomes less and less efficient at generating charge-carriers and/or holding that charge in place. A consequence of this is that the raw -uncalibrated- up-the-ramp samples in JWST detectors are non-linear, with the pixels at lower fluences being almost linear and pixels near the saturation ranges deviating significantly from this behavior. This is the behavior that the `linearity` step in the JWST Calibration pipeline aims to fix.

#### 4.4.1 Visualizing and correcting for non-linearity with the `linearity` step

To visualize the non-linearity of the up-the-ramp samples, let's take a look at the samples of one of the brightest pixels in our subarray, pixel `(45, 1600)` --- say for integration number 10. Let's plot on top a line fitted to the first 10 pixels, which should be the most "linear" of all pixels:

In [None]:
ngroups = uncal_nis[0].data.shape[1]
group = np.arange(ngroups) + 1

first_groups = 9
i1, i2 = 45, 1600

coeff = np.polyfit(group[:first_groups], uncal_nis[0].data[10, :first_groups, i1, i2], 1)

fig, axes = plt.subplots(2, 1, figsize=(6, 6), constrained_layout=True)

ax1, ax2 = axes.ravel()

# plot ramp samples
ax1.set_title(f'Up-the-ramp sample, integration 10, pixel ({i1}, {i2})')
ax1.plot(group, uncal_nis[0].data[10, :, i1, i2], 'o-', color='black', mfc='white', label='Up-the-ramp samples')
ax1.plot(group, np.polyval(coeff, group), 'r--', label='Linear fit to first ' + str(first_groups) + ' groups')
ax1.set_xlabel('Group number', fontsize=16)
ax1.set_ylabel('Counts', fontsize=16)
ax1.legend()
ax1.set_xlim(0.5, 9.5)

# plot residuals
ax2.plot(group, uncal_nis[0].data[10, :, i1, i2] - np.polyval(coeff, group), 'o-')
ax2.set_xlabel('Group number', fontsize=16)
ax2.set_ylabel("residuals")
plt.show()

Ah --- the ramp is _clearly_ non-linear! Let's apply the `linearity` step to the very first segment to see how well this gets corrected:

In [None]:
# Run linearity step:
linearity_results = calwebb_detector1.linearity_step.LinearityStep.call(uncal_nis[0])

Let's try the same plot as above, but with the linearity-corrected data:

In [None]:
ngroups = uncal_nis[0].data.shape[1]
group = np.arange(ngroups) + 1

first_groups = 9

coeff = np.polyfit(group[:first_groups],  linearity_results.data[10, :first_groups, i1, i2], 1)

fig, axes = plt.subplots(2, 1, figsize=(6, 6), constrained_layout=True)

ax1, ax2 = axes.ravel()

# plot ramp samples
ax1.set_title(f'Same as above, linearity-corrected')
ax1.plot(group, linearity_results.data[10, :, i1, i2], 'o-', color='black', mfc='white', label='Up-the-ramp samples')
ax1.plot(group, np.polyval(coeff, group), 'r--', label='Linear fit to first ' + str(first_groups) + ' groups')
ax1.set_xlabel('Group number', fontsize=16)
ax1.set_ylabel('Counts', fontsize=16)
ax1.legend()
ax1.set_xlim(0.5, 9.5)

# plot residuals
ax2.plot(group, linearity_results.data[10, :, i1, i2] - np.polyval(coeff, group), 'o-')
ax2.set_xlabel('Group number', fontsize=16)
ax2.set_ylabel("residuals")
plt.show()

Ah, much better, notice how the residuals are have improve and are much closer to zero. 

#### 4.4.2 Testing the accuracy of the `linearity` step

It is important to realize that the linearity corrections that the JWST Calibration pipeline applies through the `linearity` step are _not_ perfect. While this is difficult to see with a single integration, this can be studied with multiple integrations --- which helps us beat the noise embedded on single up-the-ramp samples. 

One trick to glance at how the linearity of the up-the-ramp samples evolves as one goes up-the-ramp is to note that if the detector is linear, it doesn't matter at which up-the-ramp sample one looks at, the **fluence level should change from group-to-group at _the same rate_ on average**. So one can quickly investigate if linearity is an issue (and if the pipeline is correctly correcting for it) by:

1. Taking the difference in fluence between two subsequent groups (say, the last two).
2. Taking the difference in fluence between two _other_ subsequent groups (say, the first two).
3. Take the ratio between those differences.

If the detector is linear, then all the pixels should fall around a ratio of 1. Do they? Let's try this experiment out. Let's first take the difference of the last two and first two groups for all the pixels of all the integrations of the **uncorrected** data --- then take the ratio of those. As we saw above, this should scream "non-linearity" all over!

In [None]:
last_pair = uncal_nis[0].data[:, -1, :, :] - uncal_nis[0].data[:, -2, :, :]
first_pair = uncal_nis[0].data[:, 1, :, :] - uncal_nis[0].data[:, 0, :, :]

In [None]:
ratio = last_pair / first_pair

Let's now flatten those arrays and plot them as a function of total fluence at the very last group. If linearity weren't an issue, all of these should line around 1 (this may take a few seconds):

In [None]:
flattened_ratio = ratio.flatten()
flattened_fluences = uncal_nis[0].data[:, -1, :, :].flatten()

In [None]:
plt.figure(figsize = (6, 4))
plt.plot(flattened_fluences, flattened_ratio, '.', alpha=0.01, color='black')
plt.plot([0,35000], [1., 1.], 'r--')
plt.ylim(0.5, 1.5)
plt.xlim(0, 35000)
plt.xlabel('Fluence at the last group (counts)', fontsize=14)
plt.ylabel('(Last / First) Group differences', fontsize=14)
plt.title('No linearity correction', fontsize=14)

plt.show()

Indeed, the data does _not_ line up around 1. So linearity _is_ an issue the larger the flux received (as we already observed in the up-the ramp samples before)!

All right, let's try the same experiment but now on the linearity-corrected data:

In [None]:
corrected_last_pair = linearity_results.data[:, -1, :, :] - linearity_results.data[:, -2, :, :]
corrected_first_pair = linearity_results.data[:, 1, :, :] - linearity_results.data[:, 0, :, :]
corrected_ratio = corrected_last_pair / corrected_first_pair

Let's plot:

In [None]:
flattened_corrected_ratio = corrected_ratio.flatten()
flattened_corrected_fluences = linearity_results.data[:, -1, :, :].flatten()

In [None]:
plt.figure(figsize = (6, 4))
plt.plot(flattened_corrected_fluences, flattened_corrected_ratio, '.', alpha=0.005, color='black')
plt.plot([0, 35000], [1., 1.], 'r--')
plt.ylim(0.5, 1.5)
plt.xlim(0, 35000)
plt.xlabel('Fluence at the last group (counts)', fontsize=14)
plt.ylabel('(Last / First) Group differences', fontsize=14)
plt.title('After linearity correction', fontsize=14)
plt.show()

That looks **much** better. Note, however, that as discussed above the corrections are *not* perfect. In particular, below about 20,000 counts it seems the correction makes the last group difference to be slightly larger than the first group differences; this changes for the larger fluences, where the last group difference seems to have a _smaller_ flux than the first group differences. This is actually consistent with a _charge migration_ hypothesis, on which pixels that receive larger fluences _lose_ charge to neighboring pixels that receive them. Testing this hypothesis is, of course, outside of the present Notebook --- but this showcases that plots like the ones above are fundamental to make sense of data and the overall accuracy and precision of non-linearity corrections.

Before moving to the next step, we apply the `linearity` step to all our data:

In [None]:
for i in range(nsegments):
    # Apply the linearity step to NIS segments:
    uncal_nis[i] = calwebb_detector1.linearity_step.LinearityStep.call(uncal_nis[i])

### 4.5 Removing the Dark Current <a class="anchor" id="dark-current"></a>

One of the last steps before the most computationally expensive steps in the pipeline is the `dark_current` step. This step grabs a reference file that calculates the dark current at each group, and applies the same correction to every integration in the same way. 

It is unclear if this step is helpful at all for TSOs, where signals are typically high (and thus, the dark current is but a very small addition to the total current gathered in a TSO), but we go ahead and apply this step nonetheless in our data. First, to check what changes this step does in our data, we apply it on the first NIS segment: 

In [None]:
# Run the darkcurrent step:
darkcurrent_results = calwebb_detector1.dark_current_step.DarkCurrentStep.call(uncal_nis[0])

Let's see its impact on products before the dark current correction:

In [None]:
# Plot them:
plt.figure(figsize=(10, 3))
im = plt.imshow(uncal_nis[0].data[10, -1, :, :] / np.nanmedian(uncal_nis[0].data[10, -1, :, :]))
im.set_clim(-3, 2)
plt.colorbar(label='Normalized (to median) fluence')
plt.title('Before the DarkCurrent step:')

# Plot them:
plt.figure(figsize=(10, 3))
im = plt.imshow(darkcurrent_results.data[10, -1, :, :] / np.nanmedian(darkcurrent_results.data[10, -1, :, :]))
im.set_clim(-3, 2)
plt.title('After the DarkCurrent step:')
plt.colorbar(label='Normalized (to median) fluence')

plt.show()

Difficult to see the impact from those simple plots. 

Let's quantify "how much" dark current there is by simply calculating the average (accross integrations) percentage of dark current on the last group. The reason for doing this in the last group is that this is the group that accumulates _the most_ dark current --- as dark current grows as a function of the number of groups. 

To do this, we consider that for a non-dark current corrected signal $S_{DC}$, if we substract the dark-current corrected signal $S_{DC, corrected}$ we get the dark current signal back, i.e., $S_{DC} - S_{DC, corrected} = DC$; dividing this by the dark-current corrected signal gives us the percentage of dark current signal on each pixel. Let's calculate a map of this for all integrations and take the median of those:

In [None]:
# Calculate the (average) percent change of the signal accross all integrations --- this is (Dark Signal) / ("Real" signal):
percent = ((linearity_results.data[:, -1, :, :] - darkcurrent_results.data[:, -1, :, :]) / 
           darkcurrent_results.data[:, -1, :, :]) * 100

percent = np.nanmedian(percent, axis=0)

# Plot --- minimum and maximum are bounded to about 20:
plt.figure(figsize=(10, 3))
im = plt.imshow(percent)
im.set_clim(0, 25)
plt.colorbar(label='% of Dark Signal')
plt.title('Median impact of dark signal on the last group')
plt.show()

All right, so there _is_ an impact on the order of ~3-7% for the last group, at least on the left-hand side of the detector where there is not a lot of signal --- i.e., left-most of pixel column 500. Right-most of this, it seems the impact is very low, of order ~5\% in background pixels (i.e., close to the upper and lower edges) and even lower in the location of the spectra itself --- less than ~0.03\% at the peak signal level.

How much the above impacts a given TSO must be defined on a target-by-target basis. In the worst-case scenarios, this impact might not be purely aesthetical --- this dark current can give rise to transit depth dilutions in transiting exoplanet science, for instance; just as any non-accounted background signal.

In the case of this notebook, we apply it nonetheless to all the detector-level data in all segments, but we leave as an excercise to the reader to perform a full re-reduction with and without dark-current to see the impact of this step on this particular dataset:

In [None]:
for i in range(nsegments):
    # Apply the dark_current step to NIS segments:
    uncal_nis[i] = calwebb_detector1.dark_current_step.DarkCurrentStep.call(uncal_nis[i])

### 4.6 Correcting 1/f noise <a class="anchor" id="one_over_f"></a>

For TSOs, there is some discussion in the literature about whether attempting to remove 1/f noise at the group-level, at the rate-level (i.e., after all the steps in <code>detector1</code>) or both is the way to go --- and whether simplistic algorithms provide a quick means of removed this source of noise. The reality is that, at the time of writing, the jury is still out on the final answer. We thus encourage readers to try different methodologies and find the one that works best for their scientific use-case. As a start, an interesting reader might, e.g., skip the above 1/f removal algorithm and simply try to remove it at the rate-level --- or perform no removal at all, and see differences in the final lightcurve precision.

<!-- <div class="alert alert-block alert-info"> <b>Note on 1/f correction methods</b>: For TSOs, there is some discussion in the literature about whether attempting to remove 1/f noise at the group-level, at the rate-level (i.e., after all the steps in <code>detector1</code>) or both is the way to go --- and whether simplistic algorithms provide a quick means of removed this source of noise. The reality is that, at the time of writing, the jury is still out on the final answer. We thus encourage readers to try different methodologies and find the one that works best for their scientific use-case. As a start, an interesting reader might, e.g., skip the above 1/f removal algorithm and simply try to remove it at the rate-level --- or perform no removal at all, and see differences in the final lightcurve precision. -->

### 4.7 Detecting "jumps" in up-the-ramp samples <a class="anchor" id="jump"></a>

When a cosmic-ray hits JWST detectors, this impacts the up-the-ramp samples by making them "[jump](https://www.youtube.com/watch?v=SwYN7mTi6HM)" from one group to another. We already noted this happening above 
[when we discussed saturation](#saturation) --- a pixel was suddenly pushed above the saturation limit and the `saturation` step flagged the pixel. However, some other jumps are not as dramatic, and the data after the jump might actually be as usable as data before the jump.


#### 4.7.1 Understanding jumps and the `jump` step

To exemplify the behavior of the jumps in up-the-ramp samples, let's look at an example. Consider the behavior of pixel index `(12,1000)` in integration `67`:

In [None]:
jump_results = calwebb_detector1.jump_step.JumpStep.call(refpix_results, maximum_cores='all')

In [None]:
plt.imshow(uncal_nis[0].data[67, -1] - uncal_nis[0].data[68, -1], vmin=-100, vmax=100)
plt.colorbar()

In [None]:
plt.figure(figsize=(6, 4))

# possible examples
# column_index = 998
# row_index = 8

column_index = 213
row_index = 182

plt.title(f'Pixel index ({row_index}, {column_index})')

group = np.arange(uncal_nis[0].data.shape[1])
plt.plot(group+1, uncal_nis[0].data[67, :, row_index, column_index], 'o-', 
         color='black', mfc='white', label='Integration 67')
plt.plot(group+1, uncal_nis[0].data[66, :, row_index, column_index], 'o-', 
         color='tomato', mfc='white', label='Integration 66', alpha=0.5)
plt.plot(group+1, uncal_nis[0].data[68, :, row_index, column_index], 'o-', 
         color='cornflowerblue', mfc='white', label='Integration 68', alpha=0.5)

plt.xlabel('Group number', fontsize=16)
plt.ylabel('Counts', fontsize=16)
plt.legend()
plt.show()

While the intercept of the different up-the-ramp samples is slightly different, the _slope_ (i.e., the count-rate) of it is fairly similar for integrations 66, 67 and 68. However, integration 67 shows a clear jump at group 4, likely from a cosmic ray. Let's take a look at what happened in this integration and group in the 2D spectrum:

In [None]:
i_group = 3
i_integration = 66

plt.figure(figsize=(15, 4))

plt.subplot(1, 3, 1)
im = plt.imshow(uncal_nis[0].data[i_integration, i_group, :, :])
im.set_clim(-100, 1000)
plt.xlim(column_index-5, column_index+5)
plt.ylim(row_index-5, row_index+5)
plt.title('Integration 66, group 15')

plt.subplot(1, 3, 2)
im = plt.imshow(uncal_nis[0].data[i_integration+1, i_group, :, :])
im.set_clim(-100, 1000)
plt.xlim(column_index-5, column_index+5)
plt.ylim(row_index-5, row_index+5)
plt.title('Integration 67, group 15')

plt.subplot(1, 3, 3)
im = plt.imshow(uncal_nis[0].data[i_integration+2, i_group, :, :])
im.set_clim(-100, 1000)
plt.xlim(column_index-5, column_index+5)
plt.ylim(row_index-5, row_index+5)
plt.title('Integration 68, group 15')
plt.show()

Ah! Clearly some cosmic ray hitting around pixel `(182, 213)`, with an area of about a pixel --- including pixel `(182, 213)`. Note that the `groupdq` doesn't show anything unusual so far:

In [None]:
uncal_nis[0].groupdq[67, -1, row_index, column_index]

The JWST Calibration pipeline has an algorithm that aims to detect those jumps --- and is appropriately named the `jump` step. An important consideration when running the `jump` step is that one can use multiprocessing to run the step. This can offer dramatic speed improvements when running the step, in particular on large subarrays of data. The number of cores to use can be defined by the `maximum_cores` parameter, which can be an integer number or `all`, which will use all available cores. 

Let's run the step using all cores (this step does take some time ~4 mins):

In [None]:
for i in range(nsegments):
    uncal_nis[i] = calwebb_detector1.jump_step.JumpStep.call(uncal_nis[i], maximum_cores='all')

It's not too obvious from the messages in the pipeline what happened, but the algorithm was used to _detect_ jumps, and these are added as new data-quality flags in the `groupdq`. Let's see what happened with the pixel identified by eye above:

In [None]:
uncal_nis[0].groupdq.shape

In [None]:
uncal_nis[0].groupdq[67, -1, row_index, column_index]

Aha! It changed. What does this mean? Let's repeat the trick we learned with the `saturation` step:

In [None]:
datamodels.dqflags.dqflags_to_mnemonics(
    uncal_nis[0].groupdq[67, -1, row_index, column_index], 
    mnemonic_map=datamodels.dqflags.pixel
    )

Nice! We now have a flag that identifies when a jump detection happened. 

#### 4.7.2 Jump rates per integration

For fun, let's use the `groupdq` changes to figure out how many jumps happened per integration on this first segment of data by simple differencing with the products from the previous step, the `dark_current` step:

In [None]:
# Create an array that will store the number of jumps per integration:
njumps = np.zeros(uncal_nis[0].groupdq.shape[0])

# Iterate through integrations counting how many pixels changed in all groups:
for integration in range(uncal_nis[0].groupdq.shape[0]):

    groupdq_difference = uncal_nis[0].groupdq[integration, :, :, :] - darkcurrent_results.groupdq[integration, :, :, :]
    wherejumps = np.where(groupdq_difference != 0.)
    njumps[integration] = len(wherejumps[0])

Let's plot this:

In [None]:
integrations = np.arange(uncal_nis[0].groupdq.shape[0]) + 1

plt.figure(figsize=(6, 4))
plt.title('Number of jumps on the first segment of data for NIS')
plt.plot(integrations, njumps, 'o-', color='black', mfc='white' )
plt.xlabel('Integration', fontsize=16)
plt.ylabel('Number of jumps', fontsize=16)
plt.xlim(0.5, uncal_nis[0].groupdq.shape[0] + 0.5)
plt.show()

Very interesting! Per integration, it seems on the order of ~3,500 average jumps are detected. Each integration has (ngroups) x (number of pixels) =  70 x 32 x 2048 = 4587520 opportunities for jumps to appear, so this means an average rate of (detected events) / (total opportunities) = 0.07% per integration for this particular segment, detector and dataset.

<div class="alert alert-block alert-info"> <b>Note on the effectiveness of the <code>jump</code> detection step</b>: The <code>jump</code> detection step uses, by default, <a href="https://jwst-pipeline.readthedocs.io/en/latest/jwst/jump/description.html#multiprocessing">a two-point difference method</a> that relies on appropriate knowledge of the read-noise of the detector. In some cases, this might be significantly off (or <code>detector1</code> corrections might not be optimal as to leave significant detector effects) such that the algorithm might be shown to be too aggressive. Similarly, the algorithm relies on a decent amount of groups in the integration to work properly (larger than about 5). It is, thus, important to try different parameters to identify jumps in a given dataset and study their impact on the final products. One of the most important parameters is the <code>rejection_threshold</code>. The default value is <code>4</code>, but TSO studies in the literature have sometimes opted for more conservative values (typically larger than 10). For this particular dataset, which has a large number of groups (70), the default value works well, but it might not be optimal nor be the best for other datasets.

Before moving to the next step, we showcase one additional function from the `datamodels` which allows to save products to files --- the `save` function. This step is optional, if you want to use the `jump` step products for later use, uncomment the lines below:

In [None]:
# uncomment this line to run 
# if not os.path.exists('nis_jumpstep_seg001.fits'):
#     nsegments = 3
#     for i in range(nsegments):
#         # Save jump step is products:
#         uncal_nis[i].save('nis_jumpstep_seg00' + str(i+1) + '.fits')

### 4.8 Fitting ramps with the `ramp_fit` step <a class="anchor" id="rampfit"></a>

The last step of `detector1` is the `ramp_fit` step. This step does something that might _appear_ to be quite simple, but that in reality it's not as trivial as it seems to be: fit a line and get the associated uncertainties to the up-the-ramp samples. The reason why this is not straightforward to do is because samples up-the-ramp are correlated. That is, because signal is accumulated up-the-ramp, group number 2 has a non-zero covariance with group number 1, and so on. 

In addition, we will save the results to this step to a desired `output_dir` location and used in the next notebook where we'll generate some Lightcurvs.

#### 4.8.1 Applying the `ramp_fit` step to JWST data

The JWST Calibration pipeline algorithm performs a sensible weighting of each group in order to account for that correlation when fitting a slope on the samples. Let's run this step, and save the products in files as we go, so we can use them for the next notebook. Note that as in the `jump` step, we can also run this step via multi-processing --- and we do just that below (if not ran already):

In [None]:
# set the desired out put directory
output_dir = "data/calibrated"

if not os.path.isdir(output_dir):
    os.mkdir(output_dir)

for i in range(nsegments):
    uncal_nis[i] = calwebb_detector1.ramp_fit_step.RampFitStep.call(
        uncal_nis[i], 
        maximum_cores='all', 
        save_results=True, 
        output_dir=output_dir, )

All right, note the products of this step for TSO's are actually a list:

In [None]:
len(uncal_nis[0])

The data associated with the zeroth element of this list (`ramps_nis1[0][0].data`) has dimensions equal to the size of the frames (rows and columns). The first element (`ramps_nis1[0][1].data`), has three dimensions, the same as the zeroth but for each integration. We usually refer to this latter product as the `rateints` product --- i.e., the rates per integration:

In [None]:
uncal_nis[0][0].data.shape

In [None]:
uncal_nis[0][1].data.shape

To familiarize ourselves with these products, let's plot the rates of the 10th integration for NIS:

In [None]:
plt.figure(figsize=(12, 3))
plt.title('NIS data; rates for integration 10')
im = plt.imshow(uncal_nis[0][1].data[10, :, :])
im.set_clim(-1, 10)
plt.colorbar(label='Counts/s')
plt.show()

In case you were unsure of the units in the colorbar, you can double-check them through the `datamodels` themselves:

In [None]:
uncal_nis[0][1].search('unit')

These rates look very pretty, lets check the first element results for the 10th inegration.

In [None]:
plt.figure(figsize=(12, 3))
plt.title('NIS data; rates for integration 10')
im = plt.imshow(uncal_nis[0][1].data[10, :, :])
im.set_clim(-1, 30)
plt.colorbar(label='Counts/s')
plt.show()

These rates look _very_ good as well. 

## 5. Final words <a class="anchor" id="final-words"></a>

This completes this notebook where we have reduced and calibrated NIRISS/SOSS data of WASP-39b from program 1366 using STAGE1 of the JWST pipeline. In the next notebook, `02_niriss_soss_spec2_generate_lightcurves`, we will use the calibrated data products to extract the spectra of WASP-39b and generate some lightcurves performing similiar steps to what is done in STAGE 2 of the JWST pipeline. I would like to thank the JWST NIRISS team, especially Néstor Espinoza and Aarynn Carter for their feedback and support For this particular effort of writing these NIRISS/SOSS demonstration notebooks.