# Calibrating WFI Exposures with RomanCal

## Kernel Information and Read-Only Status

To run this notebook, please select the "Roman Research Nexus" kernel at the top right of your window.

This notebook is read-only. You can run cells and make edits, but you must save changes to a different location. We recommend saving the notebook within your home directory, or to a new folder within your home (e.g. <span style="font-variant:small-caps;">file > save notebook as > my-nbs/nb.ipynb</span>). Note that a directory must exist before you attempt to add a notebook to it.
    

## Introduction
The purpose of this notebook is to calibrate Level 1 (L1; uncalibrated ramp cube) data with the Roman WFI science calibration pipeline RomanCal (Python package name `romancal`) to produce Level 2 (L2; calibrated rate image) exposure level data. To learn more, please visit the [RDox pages on the Exposure Level Pipeline](https://roman-docs.stsci.edu/data-handbook-home/roman-stsci-data-pipelines/exposure-level-pipeline) (ELP). We also discuss calibration reference files including how to access and examine them and how to run the pipeline with custom reference files.

Details about the Roman data levels, including file naming conventions and file array names and data types, can be found in the RDox article [Data Levels and Products](https://roman-docs.stsci.edu/data-handbook-home/wfi-data-format/data-levels-and-products). 

A L1 file contains uncalibrated ramps in units of Data Numbers (DN).  L1 files are three-dimensional data cubes, one dimension for time and two dimensions for image coordinates, that are shaped as  arrays with (N resultants, 4096 image rows, 4096 image columns). For a given pixel, a resultant contains either one read or the arithmetic mean of multiple reads. 

L2 WFI files are calibrated rate images in instrumental units of DN / second.  They are two-dimensional arrays shaped as (4088 image rows, 4088 image columns). Note the smaller image size of L2 files, which is due to the removal of the 4-pixel border of reference pixels around the image during pipeline processing.

### Local Run Settings

If you want to run the notebook in your local machine, refer to the information in [local installation](../../markdown/local-run.md) instructions before proceeding with the notebook. The instructions provide inportant information about setting up your environment and installing dependnecies.

## Imports
Libraries used:
- *astropy* for coordinates manipulation and image normalization
- *copy* for making copies of Python objects
- *crds* for access to calibration reference files
- *matplotlib* and *mpl_toolkits* for plotting images
- *numpy* for array manipulation
- *romancal* for running the Roman WFI science data pipeline
- *roman_datamodels* for opening Roman WFI ASDF files
- *asdf* for opening Roman WFI ASDF files
- *os* for operating system functions
- *s3fs* for streaming files from an AWS S3 bucket

In [None]:
# --- Data & Environment Setup ---
# This cell configures reference data and environment variables.
# On the Roman Research Nexus (RNN), everything is pre-configured and
# this cell completes instantly.  For local or CI execution it will
# download any missing reference data automatically.

import os, sys
from pathlib import Path

REQUIRED_DATA = []

# Load the setup module ---------------------------------------------------
try:
    import notebook_data_dependencies as ndd
except ImportError:
    # Walk up from the notebook directory to find shared/notebook_data_dependencies.py
    _here = Path(os.getcwd())
    for _parent in [_here] + list(_here.parents):
        _candidate = _parent / 'shared' / 'notebook_data_dependencies.py'
        if _candidate.exists():
            import importlib.util
            _spec = importlib.util.spec_from_file_location('notebook_data_dependencies', _candidate)
            ndd = importlib.util.module_from_spec(_spec)
            _spec.loader.exec_module(ndd)
            break
    else:
        raise FileNotFoundError(
            'Cannot find shared/notebook_data_dependencies.py in any parent directory.\n'
            'Make sure you are running from within the roman_notebooks repo.'
        )

# Download missing reference data (no-op when env vars are already set) ---
result = ndd.install_files(packages=REQUIRED_DATA) if REQUIRED_DATA else {}
ndd.setup_env(result)  # Sets data paths + CRDS env vars


In [None]:
import os

from astropy.coordinates import SkyCoord
from astropy.visualization import simple_norm
import copy

import matplotlib.pyplot as plt
from matplotlib import colors, colormaps as cm
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
import roman_datamodels as rdm
import s3fs

### The Calibration Reference Data System (CRDS)

The Roman ELP, which corrects L1 images for detector-level effects to produce L2 images, uses calibration reference and parameter files from the [CRDS](https://roman-crds.stsci.edu/static/users_guide/overview.html). These reference files, developed and validated by STScI’s Science Operations Center, are continually updated as new WFI data become available. CRDS assigns the most appropriate reference file for each calibration step using metadata keywords and file-specific matching criteria. To use the best-available reference files for an observation, no action is needed as RomanCal will query for the best reference files for each calibration step in the pipeline.

In this tutorial, we will focus on the **[`crds`](https://roman-crds.stsci.edu/static/users_guide/index.html)** Python application programming interface (API). The [**CRDS** webserver](https://roman-crds.stsci.edu) can also be accessed to browse calibration reference files in a tabular interface. Note that there are multiple CRDS servers, though most users will interact with the Operations (OPS) instance. Please be sure to navigate to the correct webserver for the instance in which you are interested.

For more details, see the [RDox page on CRDS for Roman WFI](https://roman-docs.stsci.edu/data-handbook-home/accessing-wfi-data/crds-for-reference-files) and the [CRDS documentation](https://jwst-crds.stsci.edu/static/users_guide/web_site_use.html).

In [None]:
import crds

In [None]:
# Import romancal packages
import romancal
from romancal.pipeline import ExposurePipeline

## Tutorial Data

In this tutorial, we use L1 WFI data files simulated with Roman I-Sim. As an example, we take the output product from the [Roman I-Sim](../romanisim/romanisim.ipynb) tutorial notebook. If you have not run that simulation, the files are available in the Nexus S3 bucket. For more information on accessing these data, see the [Data Discovery and Access](../data_discovery_and_access/data_discovery_and_access.ipynb) tutorial.

## Running the ELP on L1 Data

To run the ELP on L1 data, you have two options:
1. **Basic:** Use `romancal.ExposurePipeline()` to run all steps.
2. **Advanced:** Run one or more individual steps.

### Basic Example: Full Pipeline
The input file for this example is a WFI L1 ASDF file. First, we check whether the file is already saved on disk (if the Roman I-Sim tutorial was run). If not, we stream the L1 file into memory (as a datamodel) from the Nexus S3 bucket:

In [None]:
l1_file = 'r0003201001001001004_0001_wfi01_f106_uncal.asdf'

if os.path.exists(l1_file):
    dm_l1 = rdm.open(l1_file)
else:
    s3_uri = asdf_dir_uri = 's3://stpubdata/roman/nexus/soc_simulations/tutorial_data/'
    fs = s3fs.S3FileSystem(anon=True)
    dm_l1 = rdm.open(fs.open(s3_uri + l1_file, 'rb'))

We begin by examining the data type using the `type()` function:

In [None]:
type(dm_l1)

Reading the ASDF file with `roman_datamodels` returns a `ScienceRawModel` datamodel, which is the L1 file datamodel. At this point, we can use the `.info()` method on the data to look at the file contents:


In [None]:
dm_l1.info()

We can see that this L1 file was created with Roman I-Sim as it contains the "romanisim" block.

Next, we present a basic example of running the complete pipeline.

The optional `save_results` parameter determines whether the resulting L2 datamodel is saved as a file on your Nexus storage. Setting this parameter to `True` enables file saving. In this example, we retain the calibrated L2 datamodel in memory as the variable result without saving it locally.

In [None]:
result = ExposurePipeline.call(dm_l1, save_results=False)

In [None]:
type(result)

The output from the Exposure Pipeline is an `ImageModel` object, which serves as the datamodel for L2 files.

In addition, the pipeline created three other files in the working directory with names similar to the input L1 file. These files end with `*_cat.parquet`, `*_segm.asdf`, and `*_wcs.asdf`, corresponding to a Level 4 (L4) single-band source catalog, a L4 segmentation map, and a L1 updated WCS file, respectively. The L1 WCS file provides a L2-quality World Coordinate System (WCS) when working with a L1 file, which have a different number of pixels than L2 files, without updating the original L1 file on disk. More information about working with L4 products will be added in another future tutorial. **Note:** L4 products are still being validated and should be used with caution during development.

Optional parameters can also be passed to individual pipeline steps through the steps dictionary. For example, the code below demonstrates how to skip both the source catalog step and the step that aligns the image with the Gaia astrometric catalog (the TweakReg step, named after the software used to update the WCS). Other parameters can be configured in the same way; for details, see the [romancal documentation](https://roman-pipeline.readthedocs.io/en/latest/index.html).

In [None]:
result = ExposurePipeline.call(dm_l1, save_results=False, steps={'source_catalog': {'skip': True}, 'tweakreg': {'skip': True}})

At the end of the pipeline log messages, you will see that the SourceCatalog and TweakReg steps were skipped, as expected. Because the source catalog step was omitted, the L4 source catalog and segmentation map files were not regenerated. To verify the status of a step, you can also inspect the metadata of the output datamodel:

In [None]:
result.meta.cal_step.source_catalog

More information on these steps can be found in the [Exposure Pipeline](https://roman-docs.stsci.edu/data-handbook-home/roman-data-pipelines/exposure-level-pipeline#ExposureLevelPipeline-PipelineStepDescriptions) article on RDox.

Note that the ramp-fitting step transforms the structure of the data — in other words, the data models before and after ramp fitting are intrinsically different. Therefore, steps following ramp fitting cannot be applied to data that has not undergone it, and similarly, steps preceding ramp fitting cannot be applied once the data has.

We can save our L2 datamodel to disk with the `.save()` method:

In [None]:
result.save('my_roman_l2_file.asdf')

If you look at the file browser in the directory where you ran this tutorial, you should see a new file called "my_roman_l2_file.asdf". Note that you may need to wait a moment or manually refresh the file browser before it appears.

### Advanced Example: Running Individual Pipeline Steps

Now, for a more advanced use case, let's update the WCS based on the pointing information. For example, suppose we simulated an L1 file, processed it with the ELP, and now want to try shifting the pointing information and creating a new WCS to test the Gaia alignment. After editing any of the `meta.wcsinfo` values we wish to change, we can generate a new WCS by running the AssignWcsStep on our L2 ASDF file.

Let's start by reading in a fresh L2 file:

In [None]:
l2_file = 'r0003201001001001004_0001_wfi01_f106_cal.asdf'

if os.path.exists(l2_file):
    dm = rdm.open(l2_file)
else:
    s3_uri = asdf_dir_uri = 's3://stpubdata/roman/nexus/soc_simulations/tutorial_data/'
    fs = s3fs.S3FileSystem(anon=True)
    dm = rdm.open(fs.open(s3_uri + l2_file, 'rb'))
    original_wcs = copy.deepcopy(dm.meta.wcs)

Let's take a quick look at the file we just opened:

In [None]:
dm.info()

Before aligning the images with the Gaia coordinates, the WCS in an L2 file is populated using the telescope pointing information, resulting in the so-called "coarse WCS". The `meta.pointing` section of the metadata describes the spacecraft pointing, while the detector-dependent information used to construct the coarse WCS is contained in the `meta.wcsinfo` section. Although the values in `meta.pointing` and `meta.wcsinfo` are linked, the coarse WCS relies only on `meta.wcsinfo`. Let’s examine our `meta.wcsinfo` values:

In [None]:
dm.meta.wcsinfo

Let's focus on the `ra_ref`, `dec_ref`, and `roll_ref` keywords. Let's first take a look at the descriptions of these fields:

In [None]:
print(f"ra_ref = {dm.schema_info(path='roman.meta.wcsinfo.ra_ref')['description']}")
print(f"dec_ref = {dm.schema_info(path='roman.meta.wcsinfo.dec_ref')['description']}")
print(f"roll_ref = {dm.schema_info(path='roman.meta.wcsinfo.roll_ref')['description']}")

The "reference pixel position" mentioned in these descriptions is located at the center of each WFI detector (each detector has its own WCS). We can edit these values to make the pipeline create a WCS solution that is slightly different from the original. Let's make a copy of the data (for comparison later) and apply a simple shift of 1 arcsecond in right ascension:

In [None]:
original_ra_ref = copy.copy(dm.meta.wcsinfo.ra_ref)
dm.meta.wcsinfo.ra_ref += (1 / 3600)

print(f'Original ra_ref = {original_ra_ref},\nUpdated ra_ref = {dm.meta.wcsinfo.ra_ref}')

Next, let's run AssignWcsStep on the data:

In [None]:
result = romancal.assign_wcs.AssignWcsStep.call(dm)

Let’s now verify whether the coordinate system has changed as expected by comparing the right ascension and declination of a given pixel in the two WCS reference systems. For simplicity, we’ll use the center of the L2 image, which corresponds to (x, y) = (2043.5, 2043.5) in 0-indexed pixels. The corresponding sky coordinates can be easily obtained using `astropy.coordinates.SkyCoord` objects:

In [None]:
# Get SkyCoord object for new position at center of detector
ra, dec = result.meta.wcs(2043.5, 2043.5)
result_coord = SkyCoord(ra=ra, dec=dec, unit='deg')
result_coord

# Get SkyCoord object for original position at center of detector
ra0, dec0 = original_wcs(2043.5, 2043.5)
original_coord = SkyCoord(ra=ra0, dec=dec0, unit='deg')
original_coord

# Compute the separation between the updated and original positions
result_coord.separation(original_coord)

As we can see, the newly updated WCS is shifted by approximately 1 arcsecond relative to the WCS in the original L2 file. The offset is not exactly 1 arcsecond because the WFI pixel grid is slightly rotated with respect to the celestial coordinate system. In other words, the detector axes are not perfectly aligned with the vertical (declination) and horizontal (right ascension) directions on the sky.

As in our pipeline example above, we can also pass optional arguments to individual steps. This is useful if we want to use our own version, or an older version, of a reference file. 

## Reference Files

As mentioned above, several of the pipeline steps apply reference files to the data to correct for specific detectors' effects.
Common examples of reference files applied during the processing of imaging data include the **Bad Pixel Mask**, **Dark**, and **Flat** (see below details on each). More information on WFI reference file types can be found in the RDox article [CRDS for Reference Files](https://roman-docs.stsci.edu/data-handbook-home/accessing-wfi-data/crds-for-reference-files).

**IMPORTANT NOTE:** Reference files are a work in progress and will be updated several times before Roman launch. If you notice irregularities or missing information, please understand that they may be a known issue. If you have questions, please contact the [Roman Help Desk](https://romanhelp.stsci.edu).

### Bad Pixel Masks

Bad pixels are masked during the Data Quality (DQ) initialization step.  The MASK reference file, which contains static bad pixel flags (i.e., locations of dead pixels, telegraph pixels, etc.) for each detector, populates the data quality (DQ) `dq` array of the L2 calibrated rate image files after processing by RomanCal. During RomanCal processing, the MASK file is used in the `romancal.dq_init.DQInitStep()` step (the first step in the ELP) to populate an array called `pixeldq` in the RampModel datamodel, which is used within the pipeline. DQ flags are combined with additional DQ flags from reference file DQ arrays during subsequent processing steps using `bitwise_or`.

These mask reference files are created from dark or flat calibration datasets. Different types of pixels are flagged based on their pixel values (i.e., dead pixels with significantly reduced detector response) or behavior up-the-ramp (i.e., telegraph pixels jump between two electronic states).

For more details, see the [romancal documentation](https://roman-pipeline.readthedocs.io/en/latest/roman/dq_init/index.html) and [Rdox documentation](https://roman-docs.stsci.edu/data-handbook-home/roman-data-pipelines/exposure-level-pipeline#ExposureLevelPipeline-dq_init) for DQ initialization.

### Darks 

The DARK reference file is selected based on the WFI mode (imaging or spectroscopy) used to obtain the science data. During the `romancal.dark_current.DarkCurrentStep()` step, the dark current is subtracted off on a pixel-by-pixel and resultant-by-resultant basis. Pixels that are undefined in the dark reference file will not be subtracted from the science data.

The dark reference files are created from dark calibration datasets.  A set of dark files are sigma clipped, and stacked resultant-by-resultant to create a super dark.

For more details, see the [romancal documentation](https://roman-pipeline.readthedocs.io/en/latest/roman/dark_current/index.html) and [Rdox documentation](https://roman-docs.stsci.edu/data-handbook-home/roman-data-pipelines/exposure-level-pipeline#ExposureLevelPipeline-dark_current) for dark current subtraction.


### Flats

The FLAT reference file is selected based on the optical element used to obtain the science data.  It contains the flatfield data which corrects for both large-scale and pixel-to-pixel sensitivity variations, ensuring uniform data for a specific imaging filter.  During the `romancal.flatfield.FlatFieldStep()` step, the science array is divided by the flatfield reference array for the matching filter, effectively "flattening" the variations in the data.  Pixels with negative values or that are flagged will be skipped and not updated.

The flat reference file is created from flat calibration datasets.  A set of flat exposures with the same filter within some date range to the science data are used to compute flat rate images. These are averaged together and normalized, producing a filter-dependent flat rate image. Spectroscopic mode observations are not flatfielded by RomanCal, and there are no flat reference files available for the WFI grism or prism available from the Science Operations Center at STScI. For grism and prism observations, the flatfield step will be skipped.

For more details, see the [romancal documentation](https://roman-pipeline.readthedocs.io/en/latest/roman/flatfield/index.html) and [Rdox documentation](https://roman-docs.stsci.edu/data-handbook-home/roman-data-pipelines/exposure-level-pipeline#ExposureLevelPipeline-flatfield) for flat fielding.


### Retrieving Reference Files

As you run the exposure pipeline, the most up-to-date reference files will be automatically selected for each step. However, if you would like to use a specific reference file, these can be retrieved through the `crds` Python API and the ELP run with those files (more on that later). Let's begin with how to access reference files from CRDS.

First, let's start by looking at the `crds.getrecommendations()` function. This function returns a dictionary of file names that match the criteria that you supply. Selection criteria are specified in a dictionary of key-value pairs. Each Roman WFI metadata keyword in the dictionary is all-caps and always begins with "ROMAN.META.". The remaining parts of the string correspond to the metadata keyword locations in the science data file schema. Different reference file types require different combinations of science metadata to match to the reference files. In general, all reference file types will require the instrument name ("INSTRUMENT.NAME") and start time ("EXPOSURE.START_TIME"). Most file types require the detector name ("INSTRUMENT.DETECTOR"), and some file types require the exposure type ("EXPOSURE.TYPE") or optical element ("INSTRUMENT.OPTICAL_ELEMENT").

For the mask, dark, and flat files in particular, the required keywords are:
- mask
    - ROMAN.META.INSTRUMENT.NAME
    - ROMAN.META.INSTRUMENT.DETECTOR
    - ROMAN.META.EXPOSURE.START_TIME
- dark
    - ROMAN.META.INSTRUMENT.NAME
    - ROMAN.META.INSTRUMENT.DETECTOR
    - ROMAN.META.EXPOSURE.TYPE
    - ROMAN.META.EXPOSURE.START_TIME
- flat
    - ROMAN.META.INSTRUMENT.NAME
    - ROMAN.META.INSTRUMENT.DETECTOR
    - ROMAN.META.INSTRUMENT.OPTICAL_ELEMENT
    - ROMAN.META.EXPOSURE.START_TIME

These keywords may be combined into a single dictionary to find multiple reference file types using `crds.getreferences()`. For example, if you would like to find the name of the dark and flat reference files used by the pipeline, you could run the following example:

In [None]:
meta = {'ROMAN.META.INSTRUMENT.NAME': 'WFI',
        'ROMAN.META.INSTRUMENT.DETECTOR': 'WFI01',
        'ROMAN.META.INSTRUMENT.OPTICAL_ELEMENT': 'F158',
        'ROMAN.META.EXPOSURE.TYPE': 'WFI_IMAGE',
        'ROMAN.META.EXPOSURE.START_TIME': '2024-01-01 00:00:00'
       }

ref_files = crds.getrecommendations(meta, reftypes=['mask', 'dark', 'flat'], observatory='roman')

The `ref_files` variable now contains a dictionary for each of the reference file types you requested (MASK, DARK, and FLAT). These are the reference files that correspond to a science observation taken at midnight UTC on January 1, 2024 in the WFI imaging mode with optical element F158 and detector WFI01. Let's take a look at the names of the files CRDS returned:

In [None]:
ref_files

We can also use `crds.getreferences()` to accomplish the same thing; however, `getreferences()` goes one step further beyond `getrecommendations()` and will download the reference files if they are not already in your local cache. Using the same example as above:

In [None]:
meta = {'ROMAN.META.INSTRUMENT.NAME': 'WFI',
        'ROMAN.META.INSTRUMENT.DETECTOR': 'WFI01',
        'ROMAN.META.INSTRUMENT.OPTICAL_ELEMENT': 'F158',
        'ROMAN.META.EXPOSURE.TYPE': 'WFI_IMAGE',
        'ROMAN.META.EXPOSURE.START_TIME': '2024-01-01 00:00:00'
       }

ref_files = crds.getreferences(meta, reftypes=['mask', 'dark', 'flat'], observatory='roman')

And once again we can examine the output of `ref_files`:

In [None]:
ref_files

This time, `ref_files` contains the path to the file in the local cache (controlled by the `CRDS_PATH` environment variable) since we did not simply ask for the file name but also checked if the file was in our cache, and if it was not then we downloaded it.

### CRDS Mapping Files

CRDS organizes the reference files using mapping files. The first mapping file, the IMAP, simply describes the instruments available for the observatory. In this case, we use the WFI IMAP and no action is needed by the user to select this. The PMAP file (commonly referred to as the CRDS context) is set by the the `CRDS_CONTEXT` environment variable. The context is also CRDS server-dependent, which is set by the environment variable `CRDS_SERVER_URL`. Let's look at the values for both of those in this environment:

In [None]:
print(f"CRDS server location: {os.environ.get('CRDS_SERVER_URL')}")
print(f"CRDS context file: {os.environ.get('CRDS_CONTEXT')}")

The PMAP file contains the name of the IMAP and a list of RMAP file names as well as their contents. The RMAP files, one for each reference file type, list the names of the reference files and their selection criteria, which matches them to science observations. For more information on the mapping rules, see the [readthedocs documentation](https://roman-crds.stsci.edu/static/users_guide/rmap_syntax.html). Let's take a look at our PMAP file first:

In [None]:
pmap = crds.rmap.load_mapping(os.environ.get('CRDS_CONTEXT'))

This `pmap` object contains all of the mapping information for every reference file in this context. This is a lot to look at, so let's get the names of the mapping files contained within so we can look at one more closely:

In [None]:
maps = pmap.mapping_names()

Now let's isolate the name of just the MASK rmap. We can pick the name above, but here we show how to do this programmatically:

In [None]:
for m in maps:
    if 'mask' in m:
        mask_rmap = m 
        break
    else:
        pass

masks = crds.rmap.load_mapping(mask_rmap)

Now that we have loaded up the RMAP, let's take a look at it using the `todict()` method to turn it into a dictionary we can more easily visualize:

In [None]:
masks.todict()

The `parkey` key tells us the matching criteria unique to this reference file type from the dictionary we created when we used `crds.getrecommendations()` and `crds.getreferences()` (not shown are matching criteria always required by CRDS). Also notice that the `parameters` field tells us the column names for the `selections` part of the dictionary. In this case, we see they include "ROMAN.META.INSTRUMENT.DETECTOR", "USEAFTER", and "REFERENCE". The REFERENCE column is simply the name of the reference file that matches those critiera. The USEAFTER date is the date after which a file should be used for a science observation. CRDS will match the file with the closest **preceding** USEAFTER date to the science observation date (given by "ROMAN.META.EXPOSURE.START_TIME").

### Download a File by Name

If you know the specific reference file name and would like to download it directly from the CRDS on-premise server, you can call it via a command line option:

```
crds sync --files <filename> --output-dir=<pathname>
```

where `<filename>` is the name of the reference file (e.g. "roman_wfi_mask_0022.asdf") and `<pathname>` is the location where the file will be saved. **Note:** in the future, Roman reference files will also be available via an AWS S3 bucket, and these instructions will be updated to describe how to access them there. 

To run this in a Jupyter notebook, we write out the command as a string and pass it to the command line script code:

In [None]:
cmd = f"crds.sync --files {ref_files['mask']}"
_ = crds.sync.SyncScript(cmd)()

In this case, nothing happened since we had already previously downloaded this file into our cache using `crds.getreferences()`. But if you know the file name of a reference file that you want to retrieve from CRDS without using the matching criteria, the cell above would download the file to your cache. Simply replace `{ref_files['mask']}` in the `cmd` string with the name of the ASDF file.

### Examining Reference Files

Reference files use `roman_datamodels` just like WFI science data products and can be accessed in the same way (see the tutorial [Working with ASDF](../working_with_asdf/working_with_asdf.ipynb) for more information). Let's take a closer look at the files we retrieved from our `crds.getreferences()` example starting with the mask file:

In [None]:
mask = rdm.open(ref_files['mask'])
mask.info()

We see that the mask file contains metadata and a single array called `dq`. If we display the `dq` array, then we can see all of the features that have been flagged. The [Working with ASDF](../working_with_asdf/working_with_asdf.ipynb) tutorial gives more information about how to parse the meanings of the DQ flags. For now, let's plot two versions of the file, one with each bitwise sum of DQ flags on a color map and another flattened version with "good" (DQ = 0; black pixels in the right-hand panel of the plot below) and "bad" (DQ >= 1; white pixels in the right-hand panel of the plot below) flags:

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(14, 14))

# Make a copy of the colormap so we can reset non-finite numbers to black
my_cmap = copy.copy(cm.get_cmap('nipy_spectral'))
my_cmap.set_bad((0, 0, 0))

# Display the mask file with a log normalization using the updated color map
im = axs[0].imshow(mask.dq, origin='lower', norm=colors.LogNorm(vmin=1, vmax=5e5), cmap=my_cmap)
divider = make_axes_locatable(axs[0])
cax = divider.append_axes("right", size="5%", pad=0.05)
axs[0].set_xlabel('Science X (pixels)')
axs[0].set_ylabel('Science Y (pixels)')
axs[0].set_title('Color-Coded DQ Flags')
fig.colorbar(im, cax=cax)

# Display the mask file with boolean good and bad values on a grayscale map
axs[1].imshow(np.bool(mask.dq), origin='lower', cmap='binary_r')
axs[1].set_xlabel('Science X (pixels)')
axs[1].set_ylabel('Science Y (pixels)')
axs[1].set_title('Good vs Bad Flags')

plt.tight_layout();

Note that not all DQ flags > 0 are necessarily bad. Many DQ flags are informational about detector effects or note something about the data processing. It is important for you to decide what effects are important for your science case.

Next, we'll take a look at a dark. This isn't as visually interesting for the WFI as the dark current is so low, so we will simply display the file information and discuss the contents.

**Note:** Dark reference files from the Science Operations Center at STScI are still being developed. The dark files currently in CRDS are placeholders with all-zero values to enable simulations and pipeline processing. Updated darks are expected in early 2026.

In [None]:
dark = rdm.open(ref_files['dark'])
dark.info()

In addition to metadata, we see there are several arrays. The `data` array contains a cube of dark values up-the-ramp. This is used in the current version of the ELP, however changes are being made towards a dark rate subtraction post-ramp-fitting. The `dark_slope` array contains the dark rate per pixel, and the `dark_slope_error` is the uncertainty in the dark rate. Finally, a `dq` array is present to flag effects in the dark current (such as hot pixels).

Finally, let's examine a flat reference file:

In [None]:
flat = rdm.open(ref_files['flat'])
flat.info()

Once again, we see a `data` array, which in this case is the flatfield values, a `dq` array for flagging effects in the flatfield, and an `error` array. Let's take a look at the flatfield for this detector:

In [None]:
fig, axs = plt.subplots(1, 3, figsize=(14, 14))

norm = simple_norm(flat.data, percent=99.5)
data = axs[0].imshow(flat.data, cmap='gray', norm=norm, origin='lower')
axs[0].set_xlabel('Science X (pixels)')
axs[0].set_ylabel('Science Y (pixels)')
axs[0].set_title('Flatfield')
divider = make_axes_locatable(axs[0])
cax = divider.append_axes("right", size="5%", pad=0.05)
fig.colorbar(data, cax=cax)

norm = simple_norm(flat.err, percent=99.5)
err = axs[1].imshow(flat.err, cmap='gray', norm=norm, origin='lower')
axs[1].set_xlabel('Science X (pixels)')
axs[1].set_ylabel('Science Y (pixels)')
axs[1].set_title('Flatfield Error')
divider = make_axes_locatable(axs[1])
cax = divider.append_axes("right", size="5%", pad=0.05)
fig.colorbar(err, cax=cax)

axs[2].imshow(np.bool(flat.dq), cmap='binary_r', origin='lower')
axs[2].set_xlabel('Science X (pixels)')
axs[2].set_ylabel('Science Y (pixels)')
axs[2].set_title('Flatfield DQ')

plt.tight_layout();

## How to Override RomanCal with Local Reference Files

If you have a local reference file that you would like to use in RomanCal processing, either one you retrieved from CRDS or one that you made yourself, then you can supply that when you run the ELP as an optional argument. The new reference file you indicate will override the CRDS best reference file selection from the server. Each reference file type has its own override parameter name (e.g., `override_mask`) and must be passed as arguments to the relevant step. Let's look at an example now where we run the pipeline and override the mask and flat files. First, we need new reference files to use. For this example, let's run `crds.getreferences()` and get new files for a different set of observation parameters, and then we can use those to override the best references. Our example is generally a bad idea, but for real applications you will likely already have a file on your disk storage that you want to use. Our example is to show how to override the reference file selection.

For our example, we will process a WFI01 L1 file with the ELP, but we will override the mask and flat files to be those from WFI04. Again, **this is not recommended** but simply designed to show how to override the reference file selection used by the pipeline if you have your own calibration reference files you want to use.

**Note:** In `romancal` version 0.20.2 (August 2025), there is a bug when overriding reference files. Please make sure to disable the source catalog and tweakreg steps (as shown below) when overriding reference files to bypass this bug. This bug is already fixed in the next `romancal` version (0.21.0; November 2025), which is undergoing testing and validation.

In [None]:
meta = {'ROMAN.META.INSTRUMENT.NAME': 'WFI',
        'ROMAN.META.INSTRUMENT.DETECTOR': 'WFI04',
        'ROMAN.META.INSTRUMENT.OPTICAL_ELEMENT': 'F158',
        'ROMAN.META.EXPOSURE.START_TIME': '2024-01-01 00:00:00'
       }

ref_files = crds.getreferences(meta, reftypes=['mask', 'flat'], observatory='roman')

result = ExposurePipeline.call(dm_l1, save_results=False, steps={
                'source_catalog': {'skip': True},
                'tweakreg': {'skip': True},
                'dq_init': {'override_mask': ref_files['mask']},
                'flatfield': {'override_flat': ref_files['flat']}})

Now let's take a look at our `result` variable, which is our L2 datamodel in memory. Specifically, we can check the `meta.ref_file` section to see the names of the reference files used:

In [None]:
for k, v in result.meta.ref_file.items():
    print(f'{k} = {v}')

Indeed, we see that many of the files used came from CRDS (note the file names begin with "crds://") whereas the flat and mask files contain a local path to the files we selected.

## About this Notebook
**Author:** T. Desjardins.

**Updated On:** 2025-12-10

<table width="100%" style="border:none; border-collapse:collapse;">
  <tr style="border:none;">
    <td style="border:none; width:180px; white-space:nowrap;">
       <a href="#top" style="text-decoration:none; color:#0066cc;">↑ Top of page</a> 
    </td>
    <td style="border:none; text-align:center;">
       <img src="../../roman_logo.png" width="50">
    </td>
    <td style="border:none; text-align:right;">
       <img src="../../stsci_logo2.png" width="90">
    </td>
  </tr>
</table>