# How to Open Roman Data Files (ASDF)

***

## Kernel Information and Read-Only Status

To run this notebook, please select the "Roman Calibration" kernel at the top right of your window.

This notebook is read-only. You can run cells and make edits, but you must save changes to a different location. We recommend saving the notebook within your home directory, or to a new folder within your home (e.g. <span style="font-variant:small-caps;">file > save notebook as > my-nbs/nb.ipynb</span>). Note that a directory must exist before you attempt to add a notebook to it.

## Imports

- *numpy* for array operations
- *asdf* for ASDF input/output
- *roman_datamodels* to handle input/output and validation of data models
- *matplotlib.pyplot* for plotting data
- *astropy.units* to handle units
- *astropy.time* to handle time
- *astropy.coordinates* to handle celestial coordinates
- *pprint* for enhanced printing

In [None]:
%matplotlib inline
import numpy as np
import asdf
import roman_datamodels as rdm
from roman_datamodels.dqflags import pixel as dqflags
import matplotlib.pyplot as plt
import astropy.units as u
import astropy.time
from astropy.coordinates import SkyCoord
from pprint import pprint
import s3fs

## Introduction

The main goal of this notebook is to illustrate how to open and handle Roman Wide Field Instrument (WFI) data. WFI data are stored in [Advanced Scientific Data Format (ASDF)](https://asdf-standard.readthedocs.io/) files, which combine human-readable hierarchical metadata structure with binary array data. ASDF files are self-validating using pre-defined schema.

There are tools to interact with ASDF files in Python, Julia, C/C++, and IDL. In this example we focus on the Python interface.

Roman ASDF files can be opened and manipulated using two main approaches: 
1. using the `roman_datamodels` library, and
2. using the `asdf` library.

Using `roman_datamodels` offers the advantage of loading different data blocks as `stnode`-based objects, providing access to their methods. In contrast, the `asdf` library loads the data blocks as they were serialized on disk. While this approach loses some of the `roman_datamodels` capabilities, it also provides more flexibility. In this notebook, we illustrate both approaches, starting with loading data via `roman_datamodels`.

Additional information about Roman ASDF files can be found in the [Introduction to ASDF](https://roman-docs.stsci.edu/data-handbook-home/wfi-data-format/introduction-to-asdf) article on RDox.

***

## Quick start

All Roman data products conform to one of the datamodels described by the [`roman_datamodels`](https://roman-datamodels.readthedocs.io/en/latest/) package. This package wraps the `asdf` library and provides utilities to read and save data conforming to the official data models. We illustrate how to use `roman_datamodels` to load data from an ASDF file containing simulated Roman data.

In [None]:
asdf_dir_uri = 's3://roman-sci-test-data-prod-summer-beta-test/'
fs = s3fs.S3FileSystem()

asdf_file_uri_l2 = asdf_dir_uri + 'AAS_WORKSHOP/r0003201001001001004_0001_wfi01_f106_cal.asdf'

with fs.open(asdf_file_uri_l2, 'rb') as fb:
    af = asdf.open(fb)
    f = rdm.open(af).copy()

Notice that we used the `asdf.open()` command to open the byte stream, and then passed that object to `roman_datamodels.open()`. This is necessary at present as `roman_datamodels` does not allow for reading of a byte stream in this manner.

A high-level summary of the file can be retrieved by using the `info()` method. We have limited the number of rows printed to 30, but if you want to see all rows, you can change that number to your liking or to `None` in order to see all rows. There is a similar option for `max_cols` if you want to change the horizontal cutoff per line. The default number of rows and columns is 24 and 120, respectively.

In [None]:
f.info(max_rows=30)

Note that, by default, the `open()` method does not load the data in memory unless told to do so explicitly, which makes opening ASDF files a quick operation. 

At this point, we have information about the names and types of the different data blocks, but we don't have access to the data until we load them, which we can do by using them. For example:

In [None]:
f.data

An ASDF object can be used, effectively, like a nested dictionary. Each block can be explored via the `.keys()` attribute. For example, we can retrieve the list of keys in a Level 2 calibrated rate image file as:

In [None]:
for key in f.keys():
    print(key)

We can also find all of the keys within one of these blocks, such as the metadata. Note that here we are using the dot syntax notation (i.e., `f.meta`) to retrieve the metadata. You can also use brackets to subscript the datamodel (e.g., `f['meta']`). Dot syntax is allowed by datamodel objects in `roman_datamodels`, whereas ASDF objects (shown later in the tutorial) can only use the bracket subscript notation.

In [None]:
for key in f.meta.keys():
    print(key)

We focus on the data block, containing the science image of interest. First, how do we know which array in the file is the primary data array? It could have any name, for example "data" or "science." If we are not sure, we can ask the file itself:

In [None]:
f.get_primary_array_name()

The creators of the datamodel have told us explicitly that the primary array name in this case is "data." This may not be true for all Roman WFI ASDF files (e.g., calibration reference files), so it is always worth checking if you are not sure. Next, let's look at the type of the `data` block:

In [None]:
type(f.data)

Note that Roman images are expressed as `numpy.ndarray` objects. The units are available in the schema descriptions for the arrays (see below), but quickly the data arrays are:

- Level 1 (L1; uncalibrated 3-D ramp cubes) are in units of Data Numbers (DN)
- Level 2 (L2; calibrated 2-D rate images) are in units of DNs per second (DN/s)
- Level 3 (L3; 2-D mosaic co-adds) are in units of megaJanskys per steradian (MJy/sr)

Error arrays are in the same units as data, and variance arrays are the same units squared (e.g., DN^2 / s^2).

Let's take a look at the size of our image and some sample values in a small 3x3 cutout from the bottom-left corner of the array:

In [None]:
print('Size of f.data: ', f.data.shape)
print('\nExploring the values of f.data: \n', f.data[:3, :3])

Since we have image data, let's also take a quick look at what the image actually contains. This is quite simple, and a more detailed explanation about visualizing Roman ASDF files can be found in the [Data Visualization](../data_visualization/data_visualization.ipynb) tutorial. Below is a 1,000 x 1,000 pixel section of the data array:

In [None]:
plt.imshow(f.data[:1000, :1000], vmin=0, vmax=2, origin='lower');

As with array data stored in other file types, we can perform analyses on the arrays in memory. For example, we can check the image content by building a 1-D historgram of the its values:

In [None]:
fig, ax = plt.subplots(figsize=(12, 6), layout='tight')
ax.hist(f.data.flatten(), histtype='step', range=(-0.2, 1.7), bins=200);
ax.set_xlabel('Pixel Value', fontsize=14)
ax.set_ylabel('N / 1000', fontsize=14)
ax.tick_params(axis='both', labelsize=14);

We can explore other data blocks such as the data quality (DQ) array. The values of the DQ array are the bitwise sum of the individual flags representing specific effects. These flags are defined in the [RomanCal documentation](https://roman-pipeline.readthedocs.io/en/latest/roman/references_general/references_general.html#data-quality-flags). These can also be retrieved from `roman_datamodels.dqflags.pixel()`. As a reminder, we aliased `roman_datamodels.dqflags.pixel()` in our import statement at the start of the tutorial as `dqflags()`. Let's start by making a list of all of the unique values in the DQ array:

In [None]:
unique_dq = np.unique(f.dq)
print(unique_dq)

Now that we have the list of unique DQ values, we can decompose the values into individual flags and print the number of pixels with each unique DQ value:

In [None]:
size = np.size(f.dq)

# Number of good pixels
npix = np.shape(f.dq[f.dq==0])[0]
print("------------")
print(f'Flag 0 (affected pixels = {npix}; {npix / size:.2%}))')
print(f'0: {str(dqflags(0)).split('.')[1]}')

# Pixels with non-zero DQ flags
for uu in unique_dq[1:]:
    br = np.binary_repr(uu)
    npix = np.shape(f.dq[f.dq==uu])[0]
    print("------------")
    print(f'Flag {uu} (affected pixels = {npix}; {npix / size:.2%})')
    for ii, cc in enumerate(br[::-1]):
        if int(cc)==1:
            print(f'{2**ii}: {str(dqflags(2**ii)).split('.')[1]}')

If we want to get a report of how many pixels are impacted by specific DQ flags (e.g., all saturated pixels) regardless of other flags set, we can do that, too using the Python `&` operator (bitwise AND):

In [None]:
bit = 2
definition = str(dqflags(bit)).split('.')[1]
n_pix = np.sum(f.dq.flatten() & bit)
print(f'Bit value {bit} corresponds to {definition}')
print(f'Number of {definition} pixels: {n_pix:,} ({n_pix / f.dq.size:.2%})')

## Exploring metadata

One of the advantages of ASDF is its extensibility, and the ability to store human-readable hierarchical metadata. Let's further explore the metadata.

In [None]:
meta = f['meta']
type(meta)

As we can see, `meta` is a dictionary type object. What if instead of using the bracket notation we use the dot notation discussed previously?

In [None]:
meta = f.meta
type(meta)

Suddenly it's a `roman_datamodels.stnode._node.DNode` object! Despite this difference in object type, we can treat both this and a dictionary the same in most ways. However, an advantage of the dot syntax and the `roman_datamodels.stnode._node.DNode` object is that we retain information about the schema, which we lose if we convert the metadata to a dictionary object. We previously showed how to get the list of keys in the metadata, but as a reminder let's do it again here for easy reference:

In [None]:
for key in meta.keys():
    print(key)

Printing the whole of the metadata is quite long, so we will instead print a small subsection:

In [None]:
print(meta.instrument)

As shown above, the `meta` data block contains a lot of useful metadata information. Two of the most typical keys, for example, are the `wcs` key, containing information about the World Coordinate System (WCS; see below), and also the `photometry` key, containing information about how to transform units from instrumental (DN/s) to physical (MJy/sr).

Let's take a look at the schema information for `meta.instrument`. Note that this can be quite difficult to read, but is very rich in information about the contents, data types, allowed values, and mapping to other information (e.g., the storage location of a metadata field in the MAST Archive Catalog database) for every component of Roman WFI ASDF files.

In [None]:
pprint(meta.instrument.get_schema())

We can also use this to get the description of a specific metadata field:

In [None]:
print(meta.instrument.get_schema()['properties']['detector']['description'])

This can be alternatively written as:

In [None]:
print(f.schema_info(path='roman.meta.instrument.detector'))

### Taking advantage of `astropy.time.Time` objects in the metadata

Another feature in WFI ASDF metadata is the storage of times as `astropy.time.Time` objects, which provide numerous convenient methods for converting to different reference systems and formats. Here we illustrate a few examples. For a more comprehensive view of `astropy.time` please check the [astropy.time](https://docs.astropy.org/en/stable/time/) documentation. Note that, unless otherwise noted, WFI times are stored in Coordinated Universal Time (UTC), which is indicated in the schema descriptions for any time-related fields. However, be sure to check the field descriptions if you are unsure.

In [None]:
start_time = meta.exposure.start_time
print('Start time of the exposure:', start_time, '; datatype:', type(start_time))

We can convert the format of this start time to a modified Julian date (MJD) very easily:

In [None]:
start_time.mjd

If instead we want to convert the scale of the time (i.e., from UTC to International Atomic Time (TAI)), we can do that, too:

In [None]:
start_time.tai

Notice that the time changed by 37 seconds when we converted from UTC to TAI. This offset is expected and is part of the TAI definition. We can combine the scale change with the format change as well:

In [None]:
start_time.tai.mjd

We can use `Time` objects and operate with them. For example, if we want to know the difference in time between the start and end times of the exposure (this creates a `astropy.time.TimeDelta` object):

In [None]:
end_time = meta.exposure.end_time
exp_delta = end_time - start_time

And then express the exposure length in different units:

In [None]:
print('Exposure length in seconds:', exp_delta.to(u.s))
print('Exposure length in days:', exp_delta.to(u.day))
print('Exposure length in years:', exp_delta.to(u.year))

### Accessing WCS Information

Roman uses Generalized World Coordinate System standard ([GWCS](https://gwcs.readthedocs.io)). The WCS can be found in the `wcs` key within the `meta` block.

In [None]:
gwcs = f.meta.wcs
print(type(gwcs))

If we use the pretty-print (`pprint()`) function, we can see the full contents of the WCS object.

In [None]:
pprint(gwcs)

If instead we use the `print()` function, we get a summary of the transforms available:

In [None]:
print(gwcs)

The `gwcs` object can be used to convert between image pixel and sky coordinates.

**Important note:** the `gwcs` object uses Python 0-indexing, therefore the center of the first pixel in Python is (0, 0), while the formal definition of the WFI science coordinate system uses FITS-style 1-indexing (i.e., the center of the bottom-left pixel is (1, 1)). More information about the Roman coordinate systems can be found in the [PySIAF for Roman](https://roman-docs.stsci.edu/simulation-tools-handbook-home/simulation-development-utilities/pysiaf-for-roman) article on RDox. **All** archived L1-4 data products (e.g., WCS transforms, catalogs, etc.) will use the Python 0-indexed system.

In this example, let's convert the central pixel position of the detector to the corresponding right ascension and declination on the sky. The center of the L2 image array in the zero-indexed science coordinate frame is (x, y) = (2043.5, 2043.5) pixels. Note that the 4-pixel reference border was removed during processing, and thus the total L2 image size is 4088 rows x 4088 columns. Since the center of the first pixel in Python is (0, 0) and the array size is even, the center of the detector is (x, y) = (2043.5, 2043.5). Also note that GWCS assumes inputs in the order (x, y) and not the Pythonic form (y, x).

In [None]:
print(gwcs(2043.5, 2043.5))

Likewise, we can convert from celestial coordinates to pixel coordinates using the inverse transform via the `.invert()` method. For example, using a slightly different position still within this detector:

In [None]:
print(gwcs.invert(270.8719, -0.164399))

Notice that `gwcs` assumed our inputs were the right ascension and declination, respectively, in degrees. If we want to be more specific, then the `gwcs` object can also take as input an `astropy.coordinates.SkyCoord` object:

In [None]:
cdt = SkyCoord(270.8719, -0.164399, unit='deg')
print(gwcs.invert(cdt))

## Reading Roman data using the ASDF library

We now illustrate how to read Roman WFI data using the basic `asdf` library.

The main avenue to read a generic ASDF file is via the `open` method in the `asdf` package. This returns an `AsdfObject` object.

In [None]:
asdf_dir_uri = 's3://roman-sci-test-data-prod-summer-beta-test/'
fs = s3fs.S3FileSystem()

asdf_file_uri_l2 = asdf_dir_uri + 'AAS_WORKSHOP/r0003201001001001004_0001_wfi01_f106_cal.asdf'

with fs.open(asdf_file_uri_l2, 'rb') as fb:
    f = asdf.open(fb).copy()

Another useful method to explore the contents of an ASDF file is the `.tree` attribute:

In [None]:
pprint(f.tree) # This cell will print a lot of information, please feel free to skim or skip

For WFI ASDF files, the three high-level blocks are: 
* `asdf_library`: It contains information about the `asdf` library used to create the file.
* `history`: It contains metadata information about the extensions used to create the file.
* `roman`: This block contains Roman data and metadata.

Within the `roman` block, the `data` block contains the data, which corresponds to an uncalibrated ramp in L1 products, a calibrated rate image in L2 products, and a mosaic image in L3 products.

Other interesting data blocks are: 
- `meta`: metadata information
- `err`: estimated uncertainties
- `dq`: data quality flags

For more information about these data blocks and Level 2 data products, please visit the [RDox pages on data levels and products](https://roman-docs.stsci.edu/data-handbook-home/wfi-data-format/data-levels-and-products#DataLevelsandProducts-level2).

We further showcase the usage of the `asdf` basic library below using a L1 file.

In [None]:
asdf_file_uri_l1 = asdf_dir_uri + 'AAS_WORKSHOP/r0003201001001001004_0001_wfi01_f106_uncal.asdf'

with fs.open(asdf_file_uri_l1, 'rb') as fb:
    g = asdf.open(fb).copy()

In [None]:
g.info()

Loading the data follows exactly the same procedure as above. When working with L1 data, notice that the `data` block is now a cube of size (N, 4096, 4096), where N is the number of resultants up-the-ramp. A resultant is either a single read or the arithmetic mean of multiple reads of the WFI detectors. The L1 data array also contains the 4-pixel reference pixel border that is trimmed during processing from L1 to L2. As previously mentioned, the L1 `data` array is in units of DN.

Let's plot the value of a single pixel up-the-ramp:

In [None]:
plt.figure(figsize=(6, 6), layout='tight')
plt.title('Up-the-ramp samples for pixel 1000, 1000')
plt.plot(g['roman']['data'][:, 1000, 1000])
plt.xlabel('Resultant number', fontsize=16)
plt.ylabel('Pixel value [DN]', fontsize=16);

The L1 data array contains all the uncalibrated resultants that, after processing, yield the L2 rate images.

The ASDF tree shows another section of the file called `romanisim` that contains information about the simulation that created the L1 file. This section is not part of the datamodel definition in `roman_datamodels`, therefore it cannot be accessed with the dot notation. Instead, we can access it, and any other additional information not stored by the datamodel definition, using the ASDF tree and bracket notation:

In [None]:
g.tree['romanisim']

Similarly, we can access the previously mentioned history section of the file using the ASDF tree and bracket notation to find some package version information that may be useful to us. This includes, for example, the `roman_datamodels` version used to create the file.

In [None]:
g.tree['history']

During Roman development, you may have an outdated version of a file that does not conform to the installed version of `roman_datamodels`, but you may want to open the file anyway. This may be to just get something out of the file that you need, or you may want to try manually fixing the file to conform to the latest schema. In any case, you can still open the file with `asdf.open()` if you disable the schema validation like so:

In [None]:
asdf_file_uri_l1 = asdf_dir_uri + 'AAS_WORKSHOP/r0003201001001001004_0001_wfi01_f106_uncal.asdf'

with fs.open(asdf_file_uri_l1, 'rb') as fb:
    with asdf.config_context() as cfg:
        cfg.validate_on_read = False
        af = asdf.open(fb)

Note that if your file does not conform to the installed version of `roman_datamodels`, then you will need to leave it as an `AsdfFile` object and not try to pass it to `roman_datamodels.open()`.

## Additional Resources

For more information about Roman data products and additional resources please consider visiting the links below:

- [Roman User Documentation -- RDox](https://roman-docs.stsci.edu/)
- [MAST](https://archive.stsci.edu)
- [ASDF python API](https://asdf.readthedocs.io/en/latest/)
- [ASDF standard](https://asdf-standard.readthedocs.io/)

## About this Notebook

**Author:** Javier Sánchez, William Schultz, Tyler Desjardins 

**Updated On:** 2025-05-26

***

[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 