# How to open Roman Data Files (ASDF)

***

## Kernel Information and Read-Only Status

To run this notebook, please select the "Roman Calibration" kernel at the top right of your window.

This notebook is read-only. You can run cells and make edits, but you must save changes to a different location. We recommend saving the notebook within your home directory, or to a new folder within your home (e.g. <span style="font-variant:small-caps;">file > save notebook as > my-nbs/nb.ipynb</span>). Note that a directory must exist before you attempt to add a notebook to it.

## Imports

- *numpy* for array operations
- *asdf* for ASDF input/output
- *roman_datamodels* to handle input/output and validation of data models
- *matplotlib.pyplot* for plotting data
- *astropy.units* to handle units
- *astropy.time* to handle time
- *astropy.coordinates* to handle celestial coordinates
- *pprint* for enhanced printing

In [None]:
%matplotlib inline
import numpy as np
import asdf
import roman_datamodels as rdm
import matplotlib.pyplot as plt
import astropy.units as u
import astropy.time
from astropy.coordinates import SkyCoord
from pprint import pprint
import s3fs

## Introduction

The main goal of this notebook is to illustrate how to open and handle Roman Wide Field Instrument (WFI) data. WFI data are stored in [Advanced Scientific Data Format (ASDF)](https://asdf-standard.readthedocs.io/) files, which combine human-readable hierarchical metadata structure with binary array data. ASDF files are self-validating using pre-defined schema.

There are tools to interact with ASDF files in Python, Julia, C/C++, and IDL. In this example we focus on the Python interface.

Roman ASDF files can be opened and manipulated using two main approaches: 1. Using the `roman_datamodels` library, and 2. using the `asdf` library.

Both approaches should allow accessing the full data. Using `roman_datamodels` has as an advantage that the different data blocks are loaded as `stnode`-based objects, and gives us access to their methods. The `asdf` library, on the other hand, loads the data blocks as they were serialized in disk, which loses some of the `roman_datamodels` capabilities, but can allow more flexibility. We illustrate the two approaches in this notebook, and start showcasing loading via `roman_datamodels`.

Additional information about ASDF in the context of Roman can be found in RDox: https://roman-docs.stsci.edu/data-handbook-home/wfi-data-format.

**Note**: This notebook assumes familiarity with Python, Python dictionaries, and Jupyter notebooks, as well as some basic familiarity with `matplotlib`, `numpy`, and `astropy`. 

***

## Quick start

All Roman data products conform to one of the data models described by the [`roman_datamodels`](https://roman-datamodels.readthedocs.io/en/latest/) package. This package wraps the `asdf` library and provides utilities to read and save data conforming to the official data models. We illustrate how to use `roman_datamodels` to load data from an ASDF file containing simulated Roman data.

In [None]:
asdf_dir_uri = 's3://roman-sci-test-data-prod-summer-beta-test/'
fs = s3fs.S3FileSystem()

asdf_file_uri_l2 = asdf_dir_uri + 'ROMANISIM/DENSE_REGION/R0.5_DP0.5_PA0/r0000101001001001001_01101_0001_WFI01_cal.asdf'

with fs.open(asdf_file_uri_l2, 'rb') as fb:
    f = rdm.open(fb).copy()

A high-level summary of the file can be retrieved by using the `info()` method:

In [None]:
f.info(max_rows=30)

We have limited the number of rows printed to 30, but if you want to see all rows, you can change that number to your liking or to `None` in order to see all nodes.

Note that, by default, the `open()` method does not load the data in memory unless told to do so explicitly, which makes opening ASDF files a quick operation. 

At this point, we have information about the shape and type of the different data blocks, but we don't have access to the data until we load them. We can either load the data blocks by instantiating them or by setting `lazy_load = False`.

An ASDF object can be used, effectively, like a nested dictionary. Each block can be explored via the `.keys()` attribute. 

In [None]:
pprint(f.keys())

For a level-2 image, the list of blocks includes:

In [None]:
for key in f.keys():
    print(key)

We focus on the `data` block, containing the science image of interest.

In [None]:
img = f['data']

In [None]:
type(img)

Note that Roman images are expressed as `astropy.Quantity` objects, often with units attached to them. This functionality can only be used in Python. However, the images will still be loaded correctly using other languages (although the units will not automatically load).

Using `astropy.Quantity` objects can help prevent confusion with units.  However, sometimes it is convenient to handle the images as NumPy NDArrays. These are stored in the `.value` attribute of the `astropy.Quantity` object. For example, a typical operation with images is visualization; however, `matplotlib`'s `imshow` cannot render `astropy.Quantity` objects, in which case it is necessary to use the `.value` attribute. For more information on how to visualize data in ASDF files, see the Data Visualization notebook tutorial.

In [None]:
print('Exploring the values of `img`: ', img.value)
print('Exploring the data type of `img.value`: ', type(img.value))
print('Exploring the units of `img`: ', img.unit)
print('Exploring the type of `img.units: ', type(img.unit))

As with array data stored in other file types, we can perform analyses on the arrays in memory. For example, we can check the image content by building a 1-D historgram of the its values:

In [None]:
plt.figure(figsize=(12, 6), layout='tight')
plt.hist(img.value.flatten(), histtype='step', range=(-0.6, 0.6), bins=300);
plt.xlabel(f'Pixel value [{img.unit}]', fontsize=16)
plt.ylabel('Pixels/bin', fontsize=16);

We can explore other data blocks, for example, the data quality (DQ) flags. These flags are summarized [here](https://roman-pipeline.readthedocs.io/en/latest/roman/references_general/references_general.html#data-quality-flags). Let's take a look at DQ values, which are the bitwise sum of all DQ bits flagged during data processing.

In [None]:
unique_dq = np.unique(f['dq'])

In [None]:
unique_dq

In [None]:
for uu in unique_dq:
    br = np.binary_repr(uu)
    print("------------")
    print('Flag', uu)
    for ii, cc in enumerate(br[::-1]):
        if int(cc)==1:
            print('Bits on:', ii, 2**ii)

## Exploring metadata

One of the advantages of ASDF is its extensibility, and the ability to store human-readable hierarchical metadata. Let's further explore the metadata.

In [None]:
meta = f['meta']  # This way we get a dictionary

In [None]:
type(meta)

In [None]:
meta # Expect a long-ish output here

We retrieved the `meta` datablock as a dictionary, which contains a collection of dictionaries. We iterate over its keys to see what they contain:

In [None]:
for key in meta.keys():
    print(key)

As shown above, the `meta` data block contains a lot of useful metadata information. Two of the most typical keys, for example, are the `wcs` key containing information about the World Coordinate System (see below) and also the `photometry` key containing information about how to transform units from instrumental (DN / sec) to physical (MJy / sr).

We continue going deeper in the metadata tree. In this case, we select the `aperture` key.

In [None]:
for key in meta['aperture'].keys():
    print(key)

Alternatively, if you have opened the file with `roman_datamodels`, then you can retrieve the data blocks as `stnode._node.DNode` objects:

In [None]:
meta2 = f.meta

In [None]:
type(meta2)

And you can go deeper in the metadata tree as shown below:

In [None]:
ap = meta2.aperture

In [None]:
type(ap)

The advantage of this latest approach is that you have access to the schema of each node.

In [None]:
pprint(ap.get_schema())

### Taking advantage of `astropy.time.Time` objects in the metadata

Another feature in WFI ASDF metadata is the storage of times as `astropy.time.Time` objects, which provide numerous convenient methods for converting to different reference systems and formats. We illustrate here a few examples and for a more comprehensive view of `astropy.time` please check the documentation in https://docs.astropy.org/en/stable/time/.

In [None]:
start_time = meta2['exposure']['start_time']
print('Start time of the exposure:', start_time, '; datatype:', type(start_time))

We can convert this start time to MJD very easily:

In [None]:
start_time.mjd

We can use `Time` objects and operate with them. For example, we can get the exposure length by just subtracting the start time from the end time:

In [None]:
end_time = meta2['exposure']['end_time']
exp_len = end_time - start_time

And then express the exposure length in different units:

In [None]:
print('Exposure length in seconds:', exp_len.to(u.s))
print('Exposure length in days:', exp_len.to(u.day))
print('Exposure length in years:', exp_len.to(u.year))

### Accessing WCS Information

Roman uses Generalized World Coordinate System standard ([GWCS](https://gwcs.readthedocs.io)). The WCS can be found in the `wcs` key within the `meta` block.

In [None]:
gwcs = f['meta']['wcs']
pprint(gwcs)

The WCS can be retrieved as a `gwcs` object, which is built upon and is compatible with `astropy.wcs` utilities.

In [None]:
print(type(gwcs))

The `gwcs` object can be used to convert between image pixel and sky coordinates.

**Note:** the `gwcs` object uses Python zero-indexing, therefore the center of the first pixel in Python is (0, 0), while in the formal definition of the WFI science coordinate system the center of the bottom-left pixel is (1, 1). More information about the Roman coordinate systems can be found following the link [here](https://roman-docs.stsci.edu/simulation-tools-handbook-home/simulation-development-utilities/pysiaf-for-roman).

In this example, let's convert the central pixel position of the detector to the corresponding right ascension and declination on the sky. The center of the L2 image array in the science coordinate frame is (x, y) = (2044.5, 2044.5) pixels (note that the 4-pixel reference border was removed during processing). Recall that we must subtract 1 from both axes to convert to Python's zero-indexed system: 

In [None]:
print(gwcs(2043.5, 2043.5))

Likewise, we can convert from celestial coordinates to pixel coordinates using the inverse transform via the `.invert()` method. For example, using a slightly different position still within this detector:

In [None]:
print(gwcs.invert(0.43, 0.53))

Notice that `gwcs` assumed our inputs were the right ascension and declination, respectively, in degrees. If we want to be more specific, then the `gwcs` object can also take as input an `astropy.coordinates.SkyCoord` object:

In [None]:
cdt = SkyCoord(0.43, 0.53, unit='deg')
print(gwcs.invert(cdt))

## Reading Roman data using the basic ASDF library

We now illustrate how to read Roman WFI data using the basic `asdf` library.

The main avenue to read a generic ASDF file is via the `open` method in the `asdf` package. This returns an `AsdfObject` object.

In [None]:
asdf_dir_uri = 's3://roman-sci-test-data-prod-summer-beta-test/'
fs = s3fs.S3FileSystem()

asdf_file_uri_l2 = asdf_dir_uri + 'ROMANISIM/DENSE_REGION/R0.5_DP0.5_PA0/r0000101001001001001_01101_0001_WFI01_cal.asdf'

with fs.open(asdf_file_uri_l2, 'rb') as fb:
    f = asdf.open(fb).copy()

Another useful method to explore the contents of an ASDF file is the `.tree` attribute:

In [None]:
pprint(f.tree) # This cell will print a lot of information, please feel free to skim or skip

For WFI ASDF files, the three high-level blocks are: 
* `asdf_library`: It contains information about the `asdf` library used to create the file.
* `history`: It contains metadata information about the extensions used to create the file.
* `roman`: This block contains Roman data and metadata.

Within the `roman` block, the `data` block contains the data, which corresponds to an uncalibrated ramp in Level 1 products, a calibrated rate image in Level 2 products, and a mosaic image in Level 3 products.

Other interesting data blocks are: 
- `meta`: metadata information
- `err`: estimated uncertainties
- `dq`: data quality flags

For more information about these data blocks and Level 2 data products we recommend visiting the Roman Data Handbook following the link [here](https://roman-docs.stsci.edu/data-handbook-home/wfi-data-format/data-levels-and-products#DataLevelsandProducts-level2).

We further showcase the usage of the `asdf` basic library below using a Level 1 file.

## Exploring Level 1 Data

In the previous section we illustrated how to use `asdf` to read a Level 2 image, which trims away the reference pixels and the 33rd amplifier (reference pixel) data. In this section, we will demonstrate some examples of using Level 1 data.

In [None]:
asdf_file_uri_l1 = asdf_dir_uri + 'ROMANISIM/DENSE_REGION/R0.5_DP0.5_PA0/r0000101001001001001_01101_0001_WFI01_uncal.asdf'

with fs.open(asdf_file_uri_l1, 'rb') as fb:
    g = asdf.open(fb).copy()

In [None]:
g.info()

Loading the data follows exactly the same procedure as above. Comparing the data structures, we notice an extra data block: `amp33`, which contains the data from the 33rd amplifier. Additionally, the Level 1 arrays have sizes (4096, 4096) pixels, different from the previous Level 2 image size of (4088, 4088) pixels. On top of that, our `data` array is now a 3-D datacube rather than a 2-D image, in units of DN rather than DN / sec.

Let's plot the value of a single pixel up-the-ramp:

In [None]:
plt.figure(figsize=(6, 6), layout='tight')
plt.title('Up-the-ramp samples for pixel 1000, 1000')
plt.plot(g['roman']['data'][:, 1000, 1000])
plt.xlabel('Resultant number', fontsize=16)
plt.ylabel('Pixel value [DN]', fontsize=16);

The Level 1 datacube contains all the uncalibrated resultants that, after processing, yield the Level 2 rate images.

We can pass the and `AsdfObject` to `roman_datamodels.open` as well:

In [None]:
data_rdm = rdm.open(g)

In [None]:
print(type(data_rdm))

`roman_datamodels` understood our Level 1 data and identified it as a `ScienceRaw` model, which we explore further below.

Once more, we can use the general `.info()` method to gather information about the data.

In [None]:
data_rdm.info()

In [None]:
for key in data_rdm.keys():
    print(key)

Note that despite the key `roman` being shown by the `.info()` method, the only keys displayed in the `ScienceRaw` object are those inside the `roman` group. This is because `data_rdm` is no longer an `AsdfObject`, but a `ScienceRawModel` object.!

We can still retrieve its data blocks easily by instantiating its corresponding attributes/nodes or by using the keys as dictionary keys. The former method will yield the corresponding `roman_datamodels` node, whereas the latter will yield a dictionary.

In [None]:
type(data_rdm.meta), type(data_rdm['meta'])

## Aditional Resources

For more information about Roman data products and additional resources please consider visiting the links below:

- [Roman User Documentation -- RDox](https://roman-docs.stsci.edu/)
- [MAST](https://archive.stsci.edu)
- [ASDF python API](https://asdf.readthedocs.io/en/latest/)
- [ASDF standard](https://asdf-standard.readthedocs.io/)

## About this notebook

**Author:** Javier Sánchez, Andra Stroe

**Updated On:** 2024-05-15

***

[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 