# How to open Roman Data Files (ASDF)

***

## Imports

- *numpy* to handle array operations
- *asdf* to handle ASDF input/output
- *roman_datamodels* to handle input/output and validation of data models
- *matplotlib.pyplot* for plotting data
- *astropy.units* to handle units
- *astropy.time* to handle time

In [None]:
%matplotlib widget
import numpy as np
import asdf
import roman_datamodels as rdm
import matplotlib.pyplot as plt
%config InlineBackend.figure_format='retina'
import astropy.units as u
import astropy.time
from astropy.visualization import quantity_support
quantity_support()
from pprint import pprint

## Introduction
The main goal of this notebook is to illustrate how to open and handle Roman data.

Roman data are stored in [Advanced Scientific Data Format (ASDF)](https://asdf-standard.readthedocs.io/) files.

ASDF files are files with human-readable hierarchical metadata structure, with binary array data. ASDF data structure can be automatically validated.

There are tools to interact with ASDF files in Python, Julia, C/C++, and IDL. In this example we focus on the Python interface.


***

## Quick start



We will start by illustrating how to read the data using the basic `asdf` library.

The main avenue to read an ASDF file is via the `open` method in the `asdf` package. This returns an `AsdfObject` object.

In [None]:
# This is just an example, but will need to change in the Roman Science Portal
path_l2 = '/grp/roman/SCIENCE_PLATFORM_DATA/ROMANISIM/DENSE_REGION/R0.5_DP0.5_PA0/r0000101001001001001_01101_0001_WFI01_cal.asdf'

In [None]:
f = asdf.open(path_l2)

A high-level summary of the file can be retrieved by using the `info()` method:

In [None]:
f.info()

Another useful method to explore the contents of an ASDF file is the `.tree` attribute

In [None]:
pprint(f.tree) # This cell will print a lot of information, please feel free to skim or skip

Note that, by default, `asdf.open()` does not load the data in memory unless told explicitly, which makes opening ASDF files a quick operation. 

We do have information about the shape and type of the different data blocks, but we don't have access to the data until we load them. We can load the data blocks that we want by either instantiating them, or by setting `lazy_load = False`.

An ASDF object can be used, effectively, like a nested dictionary. Each block can be explored via the `.keys()` attribute. 

In [None]:
f['roman']["meta"]["wcs"]

In [None]:
f.keys()

For Roman ASDF files, the three high-level blocks are: 
* `asdf_library`: It contains information about the `asdf` library used to create the file.
* `history`: It can contains metadata information about the extensions used to create the file.
* `roman`: This is the block containing Roman's data and metadata information. Different data products will have different blocks under this one.

Within the `roman` block the `data` block is the one containing the data image, which is calibrated in the case of level-2 and level-3 data products, and uncalibrated in the case of level-1 products.

Other potentially interesting data blocks are `meta` containing the metadata information, `err` containing estimated uncertainties, and `dq` containing data quality flags. ##TODO add links to documents on these.

For a level-2 image, the list of blocks includes:

In [None]:
for key in f['roman'].keys():
    print(key)

We focus on the `data` block, containing the science image of interest.

In [None]:
img = f['roman']['data']

In [None]:
type(img)

Note that Roman images are expressed as `astropy.Quantity` objects, often with units attached to them. This functionality can only be used in Python. However, the images will still be loaded correctly using other languages (although the units will not automatically load).

Using `astropy.Quantity` objects can help prevent confusion with units. Sometimes it is convenient to just handle the plain data arrays, and these are stored in the `value` attribute of the Quantity object.

In [None]:
print('Exploring the values of `img`: ', img.value)
print('Exploring the data type of `img.value`: ', type(img.value))
print('Exploring the units of `img`: ', img.unit)
print('Exploring the type of `img.units: ', type(img.unit))

A typical operation with images is visualization. However, `matplotlib`'s `imshow` has a hard time rendering images with units. So we just plot the values.

In [None]:
#NOTE: probably choose a different "prettier" image

from matplotlib.colors import LogNorm  # Optional import, this is to show image in log scale
from astropy.visualization import (MinMaxInterval, ZScaleInterval, SqrtStretch, SinhStretch,
                                   ImageNormalize)  # optional imports to show image with nice stretch

plt.figure(figsize=(12, 12), layout='tight')
norm = ImageNormalize(img, interval=ZScaleInterval(),
                      stretch=SinhStretch())
plt.imshow(img.value, origin='lower', norm=norm)
plt.colorbar(label=f'{img.unit}')
plt.xlabel('X [px]', fontsize=16)
plt.ylabel('Y [px]', fontsize=16)
plt.title('Image data');

And we can check things like a 1D histogram

In [None]:
plt.figure(figsize=(12, 6), layout='tight')
plt.hist(img.flatten(), histtype='step', range=(-0.6, 0.6), bins=300);
plt.xlabel(f'Pixel value [{img.unit}]', fontsize=16)
plt.ylabel('Pixels/bin', fontsize=16);

We can explore other image blocks, for example, the data quality (DQ) flags. These flags are summarized [here](https://roman-pipeline.readthedocs.io/en/latest/roman/references_general/references_general.html#data-quality-flags).

In [None]:
plt.figure(figsize=(12, 12), layout='tight')
plt.imshow(f['roman']['dq'], origin='lower')
plt.colorbar(label='DQ')
plt.xlabel('X [px]', fontsize=16)
plt.ylabel('Y [px]', fontsize=16)
plt.title('DQ flags');

Let's take a look at these values. The DQ values are the result of the sum of all flags/bits activated during processing.

In [None]:
unique_dq = np.unique(f['roman']['dq'])

In [None]:
unique_dq

In [None]:
for uu in unique_dq:
    br = np.binary_repr(uu)
    print("------------")
    print('Flag', uu)
    for ii, cc in enumerate(br[::-1]):
        if int(cc)==1:
            print('Bits on:', ii, 2**ii)

## Exploring metadata

One of the advantages of ASDF is its extendability, and the ability to store human-readable hierarchical metadata. Let's further explore the metadata in our `roman` data block.

In [None]:
meta = f['roman']['meta']  # This way we get a dictionary

In [None]:
type(meta)

In [None]:
meta # Expect a long-ish output here

We retrieved the `meta` datablock as a dictionary, which contains a collection of dictionaries. We iterate over its keys to see what it contains

In [None]:
for key in meta.keys():
    print(key)

And we can continue going deeper in the tree

In [None]:
for key in meta['aperture'].keys():
    print(key)

Alternatively, one can retrieve the data blocks as `stnode._node.DNode` objects (it requires `roman_datamodels`)

In [None]:
meta2 = f['roman'].meta

In [None]:
type(meta2)

And one can go deeper in the tree following the same approach

In [None]:
ap = meta2.aperture

In [None]:
type(ap)

The advantage of this approach is that you still have access to the schema of each node

In [None]:
pprint(ap.get_schema())

### Taking advantage of `astropy.time.Time` objects in the metadata

Another new feature in the Roman data is that times in the metadata are stored as `astropy.time.Time` objects, which have a lot of convenience methods to change to different reference systems and time formats. We illustrate here a few examples and for a more comprehensive view of `astropy.time` please check the documentation [here](https://docs.astropy.org/en/stable/time/).

In [None]:
start_time = meta2['exposure']['start_time']
print('Start time of the exposure:', start_time, '; datatype:', type(start_time))

We can express this start time as, for example, an MJD very easily:

In [None]:
start_time.mjd

We can use `Time` objects and operate with them. For example, we can get the exposure length by just subtracting the start time from the end time:

In [None]:
end_time = meta2['exposure']['end_time']
exp_len = end_time - start_time

And then express the exposure length in different units:

In [None]:
print('Exposure length in seconds:', exp_len.to(u.s))
print('Exposure length in days:', exp_len.to(u.day))
print('Exposure length in years:', exp_len.to(u.year))

### Plotting with world coordinates

Roman uses generalized World Coordinate System standard ([GWCS](https://gwcs.readthedocs.io)). The WCS can be found in the `wcs` key under the `meta' block.

In [None]:
gwcs = f['roman']['meta']['wcs']
pprint(gwcs)

The WCS can retrieved as an `gwcs` object, which is built upon and is compatible with `astropy.wcs` utilities.

In [None]:
print(type(gwcs))

The `gwcs` object can then be conveniently used to plot image with world coordinates.

In [None]:
plt.figure(figsize=(12, 12), layout='tight')
plt.subplot(projection=gwcs)
norm = ImageNormalize(img, interval=ZScaleInterval(),
                      stretch=SinhStretch())
plt.imshow(img.value, origin='lower', norm=norm)
plt.colorbar(label=f'{img.unit}')
plt.grid(color='white', ls='solid')
plt.xlabel('RA', fontsize=16)
plt.ylabel('DEC', fontsize=16)
plt.title('Image data');

## Exploring L1 data

In the previous section we illustrated how to use `asdf` to read a level-2 image, which trims away the reference pixels and the 33rd amplifier data. In this section, we show some example usage of level-1 (raw) data.

In [None]:
path_l1 = '/grp/roman/SCIENCE_PLATFORM_DATA/ROMANISIM/DENSE_REGION/R0.5_DP0.5_PA0/r0000101001001001001_01101_0001_WFI01_uncal.asdf'
g = asdf.open(path_l1)

In [None]:
g.info()

Loading the data follows exactly the same procedure as before, but we see some extra data block: `amp33`, which contains the data from the 33rd amplifier. Additionally, the images are now 4096 $\times$ 4096 pixels, which is different than the previous images with size 4088 $\times$ 4088 pixels. On top of that, our `data` array is now a 3D datacube rather than a 2D image, and the units are different ($\mathrm{DN}$ instead of $\mathrm{DN}/\mathrm{s}$)

In [None]:
plt.figure(figsize=(6, 6), layout='tight')
plt.title('Up-the-ramp samples for pixel 1000, 1000')
plt.plot(g['roman']['data'][:, 1000, 1000])
plt.xlabel('Resultant number', fontsize=16)
plt.ylabel('Pixel value [DN]', fontsize=16);

The datacube in the level-1 data contains all resultant uncalibrated images that after processing yield the ramp images in level-2 images.

## Reading data using `roman_datamodels`

All Roman data products conform to one of the data models described by the [`roman_datamodels`](https://roman-datamodels.readthedocs.io/en/latest/) package.

This package provides the `asdf` library information to validate the files, and utilities to read and save data conforming to the official data models.

We will illustrate how to use `roman_datamodels` in order to load data from an `asdf` file.

In [None]:
data_rdm = rdm.open(path_l1)

In [None]:
print(type(data_rdm))

`roman_datamodels` understood our level-1 data and identified it as a `ScienceRaw` model, which we explore further below.

Again, the general `.info` method gives us information about the data.

In [None]:
data_rdm.info()

In [None]:
for key in data_rdm.keys():
    print(key)

Note that, despite the key `roman` being shown by the `.info` method, the only keys present in the `ScienceRaw` object are those inside of the `roman` group displayed! This is because `data_rdm` is not an `AsdfObject` anymore, but a `ScienceRawModel` object!

But we can still retrieve its data blocks easily by instantiating its corresponding attributes/nodes or by using the keys as dictionary keys. The former method will yield the corresponding `roman_datamodels` node, whereas the latter will yield a dictionary.

In [None]:
type(data_rdm.meta), type(data_rdm['meta'])

In [None]:
mean_values = np.mean(data_rdm.data.value, axis=(1, 2))
stdev = np.std(data_rdm.data.value, axis=(1, 2)) # standard deviation
npix = data_rdm.data.shape[1]*data_rdm.data.shape[2] # number of pixels
resultant = np.arange(1, mean_values.shape[0]+1)
plt.figure(figsize=(6, 6), layout='tight')
plt.errorbar(resultant, mean_values, stdev/np.sqrt(npix), marker='o', ls='none', fillstyle='none')
plt.xlabel('Resultant number', fontsize=16)
plt.ylabel('Mean pixel value with standard errors [DN]', fontsize=16);

## Exercises
Potentially delete

## Aditional Resources

For more information about Roman data products and additional resources please consider visiting the links below:

- [Roman User Documentation -- RDox](https://roman-docs.stsci.edu/)
- [MAST](https://archive.stsci.edu)
- [ASDF python API](https://asdf.readthedocs.io/en/latest/)
- [ASDF standard](https://asdf-standard.readthedocs.io/)

## About this notebook
Let the world know who the author of this great notebook is! If possible/appropriate, include a contact email address for users who might need support (e.g. archive@stsci.edu)

**Author:** Javier Sánchez, Associate Scientist.  
**Updated On:** 2024-05-23

***

[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 