<a id="title_ID"></a>
# JWST Data Products: Uncalibrated Data 
--------------------------------------------------------------
**Author**: Alicia Canipe (acanipe@stsci.edu) | **Latest update**: March 2, 2021.

## Table of contents
1. [Introduction](#intro)
   1. [Resources](#resources)   
2. [Data in MAST](#mast)
3. [Example data for this exercise](#example)
4. [Examining an exposure with astropy](#astro)
   1. [Format](#astro-format)
   2. [Metadata](#astro-meta)
   3. [Vizualizing data](#astro-viz)
5. [A different perspective: JWST data models](#model) 
   1. [Current models](#list)
   1. [Format](#model-format)
   2. [Metadata](#model-meta)
6. [Other ways to use the models](#use)
   1. [Create data from scratch](#scratch)
   2. [Create data from a file](#file)
7. [Simulations](#simulations)

1.<font color='white'>-</font>Introduction <a class="anchor" id="intro"></a>
------------------

Welcome to the first module about JWST data products! JWST is a complex observatory with four instruments and many modes, so there is a lot to learn about about the different types of data and their formats, and the tools available to help observers examine and analyze their data. In this session, we will examine JWST data products and how they change as they go through the pipeline. We will start with uncalibrated data and proceed through the processing stages of the JWST data calibration pipeline (hereafter, the pipeline) in separate modules, highlighting important notes along the way. Detailed information about how to run the pipeline will be saved for the next couple of JWebbinars.

Most JWST science data products are in FITS format, which should be familiar to observers. However, there are ancillary input and output files for the pipeline that are not; there are JSON files (used to associate different observations), ASDF files (typically pipeline configuration files), and ECSV files (for ASCII table data, such as catalogs). 

In the following sections, we will begin by exploring an example uncalibrated JWST observation to get a sense of the format, and then we will demonstrate a very important tool designed to simplify the complexity of JWST data: data models. In the notebook for the next module, we will explore the input and output data products for the first stage of processing in the pipeline. 

### A.<font color='white'>-</font>Resources<a class="anchor" id="resources"></a>


Visit (the webpage for JWebbinars - TBD) to find resources for:
* The Mikulski Archive for Space Telescopes (MAST) 
* JWST Documentation (JDox) for JWST data products
* The most up-to-date information about JWST data products in the pipeline readthedocs
* Pipeline roadmaps for when to recalibrate your data

Before we begin, import the libraries used in this notebook:

In [None]:
# Module with functions to get information about objects:
import os
import inspect

# Numpy library:
import numpy as np

# Astropy tools:
from astropy.utils.data import download_file
from astropy.io import fits

# The JWST models:
from jwst import datamodels

And set up matplotlib for plotting:

In [None]:
import matplotlib.pyplot as plt
import matplotlib as mpl

# Use this version for non-interactive plots (easier scrolling of the notebook)
# %matplotlib inline

# Use this version if you want interactive plots
%matplotlib notebook

# These gymnastics are needed to make the sizes of the figures
# be the same in both the inline and notebook versions
%config InlineBackend.print_figure_kwargs = {'bbox_inches': None}

mpl.rcParams['savefig.dpi'] = 80
mpl.rcParams['figure.dpi'] = 80

[Top of Page](#title_ID)

2.<font color='white'>-</font>Data in MAST <a class="anchor" id="mast"></a>
------------------

The JWST Data Management System (DMS) produces many products for each JWST observation, including the science files generated by the pipeline. The exact type and number of products depends on the instrument, its configuration, and observing mode. Observers should consult the [MAST documentation for information about standard data products](https://jwst-docs.stsci.edu/obtaining-data/data-discovery#DataDiscovery-Dataproducttypes). 

Of the many different data products produced by the calibration pipeline, most observers will find the science data files in MAST to be sufficient for their analysis. However, other data products such as guide star data, associations, and engineering data are also available. 

Standard science data files include:

* [uncalibrated raw data](https://jwst-pipeline.readthedocs.io/en/stable/jwst/data_products/science_products.html#uncalibrated-raw-data-uncal), identified by the suffix ```uncal```
* [countrate data](https://jwst-pipeline.readthedocs.io/en/stable/jwst/data_products/science_products.html#countrate-data-rate-and-rateints) produced by applying the Stage 1 (detector-level) corrections in order to compute count rates from the original accumulating signal ramps, identified by the suffix ```rate``` or ```rateints```
* [calibrated single exposures](https://jwst-pipeline.readthedocs.io/en/stable/jwst/data_products/science_products.html#calibrated-data-cal-and-calints), identified by the suffix ```cal```
* [resampled and/or combined exposures](https://jwst-pipeline.readthedocs.io/en/stable/jwst/data_products/science_products.html#resampled-2-d-data-i2d-and-s2d), identified by the suffixes ```i2d``` or ```s2d```
* [extracted spectroscopic 1D data](https://jwst-pipeline.readthedocs.io/en/stable/jwst/data_products/science_products.html#extracted-1-d-spectroscopic-data-x1d-and-x1dints), identified by the suffixes ```c1d```

In addition, there are also [several other products depending on the observing mode](https://jwst-pipeline.readthedocs.io/en/stable/jwst/data_products/science_products.html#source-catalog-cat), such as source and photometry catalogs, stacked PSF data, and NIRISS AMI derived data.  

[Top of Page](#title_ID)

3.<font color='white'>-</font>Example data for this exercise <a class="anchor" id="example"></a>
------------------

For this module, we will use an uncalibrated NIRCam simulated imaging exposure that is stored in Box (**note: this data isn't in the jwebbinar box**). Let's grab it:

In [None]:
main_box_url1 ="https://stsci.box.com/shared/static/5xsvwzwncpulnxobjxoodjihzeu5gyhf.fits" # NRC 1
# main_box_url1 ="https://stsci.box.com/shared/static/dpft0rvv2q9ga7gqa8wiw8ew2j1kgjp9.fits" # NRC 2
# main_box_url2 ="https://stsci.box.com/shared/static/ueb9amnwnm37dzyc9tvzkelbdniz3266.fits" # NIS

# Looking at NIRCam, but exposure is missing the INT_TIMES extension ... 
demo_file = download_file(main_box_url1)

# Save the file so that we can use it later
with fits.open(demo_file) as f:
    uncal_obs = "uncal_file_part1.fits"
    f.writeto(uncal_obs, overwrite=True)

Now, make sure we were able to successfully import the data into the notebook:

In [None]:
fits.info(uncal_obs)

[Top of Page](#title_ID)

4.<font color='white'>-</font>Examining an exposure with astropy<a class="anchor" id="astro"></a>
------------------

Many of you may be familiar with using [astropy](https://docs.astropy.org/en/stable/) to examine data. Here, we will take a look at the format and headers using standard ```astropy``` tools. 

### A.<font color='white'>-</font>Format<a class="anchor" id="astro-format"></a>

Below, we see the typical extensions in a raw JWST data file. All data related to the product are contained in one or more FITS IMAGE or BINTABLE extensions, and the header of each extension may contain keywords that are uniquely related to that extension.

* PRIMARY: The primary Header Data Unit (HDU) only contains header information, in the form of keyword records, with an empty data array (indicated by the occurence of NAXIS=0 in the primary header. Meta data that pertains to the entire product is stored in keywords in the primary header. Meta data related to specific extensions (see below) is stored in keywords in the headers of each extension.
* SCI: 4-D data array containing the raw pixel values. The first two dimensions are equal to the size of the detector readout, with the data from multiple groups (NGROUPS) within each integration stored along the 3rd axis, and the multiple integrations (NINTS) stored along the 4th axis.
* ZEROFRAME: 3-D data array containing the pixel values of the zero-frame for each integration in the exposure, where each plane of the cube corresponds to a given integration. Only appears if the zero-frame data were requested to be downlinked separately.
* GROUP: A table of meta data for some (or all) of the data groups.
* INT_TIMES: A table of begining, middle, and end time stamps for each integration in the exposure.
* ADSF: The data model meta data.

Additional extensions can be included for certain instruments and readout types. The [JWST software readthedocs](https://jwst-pipeline.readthedocs.io/en/latest/jwst/data_products/science_products.html) contains the most up-to-date information about JWST formats. 

In [None]:
fits.info(uncal_obs)

We can grab the data to examine it: 

In [None]:
science_data = fits.getdata(uncal_obs,1)

# or 

with fits.open(uncal_obs) as hdu:
    science_data = hdu['SCI'].data

In [None]:
science_data.shape

The science data shape here shows the number of integrations, groups, rows (pixels), and columns (pixels), which reflects the up-the-ramp readout (also referred to as MULTIACCUM) standardized readout sampling for all JWST detectors (read more in the [JWST User Documentation](https://jwst-docs.stsci.edu/understanding-exposure-times)). We'll talk about this more in the following sections. For now, let's look at the associated headers and other metadata.                                                    

### B.<font color='white'>-</font>Metadata<a class="anchor" id="astro-meta"></a>

Headers containing information about the observation and the data parameters be accessed the standard way:

In [None]:
primary_headers = fits.getheader(uncal_obs,0)
science_headers = fits.getheader(uncal_obs,1)

# or 

with fits.open(uncal_obs) as hdu:
    primary_headers = hdu['PRIMARY'].header
    science_headers = hdu['SCI'].header

In [None]:
print('Observation ID: ', primary_headers['OBS_ID'])
print('Instrument: ', primary_headers['INSTRUME'])
print('Exposure type: ', primary_headers['EXP_TYPE'])
print('Detector: ', primary_headers['DETECTOR'])

In [None]:
print('\nNumber of data dimensions: ', len(science_data.shape))
print('Number of integrations: ', primary_headers['NINTS'])
print('Number of groups: ', primary_headers['NGROUPS'])
print('Number of rows: ', primary_headers['SUBSIZE1'])
print('Number of columns: ', primary_headers['SUBSIZE2'])

# or 

print('\nNumber of data dimensions: ', science_headers['NAXIS'])
print('Number of integrations: ', science_headers['NAXIS4'])
print('Number of groups: ', science_headers['NAXIS3'])
print('Number of rows: ', science_headers['NAXIS2'])
print('Number of columns: ', science_headers['NAXIS1'])

Additional metadata is stored in the ASDF extension. This extension can be read using The Advanced Scientific Data Format (ASDF), which is a next-generation format for scientific data. ASDF is a tool for reading and writing ASDF files. More information about the ASDF file standard is in the [ASDF software readthedocs](https://asdf.readthedocs.io/en/stable/). The format has the following features:

* A hierarchical, human-readable metadata format (implemented using YAML)
* Numerical arrays are stored as binary data blocks which can be memory mapped. Data blocks can optionally be compressed.
* The structure of the data can be automatically validated using schemas (implemented using JSON Schema)
* Native Python data types (numerical types, strings, dicts, lists) are serialized automatically
* ASDF can be extended to serialize custom data types

Right now, you don't need to worry about ASDF too much. We'll talk about it more when we discuss configuration files and accessing the WCS information in the following modules. Below, we provide a simple example of how to access the ASDF extension:

In [None]:
with fits.open(uncal_obs) as hdu:
    asdf_metadata = hdu['ASDF'].header
    asdf_data = hdu['ASDF'].data    

In [None]:
asdf_metadata

In [None]:
asdf_data

### C.<font color='white'>-</font>Visualizing data<a class="anchor" id="astro-viz"></a>

In the previous section, we mentioned [up-the-ramp sampling](https://jwst-docs.stsci.edu/understanding-exposure-times) for IR detectors. During an integration, the detectors accumulate charge while being read out multiple times non-destructively according to readout patterns. The readout process is non-destructive, leaving charge unaffected and in place (charge is not transferred between pixels as in CCDs). After each integration, the pixels are read out a final time and then reset, releasing their charge. 

Multiple non-destructive *frames* are averaged into a *group*, depending on the readout pattern selected. Breaking exposures into multiple *integrations* is most useful for bright sources that would saturate in longer integrations.  

As such, the components of each up-the-ramp exposure are: 
* NINTS: number of integrations per exposure.
* NGROUPS: number of groups per integration.
* NFRAMES: number of frames per group (this parameter is encoded in the definition of the different readout patterns for the instruments). 

Let's select one integration for a particular pixel and examine the ramp. **Note**: this is uncalibrated data, so the detector effects are still present and the signal in each group will vary due to bias drift, reference pixel corrections, etc. not being performed yet.  

In [None]:
integration = 0
pixel_y = 598
pixel_x = 558
group = -1

In [None]:
groups = np.arange(0, primary_headers['NGROUPS'])
signal_adu = science_data[integration, :, pixel_y, pixel_x]

fig = plt.figure(figsize=(8, 8))
ax = plt.subplot()
plt.plot(groups, signal_adu, label='Pixel ('+str(pixel_y)+','+str(pixel_x)+')')
plt.xlabel('Groups')
plt.ylabel('Signal (DN)')
fig.tight_layout()
plt.subplots_adjust(left=0.15)

We can also visualize the full NIRCam array for the last group in our integration, below. Again, this is a raw exposure, so none of the detector effects have been removed. The four amplifiers of the detector are visible, along with other features (e.g., an epoxy void region). 

In [None]:
fig = plt.figure(figsize=(8, 8))
ax = plt.subplot()
plt.imshow(science_data[integration, group, :, :], origin='lower', cmap='gray', vmin=4000, vmax=12000)
plt.xlabel('Pixel column')
plt.ylabel('Pixel row')
fig.tight_layout()
plt.subplots_adjust(left=0.15)
plt.colorbar(label='DN')

[Top of Page](#title_ID)

5.<font color='white'>-</font>A different perspective: JWST data models<a class="anchor" id="model"></a>
------------------

Now that we've tried using [astropy](https://docs.astropy.org/en/stable/) to examine the data, we can explore an alternative method that removes some of the complexity and peculiarities of JWST data. Here, we will take a look at the format and headers using [JWST data models](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/). 

There are different data model classes for different kinds of data. Each model generally has several arrays that are associated with it. For example, the ImageModel class has the following arrays associated with it:

* data: The science data
* dq: The data quality array
* err: The error array

The structure and design of the data models take advantage of the ASDF features and functionality, so they can easily be searched, edited, updated, and saved. You can always use ```<model>.info()``` to look at the contents of a data model. It renders the underlying ASDF tree, showing information about the metadata, data arrays, formats, etc.

### A.<font color='white'>-</font>Current models <a class="anchor" id="list"></a>
--------------------------------------------------------------------
The data model package includes specific and general models to use for both science data and calibration reference files. For example, to generate a FITS file that is compatible with the [Stage 1 calibration pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_detector1.html), you would need to use a model for [up-the-ramp  sampled](https://jwst-docs.stsci.edu/understanding-exposure-times#UnderstandingExposureTimes-uptherampHowup-the-rampreadoutswork) IR data: the [RampModel](https://jwst-pipeline.readthedocs.io/en/latest/api/jwst.datamodels.RampModel.html#jwst.datamodels.RampModel). If instead you would like to analyze a 2-D JWST image, you could use the [ImageModel](https://jwst-pipeline.readthedocs.io/en/latest/api/jwst.datamodels.ImageModel.html#jwst.datamodels.ImageModel). Or, if you are unsure, you could let the data model package [guess for you](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/models.html#opening-a-file).

The full list of current models is maintained in the [JWST pipeline software](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/attributes.html#list-of-current-models). 

In [None]:
# print a list of the current data models
inspect.getmembers(datamodels, inspect.isclass)

You can examine the [contents of an existing model](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/models.html#looking-at-the-contents-of-a-model) with the ```.info()``` attribute:

In [None]:
ramp = datamodels.RampModel()

In [None]:
ramp.info()

### B.<font color='white'>-</font>Format<a class="anchor" id="model-format"></a>

Let's go back and examine our uncalibrated file, but this time we will use a JWST data model. Below, we access the typical FITS extensions using the model attributes. All data related to the product are contained in one or more data arrays, and the headers are stored in the model metadata.

In the model info, you see familiar names: ```data``` (```SCI``` extension for FITS), ```zeroframe``` (```ZEROFRAME```), ```group``` (```GROUP```), etc.  

In [None]:
with datamodels.open(uncal_obs) as model:
    model.info()
    
## or use a specific model:

model = datamodels.RampModel(uncal_obs)
model.info()

The exposure data is accessed through the ```data``` member of the model, instead of, for instance, the ```SCI``` extension of a FITS file. So instead of:

```python
hdulist['SCI'].data
```

you would use:

In [None]:
science_data = model.data
science_data.shape

This data can be used the same way as before:

In [None]:
integration = 2
group = -1

fig = plt.figure(figsize=(8, 8))
ax = plt.subplot()
plt.imshow(science_data[integration, group, :, :], origin='lower', cmap='gray', vmin=4000, vmax=12000)
plt.xlabel('Pixel column')
plt.ylabel('Pixel row')
fig.tight_layout()
plt.subplots_adjust(left=0.15)
plt.colorbar(label='DN')

### C.<font color='white'>-</font>Metadata<a class="anchor" id="model-meta"></a>

The [metadata](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/metadata.html#metadata) for a model contains the details about the observation, i.e., the header keywords. The ```jwst.datamodels``` library defines its metadata using the JSON Schema specification, but ```jwst.datamodels``` uses YAML for the syntax (more details are in the [JSON Schema documentation](https://json-schema.org/understanding-json-schema/index.html)). 

In [None]:
model.schema

You can search through the data model schema for particular elements:

In [None]:
model.search_schema('target')

The values for the metadata are checked automatically when added to the model. As an example, see the warning we get when we try to add a string for the RA value in the metadata, instead of a number:

In [None]:
model.meta.target.ra = str(model.meta.target.ra)

Let's change it back:

In [None]:
model.meta.target.ra = float(model.meta.target.ra)

The data model hides direct access to FITS header keywords. Instead, use [the Metadata tree](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/metadata.html#metadata).

In [None]:
model.find_fits_keyword('DATE-OBS')

In [None]:
model.meta.observation.date

[Top of Page](#title_ID)

6.<font color='white'>-</font>Other ways to use the models <a class="anchor" id="use"></a>
--------------------------------------------------------------------
The data models can be used to [create data from scratch](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/models.html#creating-a-data-model-from-scratch) or to [read in an existing FITS file or data array](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/models.html#creating-a-data-model-from-a-file). This is useful if you are trying to run an exposure through the JWST pipeline or read in an exposure to a JWST software tool or data analysis notebook, because certain checks on the data and metadata are performed when added to an existing model. Simulated data created using ```Mirage``` or ```Mirisim``` is directly compatible with the JWST pipeline, because both software tools use the data models during the creation of the simulations. 

### A.<font color='white'>-</font>Create data from scratch<a class="anchor" id="scratch"></a>

To create a new ```ImageModel``` where all of the arrays will have default values, simply provide a shape as the first argument (as described [here](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/models.html#creating-a-data-model-from-scratch)):

In [None]:
with datamodels.ImageModel((1024, 1024)) as im:
    print(im.search_schema('instrument'))

or similarly:

In [None]:
data = np.empty((50, 50))
dq = np.empty((50, 50))
with datamodels.ImageModel(data=data, dq=dq) as im:
    print(im.search_schema('exposure'))

Populate the metadata as needed:

In [None]:
im.meta.instrument.name = 'NIRCAM'

In [None]:
im.meta.instrument.name

and access the data model contents as described in the previous section:

In [None]:
im.data.shape

In [None]:
im.dq

And save:
    
```python
im.save("my-updated-image.fits")
```

### B.<font color='white'>-</font>Create data from a file<a class="anchor" id="file"></a>

The ```jwst.datamodels.open``` function allows you to create a model from a file on disk (as described [here](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/models.html#creating-a-data-model-from-a-file)). It may be passed any of the following:

* a path to a FITS file
* a path to an ASDF file
* a astropy.io.fits.HDUList object
* a readable file-like object

The file will be opened, and based on the nature of the data in the file, the correct data model class will be returned. This is comparable to the following in ```astropy```:

```python
astropy.io.fits.open("myimage.fits")
```

For example, if the file contains 2-dimensional data, an ImageModel instance will be returned. You will generally want to instantiate a model using a ```with``` statement so that the file will be closed automatically when exiting the with block.

```python
from jwst import datamodels
with datamodels.open("myimage.fits") as im:
    assert isinstance(im, datamodels.ImageModel)
```

or if you know the type of model you would like to use:

```python
from jwst.datamodels import ImageModel
with ImageModel("myimage.fits") as im:
    # raises exception if myimage.fits is not an image file
    pass
```

And save:

```python
im.save("my-updated-image.fits")
```

[Top of Page](#title_ID)

7.<font color='white'>-</font>Simulations<a class="anchor" id="simulations"></a>
--------------------------------------------------------------------
The benefit to using existing simulation software such as [Mirage](https://jwst-docs.stsci.edu/jwst-other-tools/mirage-data-simulator) (for NIRCam, NIRISS, and FGS simulations) or [Mirisim](https://www.stsci.edu/jwst/science-planning/proposal-planning-toolbox/mirisim) (for MIRI simulations) is that the outputs are directly compatible with JWST software, such as the [calibration pipeline](https://jwst-docs.stsci.edu/jwst-data-reduction-pipeline). 

As such, you can read in your Mirage data to a datamodel and examine it, or run it through the pipeline, as normal. For example, read the uncalibrated Mirage output into a ```RampModel``` and feed it to the [Stage 1 processing pipeline](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_detector1.html):

```python
uncal_data = datamodels.RampModel("mirage-uncal-file.fits")
```

Import the pipeline, and run it with the ramp data:

```python
from jwst.pipeline import Detector1Pipeline

result = Detector1Pipeline.call(uncal_data)
```

The next module will discuss data products in more detail, as they travel through different stages of the JWST pipeline. 

[Top of Page](#title_ID)