# Example: Using MIRAGE to Generate Wide Field Slitless Exposures

This notebook shows how to use Mirage to create Wide Field Slitless Spectroscopy (WFSS) data, beginning with an APT file. This can be done for NIRCam or NIRISS.

*Table of Contents:*
* [Getting Started](#getting_started)
* [Create input yaml files from an APT proposal](#yaml_from_apt)
* [Make WFSS simulated observations](#make_wfss)
   * [Provide a single wfss mode yaml file](#single_yaml)
   * [Provide mulitple yaml files](#multiple_yamls)
   * [Provide a single yaml file and an hdf5 file containing SED curves of the sources](#yaml_plus_hdf5)
   * [Outputs](#wfss_outputs)
* [Make imaging simulated observations](#make_imaging)
   * [Outputs](#imaging_outputs)

---
<a id='getting_started'></a>
## Getting Started

<div class="alert alert-block alert-warning">
**Important:** 
Before proceeding, ensure you have set the MIRAGE_DATA environment variable to point to the directory that contains the reference files associated with MIRAGE.
<br/><br/>
If you want JWST pipeline calibration reference files to be downloaded in a specific directory, you should also set the CRDS_DATA environment variable to point to that directory. This directory will also be used by the JWST calibration pipeline during data reduction.
<br/><br/>
You may also want to set the CRDS_SERVER_URL environment variable set to https://jwst-crds.stsci.edu. This is not strictly necessary, and Mirage will do it for you if you do not set it, but if you import the crds package, or any package that imports the crds package, you should set this environment variable first, in order to avoid an error.
</div>

<div class="alert alert-block alert-info">
**Dependencies:**<br>

1) Install GRISMCONF from https://github.com/npirzkal/GRISMCONF<br>

2) Install NIRCAM_Gsim from https://github.com/npirzkal/NIRCAM_Gsim. This is the disperser software, which works for both NIRCam and NIRISS.
</div>

In [None]:
import os

In [None]:
# Set environment variables
# It may be helpful to set these within your .bashrc or .cshrc file, so that CRDS will
# know where to look for reference files during future runs of the JWST calibration
# pipeline.

#os.environ["MIRAGE_DATA"] = "/my/mirage_data/"
os.environ["CRDS_PATH"] = os.path.join(os.path.expandvars('$HOME'), "crds_cache")
os.environ["CDRS_SERVER_URL"]="https://jwst-cdrs.stsci.edu"

In [None]:
from glob import glob
import pkg_resources
import yaml

from astropy.io import fits
import astropy.units as u
from astropy.visualization import simple_norm, imshow_norm
import h5py
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from mirage import imaging_simulator
from mirage import wfss_simulator
from mirage.utils.constants import FLAMBDA_CGS_UNITS, FLAMBDA_MKS_UNITS, FNU_CGS_UNITS 
from mirage.yaml import yaml_generator

In [None]:
TEST_DATA_DIRECTORY = os.path.normpath(os.path.join(pkg_resources.resource_filename('mirage', ''),
                                                    '../examples/wfss_example_data'))

---
<a id='yaml_from_apt'></a>
## Create a series of yaml files from an [APT](https://jwst-docs.stsci.edu/display/JPP/JWST+Astronomers+Proposal+Tool+Overview) proposal

With your proposal file open in APT, export the "xml" and "pointing" files. These will serve as the inputs to the yaml file generator function.

In [None]:
# Input files from APT
xml_file = os.path.join(TEST_DATA_DIRECTORY, 'niriss_wfss_example.xml')
pointing_file = os.path.join(TEST_DATA_DIRECTORY, 'niriss_wfss_example.pointing')

See Mirage's [Mirage's yaml_generator documentation](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#additional-yaml-generator-inputs "Yaml Generator Inputs")
for details on the formatting options for the inputs listed below. The formats will vary based on the complexity of your inputs and observations (number of targets, number of observations, instruments used).

In [None]:
# Source catalogs to be used. In this relatively simple case with a single target
# and a single instrument, there are two ways to supply the source catalogs. You
# may specify with or without the target name from the APT file as a dictionary key.

#catalogs = {'MAIN-TARGET': {'point_source': os.path.join(TEST_DATA_DIRECTORY,'point_sources.cat')}}
catalogs = {'point_source': os.path.join(TEST_DATA_DIRECTORY,'point_sources.cat')}

In [None]:
# Set reference file values. 
# Setting to 'crds_full_name' will search for and download needed
# calibration reference files (commonly referred to as CRDS reference files) when
# the yaml_generator is run. 
# 
# Setting to 'crds' will put placeholders in the yaml files and save the downloading
# for when the simulated images are created.
reffile_defaults = 'crds'

In [None]:
# Optionally set the cosmic ray library and rate
cosmic_rays = {'library': 'SUNMAX', 'scale': 1.0}

In [None]:
# Optionally set the background signal rates to be used
background = 'medium'

In [None]:
# Optionally set the telescope roll angle (PAV3) for the observations
pav3 = 12.5

In [None]:
# Optionally set the observation date to use for the data. Note that this information
# is placed in the headers of the output files, but not used by Mirage in any way.
dates = '2022-10-31'

You can specify the data reduction state of the Mirage outputs.
Options are 'raw', 'linear', or 'linear, raw'. 

If 'raw' is specified, the output is a completely uncalibrated file, with a filename ending in "uncal.fits"

If 'linear' is specified, the output is a file with linearized signals, ending in "linear.fits". This is equivalent to having been run through the dq_init, saturation flagging, superbias subtraction, reference pixel subtraction, and non-linearity correction steps of the calibration pipeline. Note that this product does not include dark current subtraction.

If 'linear, raw', both outputs are saved.

In order to fully process the Mirage output with the default steps used by the pipeline, it would be best to use the 'raw' output and run the entire calibration pipeline.

In [None]:
datatype = 'linear, raw'

Provide the output directory for the yaml files themselves, as well as the output directory where you want the simulated files to eventually be saved. This information will be placed in the yaml files.

In [None]:
print(catalogs)

In [None]:
# Create a series of Mirage input yaml files
# using the APT files
yaml_output_dir = '/where/to/put/yaml/files'
simulations_output_dir = '/where/to/put/simulated/data'
# Run the yaml generator
yam = yaml_generator.SimInput(input_xml=xml_file, pointing_file=pointing_file,
                              catalogs=catalogs, cosmic_rays=cosmic_rays,
                              background=background, roll_angle=pav3,
                              dates=dates, reffile_defaults=reffile_defaults,
                              verbose=True, output_dir=yaml_output_dir,
                              simdata_output_dir=simulations_output_dir,
                              datatype=datatype)
yam.create_inputs()

One yaml file will be created for each exposure and detector. The naming convention of the files follows that for [JWST exposure filenames](https://jwst-docs.stsci.edu/display/JDAT/File+Naming+Conventions+and+Data+Products). For example, the first expsure in proposal number 12345, Observation 3, Visit 2, assuming it is made using NIRCam (the A2 detector in this case) will be named jw12345003002_01101_00001_nrca1_uncal.fits. Note that Mirage does not yet create activity IDs in the same way as the JWST flight software, so filenames will be slightly different than what they will be in-flight for the same APT proposal.

Look to see which yaml files are for WFSS and which are imaging

In [None]:
yaml_files = glob(os.path.join(yam.output_dir,"jw*.yaml"))

yaml_WFSS_files = []
yaml_imaging_files = []
for f in yaml_files:
    my_dict = yaml.safe_load(open(f))
    if my_dict["Inst"]["mode"]=="wfss":
        yaml_WFSS_files.append(f)
    if my_dict["Inst"]["mode"]=="imaging":
        yaml_imaging_files.append(f)
    
print("WFSS files:",len(yaml_WFSS_files))
print("Imaging files:",len(yaml_imaging_files))

Each output yaml file contains details on the simulation.

In [None]:
with open(yaml_WFSS_files[0], 'r') as infile:
    parameters = yaml.load(infile)
for key in parameters:
    for level2_key in parameters[key]:
        print('{}: {}: {}'.format(key, level2_key, parameters[key][level2_key]))

---
<a id='make_wfss'></a>
## Make WFSS simulated observations

Create simulated data from the WFSS yaml files. This is accomplished using the **wfss_simulator** module, which wraps around the various stages of Mirage. There are several input options available for the **wfss_simulator**.

* [Provide a single wfss mode yaml file](#singler_yaml)
* [Provide mulitple yaml files](#multiple_yamls)
* [Provide a single yaml file and an hdf5 file containing SED curves of the sources](#yaml_plus_hdf5)

A brief explanation of the available keywords for the **wfss_simualtor**: 

* If an appropriate (linearized, or linearized and cut to the proper number of groups) dark current exposure already exists, the dark current preparation step can be skipped by providing the name of the dark file in **override_dark**.

* The **save_dispersed_seed** option will save the dispersed seed image to a fits file. 

* The name of the fits file can be given in the **disp_seed_filename** keyword or, if that is left as None, Mirage will create a filename based on the simulated data output name in the WFSS mode yaml file.

* If **extrapolate_SED** is set to True, then the continuum calculated by Mirage will be extrapolated to cover the necessary wavlengths if the filters in the input yaml files do not span the entire wavelength range.

* If the **source_stamps_file** is set to the name of an [hdf5](https://www.h5py.org/) file, then the disperser will save 2D stamp images of the dispersed spectral orders for each target. These are intended as aids for spectral extraction. (**NOTE that turning this option on will lead to significantly longer run times for Mirage, as so much more data will be generated.**)

* The **SED_file** keyword can be used to input an existing hdf5 file containing source spectra to be used in the simuation.

* If you have source spectra created within your notbeook or python sessions, these can be added using the **SED_dict** keyword.

* If there are normalized spectra within your **SED_file** or **SED_dict**, you must also provide the **SED_normalizing_catalog_column**. This is the magnitude column name within the ascii source catalog to use for scaling the normalized spectra. Only spectra with units specified as "normalized" will be scaled.

* The **create_continuum_seds** keyword declares whether or not Mirage will use the information in the ascii source catalog to create a set of source SEDs, save them to an hdf5 file, and provide them to the disperser. The only case where the user-input value of this keyword is respected is in the case where mutiple yaml files (and no hdf5 file) are input into the **wfss_simulator**. Only in this situation is it possible to run the disperser using either the multiple imaging seed images alone, or from multiple imaging seed images plus an hdf5 file.

<a id='single_yaml'></a>
### Provide a single wfss mode yaml file

Here, we provide a single yaml file as input. In this case, Mirage will create a direct (undispersed) seed image for the yaml file. For each source, Mirage will construct a continuum spectrum by either:

1. Interpolating the filtered magnitudes in the catalogs listed in the yaml file
2. If only a single filter's magnitude is given, Mirage will extrapolate to produce a flat continuum

This continuum spectrum will then be placed in the dispersed seed image, which will then be combined with a dark current exposure in order to create the final simulated exposure.

In [None]:
m = wfss_simulator.WFSSSim(yaml_WFSS_files[0], override_dark=None, save_dispersed_seed=True,
                           extrapolate_SED=True, disp_seed_filename=None, source_stamps_file=None,
                           SED_file=None, SED_normalizing_catalog_column=None, SED_dict=None,
                           create_continuum_seds=True)
m.create()

<a id='multiple_yamls'></a>
### Provide mulitple yaml files

Here, we provide multiple yaml files as input. There are two options when operating in this way.

* [Set **create_continuum_seds=False**](#multiple_yamls_no_sed). In this case, Mirage will create a direct (undispersed) seed image for each yaml file. For each source, the disperser determines an object's SED by *interpolating that object's signal across the seed images*. This continuum spectrum will then be placed in the dispersed seed image, which will then be combined with a dark current exposure in order to create the final simulated exposure.
* [Set **create_continuum_seds=True**](#multiple_yamls_make_sed). In this case Mirage will produce the SEDs by *interpolating the source magnitudes given in the ascii source catalog*. These SEDs are saved to an hdf5 file. The hdf5 file is then provided to the disperser along with one undispersed seed image. The advantage of this option is processing time. In this case, the **wfss_simulator** only produces a single undispersed seed image, whereas if no hdf5 file is produced, Mirage will construct seed images from all of the input yaml files.

NOTE: In this case, all of the supplied yaml files MUST have the same pointing!

In [None]:
test_yaml_files = ['jw00042001001_01101_00003_nis.yaml', 'jw00042001001_01101_00005_nis.yaml',
                   'jw00042001001_01101_00009_nis.yaml']
test_yaml_files = [os.path.join(yaml_output_dir, yfile) for yfile in test_yaml_files]

<a id='multiple_yamls_no_sed'></a>
#### Multiple yaml files, do not create continuum SED file

In [None]:
disp_seed_image = 'multiple_yaml_input_no_continuua_dispersed_seed_image.fits'
m = wfss_simulator.WFSSSim(test_yaml_files, override_dark=None, save_dispersed_seed=True,
                           extrapolate_SED=True, disp_seed_filename=disp_seed_image, source_stamps_file=None,
                           SED_file=None, SED_normalizing_catalog_column=None, SED_dict=None,
                           create_continuum_seds=False)
m.create()

<a id='multiple_yamls_make_sed'></a>
#### Multiple yaml files, create continuum SED file

In [None]:
disp_seed_image = 'multiple_yaml_input_with_continuua_dispersed_seed_image.fits'
m = wfss_simulator.WFSSSim(test_yaml_files, override_dark=None, save_dispersed_seed=True,
                           extrapolate_SED=True, disp_seed_filename=disp_seed_image, source_stamps_file=None,
                           SED_file=None, SED_normalizing_catalog_column=None, SED_dict=None,
                           create_continuum_seds=True)
m.create()

<a id='yaml_plus_hdf5'></a>
### Provide a single yaml file and an hdf5 file containing SED curves of the sources

In this case, a single WFSS mode yaml file is provided as input to Mirage. Along with this an [hdf5](https://www.h5py.org/) file is provided. This file contains a Spectral Energy Distribution (SED) curve for each target, either in units of F_lambda, (`F_lambda (erg / second / cm^2 / Angstrom)` or `(W / m^2 / micron)`) (or units that can be converted to F_lambda), F_nu (`erg / second / cm^2 / Hz` or `W / m^2 / Hz`), or a normalized SED. Along with the SED, the user must provide a set of wavelengths or frequencies. See the [hdf5 example](#make_sed_file) and [manual example](#manual_seds) below for more information on units. 

The advantage of this input scenario is that you are not limited to simple continuum spectra for your targets. Emission and absorption features can be added. Normalized SEDs will be scaled by the magnitudes listed in one of the magnitude columns of the ascii input catalog. The desired column name is provided through the `SED_normalizing_catalog_column` keyword.

The disperser software will then use the SED along with the segmentation map in the direct seed image to place spectra into the dispersed seed image. In the cell below, we show a simple example of how to create an hdf5 file with SEDs. In this case the spectrum is flat with no emission nor absorption features. 

In [None]:
target_1_wavelength = np.arange(1.0, 5.5, 0.1)
target_1_flux = np.repeat(1e-16, len(target_1_wavelength))
wavelengths = [target_1_wavelength]
fluxes = [target_1_flux]

# Examples for the case where you want to include data on more sources

# Add fluxes for target number 2
#target_2_wavelength = np.arange(0.8, 5.4, 0.05)
#target_2_flux = np.repeat(1.4e-16, len(target_2_wavelength))
#wavelengths.append(target_2_wavelength)
#fluxes.append(target_2_flux)

# Add a normalized input spectrum
#target_3_wavelength = np.arange(0.8, 5.4, 0.05)
#target_3_flux = np.linspace(1.3, 0.75, len(target_3_wavelength))
#wavelengths.append(target_3_wavelength)
#fluxes.append(target_3_flux)

<a id='make_sed_file'></a>
#### Create HDF5 file containing object SEDs

If you wish to add information about the units of the wavelengths and fluxes, that can be done by setting attributes of each dataset as it is created. See the example below where the file **test_sed_file.hdf5** is created. If units are not provided, Mirage assumes wavelength units of `microns` and flux density units of F_lambda in CGS units `(erg / second / cm^2 / Angstrom)`. hdf5 files only support the use of strings as dataset attributes, so we specify units using strings. Mirage will convert these strings to astropy units when working with the data.

Also note that in this hdf5 file (as well as in the manually created source SEDs below), each SED can have its own units.

In [None]:
wavelength_units = 'microns'
flux_units = 'flam'

In [None]:
sed_file = 'test_sed_file.hdf5'
sed_file = os.path.join(yaml_output_dir, sed_file)
with h5py.File(sed_file, "w") as file_obj:
    for i in range(len(fluxes)):
        dset = file_obj.create_dataset(str(i+1), data=[wavelengths[i], fluxes[i]], dtype='f',
                                       compression="gzip", compression_opts=9)
        dset.attrs[u'wavelength_units'] = wavelength_units
        if i < 2:
            dset.attrs[u'flux_units'] = flux_units
        else:
            dset.attrs[u'flux_units'] = 'normalized'

<a id='manual_seds'></a>
#### Manual SED inputs

Also in this example we show the option to manually provide an SED. In this case the SED must be a dictionary where the key is the index number of the object (corresponding to the index number in the ascii catalog). The dictionary entry must contain a `'wavelengths'` and a `'fluxes'` entry for each object. Both of these must be lists or numpy arrays. Astropy units can optionally be attached to each list. Currently Mirage supports only `F_lambda` (or equivalent) units, `F_nu` (or equivalent) units, or normalized units, which can be specified using astropy's `pct` unit. In the example below, note the use of `FLAMBDA_CGS_UNITS`, `FLAMBDA_MKS_UNITS`, and `FNU_CGS_UNITS`, which have been imported from Mirage. *Target_7* also uses a set of frequencies (note the specification of Hz for units), rather than wavelengths. Convertable frequency units (e.g. MHz, GHz) are also allowed.

In [None]:
my_sed = {}
target_2_wavelength = np.arange(0.8, 5.4, 0.05) * u.micron
target_2_flux = np.linspace(1.1, 0.95, len(target_2_wavelength)) * u.pct
my_sed[2] = {"wavelengths": target_2_wavelength,
             "fluxes": target_2_flux}

# Examples in the case you want to add information for other sources

#target_5_wavelength = np.arange(0.8, 5.4, 0.05) * u.micron
#target_5_flux = np.linspace(1e-16, 1e-17, len(target_5_wavelength)) * FLAMBDA_CGS_UNITS
#my_sed[4] = {"wavelengths": target_5_wavelength,
#             "fluxes": target_5_flux}

#target_6_wavelength = np.arange(0.8, 5.4, 0.05) * u.micron
#target_6_flux = np.linspace(1e-15, 1e-16, len(target_5_wavelength)) * FLAMBDA_MKS_UNITS
#my_sed[5] = {"wavelengths": target_6_wavelength,
#             "fluxes": target_6_flux}

#target_7_wavelength = np.linspace(5.6e13, 3.7e14, 10) * u.Hz
#target_7_flux = np.linspace(1.6e-26, 1.6e-27, len(target_7_wavelength)) * FNU_CGS_UNITS
#my_sed[5] = {"wavelengths": target_7_wavelength,
#             "fluxes": target_7_flux}

In [None]:
# Input the SED file and SED dictionary along with a WFSS mode yaml file to Mirage
m = wfss_simulator.WFSSSim(test_yaml_files[0], override_dark=None, save_dispersed_seed=True,
                           extrapolate_SED=True, disp_seed_filename=None, source_stamps_file=None,
                           SED_file=sed_file, SED_normalizing_catalog_column='niriss_f200w_magnitude',
                           SED_dict=my_sed, create_continuum_seds=True)
m.create()

<a id='wfss_outputs'></a>
### Outputs

Regardless of whether the **wfss_simulator** is called with multiple yaml files or a yaml and an hdf5 file, the outputs will be the same. The final output will be **jw\*uncal.fits** (or **jw\*linear.fits**, depending on whether raw or linear outputs are specified in the yaml files) files in your output directory. These files are in DMS format and can be fed directly into the **calwebb_detector1** pipeline for further calibration, if desired.

The seed image is also saved, as an intermediate output. This seed image is a noiseless rate image of the same scene in the final output file. The seed image can be thought of as an ideal version of the scene that excludes (most) detector effects.

#### Examine the dispersed seed image

In [None]:
with fits.open(m.disp_seed_filename) as seedfile:
    dispersed_seed = seedfile[1].data

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))
norm = simple_norm(dispersed_seed, stretch='log', min_cut=0.25, max_cut=10)
cax = ax.imshow(dispersed_seed, norm=norm)
cbar = fig.colorbar(cax)
plt.show()

#### Examine the final output file

In [None]:
final_file = os.path.join(yaml_output_dir, 'jw00042001001_01101_00003_nis_uncal.fits')
with fits.open(final_file) as hdulist:
    data = hdulist['SCI'].data
    hdulist.info()

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))
norm = simple_norm(data[0, 4, :, :], stretch='log', min_cut=5000, max_cut=50000)
cax = ax.imshow(data[0, 4, :, :], norm=norm)
cbar = fig.colorbar(cax)
plt.show()

---
<a id='make_imaging'></a>
# Make imaging simulated observations

Similar to the **wfss_simulator** module for WFSS observations, imaging data can be created using the **imaging_simulator** module. This can be used to create the data for the direct (in NIRCam and NIRISS), and Out of Field (NIRCam) exposures that accompany WFSS observations, as well as the shortwave channel data for NIRCam, which is always imaging while the longwave detector is observing through the grism.

In [None]:
for yaml_imaging_file in yaml_imaging_files[0:1]:
    print("Imaging simulation for {}".format(yaml_imaging_file))
    img_sim = imaging_simulator.ImgSim()
    img_sim.paramfile = yaml_imaging_file
    img_sim.create()

<a id='imaging_outputs'></a>
### Outputs

As with WFSS outputs, the **imaging_simulator** will create **jw\*ucal.fits** or **jw\*linear.fits** files, depending on which was specified in the associated yaml files.

#### Examine the seed image

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))
norm = simple_norm(img_sim.seedimage, stretch='log', min_cut=0.25, max_cut=1000)
cax = ax.imshow(img_sim.seedimage, norm=norm)
cbar = fig.colorbar(cax)
plt.show()

#### Examine the output file

In [None]:
final_file = os.path.join(yaml_output_dir, 'jw00042001001_01101_00001_nis_uncal.fits')
with fits.open(final_file) as hdulist:
    data = hdulist['SCI'].data
    hdulist.info()

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))
norm = simple_norm(data[0, 4, :, :], stretch='log', min_cut=5000, max_cut=50000)
cax = ax.imshow(data[0, 4, :, :], norm=norm)
cbar = fig.colorbar(cax)
plt.show()