# Example: Using MIRAGE to Generate Imaging Exposures

Author: Bryan Hilbert
<br>Last update: 15 Nov 2021

This notebook shows the general workflow for creating simulated data with Mirage, beginning with an APT file. For users without an APT file, Mirage will work with manually-created instrument/exposure parameter files, generally referred to as [input yaml files](#yaml_example). This notebook focuses on creating a NIRCam imaging mode simulation. For other instruments or observing modes (NIRCam and NIRISS WFSS, NIRCam TSO, NIRISS SOSS, as well as imaging mode using non-sidereal or moving targets), see the [example notebooks in the Mirage repository](https://github.com/spacetelescope/mirage/tree/master/examples).

<a id="toc"></a>
*Table of Contents:*
* [<b>1. Getting Started<b/>](#getting_started)
* [<b>2. Define Convenience Function<b/>](#convenience_function)
* [<b>3. Create Source Catalogs<b/>](#source_catalogs)
    * [Using built-in convenience functions](#catalogs_conv_funcs)
        * [for_proposal()](#for_proposal) -- point source and galaxy catalogs
        * [get_all_catalogs()](#get_all_catalogs) -- point source catalogs
        * [galaxy_background()](#galaxy_background) -- galaxy catalogs
    * [Manual creation](#catalogs_manual)
        * [Point source catalog](#manual_point_sources)
            * [From an existing JHK catalog](#existing_jhk)
            * [From scratch](#ptsrc_from_scratch)
        * [Galaxy catalog from scratch](#gal_from_scratch)
        * [Extended source catalog from scratch](#extended_cats)
* [<b>4. Generating Input Yaml Files<b/>](#make_yaml)
    * [Examine an Example Yaml File](#example_yaml)    
* [<b>5. Create Simulated Data<b/>](#run_steps_together)
    * [Call the imgaging simulator](#call_img_sim)
    * [Examine output](#examine_output)
* [<b>6. Running Simulation Steps Independently<b/>](#run_steps_independently)
    * [Seed image](#indep_seed)
        * [Examine seed image](#examine_seed)
        * [Examine other output products](#other_outputs)
    * [Prepare dark current exposure](#prep_dark)
    * [Create final exposure](#final_exposure)
* [<b>7. Simulating Multiple Exposures<b/>](#mult_sims)    
* [<b>8. Extra time: Simulate deep field via galaxy catalog</b>](#deep_field) 
* [<b>9. Calibrate the data</b>](#calibrate_data)

---
<a id="getting_started"></a>
# 1. Getting Started

<div class="alert alert-block alert-warning">
**Important:** 
Before proceeding, ensure you have set the MIRAGE_DATA environment variable to point to the directory that contains the reference files associated with MIRAGE.  
<br/><br/>
If you want JWST pipeline calibration reference files to be downloaded in a specific directory, you should also set the CRDS_DATA environment variable to point to that directory. This directory will also be used by the JWST calibration pipeline during data reduction.
<br/><br/>
You may also want to set the CRDS_SERVER_URL environment variable set to https://jwst-crds.stsci.edu. This is not strictly necessary, and Mirage will do it for you if you do not set it, but if you import the crds package, or any package that imports the crds package, you should set this environment variable first, in order to avoid an error.
</div>

In [None]:
import os

In [None]:
# For use during JWebbinar
os.environ["MIRAGE_DATA"] = "/home/shared/mirage-data"
os.environ["CRDS_DATA"] = "/home/jovyan/crds_cache"
os.environ["CRDS_SERVER_URL"] = "https://jwst-crds.stsci.edu"

# Example when running notebook locally
#os.environ["MIRAGE_DATA"] = "/path/to/your/mirage_data"
#os.environ["CRDS_DATA"] = "$HOME/crds_cache"
#os.environ["CRDS_SERVER_URL"] = "https://jwst-crds.stsci.edu"

In [None]:
# For examining outputs
from glob import glob
import numpy as np
from astropy.io import ascii, fits
from astropy.table import Table
import matplotlib.pyplot as plt
import urllib
import yaml
%matplotlib inline

In [None]:
# mirage imports
from mirage import imaging_simulator
from mirage.catalogs import create_catalog
from mirage.catalogs import catalog_generator
from mirage.seed_image import catalog_seed_image
from mirage.dark import dark_prep
from mirage.ramp_generator import obs_generator
from mirage.yaml import yaml_generator

Define the APT files that will be used for these simuations. As we will see throughout this notebook, the easiest way to create a consistent set of simulated exposures is to start with an APT program. Export the xml and pointing files from APT.

In [None]:
xml_filename = 'sample_imaging.xml'
pointing_filename = 'sample_imaging.pointing'

In [None]:
# Download the example xml and pointing files from APT
box_xml_file = 'https://stsci.box.com/shared/static/e8idi6u8yauvz1y8prwpe39d8e2lmigc.xml'
box_pointing_file = 'https://stsci.box.com/shared/static/izlyvihtzrefqzo5rn7zs2gmwg1lqb27.pointing'

In [None]:
urllib.request.urlretrieve(box_xml_file, xml_filename)
urllib.request.urlretrieve(box_pointing_file, pointing_filename)

<a id='convenience_function'></a>
# 2. Define convenience function for image display

In [None]:
def show(array, title, min=0, max=1000):
    plt.figure(figsize=(12, 12))
    plt.imshow(array, clim=(min, max), origin='lower')
    plt.title(title)
    plt.colorbar().set_label('DN$^{-}$/s')

---
<a id='source_catalogs'></a>
# 3. Create Source Catalogs

See the [Catalog Generator Notebook](https://github.com/spacetelescope/mirage/blob/master/examples/Catalog_Generation_Tools.ipynb) for the full suite of catalog creation examples.

There are [9 different types of source catalogs](https://mirage-data-simulator.readthedocs.io/en/latest/catalogs.html) accepted by Mirage. The types of catalogs that are accepted depend on the observing mode being simulated, as well as the type of sources in the catalogs. 

In this notebook we focus on imaging mode simulations of sidereal targets. In this case, there are three main types of catalogs that can be used:

* <b>Point sources<b/>

    Point source catalogs contain only point sources.

    
* <b>Galaxies<b/>

    Galaxy catalogs contain galaxies to be added to the simulation. Galaxies are simulated as 2D Sersic profiles.


* <b>"Extended" sources<b/>

    "Extended" source catalogs are a catch-all, intended for sources with more complex morphologies. In this case, the user provides a stamp image of each source, which is added to the scene.


Examples of these catalogs are shown below.

One important note on catalogs: every source must have its own unique index number. This includes a scenario where you are using multiple catalogs (e.g. a point source and a galaxy catalog). This is because Mirage creates a segmentation map of the scene using the index numbers. In the examples below, the 'starting_index' parameter is used to set the starting index number in a given catalog. If creating multiple catalogs, be sure to adjust indexes so that there is no overlap.

<a id='catalogs_conv_funcs'></a>
## Using built-in convenience functions

Mirage contains a number of convenience functions for creating source catalogs. These functions query and create source catalogs from 2MASS, Gaia, WISE, and optionally, the Besancon Galaxy model. For each source, Mirage will use the reported magnitudes through the passbands from the various surveys, and interpolate to find the source's magnitude in the requested JWST passbands.

Getting results from the Besancon query is a multi-step process, which is outlined in the [Catalog Generator Notebook](https://github.com/spacetelescope/mirage/blob/master/examples/Catalog_Generation_Tools.ipynb). For the purposes of this example, we will skip the Besancon query. Note that this is an optional input to the catalog creation function below.

More description of the catalog generation conevenience tools are given in the [Catalog Creation](https://mirage-data-simulator.readthedocs.io/en/latest/catalog_creation.html) documentation page. 

The highest-level of these is the `for_proposal()` function. This function will look at the targets in your APT file and create point source and/or galaxy catalogs in the areas around those targets.

Another useful high-level function is `get_all_catalogs()`. This function creates a point source catalog for a given central RA, Dec and box width on the sky. Under the hood, `for_proposal()` calls `get_all_catalogs()` once for each target in your APT file.

* Point source catalogs created with the functions above are generated through queries to the 2MASS, Gaia, and WISE source catalogs. This will limit the sources in the resulting catalog to the magnitude limits of those three surveys. You can also optionally provide a file containing the results of a query of the Besancon Galaxy model. With this, you can add dimmer sources with a realistic source density and distribution of stellar types, but the sources will not be real stars.


* The catalog of extragalactic sources is created from the sources in the 3DHST catalog. As with the Besancon model results, representative galaxies will be added to the source catalog in order to create a realistic scene, but the sources will not be real. Note that in this case, the density of galaxies will match that seen in 3DHST (ie in a deep exposure).

<a id='for_proposal'></a>
<b>for_proposal()<b/>

The `for_proposal()` function will generate point source and/or galaxy catalogs for the target locations contained in your APT proposal. If your proposal contains multiple targets, separate catalogs will be generated for each. The catalogs contain source magnitudes for all sources in all NIRCam/NIRISS filters specified in the proposal.

In [None]:
catalog_dir = 'catalogs'

In [None]:
catalog_results = create_catalog.for_proposal(xml_filename, pointing_filename,
                                              point_source=True, extragalactic=True,
                                              catalog_splitting_threshold=1.,
                                              besancon_catalog_file=None,
                                              ra_column_name='RAJ2000',
                                              dec_column_name='DECJ2000',
                                              out_dir=catalog_dir,
                                              save_catalogs=True)
ptsrc_cats, gal_cats, ptsrc_filenames, gal_filenames, ptsrc_mapping, gal_mapping = catalog_results

Let's take a quick look at a couple of the catalogs:

In [None]:
len(ptsrc_cats)

In [None]:
print(ptsrc_filenames)

In [None]:
ptsrc_mapping

In [None]:
ptsrc_cats[1].table

In [None]:
ptsrc_cats[0].table

The galaxy source catalogs are very large! We will look at the catalogs here, but not include them when creating the simulated data, in order to save time.

In [None]:
print(gal_mapping)

In [None]:
gal_cats[1].table

<a id='get_all_catalogs'></a>
<b>get_all_catalogs()<b/>

Next let's look at a call of `get_all_catalogs()`, to show how to construct a catalog for a given area on the sky. In this case, we need to specify the JWST filters to include in the catalog. There is also an optional keyword to specify the starting index number of the catalog. This is important, because when using multiple catalogs to create a simulation, every source must have a unique index. In this case we'll assume there are no other catalogs being used, and will start with an index of 1. As with `for_proposal()`, there is an option to supply a file containing the results of a Besancon query. 

In [None]:
center_ra = 12.0  # degrees
center_dec = 12.0  # degrees
width = 140  # arcseconds

In [None]:
sim_filters = ['F150W', 'F444W']

In [None]:
cat, filt_list = create_catalog.get_all_catalogs(center_ra, center_dec, width,
                                                 besancon_catalog_file=None,
                                                 instrument='NIRCAM', filters=sim_filters,
                                                 starting_index=1)

In [None]:
cat.table

To use this catalog in a Mirage simulation, save it to an ascii file.

In [None]:
cat_filename = os.path.join(catalog_dir, 'ptsrc_from_get_all_catalogs.cat')
cat.save(cat_filename)

<a id='galaxy_background'></a>
<b>galaxy_background()<b/>

Galaxies are simulated as 2D Sersic profiles. The `galaxy_background()` function uses a catalog from the [3DHST project](https://archive.stsci.edu/prepds/3d-hst/) to create catalogs of galaxies that can be simulated in Mirage. The function selects a number of galaxies from the 3DHST catalog such that the density of sources on the sky matches that in the 3DHST catalog. This is a good option for simulating a deep field observation. The `galaxy_background()` function is called by `for_proposal()`, as seen in the [example above](#for_proposal).

In [None]:
center_ra = 12.0
center_dec = 12.0
v3_angle = 0.  # degrees
width = 140  # arcseconds
instrument = 'nircam'
filter_list = ['F444W', 'F150W']
background_galaxy_catalog, used_seed_value = create_catalog.galaxy_background(center_ra, center_dec, v3_angle,
                                                                              width, instrument, filter_list,
                                                                              boxflag=False, brightlimit=14.0,
                                                                              starting_index=1000)

In [None]:
background_galaxy_catalog.table

In [None]:
background_galaxy_catalog.save(os.path.join(catalog_dir, 'background_galaxies_from_3DHST.cat'))

<a id='catalogs_manual'></a>
## Manual creation

<a id='manual_point_sources'></a>
### Point Sources

<a id='existing_jhk'></a>
#### From an existing JHK catalog

Let's create a small catalog containing RA, Dec, and J, H, K, and V magnitudes, plus extinction, $A_{v}$. Then we'll use a convenience function to convert it into a Mirage-formatted point source catalog.

In [None]:
ra_vals = [12.0, 12.001, 12.002, 12.003, 12.004]
dec_vals = [34.5, 34.5001, 34.5002, 34.5003, 34.5004]
num_stars = len(ra_vals)

In [None]:
J = np.random.uniform(low=14, high=16, size=num_stars)
H = np.random.uniform(low=14, high=16, size=num_stars)
K = np.random.uniform(low=14, high=16, size=num_stars)
V = np.random.uniform(low=14, high=16, size=num_stars)
Av = np.repeat(1.0, num_stars)

In [None]:
orig_cat = Table()
orig_cat['RA'] = ra_vals
orig_cat['Dec'] = dec_vals
orig_cat['J'] = J
orig_cat['H'] = H
orig_cat['K'] = K
orig_cat['V'] = V
orig_cat['Av'] = Av

In [None]:
orig_cat

Save our "existing" catalog to an ascii file

In [None]:
orig_cat_file = os.path.join(catalog_dir, 'original_ptsrc_catalog.cat')

In [None]:
ascii.write(orig_cat, orig_cat_file, overwrite=True)

Now convert the catalog into a Mirage-formatted catalog with magnitudes converted into the those for the filters of interest. For NIRCam, the filter needs to be specified as a filter/pupil pair. For NIRISS, only the filter name is needed.

In [None]:
filters = {}
filters['nircam'] = ['F150W/CLEAR', 'F444W/CLEAR']
filters['niriss'] = ['F200W', 'F277W']

In [None]:
# Name of the file to save the Mirage catalog into
mirage_ptsrc_cat_file = os.path.join(catalog_dir, 'mirage_formatted_point_sources.cat')

In [None]:
# Be sure to specify the column names in the original catalog
# that contain the RA and Dec data.
#
# Since this is the first catalog we are creating, start the index counter
# at 1. (Don't start at 0 since this will be used to create a segmentation
# map)
ptsrc_cat = create_catalog.johnson_catalog_to_mirage_catalog(orig_cat_file, filters,
                                                             ra_column_name='RA',
                                                             dec_column_name='Dec',
                                                             magnitude_system='abmag',
                                                             output_file=mirage_ptsrc_cat_file,
                                                             starting_index=1)

In [None]:
ptsrc_cat.table

<a id='ptsrc_from_scratch'></a>
#### Manually create a Mirage-formatted point source catalog from scratch

In this case we create a Mirage point source catalog directly. We'll use this catalog in the simulation below, since we can tailor this catalog more easily to create a pretty picture.

In [None]:
# Generate point source RA, Dec values that cover the field of view
# for all detectors. 
min_ra = 11.980270819703372
max_ra = 12.050540819703372
min_dec = 11.965394574641623
max_dec = 12.035664574641622
delta_ra = max_ra - min_ra
delta_dec = max_dec - min_dec

In [None]:
# Generate a list of RA, Dec pairs
num_stars = 450
random_number_generator = np.random.RandomState(2021)
ra = random_number_generator.rand(num_stars) * delta_ra + min_ra
dec = random_number_generator.rand(num_stars) * delta_dec + min_dec

# Create a list of magnitudes for two filters. Let's keep the magnitudes
# between 14 and 20, just to keep things easily visible in this short
# exposure
mag_rand_num_gen = np.random.RandomState(1066)
mags1 = mag_rand_num_gen.rand(num_stars) * 6 + 14.
mags2 = mag_rand_num_gen.rand(num_stars) * 6 + 14.

In [None]:
# Create a PointSourceCatalog object and supply the RA and Dec values.
# We won't use this catalog for our simulations. 
ptsrc = catalog_generator.PointSourceCatalog(ra=ra, dec=dec, starting_index=1)

In [None]:
# Now add magnitude columns for each filter
ptsrc.add_magnitude_column(mags1, instrument='nircam',
                           filter_name='f150w', magnitude_system='abmag')
ptsrc.add_magnitude_column(mags2, instrument='nircam',
                           filter_name='f444w', magnitude_system='abmag')
ptsrc.add_magnitude_column(mags2, instrument='niriss',
                           filter_name='f200w', magnitude_system='abmag')

In [None]:
manually_generated_ptsrc_catalog = os.path.join(catalog_dir, 'manually_generated_ptsrc.cat')
ptsrc.save(manually_generated_ptsrc_catalog)

In [None]:
ptsrc.table

See the [Catalog Generator Notebook](https://github.com/spacetelescope/mirage/blob/master/examples/Catalog_Generation_Tools.ipynb) for more examples of creating source catalogs using queries to 2MASS/GAIA/WISE/Besancon.

<a id='gal_from_scratch'></a>
### Galaxy catalog from scratch

In this case we create a Mirage galaxy source catalog directly. We'll use this catalog in the simulation below, since the galaxy catalog created above is so large, and would take longer to run.

In [None]:
num_galaxies = 20
gal_rand_num_gen = np.random.RandomState(1564)
ra_galaxy_vals = gal_rand_num_gen.rand(num_galaxies) * delta_ra + min_ra
dec_galaxy_vals = gal_rand_num_gen.rand(num_galaxies) * delta_dec + min_dec

In [None]:
radius = np.random.uniform(low=0.06, high=0.5, size=num_galaxies)  # arcsec
ellip = np.random.uniform(low=0., high=0.8, size=num_galaxies)
posang = np.random.uniform(low=0, high=359, size=num_galaxies)     # degrees
sersic = np.random.uniform(low=1.0, high=4.0, size=num_galaxies)

In [None]:
# Manually add a galaxy at a known location so we can examine it later
ra_galaxy_vals[-1] = 12.007490819703373
dec_galaxy_vals[-1] = 11.992614574641623

In [None]:
# Tweak the properties of the galaxy in the known location
radius[-1] = 0.1
ellip[-1] = 0.7
posang[-1] = 35.
sersic[-1] = 1.5

In [None]:
# Since we already have the point source catalog, we know the minimum index
# value that we can use in order to ensure that all sources have a unique index.
# Set the minimum index number to be one larger than the number of point sources.
gal_cat = catalog_generator.GalaxyCatalog(ra=ra_galaxy_vals, dec=dec_galaxy_vals,
                                          ellipticity=ellip,
                                          radius=radius,
                                          sersic_index=sersic,
                                          position_angle=posang,
                                          radius_units='arcsec',
                                          starting_index=num_stars+1)

In [None]:
gal_mag_f150w = mag_rand_num_gen.rand(num_galaxies) + 15.
gal_mag_f444w = mag_rand_num_gen.rand(num_galaxies) + 15.5

In [None]:
gal_cat.add_magnitude_column(gal_mag_f150w, instrument='nircam', filter_name='f150w',
                             magnitude_system='abmag')
gal_cat.add_magnitude_column(gal_mag_f444w, instrument='nircam', filter_name='f444w',
                             magnitude_system='abmag')

In [None]:
manually_generated_galaxy_catalog = os.path.join(catalog_dir, 'manually_generated_galaxies.cat')
gal_cat.save(manually_generated_galaxy_catalog)

In [None]:
gal_cat.table

<a id='extended_cats'></a>
### "Extended" source catalog from scratch

"Extended" sources can be used for sources where you have a fits file containing a stamp image of your source. In this way, Mirage can simulate sources with more complex morphologies than simple point sources and Sersic profiles. Here, we'll create an extended source catalog with one source, in order to show its use. We'll set it up so the extended source falls onto the B4 detector.

In [None]:
# Download a stamp image to use for this example
box_file = 'https://stsci.box.com/shared/static/lkfn6oz03pbyorka0644x2x8344q3oyg.fits'
stamp_file = 'extended_stamp.fits'

In [None]:
urllib.request.urlretrieve(box_file, stamp_file)

Let's take a quick look at this stamp image. This will be the object we are adding to the scene via the extended source catalog.

In [None]:
obj = fits.getdata(stamp_file)

In [None]:
show(obj, 'Extended source stamp image', min=0, max=10)

In [None]:
extended_ra = [12.0088178]
extended_dec = [11.9911822]
extended_stamp_file = ['extended_stamp.fits']
extended_pa = [0.]

In [None]:
# Note that the highest index number in the galaxy catalog is 60, so 
# set the starting_index to something higher.
extended_cat = catalog_generator.ExtendedCatalog(filenames=extended_stamp_file,
                                                 ra=extended_ra,
                                                 dec=extended_dec,
                                                 position_angle=extended_pa,
                                                 starting_index=99999)

For extended sources, you can either specify a magnitude, as with the point sources/galaxies, or you can specify the magnitude as 'None'. In the latter case, the stamp image is interpreted as being in units of counts per second, and is added directly to the simulation without any scaling.

In [None]:
extended_mag_f150w = [14.]
extended_mag_f444w = [16.]

In [None]:
extended_cat.add_magnitude_column(extended_mag_f444w, instrument='nircam', 
                                  filter_name='f444w', magnitude_system='abmag')
extended_cat.add_magnitude_column(extended_mag_f150w, instrument='nircam', 
                                  filter_name='f150w', magnitude_system='abmag')

In [None]:
extended_cat.table

In [None]:
manually_generated_extended_catalog = os.path.join(catalog_dir, 'manually_generated_extended.cat')
extended_catalog_file = os.path.join(catalog_dir, 'manually_generated_extended.cat')
extended_cat.save(manually_generated_extended_catalog)

<i>Back to the [Table of contents](#toc)</i>

---
<a id='make_yaml'></a>
# Generating input yaml files

For convenience, observing programs with multiple pointings 
and detectors can be simulated starting with the program's 
APT file. The xml and pointings files must be exported from 
APT, and are then used as input to the *yaml_generator*, which will
generate a series of yaml input files. The [yaml generator documentation page](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html) explains the
creation of yaml files in more detail.

## Optional user inputs

See Mirage's [Mirage's yaml_generator documentation](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#additional-yaml-generator-inputs "Yaml Generator Inputs")
for details on the formatting options for the inputs listed below. The formats will vary based on the complexity of your inputs.

### Catalogs

Catalogs are organized by target name. If starting from an APT file, these must be the target names in your proposal. Mirage will then map the specified catalogs to the appropriate observations that use each target.

There are several ways to specify the catalogs when calling the `yaml_generator`. These are detailed on the [yaml generation documentation page](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#source-catalogs). In this notebook, we will specify one catalog of each type for each target in the APT file.

```python
# When using the for_proposal() or get_all_catalogs() convenience functions, you can
# populate catalog names directly from the output names. 

cat_dict = {'STARFIELD': {'point_source': os.path.join(catalog_dir, ptsrc_mapping['001']),
                          'extended': manually_generated_extended_catalog
                      },
            'EXTRAGALACTIC': {'point_source': os.path.join(catalog_dir, ptsrc_mapping['002']),
                              'galaxy': os.path.join(catalog_dir, gal_mapping['002'])
                      }
           }
```

In this case, let's use our manually-created catalogs for the STARFIELD target, where we were able to control the number of sources in the field of view. Our manually-created catalogs do not extend to the EXTRAGALACTIC source, so for those we continue to use the outputs from for_proposal().

In [None]:
cat_dict = {'STARFIELD': {'point_source': manually_generated_ptsrc_catalog,
                          'galaxy': manually_generated_galaxy_catalog, 
                          'extended': manually_generated_extended_catalog
                      },
            'EXTRAGALACTIC': {'point_source': os.path.join(catalog_dir, ptsrc_mapping['002']),
                              'galaxy': os.path.join(catalog_dir, gal_mapping['002'])
                      }
           }

### Pipeline reference files

Set reference file values. Setting this to 'crds_full_name' when calling the `yaml_generator` will cause the yaml_generator to search for and download needed calibration reference files (commonly referred to as CRDS reference files) when the yaml_generator is run. The names of these CRDS reference files will then be placed in the appropriate entries of the yaml files. This option can be useful if you want to be able to guarantee the use of the same reference files no matter when the yaml file is used to create a simulation. 
 
Setting this to 'crds' will put placeholders (the string "crds") in the yaml files' entries for CRDS reference files. In this case, the CRDS reference files are not identified and downloaded until the simulated data are created from the yaml file. With this method, you are guaranteed to use the latest CRDS reference files when creating the simulated data, even if your yaml files are old.

In [None]:
reffile_defaults = 'crds'

### Cosmic rays

You can control the library and cosmic ray rate if desired. If you omit this line, Mirage will use the default values. More details on rates are given on the [Observation Generation documentation page](https://mirage-data-simulator.readthedocs.io/en/latest/observation_generator.html#add-cosmic-rays) as well as the [yaml generation documentation page](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#cosmic-ray-rates).

In [None]:
cosmic_rays = {'library': 'SUNMAX', 'scale': 1.0}

### Background

This option controls the background signal in the simulations. Mirage uses the same definitions as ETC for the options: 'low', 'medium', and 'high'. Or, if you set background to a number, Mirage will assume that this is the background in counts per second per pixel.

Another option for the background is to specify that you want the background level associated with a particular date. If the `dateobs_for_background` parameter is set to True in the call to the yaml_generator, then any background value given here is ignored and the background level will be calculated using the `jwst_backgrounds` tool.

More details are given on the [yaml generation documentation page](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#background-specification).

In [None]:
# Set the background for all observations
background = 'medium'

# Give a different background value for each observation,
# where the keys here are observation numbers from the APT file
#background = {'001': 'high', '002': 'medium'}

### Telescope Roll Angle

You can set the telescope roll angle on a per proposal or per observation basis. Note that this is the roll angle about the V1 axis in degrees east of North. If you omit this parameter, the default is a roll angle of 0. From this value, Mirage will calculate the local roll angle of the detector to be simulated.

More details are shown on the [yaml generation documentation page](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#roll-angle).

In [None]:
# Set one roll angle for all APT observations:
pav3 = 0.

# Or set a different roll angle for each APT observation. In this way you
# can simulate different epochs.
#roll_angle = {'001': 34.5, '002': 154.5}

### Date

Set the observation date to use for the data. A single date can be given for the proposal, or separate dates can be provided for each observation within the proposal. This information is placed in the headers of the output files. If `dateobs_for_background` is set to True in the call to the `yaml_generator`, then the date will be used to calculate the background signal.

See the [yaml generator documentation page](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#observation-dates) for more details.

In [None]:
# Set one date for all APT observations
dates = '2022-10-31'

# Specify a different date for each APT observation
#dates = {'001': '2022-06-25', '002': '2022-11-15'}

### Ghosts

For NIRISS simulations, users can add optical ghosts to the data. By default, ghosts will be added for point sources only. Ghosts can also be added for galaxy or extended targets if you have a stamp image for each source. See the [documentation for adding ghosts](https://mirage-data-simulator.readthedocs.io/en/latest/ghosts.html)
for details. For NIRCam simulations, such as those created in this notebook, the addition of ghosts is not supported, and Mirage will ignore the keywords below, but we include them here for completeness.

In [None]:
ghosts = False
convolve_ghosts = False

## Run the yaml_generator

This will create a collection of yaml files that will be used as inputs when creating the simulated data. There will be one yaml file for each detector and exposure, so there can be quite a few files created if your program has lots of exposures or dithers. A more complete description of a call to the `yaml_generator` is given on the [yaml generation documentation page](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#run-the-yaml-generator).

In [None]:
# Set the directory into which the yaml files will be written
output_dir = './yaml_files/'

In [None]:
# You can also set a separate directory where the simulated data
# will eventually be saved to
simulation_dir = './sim_data/'

You can specify the data reduction state of the Mirage outputs.
Options are 'raw', 'linear', or 'linear, raw'. 

If 'raw' is specified, the output is a completely uncalibrated file, with a filename ending in "uncal.fits"

If 'linear' is specified, the output is a file with linearized signals, ending in "linear.fits". This is equivalent to having been run through the dq_init, saturation flagging, superbias subtraction, reference pixel subtraction, and non-linearity correction steps of the calibration pipeline. Note that this product does not include dark current subtraction.

If 'linear, raw', both outputs are saved.

In order to fully process the Mirage output with the default steps used by the pipeline, it is **recommended to use the 'raw' output and run the entire calibration pipeline after Mirage has created all the data**.

In [None]:
datatype = 'raw, linear'

In [None]:
# Run the yaml generator
yam = yaml_generator.SimInput(input_xml=xml_filename, pointing_file=pointing_filename,
                              catalogs=cat_dict, cosmic_rays=cosmic_rays,
                              background=background, roll_angle=pav3,
                              dates=dates, reffile_defaults=reffile_defaults,
                              add_ghosts=ghosts, convolve_ghosts_with_psf=convolve_ghosts,
                              verbose=True, output_dir=output_dir,
                              simdata_output_dir=simulation_dir,
                              datatype=datatype)
yam.create_inputs()

In [None]:
yfiles = sorted(glob(os.path.join(output_dir,'jw*.yaml')))

In [None]:
yfiles

<i>Back to the [Table of contents](#toc)</i>

<a id="example_yaml"></a>
## Examine a yaml input file

The yaml input file contains all of the parameters and values that Mirage needs in order to simulate one exposure from one detector. Keep in mind that the yaml generator above is a convenience function for quickly generating the yaml files associated with a particular proposal. If desired, you can always create your own yaml files, or take an existing yaml file and tweak it in order to customize your simulation.

The Mirage documentation also provides an [example of a yaml file](https://mirage-data-simulator.readthedocs.io/en/latest/example_yaml.html), complete with descriptions for all
paramteres.

Entries listed as 'config' have default files that are present in the 
config directory of the repository. The scripts are set up to 
automatically find and use these files. The user can replace 'config'
with a filename if they wish to override the default.

In general, if 'None' is placed in a field, then the step that uses
that particular file will be skipped.

Note that the linearized_darkfile entry overrides the dark entry, unless
linearized_darkfile is set to None, in which case the dark entry will be
used.

Use of a valid readout pattern in the readpatt entry will cause the 
simulator to look up the values of nframe and nskip and ignore the 
values given in the yaml file.

Let's take a quick look at one of the yaml files that were created above.

In [None]:
# Choose one of the yaml files just created
yamlfile = './yaml_files/jw98765001001_01101_00001_nrcb4.yaml'

In [None]:
with open(yamlfile) as f:
    yaml_data = yaml.load(f, Loader=yaml.FullLoader)

In [None]:
yaml_data

<i>Back to the [Table of contents](#toc)</i>

---
<a id='run_steps_together'></a>
# Create simulated data

Under the hood, the `Mirage` simulator is broken up into three basic stages:

1. **Creation of a "seed image".**<br>
   This is generally a noiseless countrate image that contains signal
   only from the astronomical sources to be simulated. Currently, the 
   mirage package contains code to produce a seed image starting
   from object catalogs.<br><br>
   
2. **Dark current preparation.**<br>
   The simualted data will be created by adding the simulated sources
   in the seed image to a real dark current exposure. This step
   converts the dark current exposure to the requested readout pattern
   and subarray size requested by the user.<br><br>
   
3. **Observation generation.**<br>
   This step converts the seed image into an exposure of the requested
   readout pattern and subarray size. It also adds cosmic rays and 
   Poisson noise, as well as other detector effects (IPC, crosstalk, etc).
   This exposure is then added to the dark current exposure from step 2.<br><br>
   
For imaging mode observations, these steps are wrapped by the `imaging_simulator.py` module, as shown below.

<a id='call_img_sim'></a>
## Call the imaging simulator

The imaging_simulator.ImgSim class is a wrapper around the three main steps of the simulator (detailed in the [Running simulator steps independently](#run_steps_independently) section below). This convenience function is useful when creating simulated imaging mode data. WFSS data will need to be run in a slightly different way. See the WFSS example notebook for details.

In [None]:
# Run all steps of the imaging simulator for yaml file #1
img_sim = imaging_simulator.ImgSim()
img_sim.paramfile = yamlfile
img_sim.create()

<a id='examine_output'></a>
## Examine the Output

### Noiseless Seed Image

This image is an intermediate product. It contains only the signal from the astronomical sources and background. There are no detector effects, nor cosmic rays added to this count rate image.

In [None]:
# First, look at the noiseless seed image
show(img_sim.seedimage,'Seed Image', max=50)

In [None]:
# See the galaxy source
show(img_sim.seedimage[975:1075, 975:1075],'Galaxy in Seed Image', max=2000)

In [None]:
# See the extended source
show(img_sim.seedimage[800:900, 820:920],'Extended Source in Seed Image', max=2000)

### Raw (uncal) file

This is the "final" output for Mirage. The uncal file contains raw, uncalibrated data, and is in a format that matches the level 1b data that will be produced by JWST. This file may be run through the JWST calibration pipeline just as if it were real JWST data. Note that Mirage's creation of the raw and linearized output exposures is controlled through the `datatype` parameter in the yaml generator.

Examine the raw output. First a single group, which is dominated by noise and detector artifacts. 

In [None]:
raw_basename = os.path.basename(yamlfile).replace('.yaml', '_uncal.fits')
raw_file = os.path.join(simulation_dir, raw_basename)

In [None]:
hdulist = fits.open(raw_file)
hdulist.info()

In [None]:
raw_data = hdulist['SCI'].data
raw_header = hdulist[0].header
hdulist.close()

In [None]:
print(raw_data.shape)

In [None]:
show(raw_data[0, -1, :, :], "Final Group", max=15000)

Many of the instrumental artifacts can be removed by looking at the difference between two groups. Raw data values are integers, so first make the data floats before doing the subtraction. Here we will look at the difference between the last and the first groups.

In [None]:
show(1. * raw_data[0, -1, :, :] - 1. * raw_data[0, 0, :, :],
     "Last Minus First Group", max=2000)

This raw data file is now ready to be run through the [JWST calibration pipeline](https://jwst-pipeline.readthedocs.io/en/stable/) from the beginning. If dark current subtraction is not important for you, you can use Mirage's linear output, skip some of the initial steps of the pipeline, and begin by running the [Jump detection](https://jwst-pipeline.readthedocs.io/en/stable/jwst/jump/index.html?highlight=jump) and [ramp fitting](https://jwst-pipeline.readthedocs.io/en/stable/jwst/ramp_fitting/index.html) steps.

### Linearized exposure

Another optional output is a version of the raw exposure above that contains linearized data. In this case, the data are saved in a state equvalent to that output from the linearization step of the calibration pipeline. Visual examination of linearized data is often easier than that of raw data, because the linearized data has had bias drifts removed through the use of reference pixels. However, the signal to noise in this file will be lower than that in the seed image, since the linearized exposure does contain noise as well as containing full integrations with multiple groups, rather than the line-fit slope image present in the seed image.

In [None]:
linearized_file = os.path.join(simulation_dir, 'jw98765001001_01101_00001_nrcb4_linear.fits')

In [None]:
linearized_data = fits.getdata(linearized_file)

Let's look at the signal difference between the raw file and the linearized file for a pixel within a source.

In [None]:
linearized_data[0, :, 1025, 1025]

In [None]:
raw_data[0, :, 1025, 1025]

In [None]:
times = raw_header['TGROUP'] * np.arange(raw_header['NGROUPS'])

In [None]:
slope = (raw_data[0, 1, 1025, 1025] - raw_data[0, 0, 1025, 1025]) / (times[1] - times[0])
straight_line = slope * times

In [None]:
min_lin = linearized_data[0, 0, 1025, 1025]
min_raw = raw_data[0, 0, 1025, 1025]

f, a = plt.subplots(figsize=(8,8))
a.plot(times, linearized_data[0, :, 1025, 1025] - min_lin, 'o-', color='red', label='Linearized')
a.plot(times, raw_data[0, :, 1025, 1025] - min_raw, 'o-', color='blue', label='Raw')
a.plot(times, straight_line, color='black', linestyle=(0, (5, 10)), label='Extrapolated from raw')
a.set_xlabel('Time (sec)')
a.set_ylabel('Signal (ADU)')
a.legend()
plt.show()

<i>Back to the [Table of contents](#toc)</i>

---
<a id='run_steps_independently'></a>
# Running simulation steps independently

The steps detailed in this section are wrapped by the `imaging_simulator` mentioned above. General users will not need to worry about the details of these three steps.

<a id="indep_seed"></a>
## First generate the "seed image" 

This is generally a 2D noiseless countrate image that contains only simulated astronomical sources.

A seed image is generated based on a `.yaml` file that contains all the necessary parameters for simulating data. For this exercise, use the same yaml file that was used in the [Create Simulated Data](#run_steps_together) section as input.

In [None]:
cat = catalog_seed_image.Catalog_seed()
cat.paramfile = yamlfile
cat.make_seed()

<a id="examine_seed"></a>
### Look at the seed image

In [None]:
show(cat.seedimage,'Seed Image',max=50)

While the seed image makes for a pretty picture, and is useful as a sanity check that your objects contain the correct signals and are at the correct locations, it is far from a complete simulation. It contains no noise (Poisson, readnoise, 1/f noise, etc). Also, there is no WCS information attached to the seed image file, so it cannot be run through the calibration pipeline.

In [None]:
seed_image_hdulist = fits.open(cat.seed_file)

In [None]:
seed_image_hdulist.info()

Note that the seed image file contains an extension with a segmentation map. For imaging mode simulations, Mirage does not use this output product at all. For WFSS simulations, the segmentation map controls which pixels are dispersed when creating the seed image. 

In [None]:
show(seed_image_hdulist[2].data,'Segmentation Map', max=400)

In [None]:
seed_image_header = seed_image_hdulist[0].header

In [None]:
seed_image_header

<a id="other_outputs"></a>
### Other output products

#### Catalog of sources present on the detector

In [None]:
img_point_source_cat = os.path.join(simulation_dir, 'jw98765001001_01101_00001_nrcb4_uncal_pointsources.list')

In [None]:
img_point_sources = ascii.read(img_point_source_cat, format='commented_header', header_start=2)

In [None]:
img_point_sources

In [None]:
img_gal_source_cat = os.path.join(simulation_dir, 'jw98765001001_01101_00001_nrcb4_uncal_galaxySources.list')

In [None]:
img_gal_sources = ascii.read(img_gal_source_cat)
img_gal_sources

In [None]:
img_ext_source_cat = os.path.join(simulation_dir, 'jw98765001001_01101_00001_nrcb4_uncal_extendedsources.list')

In [None]:
img_ext_sources = ascii.read(img_ext_source_cat, format='commented_header', header_start=2)
img_ext_sources

#### Catalog of cosmic rays added to the exposure

In [None]:
cr_file = os.path.join(simulation_dir, 'jw98765001001_01101_00001_nrcb4_uncal_cosmicrays.list')

In [None]:
cr_table = ascii.read(cr_file)

In [None]:
cr_table

#### Log files

Log files for completed Mirage runs will be saved in the `mirage_logs` subdirectory under the working directory.

In [None]:
logfiles = sorted(glob('mirage_logs/*.log'))

In [None]:
with open(logfiles[-1]) as obj:
    log = obj.readlines()

In [None]:
log

The log file for the latest Mirage run will also be stored in the current working directly as `mirage_latest.log`. If Mirage crashes, this is the log file you should examine, as Mirage copies `mirage_latest.log` into the `mirage_logs` directory at the completion of the run.

In [None]:
with open('mirage_latest.log') as obj:
    latest_log = obj.readlines()

In [None]:
latest_log

<a id="prep_dark"></a>
## Prepare the dark current exposure
This will serve as the base of the simulated data.
This step will linearize the dark current (if it 
is not already), and reorganize it into the 
requested readout pattern and number of groups.

In [None]:
d = dark_prep.DarkPrep()
d.paramfile = yamlfile
d.prepare()

### Look at the dark current 
For this, we will look at an image of the final group
minus the first group

In [None]:
exptime = d.linDark.header['NGROUPS'] * cat.frametime
diff = (d.linDark.data[0,-1,:,:] - d.linDark.data[0,0,:,:]) / exptime
show(diff,'Dark Current Countrate',max=0.001)

In [None]:
darkfile = 'sim_data/jw98765001001_01101_00001_nrcb4_uncal_linear_dark_prep_object.fits'

In [None]:
dark_header = fits.getheader(darkfile)

In [None]:
dark_header

<a id="final_exposure"></a>
## Create the final exposure
Turn the seed image into a exposure of the 
proper readout pattern, and combine it with the
dark current exposure. Cosmic rays and other detector
effects are added. 

The output can be either this linearized exposure, or
a 'raw' exposure where the linearized exposure is 
"unlinearized" and the superbias and 
reference pixel signals are added, or the user can 
request both outputs. This is controlled from
within the yaml parameter file.

In [None]:
obs = obs_generator.Observation()
obs.linDark = d.prepDark
obs.seed = cat.seedimage
obs.segmap = cat.seed_segmap
obs.seedheader = cat.seedinfo
obs.paramfile = yamlfile
obs.create()

### Examine the final output image
Again, we will look at the last group minus the first group

In [None]:
obs.linear_output

In [None]:
with fits.open(obs.linear_output) as h:
    lindata = h[1].data
    header = h[0].header

In [None]:
exptime = header['EFFINTTM']
diffdata = (lindata[0,-1,:,:] - lindata[0,0,:,:]) / exptime
show(diffdata,'Simulated Data',min=0,max=50)

In [None]:
# Show on a log scale, to bring out the presence of the dark current.
# Noise in the CDS image makes for a lot of pixels with values < 0,
# which makes this kind of an ugly image. Add an offset so that
# everything is positive and the noise is visible
offset = 2.
plt.figure(figsize=(12,12))
plt.imshow(np.log10(diffdata+offset),clim=(0.001,np.log10(50)), origin='lower')
plt.title('Simulated Data')
plt.colorbar().set_label('DN$^{-}$/s')

---
<a id='mult_sims'></a>
## Simulating Multiple Exposures

Each yaml file will simulate an exposure for a single pointing using a single detector. Here, let's simulate the data from all 5 B module detectors for a single pointing. To save time, we'll create only the stamp images.

In [None]:
first_pointing_yamls = ['./yaml_files/jw98765001001_01101_00001_nrcb1.yaml',
                        './yaml_files/jw98765001001_01101_00001_nrcb2.yaml',
                        './yaml_files/jw98765001001_01101_00001_nrcb3.yaml',
                        './yaml_files/jw98765001001_01101_00001_nrcb4.yaml',
                        './yaml_files/jw98765001001_01101_00001_nrcb5.yaml']

In [None]:
# This cell will take a minute or two to run
for yfile in first_pointing_yamls:
    cat = catalog_seed_image.Catalog_seed()
    cat.paramfile = yfile
    cat.make_seed()

-------
If you want to skip running the cell above, but still want to display the results, you can download the seed images in the cells below.

In [None]:
box_links = ['https://stsci.box.com/shared/static/r3kw8jr3l7swllspu4myqg9f4eqmvwjy.fits',
             'https://stsci.box.com/shared/static/evoq9pyu709ssoljt6my3qvu4u7eagwi.fits',
             'https://stsci.box.com/shared/static/t4ma9r2y4thzq4jstk9z99pfu0cpgwy2.fits',
             'https://stsci.box.com/shared/static/m4t472dj9uxgtna1jd551o7a86laurwt.fits',
             'https://stsci.box.com/shared/static/hkpegevuggmz42zhj2x89s6cu93xihto.fits']
deepfield_filenames = ['jw98765001001_01101_00001_nrcb1_uncal_F150W_CLEAR_final_seed_image.fits',
                       'jw98765001001_01101_00001_nrcb2_uncal_F150W_CLEAR_final_seed_image.fits',
                       'jw98765001001_01101_00001_nrcb3_uncal_F150W_CLEAR_final_seed_image.fits',
                       'jw98765001001_01101_00001_nrcb4_uncal_F150W_CLEAR_final_seed_image.fits',
                       'jw98765001001_01101_00001_nrcb5_uncal_F444W_CLEAR_final_seed_image.fits']

In [None]:
for link, filename in zip(box_links, deepfield_filenames):
    urllib.request.urlretrieve(link, filename)

------
Let's arrange the seed images following the NIRCam detector layout, and view them all together. Note that seed images do not contain WCS information, so they cannot be arranged by WCS (e.g. in ds9).

In [None]:
seed_img_files = []
for yfile in first_pointing_yamls:
    changedir = yfile.replace('./yaml_files', './sim_data')
    detector = changedir.split('_')[-1]
    if 'b5' in detector:
        filtername = 'F444W'
    else:
        filtername = 'F150W'
    seedfile = changedir.replace('.yaml', '_uncal_{}_CLEAR_final_seed_image.fits'.format(filtername))
    seed_img_files.append(seedfile)
seed_img_files

In [None]:
# Read in all of the seed images
b1 = fits.getdata(seed_img_files[0])
b2 = fits.getdata(seed_img_files[1])
b3 = fits.getdata(seed_img_files[2])
b4 = fits.getdata(seed_img_files[3])
b5 = fits.getdata(seed_img_files[4])

In [None]:
def display_sw(one, two, three, four, min=0, max=1000):
    fig, ax = plt.subplots(2, 2, sharex=True, sharey=True, figsize=(10, 10))
    ax[0, 0].imshow(three, clim=(min, max), origin='lower')
    ax[0, 1].imshow(one, clim=(min, max), origin='lower')
    ax[1, 0].imshow(four, clim=(min, max), origin='lower')
    ax[1, 1].imshow(two, clim=(min, max), origin='lower')
    # Hide x labels and tick labels for top plots and y ticks for right plots.
    for a in ax.flat:
        a.label_outer()

In [None]:
display_sw(b1, b2, b3, b4, min=0, max=1)

In [None]:
show(b5, 'LW Channel', max=10)

<i>Back to the [Table of contents](#toc)</i>

-----
<a id='deep_field'></a>
# Simulate deep field exposure

In this section, we create a simluation of a deep-field-like exposure, using the background galaxy catalog produced from the `background_galaxy()` function. Creating and adding galaxies to the seed image takes longer than adding point sources, so this cell will take longer than the simulations above.

In [None]:
deep_field_yaml = './yaml_files/jw98765002001_01101_00001_nrcb5.yaml'

In [None]:
# NOTE: this cell may take a few minutes to run
# If you don't want to wait for it, you can download the
# resulting seed image using the cell below.

deepfield_sim = imaging_simulator.ImgSim()
deepfield_sim.paramfile = deep_field_yaml
deepfield_sim.create()

-----
If you skipped running the imaging simulator above, you can download the seed image for the deep field observation here

In [None]:
box_link = 'https://stsci.box.com/shared/static/6z1yu5pju1fcbfld4u0e6nf34io2sr7g.fits'
deepfield_file = 'jw98765002001_01101_00001_nrcb5_uncal_F444W_CLEAR_final_seed_image.fits'

In [None]:
urllib.request.urlretrieve(box_link, deepfield_file)

In [None]:
deepfield_sim = catalog_seed_image.Catalog_seed()
deepfield_sim.seedimage = fits.getdata(deepfield_file)

-----

In [None]:
show(deepfield_sim.seedimage,'Seed Image', max=2)

In [None]:
show(deepfield_sim.seedimage[0:500, 250:750],'Seed Image', max=2)

<i>Back to the [Table of contents](#toc)</i>

<a id="calibrate_data"></a>
# 10. Calibrate the data

The "raw" outputs from Mirage are equivalent to Level-1b data that will be returned from JWST. These files contain the suffix "_uncal.fits". 

You can now proceed with calibration using the JWST data calibration pipeline, just as with real data. For imaging mode data such as those produced here, a previous [JWEbbinar contains notebooks](https://github.com/spacetelescope/jwebbinar_prep/tree/main/imaging_mode) showing how to run the calibration pipeline.

[Pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/) is also available through readthedocs.

<i>Back to the [Table of contents](#toc)</i>