# Creating simulated data from a mosaic image

This notebook demonstrates how to use Mirage to create simulated data from a distortion-free mosaic image. In this case, we will use a mosaic of the GOODS-S region from the [CANDELS survey](https://archive.stsci.edu/prepds/candels/).

For each observation to be simulated, the appropriate area of the mosaic is extracted from the full mosaic, and is resampled in order to introduce the distortion associated with the JWST instrument to be used. In JWST calibration parlance, "resample" is equivalent to ["blotting" in Drizzlepac terminology](https://drizzlepac.readthedocs.io/en/deployment/ablot.html).

This distorted image is then addded to the simulated data in one of two ways.

1. If you wish to modify the image from the mosaic in any way, such as by adding additional objects or scaling the brightness, then the mosaic image can be added to one of Mirage's "extended" source catalogs, along with additional sources.


2. If you do not wish to modify the cropped mosaic image in any way (other than introducing the appropriate distortion), then the distorted image can be used directly as a seed image, and you only need to run the dark_prep and obs_generation steps of Mirage in order to create the final simulated data.

---
## Table of Contents

* [Imports](#imports)
* [Download Data](#download_data)
* [Use Resampled Image in an Extended Catalog](#use_in_cat)
* [Use Resampled Image as a Seed Image](#use_as_seed)

---
<a id='imports'></a>
## Imports

In [None]:
from astropy.io import fits
import matplotlib.pyplot as plt
import numpy as np
import os
import requests
import yaml

In [None]:
from mirage.catalogs.catalog_generator import ExtendedCatalog
from mirage.catalogs.create_catalog import combine_catalogs
from mirage.dark.dark_prep import DarkPrep
from mirage.ramp_generator.obs_generator import Observation
from mirage.imaging_simulator import ImgSim
from mirage.seed_image.fits_seed_image import ImgSeed
from mirage.yaml import yaml_generator

---
<a id='download_data'></a>
## Download data

In order to run the examples in this notebook, there are several files that need to be downloaded. This includes the mosaic image to use as the basis of this work, as well as a trio of "stamp images", each of a particular object. The stamp images will be added to the simulated data on top of the mosaic image.

The mosaic image is a high level science product from the CANDELS survey. The original data are hosted on [MAST](https://archive.stsci.edu/missions/hlsp/candels/goods-s/gs-tot/v1.0/). For this notebook, we have placed a copy of the file along side the reference files needed for Mirage.

The stamp images contain HST observations of several [3C catalog](https://en.wikipedia.org/wiki/Third_Cambridge_Catalogue_of_Radio_Sources) objects. In this case the original data come from recent [3C observations](https://hz3c.stsci.edu/Observations.html) described by [Hilbert et al.](https://ui.adsabs.harvard.edu/abs/2016ApJS..225...12H/abstract) For each file, the object of interest was extracted from the original data and saved in a separate fits file. These files were then also placed along side the Mirage refernce files.

In [None]:
stamp_dir = 'imaging_example_data'

In [None]:
base_url = ('https://data.science.stsci.edu/redirect/JWST/jwst-simulations/mirage_reference_files'
            '/example_data_for_notebooks/')

In [None]:
input_images = ['3C305.1_stamp.fits', '3C324_stamp.fits', '3C454.1_stamp.fits',
                'hlsp_candels_hst_wfc3_gs-tot-sect23_f160w_v1.0_drz.fits']

In [None]:
def download_file(url, file_name, output_directory='./'):
    """Download file into the specified output directory given the URL
    
    Parameters
    ----------
    url : str
        URL to the file to be downloaded
        
    Returns
    -------
    download_filename : str
        Name of the downloaded file
    """
    download_filename = os.path.join(output_directory, file_name)

    # Only download the file if it doesn't already exist
    if not os.path.isfile(download_filename):
        print('Downloading: {}'.format(file_name))
        with requests.get(url, stream=True) as response:
            if response.status_code != 200:
                raise RuntimeError("Wrong URL - {}".format(url))
            with open(download_filename, 'wb') as f:
                for chunk in response.iter_content(chunk_size=2048):
                    if chunk:
                        f.write(chunk)
        print('Download of {} complete.'.format(file_name))
    else:
        print('{} already exists. Skipping download.'.format(file_name))
    return download_filename

In [None]:
stamp_files = []
for filename in input_images:
    full_url = '{}{}'.format(base_url, filename)
    download_file(full_url, filename, output_directory=stamp_dir)
    stamp_files.append(os.path.join(stamp_dir, filename))

In [None]:
# Remove the mosaic file from the list of stamp images
stamp_files.pop()

In [None]:
# The list should now contain only the 3C files
stamp_files

---
<a id='use_in_cat'></a>
## Use the resampled image in an extended source catalog

In this example, the resampled image is placed into a Mirage catalog of "extended" sources. When the **imaging_simulator** function is run, this catalog is read in and the resampled image is placed into the seed image. The advantage of this method is that other sources can also be added to the seed image. In this example, we also add the three stamp images of the 3C objects to the seed image by adding them in the catalog of exteneded sources.  

### Define the filename of the mosaic image

In [None]:
mosaicfile = os.path.join(stamp_dir, input_images[3])

### Define the xml and pointing files exported from APT

In [None]:
xml_file = 'imaging_example_data/extended_object_test.xml'
pointing_file = xml_file.replace('.xml', '.pointing')

### Run the yaml_generator to create the yaml input files

Define user-inputs to the yaml generator. Note that you can still use a `catalogs` input here to add
point sources or galaxies on top of the mosaic image. The extended source catalog names will be added in a later step. See the Mirage [documentation page on running the yaml_generator](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html) for details on input options.

In [None]:
cr = {'library': 'SUNMAX', 'scale': 1.0}
dates = '2019-5-25'
background = 'low'
pav3 = 0.0

If you wish to add more point sources or galaxies, you can specify those catalogs here.
See the Mirage [yaml_generator documentation](https://mirage-data-simulator.readthedocs.io/en/latest/yaml_generator.html#source-catalogs) for more details

In [None]:
# catalogs = {'MAIN-TARGET': {'point_source': 'ptsrc.cat',
#                              'galaxy': 'galaxies.cat'
#                             }
#             }

Run the yaml generator

In [None]:
yam = yaml_generator.SimInput(xml_file, pointing_file, verbose=True,
                              output_dir='yamls',
                              cosmic_rays=cr,
                              #catalogs=catalogs,
                              background=background, roll_angle=pav3, dates=dates,
                              simdata_output_dir='simdata',
                              datatype='raw')
yam.use_linearized_darks = True
yam.create_inputs()

### Create a template extended source catalog to use 

For convenience: Get a list of all instruments, apertures, and filters used in the APT file.
When creating the extended source catalog below, you will need to add magnitude columns for
all filters that you wish to create simulated data for.

In [None]:
instruments = yam.info['Instrument']
filter_keywords = ['FilterWheel', 'ShortFilter', 'LongFilter', 'Filter']
pupil_keywords = ['PupilWheel', 'ShortPupil', 'LongPupil']
yam.info

nrc_sw_optics = set([(f, p) for f, p in zip(yam.info['ShortFilter'], yam.info['ShortPupil'])])
nrc_lw_optics = set([(f, p) for f, p in zip(yam.info['LongFilter'], yam.info['LongPupil'])])
niriss_optics = set([(f, p) for f, p in zip(yam.info['FilterWheel'], yam.info['PupilWheel'])])
niriss_wfss_optics = set([(f, p) for f, p in zip(yam.info['Filter'], yam.info['PupilWheel'])])

print('NIRCam filters/pupils used in this proposal: ')
print(nrc_sw_optics)
print(nrc_lw_optics)
print('\nNIRISS filters/pupils used in this proposal: ')
print(niriss_optics)
print(niriss_wfss_optics)
print(('\nBe sure to add magnitude columns to the template catalog '
        'for all filters you are going to simulate.\n'))

Create a template extended source catalog containing **sources other than the mosaic image** that you want to add to the seed image. The resampled mosaic will be added to this template later. Note that you must add magnitude values for these other sources in all filters that are used in the proposal. If you do not have other sources to add, you can provide empty lists, or simply use the other strategy, where the [resampled image is used as the seed image](#use_as_seed).

In [None]:
# In this example we have only 2 filters to worry about
filter1 = 'F150W'
filter2 = 'F444W'

See the Mirage [documentation page on catalog generation](https://mirage-data-simulator.readthedocs.io/en/latest/catalog_creation.html) as well as the [Catalog generation notebook](https://github.com/spacetelescope/mirage/blob/master/examples/Catalog_Generation_Tools.ipynb) for details and options when creating catalogs. 

In [None]:
# Place the other stamp images in a diagonal line with 6.2" between each, centered in the B1 detector
# They are also visible in the top right quadrant of the B5 detector
other_stamp_ra = [53.1562247, 53.1545247, 53.1528247]
other_stamp_dec = [-27.8076957, -27.8059957, -27.8042957]
other_stamp_pa = [0., 0., 0.]  # Rotating stamps via pa currently disabled in Mirage
other_stamp_f150w_mags = [19.5, 19.75, 20.]
other_stamp_f444w_mags = [20.5, 20.75, 21.]

# Magnitude values must be strings here because we will be combining them
# with values of 'None' for the resampled image magnitudes
f150w_mags_as_str = [str(element) for element in other_stamp_f150w_mags]
f444w_mags_as_str = [str(element) for element in other_stamp_f444w_mags]

template_extended_catalog_file = 'extended_sources_template.cat'
template_cat = ExtendedCatalog(filenames=stamp_files, ra=other_stamp_ra, dec=other_stamp_dec,
                               position_angle=other_stamp_pa)
template_cat.add_magnitude_column(f150w_mags_as_str, instrument='nircam', filter_name=filter1)
template_cat.add_magnitude_column(f444w_mags_as_str, instrument='nircam', filter_name=filter2)
template_cat.save(template_extended_catalog_file)

For each yaml file, create a resampled image from the mosaic, add this image to the template catalog and save as a yaml-specific catalog. Then enter the catalog name in the yaml file so that when the **imaging_simulator** is run, it will use the correct catalog.

The reason that yaml-specific catalogs are needed is because the astrometric distortion is different for each detector. Therefore we must create a different resampled image for each detector, and for each pointing. 

In [None]:
for yfile in yam.yaml_files:
    
    # Read in the yaml file so that we know RA, Dec, PAV3
    # of the exposure
    with open(yfile) as file_obj:
        params = yaml.safe_load(file_obj)
        
    ra = params['Telescope']['ra']
    dec = params['Telescope']['dec']
    pav3 = params['Telescope']['rotation']

    # Define the output files and directories
    sim_data_dir = params['Output']['directory']
    simulated_filename = params['Output']['file']
    crop_file = simulated_filename.replace('.fits', '_cropped_from_mosaic.fits')
    crop_file = os.path.join(sim_data_dir, crop_file)
    blot_file = simulated_filename.replace('.fits', '_blotted_seed_image.fits')
    
    # Crop from the mosaic and resample for the desired detector/aperture
    seed = ImgSeed(paramfile=yfile, mosaicfile=mosaicfile, cropped_file=crop_file,
                   outdir=sim_data_dir, blotted_file=blot_file)
    seed.crop_and_blot()

    # Now add the resampled file to the extended source catalog template and
    # save as a separate catalog file
    
    # Need to add a magnitude entry for each filter/pupil
    mosaic_f150w_mag = ['None']
    mosaic_f444w_mag = ['None']
    
    # Create the catalog containing only the resampled image
    blotted_image_full_path = os.path.join(sim_data_dir, blot_file)
    extended_catalog_file = simulated_filename.replace('.fits', '_extended_sources.cat')
    ext_cat = ExtendedCatalog(filenames=[blotted_image_full_path], ra=[ra], dec=[dec], position_angle=[pav3])
    ext_cat.add_magnitude_column(mosaic_f150w_mag, instrument='nircam', filter_name=filter1)
    ext_cat.add_magnitude_column(mosaic_f444w_mag, instrument='nircam', filter_name=filter2)

    # Combine the resampled image catalog and the template catalog
    combined_cat = combine_catalogs(ext_cat, template_cat)
    combined_cat.save(extended_catalog_file)

    # Now add this extended source catalog to the yaml file
    params['simSignals']['extended'] = extended_catalog_file

    # Save the updated yaml file
    with open(yfile, 'w') as file_obj:
        dump = yaml.dump(params, default_flow_style=False)
        file_obj.write(dump)

### Create the simulated data

To create the simulated data, call the **imaging_simulator** once for each yaml file.

In [None]:
for yfile in yam.yaml_files[0:1]:
    sim = ImgSim(paramfile=yfile)
    sim.create()

### Examine the data

In [None]:
def show(array, title, min=0, max=1000):
    plt.figure(figsize=(12, 12))
    plt.imshow(array,clim=(min, max), origin='lower')
    plt.title(title)
    plt.colorbar().set_label('DN$^{-}$/s')

First let's look at the seed image, which is noiseless and will make it easier to see the fainter sources

In [None]:
with fits.open('simdata/jw00042001001_01101_00001_nrcb1_uncal_F150W_seed_image.fits') as h:
    seeddata = h[1].data
    seedheader = h[0].header

Note that the three added stamp images are in a diagonal line going up and to the right through the center of the image.

In [None]:
plt.figure(figsize=(12, 12))
plt.imshow(np.log10(seeddata), clim=(np.log10(0.001), np.log10(5)), origin="lower")
plt.title('Seed Image')
plt.colorbar().set_label('DN$^{-}$/s')

Next look at the simulated data. Noise and instrumental effects combined with the short exposure time will make it much tougher to see the fainter sources. Look at the last group minus the first group, in order to remove bias structure and bring out the brighter sources.

In [None]:
with fits.open('simdata/jw00042001001_01101_00001_nrcb1_uncal.fits') as h:
    lindata = h[1].data
    header = h[0].header

In [None]:
exptime = header['EFFINTTM']
diffdata = (1.* lindata[0,-1,:,:] - 1. * lindata[0,0,:,:]) / exptime

In [None]:
# Show on a log scale, to bring out the presence of the dark current
# Noise in the CDS image makes for a lot of pixels with values < 0,
# which makes this kind of an ugly image. Add an offset so that
# everything is positive and the noise is visible
offset = 2.
plt.figure(figsize=(12, 12))
plt.imshow(np.log10(diffdata + offset), clim=(np.log10(2), np.log10(5)), origin="lower")
plt.title('Simulated Data')
plt.colorbar().set_label('DN$^{-}$/s')

---
<a id='use_as_seed'></a>
## Use the resampled image as a seed image

In this example, the resampled image is used directly as the seed image, with no modifications. No extra sources are added. With this method, we use only the `dark_prep` and `obs_generator` steps of Mirage. The cropping and blotting of the mosaic image replaces the call to `catalog_seed_image` that would otherwise be made.

### Define the filename containing the mosaic image

In [None]:
mosaicfile = os.path.join(stamp_dir, input_images[3])

### Define the xml and pointing files exported from APT

In [None]:
xml_file = 'imaging_example_data/extended_object_test.xml'
pointing_file = xml_file.replace('.xml', '.pointing')

### Run the yaml_generator to create the yaml input files

User-inputs to the yaml generator. In this case you cannot add other catalogs to the seed image since the `catalog_seed_image` function will not be called.

In [None]:
# In this case the resampled image will be the seed image. You cannot add other sources
# to the seed image, so no need to specify other catalogs here.
cr = {'library': 'SUNMAX', 'scale': 1.0}
dates = '2019-5-25'
background = 'low'
pav3 = 0.0

Run the yaml generator

In [None]:
yam = yaml_generator.SimInput(xml_file, pointing_file, verbose=True,
                              output_dir='yamls',
                              cosmic_rays=cr,
                              background=background, roll_angle=pav3, dates=dates,
                              simdata_output_dir='simdata',
                              datatype='raw')
yam.use_linearized_darks = True
yam.create_inputs()

### Create the resampled image, then run the `dark_prep` and `obs_generator` steps

Run these steps for each yaml file produced from the APT file.

In [None]:
for yfile in yam.yaml_files:
    
    # Read in the yaml file so that we know RA, Dec, PAV3
    with open(yfile) as file_obj:
        params = yaml.safe_load(file_obj)
        
    # Define output filenames and directories
    sim_data_dir = params['Output']['directory']
    simulated_filename = params['Output']['file']
    crop_file = simulated_filename.replace('.fits', '_cropped_from_mosaic.fits')
    crop_file = os.path.join(sim_data_dir, crop_file)
    blot_file = simulated_filename.replace('.fits', '_blotted_seed_image.fits')
        
    # Crop from the mosaic and then resample the image
    seed = ImgSeed(paramfile=yfile, mosaicfile=mosaicfile, cropped_file=crop_file,
                   outdir=sim_data_dir, blotted_file=blot_file)
    seed.crop_and_blot()
    
    # Run dark_prep
    dark = DarkPrep()
    dark.paramfile = yfile
    dark.prepare()

    # Run the observation generator
    obs = Observation()
    obs.paramfile = yfile    
    obs.seed = seed.seed_image
    obs.segmap = seed.seed_segmap
    obs.seedheader = seed.seedinfo
    obs.linDark = dark.prepDark
    obs.create()

### Examine the data

Look at the seed image, which is simply the cropped and resampled image from the mosaic

In [None]:
with fits.open('simdata/jw00042001001_01101_00001_nrcb1_uncal_blotted_seed_image.fits') as hdulist:
    seed = hdulist[1].data

In [None]:
plt.figure(figsize=(12, 12))
plt.imshow(np.log10(seed+2), clim=(np.log10(2), np.log10(2.5)), origin="lower")
plt.title('Seed Image')
plt.colorbar().set_label('DN$^{-}$/s')

Now look at the simulated data in the raw file. Look at the last group minus the first group in order to make the sources easier to see.

In [None]:
with fits.open('simdata/jw00042001001_01101_00001_nrcb1_uncal.fits') as h:
    lindata = h['SCI'].data
    header = h[0].header

In [None]:
exptime = header['EFFINTTM']
diffdata = (1. * lindata[0,-1,:,:] - 1. * lindata[0,0,:,:]) / exptime

In [None]:
# In this case, instrumental effects such as bias differences in the 4 amplifiers
# 1/f noise dominate, making it hard to see any but the brightest sources.
# Again, this is largely due to the short exposure time of the exposure.
offset = 2.
plt.figure(figsize=(12, 12))
plt.imshow(np.log10(diffdata + offset), clim=(0.001, 0.5), origin='lower')
plt.title('Simulated Raw Data')
plt.colorbar().set_label('DN$^{-}$/s')