# Synthetic image creation for MOSviz pipeline data

**Motiviation**: The synthetic dataset we currently possess from the JWST data pipeline team for use in MOSviz contains simulated spectra from NIRSpec but no associated NIRCam photometry. We would like to have test imagery to display in MOSviz alongside these 2D and 1D spectra. We are unsure whether the pipeline team has plans to produce any.

**Goal**: Populate an image of background noise with properly-scaled galaxy cutouts sourced from a Hubble Space Telescope image and placed at their analogous locations in the new image. These galaxies' real spectra do not necessarily correspond with those in our dataset, but we care more about the veneer of having photometry to match with our spectra at this point.

**Execution**: We pull our galaxy cutouts and catalog information from the Hubble Deep Field image. ~~ASTRODEEP's image of the [Abell 2774 Parallel](http://astrodeep.u-strasbg.fr/ff/?img=JH140?cm=grayscale) | [MACS J0416.1-2403 Parallel](http://astrodeep.u-strasbg.fr/ff/?ffid=FF_M0416PAR&id=1264&cm=grayscale)~~. We sought to use [Artifactory](https://bytesalad.stsci.edu/ui/repos/tree/General/jwst-pipeline%2Fdev%2Ftruth) to obtain a range of RA/Dec over which to project our synthetic image, but that information was absent. Instead, we place the image over a manually chosen RA/Dec range and place the cutouts in randomly selected locations within that field of view.

**Issues**:
- We wanted to scrape RA/Dec information the data pipeline products to get a range of coordinates over which to scale our synthetic image, but it appears that the pipeline's data products lack `"TARG_RA"` or `"TARG_DEC"` keywords in their headers.
    - The data products also don't appear to have WCS information. We don't strictly need it to achieve this notebook's goals, but it would be convenient to have.
- There does not appear to be an observation with level 2 data in `jwst-pipeline/truth/test_nirspec_mos_spec2` and level 3 data in `jwst-pipeline/truth/test_nirspec_mos_spec3`. All observations are either level 2 only or level 3 only.
- _(Resolved)_ A good number of the cutouts from the first couple of field images we tested had intrustions from other galaxies due to crowding. We settled on the Hubble Deep Field as a good source image, but had we not, we may have considered using galaxies modeled with Sersic profiles to get cleaner cutouts to inject into our synthetic image.

### Import packages

- We use `astropy.io.fits` to read in existing FITS files and write a new one with the synthetic image.
- The objects from `astropy.nddata` help with creating cutouts once we've identified galaxies we'd like to take from the field image.
- The methods from `astropy.stats` work with image data that's clipped to within a certain number of deviations from the mean.
- The objects from `astropy.table` help with reading an modifiying tabular data.
- `astropy.wcs.WCS` creates a World Coordinate System object that's useful for transforming on-sky data from one patch to another.
- `glob.glob` lists local files that match a given pattern.
- We use `matplotlib.pyplot` to preview the field image, the cutouts, and various stages of our synthetic image.
- We use `numpy` to facilitate several specialized mathematical and array-related operations.

In [None]:
from astropy.io import fits
from astropy.nddata import block_reduce, Cutout2D
from astropy.stats import sigma_clipped_stats, sigma_clip
from astropy.table import Table, join
from astropy.wcs import WCS
from glob import glob

import matplotlib.pyplot as plt
import numpy as np

### Generate galaxy cutouts

The galaxy cutouts come from the Hubble Deep Field image. To retrieve them, we download the image itself and its associated catalog data, search the catalog for the brightest galaxies, then find those galaxies in the image.

In [None]:
# download the image containing sources to be cut out later
image_fits = fits.open('https://archive.stsci.edu/pub/hlsp/hdf/v2/mosaics/x4096/f814_mosaic_v2.fits')
image_header = image_fits[0].header
image_data = image_fits[0].data

image_data.shape

In [None]:
# download sources' location and flux information
source_info1 = Table.read('https://archive.stsci.edu/pub/hlsp/hdf/wfpc_hdfn_v2catalog/HDFN_wfpc_v2generic.cat',
                          format='ascii')
source_info2 = Table.read('https://archive.stsci.edu/pub/hlsp/hdf/wfpc_hdfn_v2catalog/HDFN_f814_v2.cat',
                          format='ascii')
sources = join(source_info1, source_info2)

# confirm that both tables contain the same objects in the same order (True)
( (source_info1['NUMBER'] == source_info2['NUMBER']).sum()
 == len(source_info1)
 == len(source_info2) )

In [None]:
# sort sources by flux within 71.1 pixel diameter of source, or aperture 11
sources.sort('FLUX_APER_11', reverse=True)

# filter out likely stars and sources with negative flux
sources = sources[(sources['CLASS_STAR'] < .5)
                  & (sources['FLUX_APER_8'] > 0)]

In [None]:
sources[:5]

In [None]:
# convert the sources' WCS locations to in-image pixel values
image_wcs = WCS(image_fits[0].header)
sources_x, sources_y = image_wcs.world_to_pixel_values(sources['ALPHA_J2000'],
                                                       sources['DELTA_J2000'])

Note: Depending on the value of `catalog_size`, the following cell can produce a lot of output. Right-click the cell and select "Enable Scrolling for Outputs" to expand it or "Disable Scrolling for Outputs" to condense it.

In [None]:
# save a list of good cutouts for later use
cutout_list = []
first_source = 0
catalog_size = 20
downsample_factor = 2
patch_length = 100

for x, y in list(zip(sources_x, sources_y))[first_source:]:
    # use pixel locations to cut a source from the image
    cutout = Cutout2D(image_data, (x, y),
                      patch_length * downsample_factor).data
    
    # bin by downsample_factor to increase field of view
    cutout = block_reduce(cutout, downsample_factor)
    
    # skip any cutouts that extend past the image border
    if (  np.all(cutout[-1] <= 0) or np.all(cutout[0] <= 0)
          or np.all(cutout[:,-1] <= 0) or np.all(cutout[:,0] <= 0)  ):
        continue
        
    # save and plot the new cutout
    cutout_list.append(cutout)
    
    plt.imshow(cutout, vmin=-1e-5, vmax=image_data.std(),
               origin='lower', cmap='bone')
    plt.show()
    
    if len(cutout_list) == catalog_size:
        break

We also save image statistics calculated from pixels within a chosen number of standard deviations from the image's mean intensity. Some of them may be useful in creating the synthetic image later on.

In [None]:
clipped_mean, clipped_median, clipped_stddev = sigma_clipped_stats(image_data,
                                                                   sigma=3.)

### Extract destination RA/Dec from spectra files

Note that the following cells only run if on a connection with access to the STScI VPN. They use copies of JWST pipeline files from May 19, 2020.

To run them elsewhere and/or use more current files, visit the [Artifactory](https://bytesalad.stsci.edu/ui/repos/tree/General/jwst-pipeline%2Fdev%2Ftruth%2Ftest_nirspec_mos_spec3), ensure the `jwst-pipeline/dev/truth/test_nirspec_mos_spec3/` folder is selected in the directory tree, and download all files whose names begin with `jw00626-o030` and end with `nirspec_f170lp-g235m_x1d.fits`. (The original version of the notebook uses six of these Level 3 data products.) Then, change `filepath` in the next cell to the location where you saved the downloaded files. Remember the trailing forward slash.

In [None]:
# view level 3 spectra FITS header information
filepath = '/user/jotor/jwst-pipeline-lvl3/'
x1d_header = fits.getheader(filepath + 'jw00626-o030_s00000_nirspec_f170lp-g235m_x1d.fits')
#s2d_header = fits.getheader(filepath + 'jw00626-o030_s00000_nirspec_f170lp-g235m_s2d.fits')

In [None]:
x1d_header['TARG_RA'], x1d_header['TARG_DEC']

In [None]:
WCS(x1d_header)

Notice that this header lacks WCS information. Additionally, examining all of this observation's file headers reveals that they all have the same RA/Dec, which is not what we expect for its different "pointings."

In [None]:
# search for RA/Dec information from Artifactory observation files
x1d_header_list = [fits.getheader(file)
                   for file in glob(filepath + 'jw00626*x1d.fits')]

ras, decs = np.array([[h['TARG_RA'], h['TARG_DEC']] for h in x1d_header_list]).T
ras, decs

Since the headers lack the information we seek, we randomly generate our sources' RA/Dec information in predetermined patch of sky. We take the patch size from the size of NIRSpec Micro-Shutter Assembly (MSA) -- about 3.6'x3.4'-- approximated to a square field of view.

In [None]:
np.random.seed(19)
ras = np.random.uniform(0, 1/15, catalog_size)
decs = np.random.uniform(-1/30, 1/30, catalog_size)

### Create synthetic image

We initialize a `numpy` array and fill it with normally-distributed background noise based on some of the clipped image statistics that were calculated earlier.

In [None]:
synth_img_size = 1000
synth_image = np.zeros((synth_img_size, synth_img_size))

In [None]:
# add noise
synth_image += np.random.normal(loc=clipped_mean, scale=clipped_stddev*8,
                                size=synth_image.shape)

In [None]:
imshow_params = {'cmap': 'bone', 'origin': 'lower'}
plt.imshow(synth_image, **imshow_params)
plt.show()

### Fill out new WCS object for `synth_image`

Creating a `WCS` object for `synth_image` allows it to be mathematically transformed into a projection on the sky. That projection can then be compared to other FITS images with their own `WCS` information.

In [None]:
synth_wcs = WCS(naxis=2)
synth_wcs

The next step is to calculate field of view information for `synth_image`.

In [None]:
# find the range of sources in RA and dec
ra_bounds = np.array([ras.max(), ras.min()])
dec_bounds = np.array([decs.max(), decs.min()])

delta_ra = np.ptp(ras)
delta_dec = np.ptp(decs)

In [None]:
# save the maximum span in coordinates, RA or dec
if delta_ra > delta_dec:
    min_image_fov = abs(delta_ra * np.cos(np.pi / 180 * dec_bounds.sum() / 2))
else:
    min_image_fov = delta_dec
    
min_image_fov

In [None]:
# scale this field of view (FOV) by pixels
pix_scale = min_image_fov / synth_img_size

# add a buffer to the FOV's borders to prevent clipping sources
pix_scale *= 1.5
pix_scale

With those calculations done, the `WCS` object is ready to be filled out.

In [None]:
synth_wcs.wcs.ctype = ['RA---TAN', 'DEC--TAN']

# match value of center pixel of detector to value of FOV's central coordinate in the sky
synth_wcs.wcs.crpix = [synth_img_size / 2, synth_img_size / 2]
synth_wcs.wcs.crval = [ra_bounds.sum() / 2, dec_bounds.sum() / 2]

# distance (in sky coordinates) traversed by one pixel length in each dimension
synth_wcs.wcs.cdelt = [-pix_scale, pix_scale]

synth_wcs

### Populate `synth_image` with the cutouts

Finally, we convert the cutouts' coordinates to pixels and add them into `synth_image` to complete the creation of the mock image.

In [None]:
# convert source RAs/decs from real coordinates to pixels 
ras_pix, decs_pix = np.round(synth_wcs.world_to_pixel_values(ras, decs)).astype(int)
ras_pix, decs_pix

In [None]:
cutout_half_delta = patch_length // 2

for i in range(len(ras_pix)):
    synth_image[ras_pix[i] - cutout_half_delta : ras_pix[i] + cutout_half_delta,
                decs_pix[i] - cutout_half_delta : decs_pix[i] + cutout_half_delta] += cutout_list[i]

In [None]:
fig, ax = plt.subplots(figsize=(10,10))
ax.imshow(synth_image, vmin=0, vmax=synth_image.std()*3, **imshow_params)

plt.show()

Save the image, overwriting the previous file if it already exists in the current directory.

In [None]:
fits.writeto('synthetic_HDF_more.fits', synth_image,
             header=synth_wcs.to_header(), overwrite=True)

<p>
    <span style="line-height: 60px;"> <i> Authors: Robel Geda and O. Justin Otor </i> </span>
    <img style="float: right;/* clear: right; */vertical-align: text-bottom;display: inline-block;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px">
</p>

-----