<a id="title_ID"></a>
# JWST Pipeline Validation Notebook: calwebb_image2, background subtraction for MIRI imaging with flight data

<span style="color:red"> **Instruments Affected**</span>: e.g., FGS, MIRI, NIRCam, NIRISS 

### Table of Contents


<div style="text-align: left"> 
    
<br> [Introduction\*](#intro)
<br> [JWST CalWG Algorithm\*](#algorithm)
<br> [Defining Terms](#terms)
<br> [Test Description\*](#description)
<br> [Data Description\*](#data_descr)
<br> [Imports\*](#imports)
<br> [Loading the Data\*](#data_load)
<br> [Run the Pipeline](#pipeline)
<br> [Passing criteria](#testing) 
<br> [About This Notebook\*](#about)
<br>    

</div>

<a id="intro"></a>
# Introduction


This is the validation notebook for the background subtraction step as part of calwebb_image2. This step takes in a set of images and a set of background observations. If more than one background observation is given, they are combined into a sigma clipped mean before being subtracted from each of the science data images. For more information on the pipeline step visit the links below. 

Step description: https://jwst-pipeline.readthedocs.io/en/latest/jwst/background_step/description.html

Pipeline code: https://github.com/spacetelescope/jwst/tree/master/jwst/background

[Top of Page](#title_ID)

<a id="algorithm"></a>
# JWST CalWG Algorithm

The page describing the algorithm and any details can be found here:

https://outerspace.stsci.edu/display/JWSTCC/Vanilla+Imaging+Background+Subtraction


[Top of Page](#title_ID)

<a id="terms"></a>
# Defining Terms

Here are some common terms that will be used throughout the notebook

> JWST: James Webb Space Telescope

> MIRI: Mid-Infrared Instrument



[Top of Page](#title_ID)

<a id="description"></a>
# Test Description

This test is performed by using a set of data with multiple bright point sources and dust located around SN 2021axdf . There is a set of 4 images at four dithered positions in the F2550W filter. This test also takes in a set of four background images, at four dithered positions, with the same filter. All images will be processed through calwebb_detector1, and put into an association file to be run through calwebb_image2. This will tell the background step which are the science observations and which are the background observations so that it will do a sigma clipped mean of the background exposures, then subtract the mean background image from each of the science observations.

The notebook shows the images (background, science, averaged background, and background subtracted) through the course of the notebook to demonstrate how well the algorithm works. The notebook then takes the rate images through calwebb_image2 and calwebb_image3 in order to look at the combined F2550W image that includes all four dithered positions and the background subtraction in image2.

For observations with the F2550W filter, it is recommended that background observations also be taken, in order to subtract off the high background that is seen with this filter. This notebook tests that subtraction to ensure that the user can get reasonable data for this filter after the background subtraction.

[Top of Page](#title_ID)

<a id="data_descr"></a>
# Data Description


The set of data used in this particular test were taken with the F2550W filter of SN 2021axdf. There is a set of 4 science images at four dithered positions. This test also takes in a set of four background images, at four dithered positions, in the same filter. They were taken as part of Proposal 2754 - Unique Constraints on Early Dust Growth in Core-Collapse Supernovae. This is a DD proposal to study dust constraints for CCSNe before 500 days post-explosion.

[Top of Page](#title_ID)

<a id="tempdir"></a>
# Set up Temporary Directory
The following cell sets up a temporary directory (using python's `tempfile.TemporaryDirectory()`), and changes the script's active directory into that directory (using python's `os.chdir()`). This is so that, when the notebook is run through, it will download files to (and create output files in) the temporary directory rather than in the notebook's directory. This makes cleanup significantly easier (since all output files are deleted when the notebook is shut down), and also means that different notebooks in the same directory won't interfere with each other when run by the automated webpage generation process.

If you want the notebook to generate output in the notebook's directory, simply don't run this cell.

If you have a file (or files) that are kept in the notebook's directory, and that the notebook needs to use while running, you can copy that file into the directory (the code to do so is present below, but commented out).

[Top of Page](#title_ID)

In [None]:
#****
#
# Set this variable to False to not use the temporary directory
#
#****
use_tempdir = True

# Create a temporary directory to hold notebook output, and change the working directory to that directory.
from tempfile import TemporaryDirectory
import os
import shutil

if use_tempdir:
    data_dir = TemporaryDirectory()

    # Save original directory
    orig_dir = os.getcwd()

    # Move to new directory
    os.chdir(data_dir.name)

# For info, print out where the script is running
print("Running in {}".format(os.getcwd()))

## If Desired, set up CRDS to use a local cache

By default, the notebook template environment sets up its CRDS cache (the "CRDS_PATH" environment variable) in /grp/crds/cache. However, if the notebook is running on a local machine without a fast and reliable connection to central storage, it makes more sense to put the CRDS cache locally. Currently, the cell below offers several options, and will check the supplied boolean variables one at a time until one matches.

* if `use_local_crds_cache` is False, then the CRDS cache will be kept in /grp/crds/cache
* if `use_local_crds_cache` is True, the CRDS cache will be kept locally
  * if `crds_cache_tempdir` is True, the CRDS cache will be kept in the temporary directory
  * if `crds_cache_notebook_dir` is True, the CRDS cache will be kept in the same directory as the notebook.
  * if `crds_cache_home` is True, the CRDS cache will be kept in $HOME/crds/cache
  * if `crds_cache_custom_dir` is True, the CRDS cache will be kept in whatever is stored in the 
    `crds_cache_dir_name` variable.

If the above cell (creating a temporary directory) is not run, then setting `crds_cache_tempdir` to True will store the CRDS cache in the notebook's directory (the same as setting `crds_cache_notebook_dir` to True).

In [None]:
import os

# Choose CRDS cache location
use_local_crds_cache = True
crds_cache_tempdir = False
crds_cache_notebook_dir = False
crds_cache_home = False
crds_cache_custom_dir = False
crds_cache_dir_name = ""

if use_local_crds_cache:
    if crds_cache_tempdir:
        os.environ['CRDS_PATH'] = os.path.join(os.getcwd(), "crds")
    elif crds_cache_notebook_dir:
        try:
            os.environ['CRDS_PATH'] = os.path.join(orig_dir, "crds")
        except Exception as e:
            os.environ['CRDS_PATH'] = os.path.join(os.getcwd(), "crds")
    elif crds_cache_home:
        os.environ['CRDS_PATH'] = os.path.join(os.environ['HOME'], 'crds', 'cache')
    elif crds_cache_custom_dir:
        os.environ['CRDS_PATH'] = crds_cache_dir_name

<a id="imports"></a>
# Imports
List the package imports and why they are relevant to this notebook.


* astropy.io for opening fits files
* inspect to get the docstring of our objects.
* IPython.display for printing markdown output
* jwst.datamodels for building model for JWST Pipeline
* jwst.module.PipelineStep is the pipeline step being tested
* matplotlib.pyplot.plt to generate plot


[Top of Page](#title_ID)

In [None]:

from astropy.io import fits
import glob

from jwst.datamodels import RampModel, ImageModel
from jwst.pipeline import Detector1Pipeline, Image2Pipeline, calwebb_image3
from jwst import associations
from jwst.associations.lib.rules_level2_base import DMSLevel2bBase, DMS_Level3_Base
from jwst.associations import asn_from_list
from jwst.background import BackgroundStep

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

<a id="data_load"></a>
# Loading the Data

Download data from  Artifactory or Box to use in the notebook.

[Top of Page](#title_ID)

### Look at rate images

Display the rate science images and background images to see locations of sources.

In [None]:
from astropy.utils.data import download_file
from pathlib import Path
from shutil import move
from os.path import splitext

def get_box_files(file_list):
    for box_url,file_name in file_list:
        if 'https' not in box_url:
            box_url = 'https://stsci.box.com/shared/static/' + box_url
        downloaded_file = download_file(box_url)
        if Path(file_name).suffix == '':
            ext = splitext(box_url)[1]
            file_name += ext
        move(downloaded_file, file_name)


file_urls = ['https://stsci.box.com/shared/static/j2f1klymtj8yfy3ryqsficy2js4q68xc.fits', 
             'https://stsci.box.com/shared/static/64xsicjfdqnp2y70htwkqfpbp6hn664r.fits',
             'https://stsci.box.com/shared/static/hnmt7i6qomluigg6gxwlf4zvifg98uno.fits',
             'https://stsci.box.com/shared/static/5m6n24qjv2xzyn87dxufwuidful0f805.fits',
             'https://stsci.box.com/shared/static/4x1z2t0ji9je2o5qvisojl3m3nqi8vnl.fits',
             'https://stsci.box.com/shared/static/p5co8n3pjjq7ecv45klaekkryoii3gcv.fits',
             'https://stsci.box.com/shared/static/0czufensqnvz1m05o4jknt9dh7pv6g3t.fits',
             'https://stsci.box.com/shared/static/z00chmebwzw8us8e7q0un4sj6y1yy5p3.fits']
             

files = ['jw02754001001_06101_00001_mirimage_rate.fits', 
         'jw02754001001_06101_00002_mirimage_rate.fits', 
         'jw02754001001_06101_00003_mirimage_rate.fits',
         'jw02754001001_06101_00004_mirimage_rate.fits',
         'jw02754002001_02101_00001_mirimage_rate.fits',
         'jw02754002001_02101_00002_mirimage_rate.fits',
         'jw02754002001_02101_00003_mirimage_rate.fits',
         'jw02754002001_02101_00004_mirimage_rate.fits']
         

box_download_list = [(url,name) for url,name in zip(file_urls,files)]


get_box_files(box_download_list)


print(files)

In [None]:
scislopelist = ['jw02754001001_06101_00001_mirimage_rate.fits', 
         'jw02754001001_06101_00002_mirimage_rate.fits', 
         'jw02754001001_06101_00003_mirimage_rate.fits',
         'jw02754001001_06101_00004_mirimage_rate.fits']

bkgslopelist = ['jw02754002001_02101_00001_mirimage_rate.fits',
         'jw02754002001_02101_00002_mirimage_rate.fits',
         'jw02754002001_02101_00003_mirimage_rate.fits',
         'jw02754002001_02101_00004_mirimage_rate.fits']

In [None]:
# Look at science images

for image in scislopelist:
    im = ImageModel(image)

    plt.figure(figsize=(20,20))
    plt.imshow(im.data, cmap='rainbow', origin='lower', vmin=800,vmax=1200)
    plt.colorbar()
    plt.show()
    print('background region values', im.data[600, 600])

In [None]:
# Look at background images
# Pixel to examine that should have a source in at least one background image
xval = 850
yval = 875

for backimage in bkgslopelist:
    bkgim = ImageModel(backimage)
    print(bkgim.meta.filename)
    selectedstar = bkgim.data[yval, xval] # Choose a pixel that is on a source in at least one image

    plt.figure(figsize=(20,20))
    plt.imshow(bkgim.data, cmap='rainbow', origin='lower', vmin=800,vmax=1200)
    plt.colorbar()
    plt.show()

    print('brightness of selected pixel', selectedstar, '\n')

### Create a Level2 association file of the science and background exposures

In [None]:
# Create an association file of all of the rate files

#asn_files = [scislopelist[0].meta.filename, scislopelist[1].meta.filename, scislopelist[2].meta.filename,
#            scislopelist[3].meta.filename]
#bgr_files = [bkgslopelist[0].meta.filename, bkgslopelist[1].meta.filename, bkgslopelist[2].meta.filename,
#            bkgslopelist[3].meta.filename]
asn_files = scislopelist
bgr_files = bkgslopelist

asn = asn_from_list.asn_from_list(asn_files, rule=DMSLevel2bBase, meta={'program':'test', 'target':'randomfield', 'asn_pool':'test'})

# now add the opposite nod as background exposure:
for product in asn['products']:
    product['members'].append({'expname':bgr_files[0], 'exptype':'background'})
    product['members'].append({'expname':bgr_files[1], 'exptype':'background'})
    product['members'].append({'expname':bgr_files[2], 'exptype':'background'})
    product['members'].append({'expname':bgr_files[3], 'exptype':'background'})
    
# write this out to a json file
with open('imager_bkgsubtest_asn.json', 'w') as fp:
    fp.write(asn.dump()[1])

## Run association file through background subtraction step of calwebb_image2

The default value of sigma for the background subtract step is set to 3, but may need to be adjusted downward to 2 or 1 in order to actually sigma clip the sources in the images. Test this for your data. For this particular data set, 2 is sufficient, but for brighter sources, 1 may be the best option.

In [None]:
pipe2 = Image2Pipeline()
# Set pipeline parameters
pipe2.save_results = True
pipe2.bkg_subtract.sigma = 2  # Set this in order to catch the outliers and leave only background
pipe2.bkg_subtract.maxiters = 3
pipe2.bkg_subtract.save_combined_background = True 

pipe2.assign_wcs.skip = True
pipe2.flat_field.skip = True
pipe2.photom.skip = True
pipe2.resample.skip = True
pipe2.save_bsub = True


pipe2.run('imager_bkgsubtest_asn.json')


### Look at averaged background image

See how well the sigma clipping did at removing the sources from the background image. If the sources in the background image are bright, the value of sigma should be set to 1. If the sources are faint enough, the default value of 3 should be good enough. 

Also look at the value of a specific pixel in the averaged image, one that has a source in at least one of the background images, to see if the flux was adequately removed in the sigma clipping process.

In [None]:
# This can only be uncommented in builds after 7.7.1
averaged_backgrounds = glob.glob('*combinedbackground.fits')
print(averaged_backgrounds)

avgbkg = ImageModel(averaged_backgrounds[0])
selectedavgstar = avgbkg.data[yval, xval]  # Choose a pixel location that contains a source in at least one background image

plt.figure(figsize=(20,20))
plt.imshow(avgbkg.data, cmap='rainbow', origin='lower', vmin=800,vmax=1200)
plt.colorbar()
plt.show()

print('brightness of selected pixel', selectedavgstar)

Look at the values of pixels in the background images at the location of one of the stars to see whether the star flux is being rejected as part of the sigma clipping. In calwebb_image2, the sigma value was set to 2, which is lower than the default value of 3. This allows the most pixels to be rejected as outliers, and should leave only the background values in the final averaged image.

In [None]:
### Check averaging of background images

print(bkgslopelist,'\n')
im1 = ImageModel(bkgslopelist[0])
im2 = ImageModel(bkgslopelist[1])
im3 = ImageModel(bkgslopelist[2])
im4 = ImageModel(bkgslopelist[3])

print('Value in image1 ', im1.data[yval, xval])
print('Value in image2 ', im2.data[yval, xval])
print('Value in image3 ', im3.data[yval, xval])
print('Value in image4 ', im4.data[yval, xval],'\n')

avgvalue = (im1.data[yval, xval]+ im2.data[yval, xval] + im3.data[yval, xval] + im4.data[yval, xval])/4
print('Averaged value = ', avgvalue)
print()

print('Brightness of selected pixel in averaged image', selectedavgstar)

### Look at background subtracted data


In [None]:
# Look at background image
subtracted_images = glob.glob('*bsub.fits')

for bkgsubimage in subtracted_images:
    bkgsub = ImageModel(bkgsubimage)

    plt.figure(figsize=(20,20))
    plt.imshow(bkgsub.data, cmap='rainbow', origin='lower', vmin=-10,vmax=5)
    plt.colorbar()
    plt.show()
    print('background region values', bkgsub.data[850, 500])
    try:
        np.testing.assert_allclose(bkgsub.data[850, 500], 0.01, atol=0.8)
    except:
        print('Subtracted background value is not near zero')

<a id="testing"></a>
# Passing criteria

Examine the images shown and the pixel values reported through the notebook. If the averaged background image is subtracted from the science images (subtracted background values nearer 0), and the averaged background image shows a smooth background with the sources removed, then this test passes. The four background images should be averaged together after the sources were rejected via sigma clipping. Check that the subtracted background values are near 0.

[Top of Page](#title_ID)

### Notebook extension: Look at combined image

Create combined image through calwebb_image2 and calwebb_image3 to see what combined background subtracted image looks like

In [None]:
pipe2 = Image2Pipeline()
# Set pipeline parameters
pipe2.save_results = True
pipe2.bkg_subtract.sigma = 2  # Set this in order to catch the outliers and leave only background
pipe2.bkg_subtract.maxiters = 3
pipe2.bkg_subtract.save_combined_background = True 

pipe2.save_bsub = True

pipe2.run('imager_bkgsubtest_asn.json')

In [None]:
# use asn_from_list to create association table

calfiles = glob.glob('*_cal.fits')
asn = asn_from_list.asn_from_list(calfiles, rule=DMS_Level3_Base, product_name='prop2754_bkgsub_combined.fits')

# dump association table to a .json file for use in image3
with open('prop2754_bkgsub_combined.json', 'w') as fp:
    fp.write(asn.dump()[1])

print(asn) 

### Set up options for skymatch that will find and subtract background sky levels to leave background in final combined image near zero.
Matching method that subtracts background sky to use: 'global+match'

The default method, 'match' does not subtract background to near zero, which in this case, would leave a negative background value after the background subtraction done in calwebb_image2.


In [None]:
# Run Image3 to combine images

# Put in parameters needed to give better source finding results

# set any specific parameters
# tweakreg parameters to allow data to run
fwhm = 7.312  # Gaussian kernel FWHM of objects expected, default=2.5
minobj = 5  # minimum number of objects needed to match positions for a good fit, default=15
snr = 100 # signal to noise threshold, default=5
sigma = 3 # clipping limit, in sigma units, used when performing fit, default=3
fit_geom ='shift' # ftype of affine transformation to be considered when fitting catalogs, default='general'
use2dhist = False  # boolean indicating whether to use 2D histogram to find initial offset, default=True
matchmeth = 'global+match'
matchdown = True
matchsub = False

pipe3 = calwebb_image3.Image3Pipeline()    
pipe3.tweakreg.kernel_fwhm = fwhm
pipe3.tweakreg.snr_threshold = snr
pipe3.tweakreg.minobj = minobj
pipe3.tweakreg.sigma = sigma
pipe3.tweakreg.fitgeometry = fit_geom
pipe3.tweakreg.use2dhist = use2dhist
pipe3.source_catalog.kernel_fwhm = fwhm
pipe3.source_catalog.snr_threshold = snr
pipe3.skymatch.skymethod = matchmeth
pipe3.skymatch.match_down = matchdown
pipe3.skymatch.subtract = matchsub
pipe3.skymatch.save_results = True
pipe3.outlier_detection.save_results = True
pipe3.resample.save_results = True
pipe3.source_catalog.save_results = True
pipe3.save_results = True

pipe3.run('prop2754_bkgsub_combined.json')

In [None]:
# Read in i2d combined Image
im_i2d = ImageModel('prop2754_bkgsub_combined_i2d.fits') 

#### Sky match methods 
    Method               
    local                          
    global                          
    match
    global+match
    
The value found by skymatch is the background level calculated for the overall sky value. If skymatch.subtract is True, the subtraction will be done in the skymatch step. If subtract= False, the subtraction is done in the resample step. 

If the match option is used, the sky values are normalized to either the lowest or highest value (set by match_down) and delta levels are subtracted for each image.

In [None]:
# Get the level of background calculated in the sky_match step
print('Skymatch method used :',im_i2d.meta.background.method)
print('Sky level calculated in skymatch step and subtracted from cal images while being combined.')
print(im_i2d.meta.background.level)

In [None]:
plt.figure(figsize=(20,20))
#plt.imshow(viz2(im_i2d.data), origin='lower')
plt.imshow(im_i2d.data, origin='lower', cmap='rainbow', vmin=-1, vmax=5)
plt.colorbar()


<a id="about_ID"></a>
## About this Notebook
**Author:** M Cracraft, Principal Staff Scientist, INS/MIRI branch
<br>**Updated On:** 01/13/23

[Top of Page](#title_ID)
<img style="float: right;" src="./stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="stsci_pri_combo_mark_horizonal_white_bkgd" width="200px"/> 