<a id="top"></a>
# HSTaXe Cookbook: Spectral Extraction for WFC3/IR 

This notebook contains a step-by-step guide for performing spectral extractions with HSTaXe for G102 (or G141) data from WFC3/IR. <br>
The original source for this notebook is the "cookbook" folder on the [spacetelescope/hstaxe](https://github.com/spacetelescope/hstaxe) GitHub repository. 

***
## Learning Goals
In this tutorial, you will:

- Organize input data
- Run custom background subtraction code
- Set up HSTaXe and prepare data for extraction
- Learn how to handle different types of background subtraction
- Extract 1-D spectra with a simple box extraction

## Table of Contents

[1. Introduction](#intro)<br>
[2. Imports](#import)<br>
[3. Setup](#setup)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.1 Run WFC3 Backsub and Calibrate](#cal)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.2 Verify Matching WCS Information](#wcs)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.3 Drizzling Input Data](#drizzle)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.4 Creating a Catalog with SExtractor](#catalog)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[3.5 Copy Catalog and Rename Mag Column](#copycat)<br>
[4. Running HSTaXe](#axe)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[4.1. Outputs](#out) <br>
[5. Fluxcube Extraction](#fluxcube)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[5.1 Building the Fluxcube](#fcubeprep) <br>
[6. Conclusions](#conclusions)<br>
[7. About this Notebook](#about)<br>
[8. Citations](#cite)<br>

# 1. Introduction <a id="intro"></a>

[HSTaXe](https://hstaxe.readthedocs.io/en/latest/index.html) is a Python package that provides spectral extraction processes for HST data. **Please be aware that running this notebook requires creating a conda environment from the provided `.yaml` file in this notebook's [github repository](https://github.com/spacetelescope/hstaxe/tree/main/cookbooks).** For more details about creating the necessary environment see the notebook's README file.

Below, we show two workflows for spectral extraction using WFC3/IR G102 grism data. The first workflow performs a basic image-by-image box extraction, while the latter uses a flux cube technique. **The example data we use in this notebook (from WFC3 CAL program 16582) are available [here](https://stsci.box.com/s/j2ygj4gaqgzmp0b4xcc1h2rszz6cv9wm).** If you would like to use this notebook with the example data, please download the `example_data` subdirectory from the link above, and store it within the same parent directory as this notebook. Once you have the example data directory, this notebook is intended to run continuously without needing to edit any of the cells. 

In addition to the example data, **this notebook also requires configuration files for HSTaXe, which can be downloaded [here](https://stsci.app.box.com/folder/191816748622).** The `WFC3_IR_conf` directory should be stored in the  same parent directory as this notebook, and later we will be copy them to the `CONF` subdirectory created by HSTaXe.

**When analyzing WFC3/IR grism data, it is strongly advised that you calibrate the raw grism FITS files with the program WFC3 Backsub available [here](https://stsci.app.box.com/folder/198794823506).** In order to run this notebook, please download the entire `WFC3_Backsub` folder from the provided link, which includes `back_sub.py` as well as a folder of reference FITS files called `backsub_data`, and put it in the same parent directory as this notebook.

**If you plan to use your own data with this notebook, please be aware you will be required to create an input source catalog with SExtractor.** Information regarding SExtractor including installation instructions are available [here](https://sextractor.readthedocs.io/en/latest/Installing.html). In addition to installing SExtractor, you must run the software with aXe specific configuration files. **These aXe-SExtractor configuration files can be downloaded [here](https://stsci.box.com/s/3npry36gu7ocfnuxgzwr5syj4i8r7hy8).** Once SExtractor is installed, create a `sextractor` directory in the same parent directory as this notebook, and place configuration files inside. 

# 2. Imports <a id="import"></a>

For this workflow, we will import the following modules:

- *os*, *glob* and *shutil*, for file handling
- *numpy* for array handling
- *astropy.io.fits* for FITS file handling
- *astropy.table.Table* for table functions
- *ginga.util.zscale* for image display scaling
- *matplotlib.pyplot* for plotting
- *stwcs.updatewcs* for matching grism and direct image WCS
- *astrodrizzle* for creating input image mosaics
- *hstaxe.axetasks* for performing the spectral extraction
- *wfc3tools.calwf3* for background subtraction and image reduction

In [None]:
%matplotlib inline

import os
import shutil
import glob
import numpy as np
from astropy.io import fits, ascii
from astropy.table import Table
import matplotlib.pyplot as plt
from ginga.util import zscale
from stwcs import updatewcs
from drizzlepac import astrodrizzle
from hstaxe import axetasks
from wfc3tools import calwf3

## 3. Setup <a id="setup"></a>

We'll start our basic extraction workflow by organizing our input data. A set of example data are available [here](https://stsci.box.com/s/tpbhvrqtbtwod7tr7uijexttoocy4duj) for tutorial purposes. Note that many of the following steps will pre-process data that will only be used for the advanced extraction later in the notebook.

First, we save the working directory for this notebook.

In [None]:
cwd = os.getcwd()
print(f'The current directory is: {cwd}')

Next, we'll create directories for our grism and direct images. **HSTaXe will modify our input images in-place, so it is crucial to retain clean versions of them in another location, which will be copied into these directories.** If running this notebook multiple times, run all the lines in the next cell to clear any existing inputs.

In [1]:
os.chdir(cwd)
if os.path.isdir('g102'):
    shutil.rmtree('g102')
if os.path.isdir('f098m'):
    shutil.rmtree('f098m')
if os.path.isdir('f105w'):
    shutil.rmtree('f105w')
os.mkdir('g102')
os.mkdir('f098m')
os.mkdir('f105w')

NameError: name 'os' is not defined

Now, copy your images to the input directories.

In [None]:
# src = '/path/to/your/grism/images/*raw.fits'
src = 'example_data/g102/*raw.fits'
dst = 'g102/'
for f in glob.glob(src):
    shutil.copy(f, dst)

# src = '/path/to/your/direct/images/*raw.fits'
src = 'example_data/f098m/*raw.fits'
dst = 'f098m/'
for f in glob.glob(src):
    shutil.copy(f, dst)

# src = '/path/to/your/direct/images/*raw.fits'
src = 'example_data/f105w/*raw.fits'
dst = 'f105w/'
for f in glob.glob(src):
    shutil.copy(f, dst)

## 3.1 Run WFC3 Backsub and Calibrate Data <a id="cal"></a>

If you are working with WFC3/IR grism data (G102 and/or G141), we highly advise that you use the [WFC3 Backsub](https://stsci.app.box.com/folder/198794823506) program to process the RAW files into calibrated FLT files (see the [Introduction](intro) for download instructions). The G102 and G141 background sky signal is both variable and made up of multiple components, and the current version of the WFC3 calibration pipeline, `calwf3`, does not have the capability to model and remove this dispersed 2D background.  WFC3 Backsub is designed specifically to assess the level of each of the three components (zodiacal, 1.083 μm HeI emission, and scattered) and remove the signal during calibration. The `back_sub.py` program program still relies on and uses `calwf3` for calibration (e.g. bias correction and dark subtraction), but it employs custom reference files (not available in the Calibration Reference Data System (CRDS)) to measure and remove the multiple sky components before the final "up-the-ramp" fitting occurs in `calwf3`.   


WFC3 Backsub was originally written by [Dr. Norbert Pirzkal](https://www.stsci.edu/stsci-research/research-directory/nor-pirzkal) (at STScI) for his scientific work with the [Faint Infrared Grism Survey](https://ui.adsabs.harvard.edu/abs/2017ApJ...846...84P/abstract). While the code was originally posted on Dr. Pirzkal's personal GitHub repository, we have taken it and updated some of the syntax and procedures to work with the HSTaXe cookbook environment and have it hosted on [STScI's Box](https://stsci.app.box.com/folder/193831769414) now. A description of the three background components and the methods used in WFC3 Backsub can be found in [WFC3 ISR 2020-04](https://www.stsci.edu/files/live/sites/www/files/home/hst/instrumentation/wfc3/documentation/instrument-science-reports-isrs/_documents/2020/WFC3_IR_2020-04.pdf) (Pirzkal & Ryan). For more information on WFC3 IR calibration as well as the IR variable background please see Chapters [3](https://hst-docs.stsci.edu/wfc3dhb/chapter-3-wfc3-data-calibration/3-3-ir-data-calibration-steps) & [7](https://hst-docs.stsci.edu/wfc3dhb/chapter-7-wfc3-ir-sources-of-error/7-10-time-variable-background) of the [WFC3 Data Handbook](https://hst-docs.stsci.edu/wfc3dhb).

The cells below assume you have downloaded the `WFC3_backsub` directory from the provided Box link and have it in the same parent directory as this notebook.<br>
First, we will copy the `back_sub.py` file and the custom reference files over to the `grism_ims` directory.  

In [None]:
# src = '/path/to/your/WFC3_backsub/*.py'
src = 'WFC3_backsub/back_sub.py'
dst = 'g102/'
cl1 = os.system(f"cp {src} {dst}")

# src = '/path/to/your/WFC3_backsub/backsub_data/'
src = 'WFC3_backsub/backsub_data/'
dst = 'g102/backsub_data/'
cl2 = os.system(f"cp -R {src} {dst}")

if cl1 != 0 or cl2 !=0:
    print("backsub code and/or reference file data did not copy correctly")

With the `back_sub.py` program, custom reference files, and raw grism FITS files all in the same directory (`grism_ims`), we can call the code with the few command line arguments that it has: `grism`, `ipppss`, and `grey_flat`.

In [None]:
os.chdir('grism_ims')

# set arguments for command line call
grism = 'G102'
ipppss = 'All'
grey_flat = True

# create command line call and run backsub
cl_input = f"python back_sub.py '*_raw.fits'  --grism={grism} --ipppss={ipppss} --grey_flat={grey_flat}"
cl = os.system(cl_input)
if cl != 0:
    print("Backsub program did not execute properly.")
    print("Be sure `back_sub.py`, `backsub_data` and raw grism files are all in the same directory.")

You should now have calibrated FLT grism files inside of the `grism_ims` directory. With the grism images properly background subtracted and calibrated we can move on to calibrating the direct image exposures. For this, all we need to do is call `calwf3` on the RAW files after passing them through [CRDS `bestref`](https://hst-crds.stsci.edu/static/users_guide/basic_use.html).

Before running CRDS `bestref`, we need to set [CRDS environment variables](https://hst-crds.stsci.edu/docs/cmdline_bestrefs/).  We will point to a subdirectory called `crds_cache/` using the `IREF` environment variable. The `IREF` variable is used for WFC3 reference files and different instruments use other variables, e.g., `JREF` for ACS. You have the option to permanently add these environment variables to your user profile by adding the path in your shell's configuration file. If you are using bash, you would edit the `~/.bash_profile` file with lines such as:
```
export CRDS_PATH="$HOME/crds_cache"
export CRDS_SERVER_URL="https://hst-crds.stsci.edu"
export iref="${CRDS_PATH}/references/hst/iref/"
```
If you have already set up the CRDS environment variables you may skip running the cell below. 

In [None]:
os.environ['CRDS_SERVER_URL'] = 'https://hst-crds.stsci.edu'
os.environ['CRDS_SERVER'] = 'https://hst-crds.stsci.edu'
if 'CRDS_PATH' not in os.environ.keys():
    os.environ['CRDS_PATH'] = os.path.join(os.environ['HOME'],'crds_cache')
if 'iref' not in os.environ.keys():
    os.environ['iref'] = '$HOME/crds_cache/references/hst/iref/'

Now that the CRDS environment variables are properly set, we can run the direct RAW files through `bestref` and then `calwf3`. <br>
If you have never used `bestrefs` before, the next cell may take a few minutes to download necessary reference file mappings.

In [None]:
os.chdir('../f098m')

raws = glob.glob('*raw.fits')
for file in raws:
    cl = f"crds bestrefs --files {file} --sync-references=1 --update-bestrefs"
    os.system(cl)

[calwf3(file) for file in raws]

os.chdir('../f105w')

raws = glob.glob('*raw.fits')
for file in raws:
    cl = f"crds bestrefs --files {file} --sync-references=1 --update-bestrefs"
    os.system(cl)

[calwf3(file) for file in raws]

## 3.2 Verify WCS Information<a id="wcs"></a>

It is possible that the WCS in the direct and grism images differ. In this section we will use a function to process all the direct and grism images to verify that the WCS information is consistent throughout. If there is any disagreement in WCS information we call `updatewcs` with the database keyword set to False, which will roll back all the solutions to the original distortion-corrected WCS. For more information regarding HST WCS and improved absolute astrometry please see [WFC3 Instrument Science Report 2022-06](https://ui.adsabs.harvard.edu/abs/2022wfc..rept....6M/abstract) (Mack et al.). For documentations on `updatewcs` please see [here](https://stwcs.readthedocs.io/en/latest/updatewcs.html).

In [None]:
def check_wcs(images):
    """ A helper function to verify the active world coordinate solutions match and roll them back if they do not 
    
    Parameter
    ---------
    images : list 
        a list of grism and direct images 
        
    Return
    ------
    N/A
    """
    
    direct_wcs = []
    grism_wcs = []

    for f in images:
        # get filter for image to distinguish between direct and grism
        filt = fits.getval(f, 'FILTER')
        
        hdul = fits.open(f)
        db_bool = 'WCSCORR' not in hdul
        hdul.close()
        
        try:
            # get the active solution from the file's "SCI" extension
            wcsname = fits.getval(f, 'WCSNAME', ext=('SCI', 1))
            if db_bool == True:
                updatewcs.updatewcs(f,use_db=db_bool)
        except KeyError:
            updatewcs.updatewcs(f,use_db=db_bool)
            wcsname = fits.getval(f, 'WCSNAME', ext=('SCI', 1))
        
        # seperate between direct and grism
        if 'G' in filt:
            grism_wcs.append(wcsname)
        if 'F' in filt:
            direct_wcs.append(wcsname)

    # get the number of unique active solutions in the direct and grism images       
    num_wcs_direct = len(set(direct_wcs))
    num_wcs_grism = len(set(grism_wcs))

    # roll back WCS on all files if there is more than one active solution for either direct or grism images
    if num_wcs_direct > 1 or num_wcs_grism > 1:
        [updatewcs.updatewcs(file,use_db=False) for file in images]
        print('WCS reset complete')

    # roll back WCS on all files if the active solution for the direct images do not match the grism images
    elif set(direct_wcs) != set(grism_wcs):
        [updatewcs.updatewcs(file,use_db=False) for file in images]
        print('WCS reset complete')

    # do nothing if there is one unique active solution and they match
    elif set(direct_wcs) == set(grism_wcs):
        print(f"No WCS update needed. All grism and direct images use WCS: {grism_wcs[0]}.")

In [None]:
os.chdir(cwd)
all_images = glob.glob('f098m/i*_flt.fits')+\
             glob.glob('f105w/i*_flt.fits')+\
             glob.glob('g102/i*_flt.fits')
check_wcs(all_images)

## 3.3 Drizzling the Input Data<a id="drizzle"></a>
The next step is to drizzle the grism images. We'll need a list of the image names to feed to AstroDrizzle. After that, we'll do the same for the direct images, but use the drizzled grism image as a reference, which will ensure proper registration between the data. HSTaXe will use these linked drizzle images to locate spectral traces based on the positions of sources in the direct images.

In [None]:
# Create list file using images in grism directory
os.chdir('g102')

lis = open('g102.lis', 'w')
for f in sorted(os.listdir('.')):
    if f[-8:] == 'flt.fits':
    if os.path.splitext(f)[1]=='.fits':
        lis.write(f)
        lis.write('\n')
lis.close()

!cat g102.lis

In [None]:
# Drizzle grism images. If only using one input image, set blot, median, driz_cr to False
astrodrizzle.AstroDrizzle('@g102.lis', output='g102', mdriztab=True, 
                          preserve=False, skysub=False, final_fillval=None)

In [None]:
# List files for direct images
os.chdir(cwd)
os.chdir('f098m')

lis = open('f098m.lis', 'w')
for f in sorted(os.listdir('.')):
    if f[-8:] == 'flt.fits':
    # if os.path.splitext(f)[1]=='.fits':
        lis.write(f)
        lis.write('\n')
lis.close()

os.chdir('../f105w')

lis = open('f105w.lis', 'w')
for f in sorted(os.listdir('.')):
    if f[-8:] == 'flt.fits':
    # if os.path.splitext(f)[1]=='.fits':
        lis.write(f)
        lis.write('\n')
lis.close()

!cat f098m.lis

Next, drizzle the direct images using the drizzled grism mosaic as a reference to ensure proper registration. For more information please see the `AstroDrizzle` documentation [here](https://drizzlepac.readthedocs.io/en/latest/astrodrizzle.html).

In [None]:
ref = '../grism_ims/grism_drz.fits[1]'
astrodrizzle.AstroDrizzle("@f098m.lis", output="f098m", mdriztab=True, 
                          preserve=False, skysub=False, final_fillval=None)
astrodrizzle.AstroDrizzle("@f105w.lis", output="f105w", mdriztab=True, 
                          preserve=False, skysub=False, final_fillval=None)

Your grism and direct images should now be aligned. For the WFC3/IR grisms, there should be very little vertical offset between the positions of sources in the direct image and their correspondents in the grism image. We'll perform a quick visual check here:

In [None]:
os.chdir(cwd)
fig, axs = plt.subplots(1, 2, figsize=(15,10), dpi=100)

d = fits.getdata('direct_ims/direct_drz.fits', 1)
z1,z2 = zscale.zscale(d)
im1 = axs[0].imshow(d, origin='lower', cmap='Greys_r', vmin=z1, vmax=z2)
axs[0].set_title('direct_drz.fits')
fig.colorbar(im1,shrink=0.7,pad=0.01)

d = fits.getdata('grism_ims/grism_drz.fits', 1)
z1,z2 = zscale.zscale(d)
im2 = axs[1].imshow(d, origin='lower', cmap='Greys_r', vmin=z1, vmax=z2)
axs[1].set_title('grism_drz.fits')
fig.colorbar(im2,shrink=0.7,pad=0.01)

plt.tight_layout()

## 3.4 Creating a Catalog with SExtractor <a id="catalog"></a>

This section is intended for anyone using data other than the `example_data` provided for this notebook. Since we also provide a catalog in the `example_data` directory and will not formally run SExtractor here, we want to explain the process behind using the drizzle image to create a new catalog with SExtractor. **Please refer to the links in the [Introduction](#intro) section for instructions regarding installing SExtractor and downloading the necessary aXe-SExtractor configuration files.**

HSTaXe will look for a highly specific format in the catalog, and does not always give clear error messages when something within the catalog is awry. If creating a catalog yourself, please follow the next steps carefully:

1. Copy the drizzled direct image into the `sextractor` directory (once created), which contains HSTaXe-appropriate configuration files for SExtractor.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<u>Example code:</u>
```python
shutil.copy('direct_ims/direct_drz.fits', 'sextractor/')
```
2. With SExtractor installed, run the follow"ing command from within the `sextractor` directory that you created:
   
   `sex -c aXe.sex direct_drz.fits[1] -DETECT_THRESH 5 -MAG_ZEROPOINT 26.4525`

    Note that the value for the `DETECT_THRESH` keyword, which sets the minimum value for pixels to be considered, may be changed appropriately for your data. `MAG_ZEROPOINT` should also be changed to match the magnitude zeropoint of the direct image filter for your data. A table is provided below containing the pivot wavelengths and zeropoints for the WFC3/IR grism reference filters.
    
| Filter | Pivot Wavelength (nm) | Zeropoint (ABmag |
|:--------|-----------------------|------------------|
| F098M  |         986.4         |      25.666      |
| F105W  |          1055         |      26.264      |
| F140W  |          1392         |      26.450      |
| F160W  |          1537         |      25.936      |

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<u>Example code:</u>
```python
os.chdir('sextractor')
detect_thresh = 10
cl_input = f'sex -c aXe.sex direct_drz.fits[1] -DETECT_THRESH {detect_thresh} -MAG_ZEROPOINT 25.666'
os.system(cl_input)
``` 
3. At this point you should have created a catalog. The next steps include copying the file into the `direct_ims` directory and editing the name of the `MAG_ISO` column. See [Section 2.3](#copycat) for information regarding renaming the `MAG_ISO` column. 

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<u>Example code:</u>
```python
os.chdir(cwd)
shutil.copy('sextractor/aXe.cat', 'direct_ims')
```

## 3.5 Copy Catalog and Rename Mag Column <a id="copycat"></a>

The catalog corresponding to the data used in this notebook is included in the `example_data` directory downloaded in the Introduction section. Start by copying the catalog into the `direct_ims` directory.

In [None]:
# Copy the example catalog to the direct image directory:
os.chdir(cwd)
shutil.copy('example_data/aXe.cat', 'direct_ims');

In [None]:
cat = Table.read('direct_ims/aXe.cat', format='ascii.sextractor')
cols_to_show = ['NUMBER', 'X_IMAGE', 'Y_IMAGE', 'A_IMAGE', 'B_IMAGE', 'MAG_ISO', 'MAGERR_ISO']
cat[cols_to_show].show_in_notebook()

Examine the catalog. The "MAG_ISO" column must be renamed to "MAG_F####" for the catalog to be correctly read in by HSTaXe. Where "####" is the pivot wavelength of the direct image filter in nm (Å for WFC3/UVIS, e.g. 4971 for F200LP and 1392 for F140W). Please see WFC3 Instrument Handbook Section 7.5 ["IR Spectral Elements"](https://hst-docs.stsci.edu/wfc3ihb/chapter-7-ir-imaging-with-wfc3/7-5-ir-spectral-elements) for information about UVIS filter pivot wavelengths. You can also refer to the table above as a quick-reference for the pivot wavelength of the recommended direct image filters for the IR grisms.

Any lines in the catalog containing clearly spurious detections, such as those with magnitudes of 99.0, should also be removed. **Note**: Removing spurious detections is not apart of this notebook and will need to be done manually. 


Lastly, locate the lines containing the sources whose spectra you want to extract and note the line number from the NUMBER column. This will be used later to identify the BEAM number for your object in the output files.

To avoid having to edit the column information manually in the SExtractor catalog, there is a helper function below called `edit_catalog_pivot`. It takes in the SExtractor catalog, output file path/name, and the pivot wavelength value and edits the information and writes to the output file. 

In [None]:
def edit_catalog_pivot(inputfile, outputfile, pivot_wave):
    """ Function to edit the auto-generated sextractor header/column name so aXe will run
        
        Parameters
        ----------
        inputfile : str
            The full path to the input catalog including filename
        outputfile : str
            The full path to the output catalog including filename
        pivot_wave : int or str
            The pivot wavelength of filter used in the driect image
            For UVIS please use 4 digits in units of Angstrom and 
            for IR please use 4 ditits in units of nanometers 
            
        Return
        ------
        Nothing. But a file is written to `outputfile`
    """
    # Read in the input catalog
    with open(inputfile, 'r') as f:
        lines = f.readlines()

    with open(outputfile, 'w') as f:
        # Find the mag_iso row and replace with pivot wavelength
        for line in lines:
            line = line.replace('MAG_ISO', f'MAG_F{pivot_wave}')
            f.write(line)

In [None]:
inputfile = 'direct_ims/aXe.cat'
outputfile = 'direct_ims/aXe_ir_f098m.cat'
pivot_wave = 986 # nm
edit_catalog_pivot(inputfile, outputfile, pivot_wave)

In [None]:
# Use this cell to filter your catalog for your sources
# In this example, we sort to identify the brightest sources
cat = Table.read(outputfile, format='ascii.sextractor')
cat.sort(f'MAG_F{pivot_wave}')
cols_to_show = ['NUMBER', 'X_IMAGE', 'Y_IMAGE', 'A_IMAGE', 'B_IMAGE', 'THETA_IMAGE', 'FLUX_RADIUS', 'MAG_F986', 'MAGERR_ISO']
cat[cols_to_show].show_in_notebook()

# 4. Running HSTaXe<a id="aXe"></a>

With the catalog generated and edited, we can now move on to working with HSTaXe. We'll set up a few additional directories and environment variables that point to them, while clearing out any previous data or outputs in these directories. We'll also copy our data into the fresh `DATA` directory.

In [None]:
os.chdir(cwd)

for dirr in ['DATA','CONF','OUTPUT', 'DRIZZLE']:
    if os.path.isdir(dirr):
        shutil.rmtree(dirr)
    os.mkdir(dirr)
    
os.environ['AXE_IMAGE_PATH'] = './DATA/' 
os.environ['AXE_CONFIG_PATH'] = './CONF/'
os.environ['AXE_OUTPUT_PATH'] = './OUTPUT/'
os.environ['AXE_DRIZZLE_PATH'] = './DRIZZLE/'

dsrc = 'direct_ims/*flt.fits'
gsrc = 'grism_ims/*flt.fits'
csrc = 'WFC3_IR_conf/*'

for files in [dsrc, gsrc, csrc]:
    if '_ims' in files:
        dirr = 'DATA'
    elif 'conf' in files:
        dirr = 'CONF'
    for f in glob.glob(files):
        shutil.copy(f, dirr)
    

Next, we define the field-of-view boundaries for the detector. We'll pass this information to the `iolprep` task, which will let it include in the input object lists (IOLs) it generates, objects whose direct image locations fall outside of the chip but whose spectral traces do fall onto the chip.

For WFC3/IR, the [left, right, top, bottom] extensions, in pixels, are [183, 85, 50, 50]

In [None]:
FOV = '183,85,50,50'

Now we'll run `iolprep` to generate our IOLs, which are object catalogs for each individual direct image, from the drizzled direct image and its catalog.

In [None]:
os.chdir(cwd)
os.chdir('direct_ims')

axetasks.iolprep(drizzle_image = 'direct_drz.fits',
                input_cat = 'aXe_ir_f098m.cat',
                dimension_in = FOV)

In [None]:
# Copy the IOLs to the aXe DATA directory
os.chdir(cwd)
for f in glob.glob('direct_ims/*_?.cat'):
    shutil.copy(f, 'DATA')

The last step before extracting our spectra is to generate a file which contains on each line the names of a grism image, IOL name, and associated direct image. This is best done manually, to ensure that each grism image lines up with the appropriate direct image. For the example data, a file called `example.lis` is provided.

With this list, we run `axeprep`, which prepares the individual images for spectral extraction. This step is also responsible for perfoming a global background subtraction, if desired. This is controlled by the `backgr` keyword. 

Note that the `configs` and `backims` keywords should be matched to the data you have. E.g., if you are working with G102 data with direct images in F098M, the config file should be `G102.F098M.V4.32.conf`. If you are not performing a global background subtraction, `backims` is not required, but should be matched to the correct grism if you are. **Note: we do not currently recommend using HSTaXe to perform global background subtraction. A more reliable method will be added to this notebook in the near future.**

In [None]:
# copy the example list file
os.chdir(cwd)
shutil.copy('example_data/example.lis', '.')
os.rename('example.lis', 'aXe.lis')

In [None]:
os.chdir(cwd)

axetasks.axeprep(inlist='aXe.lis',
                     configs='G102.F098M.V4.32.conf',
                     backgr=False,
                     norm=False,
                     mfwhm=3.0)

The last HSTaXe task to run is `axecore`, which performs the actual extraction and generates output files. Again, the configuration file and sky background arguments should be matched to the spectral elements used for your data.

Local background subtraction is also performed by this step, if desired. The following keywords are critical for local background:

* `back`: This argument is the flag to trigger local background subtraction.
* `np`: Defines the number of pixels on either side of the spectral trace (beam) used to calculate the local background from.
* `interp`: Sets the interpolation method for the local background (-1=median, 0=mean, ≥1=nth order polynomial)
* `backfwhm`: The FWHM specifying the width of the background pixel extraction table

More information on background handling, both global and local, with `HSTaXe` can be found in the documentation [here](https://hstaxe.readthedocs.io/en/latest/hstaxe/description.html#sky-background).

In addition to local background subtraction, HSTaXe is also able to perform a vertical extraction. This method of extraction requires editing the `THETA_IMAGE` column to `-90.0` in the object catalog (aXe.cat) and setting two keywords in `axecore`: `orient=True` and `slitless_geom=False`. Vertical extractions have been used in the past to handle the extreme curvature of the orders. For more information on the vertical extraction method please see [WFC3 ISR 2011-18](https://ui.adsabs.harvard.edu/abs/2011wfc..rept...18R/abstract) (Rothberg et al. 2011).

In [None]:
axetasks.axecore('aXe.lis',
                 'G102.F098M.V4.32.conf',
                 fconfterm=None,
                 extrfwhm=4.,
                 drzfwhm=3.,
                 orient=False,
                 back=False,
                 weights=True,
                 slitless_geom=True,
                 cont_model='gauss',
                 sampling='drizzle',
                 exclude=True)

## 4.1. Outputs<a id="out"></a>

Each grism input file will have several corresponding output files. For each of the G102 and G141 input FLT file, HSTaXe will create the following in the `OUTPUT/` directory:

- \<ipppssoot>_flt_2.cat          : Object catalog for the FLT file [ipppssoot]_flt.fits<br>
- \<ipppssoot>_flt_2.OAF          : Aperture file<br>
- \<ipppssoot>_flt_2.PET.fits     : The Pixel Extraction Table, containing all the unbinned information about each spectrum<br>
- \<ipppssoot>_flt_2.SPC.fits     : 1D extracted spectra<br>
- \<ipppssoot>_flt_2.CONT.fits    : Contamination estimate for eact of the spectra<br>
- \<ipppssoot>_flt_2_opt.SPC.fits : Optimally extracted version of the 1D spectra

For now, let's take a look at the STP files, which contain 2D "stamps" of the extracted spectral traces; and the SPC files, which contain our 1D extracted spectra.

We'll need the line numbers from the original source catalog we generated to identify the BEAM number for the object whose spectrum we want. For the example data, The target is the bright planetary nebula in the middle-left of the drizzled direct image. Its number in the example catalog is 19.

First, let's examine the stamps from the STP files.

In [None]:
beam = '19'

fig, axes = plt.subplots(3, 1, figsize=(10,6))

for i, f in enumerate(glob.glob('OUTPUT/*STP.fits')):
    
    with fits.open(f) as hdul:
        d = hdul[f'BEAM_{beam}A'].data
    z1,z2 = zscale.zscale(d)
    im = axes[i].imshow(d, origin='lower', vmin=z1, vmax=z2)
    fig.colorbar(im,ax=axes[i],shrink=0.5, pad=0.01, aspect=6)
    axes[i].set_title(os.path.basename(f))
fig.tight_layout()

And now, we can finally look at our extracted spectra from the SPC files. For the example data, we've plotted the expected location of several emission lines that should be present in the planetary nebula spectrum (from Table 2 in [Bohlin et al. 2015](https://www.stsci.edu/files/live/sites/www/files/home/hst/instrumentation/wfc3/documentation/instrument-science-reports-isrs/_documents/2015/WFC3-2015-10.pdf)).

In [None]:
fig, ax = plt.subplots(1,1, figsize=(10,6), dpi=120)
ax.grid(alpha=0.5)

for i, f in enumerate(glob.glob('OUTPUT/*2.SPC.fits')):
    with fits.open(f) as hdul:
        d = hdul[f'BEAM_{beam}A'].data
        h = hdul[0].header
    wl = d['LAMBDA']
    flux = d['FLUX']
    error = d['FERROR']
    contam = d['CONTAM']
    xrange = (wl>8000) & (wl<11500)
        
    ax.errorbar(wl[xrange],flux[xrange],error[xrange], label=f'{os.path.basename(f)}')
        
# ax.set_title(os.path.basename(f))
ax.axvline(9070.0, ls=':', c='k') # SIII
ax.axvline(9535.1, ls=':', c='k') # SIII+P epsilon
ax.axvline(10060.2, ls=':', c='k') # HI P7
ax.axvline(10833.5, ls=':', c='k') # HeI


ax.set_ylabel(r'Flux ($erg/cm^2/s/\AA$)')
ax.set_title(f"{h['targname']}",size=13)
ax.set_xlabel(r'Wavelength ($\AA$)')
ax.legend(loc='upper left')


# 5. Fluxcube Extraction <a id="fluxcube"></a>

This more advanced extraction method will produce a final result that contains more accurate contamination estimates, and is weighted. We will achieve this by using the `aXedrizzle` functionality from HSTaXe, which will allow us to combine spectra taken at the same orientation. We will also make use of the segmentation map generated by SExtractor to determine the shape of the objects in our images.

The following steps will make use of the example data we used for the basic extraction above, as well as some additional images taken with a different direct imaging filter. We did a good deal of the required pre-processing for these images earlier, so we wouldn't have to repeat the same steps. Now, we'll create a directory to create our fluxcube in, and copy the drizzled direct and grism images we made in [Section 3.3](#drizzle). We'll also copy the set of grism images from the HSTaXe DATA directory, and the segmentation map output by SExtractor.

In [None]:
os.chdir(cwd)
if os.path.isdir('flx'):
    shutil.rmtree('flx')
os.mkdir('flx')
dst = 'flx'

for d in ['g102', 'f098m', 'f105w']:
    f = f'{d}/{d}_drz.fits'
    shutil.copy(src, dst);

src = 'g102'
for f in glob.glob(f'{src}/_flt.fits'):
    shutil.copy(f, dst);

src = 'example_data'
shutil.copy(f'{src}]/seg.fits', 'flx');

Next, we'll create a file called `cube.lis` that contains a description of the drz images we're using, as well as the pivot wavelengths (in nm) and ABmag zeropoints for our direct filters.

In [None]:
os.chdir('flx')
direct_ims = glob.glob('f*drz.fits')

lis = open('cube.lis', 'w')

for f in direct_ims:
    photplam = fits.getval(f, 'PHOTPLAM', ext=0)  # pivot wavelength in Angstroms
    photflam = fits.getval(f, 'PHOTFLAM', ext=0)  # inverse sensitivity in Angstroms
    ab_zeropoint = -48.60 - 2.5*np.log10(photflam * photplam**2/3e18)  # f_nu = lambda**2 / c * f_lambda (c in Angstroms/s since photplam in Angstroms)
    line = f'{f} {photplam/10} {ab_zeropoint}'
    lis.write(line)
    lis.write('\n')
lis.close()

!cat cube.lis

# 6. Conclusions <a id="conclusions"></a>

Thank you for walking through this notebook. You should now be able to perform extractions on WFC3/IR spectral data using HSTaXe.

For additional information on the WFC3 grisms, please visit the [grism resources](https://www.stsci.edu/hst/instrumentation/wfc3/documentation/grism-resources) and [grism data analysis](https://www.stsci.edu/hst/instrumentation/wfc3/documentation/grism-resources/grism-data-analysis) webpages.

Cookbooks walking through extraction methods for WFC3/UVIS and ACS/WFC are available on the [HSTaXe GitHub](https://github.com/spacetelescope/HSTaXe). For detailed information on HSTaXe, please visit the [documentation webpage](https://hstaxe.readthedocs.io/en/latest/index.html). Lastly, if you have questions regarding this notebook or using WFC3 data with HSTaXe please contact our WFC3 [Help Desk](https://stsci.service-now.com/hst).


**Congratulations, you have completed the notebook.**

## 7. About this Notebook <a id="about"></a>

**Author:** Aidan Pidgeon and Benjamin Kuhn, WFC3 Instrument Team

**Special Thanks to:** 
 - Dr. Nor Pirzkal, for creating the original workflow that was adapted into this notebook
 - Ricky O'Steen and Duy Nguyen, for their fantastic work in updating the HSTaXe module
 - Debopam Som, for support in testing the HSTaXe workflow

**Released:** 2023-01-06 <br>
**Last Updated:** 2023-03-14

## 8. Citations <a id="cite"></a>

If you use `astropy`, `drizzlepac`, `matplotlib` or `numpy` for published research, please cite the
authors. Follow this link for more information about citing the libraries:

* [Citing `astropy`](https://www.astropy.org/acknowledging.html)
* [Citing `drizzlepac`](https://drizzlepac.readthedocs.io/en/latest/LICENSE.html)
* [Citing `matplotlib`](https://matplotlib.org/stable/users/project/citing.html)
* [Citing `numpy`](https://numpy.org/citing-numpy/)

[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 