![STSCI banner](https://github.com/STScI-MIRI/MRS-ExampleNB/raw/main/assets/banner1.png)

# Introduction to TSO data products & the JWST TSO pipeline

Author: Sarah Kendrew, Instrument & Calibration Scientist, ESA/STScI MIRI Branch <br>
Last Updated: 28 Nov 2021<br>
Pipeline version: 1.3.3<br>

### Table of contents

1. [Introduction](#intro)<br>
   1.1 [Purpose of this Notebook](#purpose)<br>
   1.2 [Input Simulations](#inputs)<br>
   1.3 [Caveats for Simulated Data](#mirisim)<br>
2. [Setup](#setup)<br>
3. [Retrieving and Inspecting the Uncalibrated Data](#firstlook)<br>
    3.1 [Introduction to JWST datamodels](#datamodels)<br>
    3.2 [Inspecting datamodel metadata](#metadata)<br>
4. [Intro to running the pipeline](#det1)<br>
    4.1 [Running the pipeline: Basics](#pipe_basics)<br>
    4.2 [Retrieving reference files](#crds)<br>
    4.3 [Making changes to the pipeline steps](#pipe_changes)<br>
5. [Progressing Further: Spec2Pipeline()](#spec2pipe)<br>
    5.1 [Introduction to the Stage 2 Pipeline](#spec2_intro)<br>
    5.2 [Running the pipeline step by step](#stepbystep)<br>
6. [Final Steps: Tso3Pipeline()](#tso3pipe)<br>
    6.1 [Association files](#asn)<br>
    6.2 [Running the Tso3 Pipeline](#tso3)<br>
7. [Conclusion](#bye)<br>

# 1.<font color='white'>-</font>Introduction <a class="anchor" id="intro"></a>

## 1.1<font color='white'>-</font>Purpose of this notebook <a class="anchor" id="purpose"></a>

In this notebook we provide a realistic example for running the JWST pipeline on a JWST Time Series Observation (TSO). For the purposes of this tutorial we will work with a simulated observation with the MIRI Low Resolution Spectrometer (LRS). In particular, we focus on aspects of the pipeline that differ from "standard" algorihtms and procedures. There will not be enough time to look at every step in detail, but we will demonstrate how to make changes to the pipeline setting (and why you might want to do that) for the best scientific utility. 

Note that the notebook uses JWST Calibration Pipeline version 1.3.3, which is the current version at the time of this JWebbinar. The pipeline will however be further developed and updated post-launch. 

We will start with a simple simulated MIRI LRS observation, created using MIRISim version 2.4.1, which is compatible with pipeline version 1.3.3 (https://wiki.miricle.org/Public/MIRISim_Public). The data are described in more detail below. 


## 1.2<font color='white'>-</font>Input Simulations <a class="anchor" id="inputs"></a>

We used the MIRISim software package (v2.4.1) to generate realistic simulations of a MIRI LRS slitless observation of a simple stellar-type point source. The stellar SED was modelled as a simple black body spectrum with the following parameters:

* Temperature = 6230 K
* Normalised to K = 8.99, or flux of 20 mJy at 2 $\mu$m. 

LRS slitless observations are carried out in the SLITLESSPRISM subarray of the MIRI Imaging detector. The subarray has 416 rows x 72 columns (the left-most 4 columns are reference pixels, i.e. no illuminated), with sampling of 0.11 arcsec/pix. The single-frame read time for this subarray is 0.159 seconds, and the FASTR1 read mode has an extra reset between integrations. We perform an observation of 100 groups, 10 integrations, in a single exposure; giving an exposure time of:

t$_{exp}$ = ((100 + 1) $\times$ 0.159) $\times$ 10 = 160.59 s = 2.67 minutes

"Real" TSOs will typically have many more integrations than 10, with exposures covering many hours. But for the sake of reducing processing time, we use a shorter exposure here. 



### 1.3<font color='white'>-</font>Caveats for Simulated Data<a class="anchor" id="mirisim"></a> ###

As noted above, in this notebook we will be processing simulated data created with the MIRIsim tool.  Like the pipeline, MIRISim is also an evolving piece of software and there are multiple known issues that can cause problems.  

**General MIRISim caveats**

- Detector noise properties are not modelled in a fully physically realistic way. It's not recommended to use these simulations "out of the box" for detailed noise investigations. 

- Reference pixels are not treated consistently, the refpix step of detector1 must therefore be turned off to process mirisim data without artifacts.

- The default detector read mode is FASTR1, which contains a reset between integrations. MIRISim still uses the FAST read mode, which does not include this extra reset. The exposure time recordedin the simulated FITS header is therefore missing this additional reset time. 

**MIRI TSO-specific caveats**

- There is no "TSO" flag in MIRISim; the software does not set the header keyword that the pipeline looks for to recignise whether an exposure is a TSO. We will set this manually in this notebook.

- MIRISim is not able to insert a time-variable signal into a simulated observation. The source flux is assumed to be constant. Additional tools have been developed in the MIRI consortium for this purpose, but for the aims of this tutorial we do not consider this. 

- The MIRI LRS prism has a leak in its transmission that causes some 3-4 $\mu$m flux to contaminate the spectrum around 6-7 $\mu$m. This is not included in the MIRISim models. 




<hr style="border:2px solid gray"> </hr>

## 2.<font color='white'>-</font>Setup <a class="anchor" id="setup"></a>

<div class="alert alert-block alert-warning">
In this section we set things up a number of necessary things in order for the pipeline to run successfully:

1. import the necessary python packages
2. specify the directory structure

</div>

First the imports.


In [None]:
# Need to set these enviromental variables for this notebook to work properly:
%set_env CRDS_PATH $HOME/crds_cache
%set_env CRDS_SERVER_URL https://jwst-crds.stsci.edu

In [None]:
import os
import glob
import shutil

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
from matplotlib.patches import Rectangle
import astropy.io.fits as fits

from jwst import datamodels
from jwst.pipeline import calwebb_detector1, calwebb_spec2, calwebb_tso3
from jwst.associations.asn_from_list import asn_from_list
from gwcs.wcstools import grid_from_bounding_box

The following set of imports are specifically for the Spec2Pipeline step-by-step section. 

In [None]:
from jwst.assign_wcs import AssignWcsStep
from jwst.srctype import SourceTypeStep
from jwst.flatfield import FlatFieldStep
from jwst.photom import PhotomStep
from jwst.extract_1d import Extract1dStep

from gwcs.wcstools import grid_from_bounding_box

import crds
import json

And a few additional imports for the Tso3Pipeline section too. 

In [None]:
import astropy.io.ascii as ascii

from jwst.associations.lib.rules_level3_base import DMS_Level3_Base
from jwst.associations import asn_from_list

Next we will create an output directory for the new products we produce. 

In [None]:
# set this to any preferred output directory on your system
outdir = 'miri_lrs_output/'
if not os.path.exists(outdir):
    os.mkdir(outdir)

<hr style="border:1px solid gray"> </hr>

## 3.<font color='white'>-</font>Retrieving and Inspecting the Uncalibrated Data <a class="anchor" id="firstlook"></a>

In practice, you will retrieve your observational data from the [MAST archive portal](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html). Here you can query on a host a different parameters - mission, instrument, program ID, date of execution, etc. 

In the archive you will find both uncalibrated and calibrated data. All data received from the observatory is automatically processed by the current pipeline version; this includes ancillary data such as target acquisition images. The steps that are run in the pipeline for a particular data mode can be found in the pipeline documentation for [Stage 1](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_detector1.html#calwebb-detector1), [Stage 2](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec2.html) and [Stage 3](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_tso3.html). 

This automated pipeline processing uses a set of default parameters that were determined by the instrument teams to be "good" for a typical observation, however, some observers may want to try out different settings or find better settings for their particular science case. We will show later how this works. 

For the purposes of this JWebbinar we provide a simulated observation. 


In [None]:
data_url = "https://stsci.box.com/shared/static/lev1fh6b8iy54n8wazhg3qkknh0acd4o.fits"

You can access this file and examine structure & contents using regular astropy FITS tools, like this. You can see the file has 5 extensions:

* An empty primary extension (with a header)
* SCI: the science data. This is a 4D dataset: NCOLS x NROWS x NGROUPS x NINTS. 
* PIXELDQ: data quality flags. This is a single 2D plane with dimensions NCOLS x NROWS.
* REFOUT: the MIRI reference output values. This is a MIRI specific output. 
* ASDF: the metadata for the datamodels. 

Note that normally the PXELDQ plane is added in the first pipeline stage; its presence here is a feature of MIRISIm. Only the reference columns are flagged, the rest of this array is zeros. 

In [None]:
hdu = fits.open(data_url)
hdu.info()

Using matplotlib, we can visualize the data. The first plot below shows groups 25, 50 and 100 of the first integration of the exposure. You can see how the signal builds up with increasing group number. This increase in signal forms the ''ramp'' that the pipeline will be fitting. 

The second plot shows this flux build up in 2 pixels: one in the source spectrum, and one in the background. 

In [None]:
uncal_fits = hdu[1].data
print(np.shape(uncal_fits))

nints = np.shape(uncal_fits)[0]

# identifying pixels in source and background regions
src_px = [37, 360]
bgr_px = [20, 200]
px_labels = ["source + bgr px", "bgr px"]


ql_fig, ax = plt.subplots(ncols=3, nrows=1, figsize=[15, 8])
plt_ints = [24, 49, 99]

for i, pi in enumerate(plt_ints):
    ax[i].imshow(uncal_fits[8, pi, :, :], origin='lower', aspect='equal', interpolation='None')
    ax[i].set_title('Grp {}'.format(pi+1))
    sc1 = ax[i].scatter(src_px[0], src_px[1], marker='x', color='cyan')
    sc2 = ax[i].scatter(bgr_px[0], bgr_px[1], marker='x', color='magenta')

ql_fig.legend([sc1, sc2], px_labels, loc='lower center', fontsize='x-large', ncol=2)

In [None]:
ramp_fig, ax = plt.subplots(figsize=[10, 6])
for i in range(nints):
    ax.plot(uncal_fits[i, :, src_px[1], src_px[0]], 'c-', lw=2)
    ax.plot(uncal_fits[i, :, bgr_px[1], bgr_px[0]], 'm-', lw=2)

leg = [Line2D([0], [0], lw=2, color='c', label='source + bgr px'), 
       Line2D([0], [0], lw=2, color='m', label='bgr px')]   
    
ax.set_xlabel('group no.', fontsize='large')
ax.set_ylabel('DN', fontsize='large')
ax.legend(handles=leg, loc=2, fontsize='large')

### <font color='white'>-</font>3.1 Introduction to JWST datamodels<a class="anchor" id="datamodels"></a> 

The JWST Calibration Pipeline ("the pipeline") provides datamodels for convenient accessing and working with the data. These datamodels are effectively containers that are optimised for particular JWST data types. More information is available [here](https://jwst-pipeline.readthedocs.io/en/latest/jwst/datamodels/index.html).

You don't need to know what specific model your data corresponds to. The ``datamodels.open`` function checks the relevant header keywords and matches the data against an existing model. 

In the cells below we will explore come useful aspects of the datamodels. 

In [None]:
uncal = datamodels.open(data_url)
print(uncal)
print(uncal.info())

This schema gives us a first look at the the model attributes, and how the metadata is packaged in the model. Most importantly, the science data is in the ``uncal.data`` attribute.  If you aren't sure where to find a particular keyword, the function ``uncal.find_fits_keyword()`` can identify it for you. 

In [None]:
print('The number of groups per integration in the exposure is {0}'.format(uncal.ngroups))
print('The number of integrations in the exposure is {0}'.format(uncal.nints))
print('Information on the filter used in this exposure can be found here: {0}'.format(uncal.find_fits_keyword('FILTER')))
print('OK! So the filter used is {0}, which is the LRS double prism'.format(uncal.meta.instrument.filter))

We can recreate the above plot using datamodel syntax

In [None]:
ql_fig2, ax2 = plt.subplots(ncols=3, nrows=1, figsize=[15, 8])
plt_ints = [24, 49, 99]

for i, pi in enumerate(plt_ints):
    ax2[i].imshow(uncal.data[8, pi, :, :], origin='lower', aspect='equal', interpolation='None')
    ax2[i].set_title('Grp {}'.format(pi+1))
    sc1 = ax2[i].scatter(src_px[0], src_px[1], marker='x', color='cyan')
    sc2 = ax2[i].scatter(bgr_px[0], bgr_px[1], marker='x', color='magenta')

ql_fig2.legend([sc1, sc2], px_labels, loc='lower center', fontsize='x-large', ncol=2)

### <font color='white'>-</font>3.2 Inspecting datamodel metadata<a class="anchor" id="metadata"></a>

The ``meta`` attribute of the model is particular rich in unseful information. We'll show some of its useful features in the next cell. Not all attributes of the model are correctly populated as the MIRISim simulation data lacks some of the JWST observatory keywords. 


We show here that the "TSOVISIT" keyword or attribute is not set. This is a MIRISim issue, and we'll show below how we set this manually in the data model to ensure that the pipeline recognizes the exposure as a TSO.

In [None]:
print('Total exposure time is {:.2f} seconds'.format(uncal.meta.exposure.exposure_time))
print('Detector readout pattern was {0} mode'.format(uncal.meta.exposure.readpatt))      
print('The original filename of this RampModel was {0}'.format(uncal.meta.filename))     
print('The most amazing space telescope bar none is {0}'.format(uncal.meta.telescope))   
print('TSOs should have this attribute set to True and instead it reads {0}'.format(uncal.meta.visit.tsovisit))

# Let's change the TSO status
if not uncal.meta.visit.tsovisit:
    uncal.meta.visit.tsovisit = True

print('Now the TSO setting reads {0}'.format(uncal.meta.visit.tsovisit))

### Summary 

<div class="alert alert-block alert-warning">
In this section, we:<br>
* loaded the uncalibrated MIRISim data for a slitless LRS exposure<br>
* showed how to load the data using datamodels<br>
* showed some of the useful features of datamodels and how metadata is organised in the model attributes<br>
</div>
    



<hr style="border:2px solid gray"> </hr>

## 4.<font color='white'>-</font>Intro to Running the Pipeline: CalDetector1() <a class="anchor" id="det1"></a>

In this section we will demonstrate how to run the JWST calibration pipeline on TSO data. When you retrieve your data from the archive, you will have pipeline processed data available for download already; however as part of your research you may well want to rerun the pipeline, or specific steps or stages, to optimise certain settings for your targets.

We will highlight here a few pipeline steps that you may want to skip or modify for TSOs, and show how to do that. 

The pipeline consists of [**3 stages**](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/main.html#pipelines). For TSOs, these are:

* CalDetector1: implements basic detector calibrations and converting ramps to slopes
* CalImage2 or CalSpec2: performs additional instrument- or mode-level calibrations, returning flux calibrated images for imaging data, and flux calibrated images and extracted spectra for spectroscopic data. 
* CalTso3: higher level calibrations, returning more TSO-specific output products

In the next sections we will run our MIRI LRS data through these pipeline stages, highlighting basic functionality, output products and suggested modifications. We will start with an introduction to the basic call sequence for the pipeline. 



### 4.1 <font color='white'>-</font>Running the Pipeline: Basics<a class="anchor" id="pipe_basics"></a>

Full documentation for running the pipeline from within Python can be found on the [main pipeline documentation pages](https://jwst-pipeline.readthedocs.io/en/latest/jwst/introduction.html#running-from-within-python). In additional, materials for other JWebbinars providing more general instroductions to the pipeline can be found [here](https://www.stsci.edu/jwst/science-execution/jwebbinars).

In our import statements above we imported the modules ``calwebb_detector1``, ``calwebb_spec2`` and ``calwebb_tso3`` from ``jwst.pipeline``. The most basic call sequence looks as follows:

``det1 = calwebb_detector1.Detector1Pipeline.call(f)`` passing the FITS file to the call sequence; or<br>

``det1 = calwebb_detector1.Detector1Pipeline.call(uncal)`` using the previously loaded datamodel as input. <br>

In this case we will work with the ``uncal`` datamodel, as we made a modification to it in the above section. Let's see what happens. 




In [None]:
det1 = calwebb_detector1.Detector1Pipeline.call(uncal, output_dir=outdir, save_results=True)
print(det1)

# let's look at the output
print(os.listdir(outdir))

If we see no errors, the pipeline successfully ran on our data. The print statement for ``det1`` shows that the pipeline has produced an ImagModel with dimensions (416, 72), which are the correct dimensions for our subarray, corresponding to the output file miri_lrs_tso_100G10I_mirisim241_rate.fits. This is the slope image in DN/s, co-added over all integrations. 

We can see that the output directory contains an additional file, miri_lrs_tso_100G10I_mirisim241_rateints.fits. This is the file that contains the slope images for each integration, which is the product we want to use for TSOs. Let's take a look at the contents of this file. 

In [None]:
rifile = glob.glob(outdir+'/*rateints.fits')
print(rifile)

rate_ints = datamodels.open(rifile[0])
print(rate_ints)

sl_fig, axs = plt.subplots(ncols=5, nrows=2, figsize=[15, 10])

for i, aa in enumerate(axs.flat):
    im = aa.imshow(rate_ints.data[i, :, :], origin='lower', aspect='equal', interpolation='None')
    aa.set_title('int = {0}'.format(i+1))

    
sl_fig.subplots_adjust(right=0.9)
cbar_ax = sl_fig.add_axes([0.95, 0.2, 0.02, 0.6])
cbar = sl_fig.colorbar(im, cax=cbar_ax)
cbar.set_label('DN/s', fontsize='x-large')

OK, let's now dig a bit deeper in what the pipeline is doing to learn more about reference files and to prepare for making some changes. The steps the pipeline runs for a particular mode are defined in a [parameter file in ASDF format](https://jwst-pipeline.readthedocs.io/en/latest/jwst/stpipe/config_asdf.html#config-asdf-files) (ASDF stands for Advanced Scientific Data Format). Asdf file can be read with a simple text editor, and a python package ``asdf`` exists to access programmatically.  

There's a handy function called ``get_crds_parameters`` that prints out all the metadata in the model, including information on the provenance of the file, data units, the calibration steps run (the ``cal_step`` entries), the version of the pipeline that was used (``calibration_software_version``), the names of the reference files used for calibration (``ref_file`` entries) and more.

Reference files are used to perform the calibration steps. These files are usually delivered by the instrument teams and the contents can be in the form of images or . Examples of reference files used by the pipeline are: 
* flat fields
* flux calibration tables
* wavelength calibration files

and many more.




In [None]:
rate_ints.get_crds_parameters()

### 4.2 <font color='white'>-</font>Retrieving reference files<a class="anchor" id="crds"></a>

Reference files and parameter files for the calibration pipeline live in the **JWST Calibration Reference Data System** - [CRDS](https://jwst-pipeline.readthedocs.io/en/latest/jwst/introduction.html#crds). You can see this acronym referenced throughout the pipeline documentation, and in the metadata for the exposures. 

Each version of the pipeline has an associated CRDS **context**; you can see the context used in our Detector1Pipeline run in ``rate_ints.meta.ref_file.crds.context_used`` as ``jwst_0776.pmap``. Sometimes it may be necessary to change this setting, in order to run the pipeline with older reference files. 

The CRDS system can be accessed programmatically, or simply via the following URL: https://jwst-crds.stsci.edu, from where you can download reference files with a simple click. In the figures below you can see the web interface for CRDS, and how we can locate the parameter ASDF files for individual pipeline stages and steps. See if you can retrieve the ASDF file used for our Detector1Pipeline run?

To save on time, we have included a new read noise reference file in our working directory (which is actually a copy of the existing one), and will demonstrate how to override the reference file in the pipeline call. 

![Fig 1 CRDS interface](images/crds_interface.png)

![Fig 2 Det1 parameter file](images/crds_miri_det1.png)

OK, we've now seen how to run the pipeline with its default settings, and where to find the information about those settings. Via the datamodel metadata we can see the filenames of any reference files that were used, the CRDS context. We can access these files from a web interface or programmatically. 

Next, let's see how we can make changes to these settings. 

<hr style="border:2px solid gray"> </hr>

###  <font color='white'>-</font>4.3 Making changes to the pipeline steps<a class="anchor" id="pipe_changes"></a>

It will be a pretty common occurrence that you want to investigate the impact of a particular calibration step on your analysis, change a parameter, or try using a different reference file. In this section we'll look at how to do that. 

__First__, we will take the read noise reference file in the working directory, and rerun the ``Detector1Pipeline()`` step with the new file. This demonstrates how a reference file can be _overridden_. The read noise reference file is references in both the jump step and in the ramp fitting. 

(**Note:** The basic detector calibration reference files were created following very detailed ground testing and often lengthy detector investigations. We show here how to override the read noise file to demonstrate the _method_; we recommend you replace Detector1 reference files with _extreme caution_). 

__Second__, we would like the ``Detector1Pipeline()`` to run the ``lastframe`` step. The MIRI detectors have an odd pull-down effect on the last frame (group) in an integration. This introduces a deviation from the (quasi-) linear response of the detector, and as the effect has an odd-even row dependence, it can impact line measurements, especially if we are looking at line ratios. However, this effect is also stable over time for a given source, so for TSOs we prefer to use the last frame for maximal SNR. But if we consider a situation where the absolute flux calibration _is_ important - for example to match the data with that from another instrument, then we may want to rerun the pipeline to include the step (and _exclude_ the last frame in the ramp fit). 


We're going to set the ``save_results`` parameter to ``False``, as we're not going to use these output files any further and we don't need to save them. 





In [None]:
# Change 1: identify the new read noise file
new_readnoise = 'miri_new_readnoise_file.fits'


# Bring it all together
d1mod = calwebb_detector1.Detector1Pipeline.call(uncal, save_results=False, 
                                           steps={'jump': {'override_readnoise': new_readnoise},
                                                  'ramp_fit': {'override_readnoise': new_readnoise},
                                                  'lastframe': {'skip': False}})

In [None]:
# Now let's check if our changes were applied
print('Read noise reference file used was {0}'.format(d1mod.meta.ref_file.readnoise.name))
print('The last frame step status: {0}'.format(d1mod.meta.cal_step.lastframe))

### Summary

<div class="alert alert-block alert-warning">
In this section, we ran the ``calwebb_detector1`` pipeline, and learned a few basic operations:<br>
<br>
* running a pipeline stage end-to-end with default settings, using the ``Step.call()`` method<br>
* identifying and visualizing the output products<br>
* showing the list of reference files used in the various calibration steps, and how to retrieve them in CRDS<br>
* make basic modifications to the pipeline call, such as skipping steps or overriding a reference file.<br>
</div>



<hr style="border:2px solid gray"> </hr>

## <font color='white'>-</font>5. Progressing Further: Spec2Pipeline()<a class="anchor" id="spec2pipe"></a>

We have demonstrated some basics of the pipeline using Stage 1 of the Pipeline, which has converted ramps to slopes for each integration in the exposure. In this section we will run our data through Stage 2 of the pipeline, showing the following methods:

* Running and configuring individual steps
* Inspecting the WCS/wavelength information
* Spectral extraction and making modifications


### <font color='white'>-</font>5.1 Introduction to the Stage 2 Pipeline<a class="anchor" id="spec2_intro"></a>

In Stage 2 of the pipeline, the following calibrations are applied to our MIRI LRS data:

1. Assigning a world coordinate system (including wavelengths)
2. Assigning a source type
3. Flat fielding
4. Photometric calibration
5. Spectral extraction

The Stage 2 pipeline has many more steps, but many do not apply to TSOs. For the full list of steps and which instruments and modes use what steps, see the [pipeline doumentation pages](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_spec2.html#calwebb-spec2). 

Note that imaging TSOs use the Image2Pipeline(), [described here](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_image2.html).


### <font color='white'>-</font>5.2 Running the pipeline step by step<a class="anchor" id="stepbystep"></a>

In the previous section we called the Stage 1 pipeline as a single step, via ``calwebb_detector1.Detector1Pipeline.call()``. This method works also for Stage 2, using the ``rateints.fits`` file as input. The call sequence is as follows:

``spec2_out = calwebb_spec2.Spec2Pipeline.call(rate_ints, save_results=True, output_dir=out_dir)``

However in this section we will demonstrate how to run individual steps one by one, which can make it easier to configure and modify steps.

For this part, we use the second set of imports in the preamble for this notebook 


#### 5.2.1 Assign_wcs

OK, that's all imported. Let's start with the ``assign_wcs`` step. We can accept all default settings for this step. This step makes no modifications to the science data. It only attaches WCS information to the data in a WCS extension, including wavelength, such that every pixel in the detector array has an associated (RA, Dec, wavelength). It also defines a bounding box region over which these values are defined (with NaNs everywhere else). 

**NOTE**. The assignment of wavelength values to pixels assumes that the target is placed at a specific location, i.e. the nominal pointing location for the SLITLESSPRISM subarray. The target acquisition procedure should be capable of placing the target there with an accuracy of < 10 mas, and an image will be taken through the TA filter which you will have access to, to verify the source positioning. For reference, the pixel scale of the MIRI Imager detector is 110 mas/px, so 10 mas is approx. 1/10th of a pixel. 

Any offset of the target from this location will result in a small wavelength offset in the final spectrum. The TA performance estimates represent pre-launch values; TA inaccuracies can also result from inaccurate target coordinates, partial saturation of the target in the TA exposure, or insufficient SNR.  




In [None]:
awcs = AssignWcsStep.call(rate_ints, save_results=True, output_dir=outdir)
print(awcs)

From the ``print`` statement above we can see that the output data is a CubeModel, still with the dimensions of the ``rateints.fits`` file. The step has produced a new output file, ``miri_lrs_tso_100G10I_mirisim241_assignwcsstep.fits``, which we can also access via the step output, ``awcs``. 

Let's look at the bounding box that was defined on the aperture, and how we can retrieve a wavelength map. For this we need to import a function from the ``gwcs`` package that comes with the JWST pipeline. We use this to plot a 2D wavelength map of the subarray. As MIRISim simulated data do not contain realistic coordinates, we do not show the RA, Dec - but the method shown works for this too with on-sky data. 

In [None]:
print('Corner pixel coordinates of the bounding box: {0}'.format(awcs.meta.wcs.bounding_box))

bbox_w = awcs.meta.wcs.bounding_box[0][1] - awcs.meta.wcs.bounding_box[0][0]
bbox_ht = awcs.meta.wcs.bounding_box[1][1] - awcs.meta.wcs.bounding_box[1][0]
print('Bounding box height: {0} px; width: {1} px'.format(bbox_ht, bbox_w))

# Now let's calculate the grid
x, y = grid_from_bounding_box(awcs.meta.wcs.bounding_box)
ra, dec, lam = awcs.meta.wcs(x, y)

bbfig, bbax = plt.subplots(ncols=2, nrows=1, figsize=[8, 9])
bbox = Rectangle((awcs.meta.wcs.bounding_box[0][0],awcs.meta.wcs.bounding_box[1][0]), bbox_w, bbox_ht, angle=0.0, ec='r', lw=2, fc='None')

bbax[0].imshow(awcs.data[1, :, :], origin='lower', interpolation='None', aspect='equal')
bbax[0].add_patch(bbox)
bbax[0].set_title('Bounding box placement', fontsize='large')

wim = bbax[1].imshow(lam, origin='lower', aspect='equal', interpolation='None')
bbax[1].set_title('Wavelength map', fontsize='large')
bbfig.colorbar(wim, ax=bbax[1], label='micron')

#### 5.2.2. Source type

The pipeline assigns a source type based on information passed from the APT proposal. If no information is available from APT, then each mode has a default value. For MIRI slitless LRS this is 'POINT'. This step is pretty straightforward, we will run it without any further modifications. Remember to use the output of the previous step as input; this should be the FITS filename. We can get the filename out the output product of the ``assign_wcs`` step from the datamodel metadata using ``awcs.meta.filename``.

Whether the target is POINT or EXTENDED becomes important later during the flux calibration and spectral extraction steps.

In [None]:
print(awcs.meta.filename)
srctype_input = outdir+awcs.meta.filename
print('Original value of the source type metdata: {0}'.format(awcs.meta.target.source_type))

spec2_src = SourceTypeStep.call(srctype_input, save_results=True, output_dir=outdir)

print('Value after the source type step: {0}'.format(spec2_src.meta.target.source_type))

#### 5.2.3. Flat fielding

In the next step we apply the flat field correction; this corrects for the gain differences between pixels. The gain uniformity of the MIRI detectors is overall very good. We will run this step without further modifications or comments. 



In [None]:
ff_input = outdir+spec2_src.meta.filename
spec2_ff = FlatFieldStep.call(ff_input, save_results=True, output_dir=outdir)

#### 5.2.4 Photometric calibration

In the photometric calibration step ("photom"), we perform the important calibration of converting the units of the science (and error) data from DN/S to MJy/sr. 

**NOTE**. It is important to note that the reference file used for this step provides conversion factors as a function of _wavelength_. As noted above, the assignment of wavelengths to pixels, as performed in the ``assign_wcs`` step assumes a specific location of the target on the detector. This means that if the target is offset from this location, the flux calibration will be affected. 

The pointing stability of the telescope over timescales of many hours, typical of some TSO exposures, is not fully characterzed at this pre-launch stage. Any jitter or slow drifting behaviour will result in small errors in the flux calibration of the target, which may introduce noise in the further time series analysis.  

As a result, the value of this step is very much dependent on your science goals and the type of target. When performing relative spectrophotometry only, e.g. detecting an exoplanet transit against an out-of-transit baseline, it may be prudent to skip this step (or at least carefully consider the impact of this issue).

If you decide to skip the step, you can jump from flat fielding to spectral extraction without any issues; the spectra will be extracted in units of DN/s. 

In [None]:
print('Data units prior to the Photom step: {0}'.format(spec2_ff.unit))
photom_input = outdir+spec2_ff.meta.filename
spec2_ph = PhotomStep.call(photom_input, save_results=True, output_dir=outdir)
print('Data units after the Photom step: {0}'.format(spec2_ph.unit))
print('Output filename: {0}'.format(spec2_ph.meta.filename))
np.shape(spec2_ph.data)

#### 5.2.5. Extract 1D

In the final step of the Stage 2 pipeline, the 2D spectral images are extracted into 1D, flux-calibrated spectra. For TSOs, the output product will contain an extracted spectrum for each integration, so that a spectroscopic time series can be constructed. 

For MIRI LRS, the default extraction method is a fixed-width aperture measuring 11 pixels across. Wavelength dependent aperture correction factors are applied as part of the algorithm. The error arrays are also extracted and the errors combined to give an uncertainty on the science spectrum. 

In [None]:
x1d_input = outdir+spec2_ph.meta.filename
x1d = Extract1dStep.call(x1d_input, save_results=True, output_dir=outdir)
print(x1d)

The datamodel of the ``extract_1d`` output is a MultiSpecModel where the main attribute ``spec`` is a list of SpecModels - one for each integration.

When we plot the spectra below, we can see that the spectrum increases at long wavelengths - not exactly what you'd expect for a star. This is the thermal background, which starts to dominate over the target flux beyond 10 micron. So far, the background has not been subtracted from our data. Below we show how to do this as part of the spectral extraction step. 

In [None]:
print(len(x1d.spec))

x1dfig, x1dax = plt.subplots(figsize=[12,4])

for i in range(len(x1d.spec)):
    x1dax.plot(x1d.spec[i].spec_table['WAVELENGTH'], x1d.spec[i].spec_table['FLUX'])
    
x1dax.set_title('Extracted spectra (per integration)')
x1dax.set_xlabel('wavelength ($\mu$m)')
x1dax.set_ylabel('flux (Jy)')

If we run the ``get_crds_parameters()`` function on the output of this step then we can see that much more metadata was added to the model. It includes also the reference file used for the extraction, `jwst_miri_extract1d_0004.json`, which we can retrieve from CRDS. We show below how to do this programmatically using the ``crds`` package. 

You can find the full documentation for this package [online here](https://jwst-crds.stsci.edu/static/users_guide/index.html). 

In [None]:
x1d.get_crds_parameters()

In [None]:
x1d_ref_file = (x1d.meta.ref_file.extract1d.name).split('/')[-1]
print(x1d_ref_file)

basename = crds.core.config.pop_crds_uri(x1d_ref_file)
print(basename)

file_path = crds.locate_file(basename, "jwst")
x1d_ref = json.load(open(file_path))

In [None]:
x1d_ref

When we look at the contents of this file, we see it contains some settings for both the fixed-slit and slitless modes of the MIRI LRS. The parameters in this file set a very basic extraction scheme, with a fixed extraction aperture of 11 pixels. The Extract1D code in the pipeline does support many more extraction options; these are all described in the [pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/extract_1d/description.html).

In the next section, we will show how to implement a simple background subtraction in the spectral extraction step, by defining a background region in the input json file. The simplest way to edit the settings in this file are to copy the file and edit it in a text editor, following the syntax of the existing text.

We will perform a background subtraction by defining a simple rectangular background region  of the same width as the source extraction aperture, and ask the code to median the values as a function of wavelength. More complex schemes are possible, such as regions defined by a polynomial. 

A new json file with these additional background settings is included in the working directory of this notebook. 

In [None]:
x1d2 = Extract1dStep.call(x1d_input, override_extract1d='jwst_miri_extract1d_slitless_withbgr.json', save_results=True, output_dir=outdir)

In [None]:
x1dfig2, x1dax2 = plt.subplots(figsize=[12, 4])

for i in range(len(x1d2.spec)):
    x1dax2.plot(x1d2.spec[i].spec_table['WAVELENGTH'], x1d2.spec[i].spec_table['FLUX'])
    
x1dax2.set_title('Extracted spectra (per integration): Background subtracted', fontsize=14)
x1dax2.set_xlabel('wavelength ($\mu$m)', fontsize='large')
x1dax2.set_ylabel('flux (Jy)', fontsize='large')

#### 5.2.3 Comparison running the pipeline end to end

We can also run the Spec2Pipeline() end to end. We illustrate this here and show that the output products are the same.

Note that the default output file extensions are different when you run the pipeline step by step, versus end to end in one call. 

* the output of the ``photom`` step will be ``_photomstep.fits`` when run individually, vs. ``_calints.fits`` when run end to end.
* similarly the output of ``extract1d`` is ``_extract1dstep.fits``, vs. ``_x1d.fits`` when run end to end.


After the run, we plot the extracted spectra from the 1st integration from both methods together, to show that the result is the same. 

In [None]:
spec2_e2e = calwebb_spec2.Spec2Pipeline.call(rate_ints, save_results=True, output_dir=outdir)
print(spec2_e2e)

In [None]:
xf = outdir+'contents_x1dints.fits'

x1d2 = datamodels.open(xf)


x1dfig2, x1dax2 = plt.subplots(figsize=[12, 4])


x1dax2.plot(x1d.spec[0].spec_table['WAVELENGTH'], x1d.spec[0].spec_table['FLUX'], 'b-', label='from extract1dstep.fits')
x1dax2.plot(x1d2.spec[0].spec_table['WAVELENGTH'], x1d2.spec[0].spec_table['FLUX'], 'm-', label='from x1dints.fits')
    
x1dax2.set_title('Extracted spectra (per integration)')
x1dax2.set_xlabel('wavelength ($\mu$m)')
x1dax2.set_ylabel('flux (Jy)')

x1dfig2.legend()

### Summary

<div class="alert alert-block alert-warning">

In this section, we've run the Stage 2 pipeline on MIRI slitless LRS data, showing the different pipeline steps. In addition we have shown:<br>
<br>   

* how to run the pipeline step by step and make changes to the step parameters<br>
* how to construct and inspect a 2D wavelength map from the WCS information<br>
* how spectra are extracted based on a json parameters file, and how this file can be overridden with custom settings<br>
* how to perform a sinple background subtraction as part of the extract1d step. <br>
</div>

## <font color='white'>-</font>6. Final Steps: Tso3Pipeline()<a class="anchor" id="tso3pipe"></a>

The 3rd and final stage of the pipeline performs some extra cleanup, and produces higher-level science products. For "regular" observations, this is where dithererd exposures are combined, mosaics are produced, etc. For TSOs, you can read about the extra steps performed in the [pipeline documentation](https://jwst-pipeline.readthedocs.io/en/latest/jwst/pipeline/calwebb_tso3.html#calwebb-tso3).

* Outlier detection: this step performs some extra cleaning on the images
* Spectral extraction: the spectral extraction is repeated on the cleaned images
* White light curve creation: produces a white light curve

The most important part of the Tso3 pipeline is that it re-combines lengthy exposures that were segmented by the data management system back into a single product. The spectral extraction will therefore at this stage contain spectra for ALL integrations in the exposure. 

If your exposure was short enough to be captured in a single FITS file, only the white light step will provide significant added value over the Spec2 outputs. 

### <font color='white'>-</font>6.1 Association files<a class="anchor" id="asn"></a>

An important aspect of the Tso3 pipeline is that it does not accept a single file or datamodel as input; instead we have to create and pass an [association file](https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/index.html). This is type of text file with specific formatting requirements that points the pipeline to the files and how they relate to each other. In the case of a segmented exposure, we would pass the list of all ``_calints.fits`` files to the association file; for a short exposure that is not segmented, the association file is essentially just a wrapper around the exposure. 

Fortunately there are tools available to help you create this file and pass it to the pipeline. We demonstrate how to do this below, using some additional imports included in the preamble to the notebook. 





In [None]:
#First we create a list of calints.fits files
calints_files = glob.glob(outdir+'*calints.fits')
print(calints_files)

In [None]:
asn = asn_from_list.asn_from_list(calints_files, rule=DMS_Level3_Base, product_name='miri_lrs_tso_stage3.fits')
print(asn)

In [None]:
asn_file = 'miri_lrs_tso_stage3_asn.json'

with open(asn_file, 'w') as fp:
    fp.write(asn.dump()[1])

Take a moment to open this file in a text editor, and look at its contents. 


### <font color='white'>-</font> 6.2 Running the TSO3 Pipeline<a class="anchor" id="tso3"></a>

Next we will run the Stage 3 pipeline with this file as input. We make 2 modifications:

* we use the extract_1d parameters file created above, to include background subtraction
* we set the limits for the white light creation from 6 to 10 $\mu$m, instead of the default choice of 5-12 $\mu$m

In [None]:
tso3 = calwebb_tso3.Tso3Pipeline.call(asn_file, save_results=True, output_dir=outdir, 
                                     steps={'extract_1d': {'override_extract1d' : 'jwst_miri_extract1d_slitless_withbgr.json'},
                                             'white_light': {'min_wavelength': 6.0, 'max_wavelength': 10.}})

In the output folder we can see two new files that were created by this pipeline stage, each with the name prefix ``miri_lrs_tso_stage3``, as we specified when we created the association:

* miri_lrs_tso_stage3_x1dints.fits
* miri_lrs_tso_stage3_whtlt.ecsv

We will leave the inspection of the x1dints product as an offline exercise, as this product has the same structure as the output from the Spec2Pipeline. We will take a closer look at the white light curve. The ecsv format is readable by common astropy I/O tools. 

In [None]:
wtlt_file = glob.glob(outdir+'*whtlt*')
print(wtlt_file)

In [None]:
wtlt = ascii.read(wtlt_file[0])
print(wtlt.colnames)

wtfig, wtax = plt.subplots(figsize=[12, 4])
wtax.plot(wtlt['MJD'], wtlt['whitelight_flux'])
wtax.set_xlabel('MJD')
wtax.set_ylabel('Flux (Jy)')
wtax.set_ylim([0.19, 0.2])

### Summary

<div class="alert alert-block alert-warning">
In this section we ran our simulated observations through Stage 3 of the pipeline, which combines segmented files into 1, re-extracts the data and creates a white light curve by summing over wavelengths. We demonstrated:
<br>    
* How to produce an association file for the Tso3 pipeline<br>
* How to customise the white light limit wavelengths<br>
* How to read in and plot a white light curve.<br>
    
</div>

## 7.<font color='white'>-</font>Conclusion<a class="anchor" id="bye"></a>

<div class="alert alert-block alert-info">
This concludes the 2nd part of our JWebbinar focused on TSOs. We hope this was a useful level-zero introduction to how the pipeline handles TSOs, and how and why you can modify it to get the best quality data. <br>
    
In the next portion of the JWebbinar, we will take a deeper dive into some of these steps and calibration procedures, highlighting topics that are particularly relevant for TSO science. <br>
        
As always, if you have any questions or concerns, you can find us through the <a href="https://stsci.service-now.com/jwst" target="_blank">JWST Help Desk</a>. <br>
</div>