### Intro to the Roman Pipeline

#### Outline

- Pipeline installation and setup
- Running the pipeline
- Exercise
- CRDS
- Feedback and discussion

### Roman pipeline installation and setup up

The Roman pipeline package is a Python package, called `romancal`. The code is available on [GitHub](https://github.com/spacetelescope/romancal) and releases are posted on [PyPi](https://pypi.org/project/romancal/). Before running the Roman pipeline make sure `romancal` and all its dependencies are installed. The latest installation instructions are available in the [README](https://github.com/spacetelescope/romancal/blob/main/README.md) file on github, and briefly mentioned below.

Public releases are installed directly from PyPI using 

```
% pip install romancal
```
This installs the pipeline and all its dependencies in the current Python environment. We recommend to use conda as a basis for managing the environments and that new releases are installed in new conda environments. 

The development version of the pipeline can be installed using the command:

```
% pip install git+https://github.com/spacetelescope/romancal.git
```

This will install the code from the main development branch and all its current dependencies.

In addition, we recommend that jupyter and ipython are installed as well. They are not dependencies, just convenience.

The pipeline uses reference stored in the [Calibration Reference Data System (CRDS)](https://roman-crds.stsci.edu/). We don't have a public CRDS server for Roman yet so examples in this notebook while showing how to use a server when it is available, will use local reference files.

**Pipeline documentation** is available publicly on [readthedocs](https://roman-pipeline.readthedocs.io/en/latest/).

#### CRDS configuration

To run the pipeline outside the STScI network, CRDS must be configured by setting two environment variables:

```
export CRDS_PATH=$HOME/crds_cache
export CRDS_SERVER_URL=https://roman-crds.stsci.edu
```

### The CRDS variables need to be set in the environment before starting the notebook.

#### Running the Roman Pipeline

The pipeline can be run on the command line or in a python session. We'll look first at how to run it in Python. But first, let's download some data.

#### Stages of the Roman pipeline

The Roman calibration pipeline is split into three stages

- Level2 pipeline, or Exposure Level Pipeline (ELP)

  This stage runs on individual exposures and applies detector-level corrections to given exposure types (imaging, prism, and grism.). The currently implemented steps are listed [here](https://roman-pipeline.readthedocs.io/en/latest/roman/pipeline/exposure_pipeline.html#exposure-pipeline).


- Level 3 pipeline, or High Level Pipeline (HLP)

  The Level 3 pipeline combines individual exposures according to the association rules supplied. This is not implemented yet.


- Level 4 pipeline - generates high level products, like various types of catalogs.

**Steps in a Pipeline stage**

Every stage consistes of one or more steps which run in a sequence. The `ExposurePipeline` currently defines the following steps:

```
step_defs = {'dq_init': dq_init_step.DQInitStep,
             'saturation': SaturationStep,
             "refpix": RefPixStep,
             "linearity": LinearityStep,
             "dark_current": DarkCurrentStep,
             "rampfit": ramp_fit_step.RampFitStep,
             "assign_wcs": AssignWcsStep,
             "flatfield": FlatFieldStep,
             "photom": PhotomStep,
             "source_detection": SourceDetectionStep,
             "tweakreg": TweakRegStep,
            }

```

#### Data files naming conventions

File names are constructed using the following rules:

**WFI detector Level 1 files**

These are uncalibrated files, the standard suffix is `uncal`:
```
rPPPPPCCAAASSSOOOVVV_ggsaa_eeee_<detector>_uncal.asdf
```    

**WFI detector Level 2 files**

These are pixel calibrated files, the standard suffix is `cal`:

```
rPPPPPCCAAASSSOOOVVV_ggsaa_eeee_<detector>_cal.asdf
```

`PPPPP`: Program number

`CC`:    Execution plan number

`AAA`:   Pass number (with execution plan)

`SSS`:   Segment Number (within pass) 

`OOO`:   Observation number

`VVV`:   Visit number

`gg`:    Group identifier

`s`:     Sequence identifier (within the group)
         1 for the prime exposure
         > 1 for the parallel exposure
         
`aa`:    Activity Identifier (within the sequence)

`eeee`:  Exposure number (within the visit)



#### Running the ELP pipeline from a Python session

Using `call` is the recommended way to run the pipeline in a Python session. In this case the output data model is returned in memory and can be saved to disk either by passing an option to the `call` method or in a separate command:

In [1]:
from romancal.pipeline import ExposurePipeline
import os
os.environ["WEBBPSF_PATH"] = os.getcwd()+"/../data/webbpsf-data"

In [None]:
# Is there a way to pass a parameter to call?
out = ExposurePipeline.call('../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf',
                            save_results=True)

**Alternatively:**
    
```
out.save(<filename.asdf>)
```

In this mode the call to `ExposurePipeline` saves the product to a file with the same root name and a suffix of **cal.asdf**.

#### Running individual steps

Individual steps can be run in the same way. For example, running `assign_wcs` on the resultant `cal` file.

All steps can be imported from a common namespace `romancal.step`. The input to each step
is a file name or a data model and the output is a data model. Data models are returned in memory when running in Python.

In [None]:
from romancal.step import AssignWcsStep

out_model = AssignWcsStep.call('../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf')

In [None]:
out_model.save('r002_assign_wcs.asdf')

The calls above used reference files in CRDS. **One can pass local reference files to a step or a pipeline.** For example, running assign_wcs with a custom `distortion` file called "new_distortion.asdf" in the current directory:

In [None]:
out_model = AssignWcsStep.call('../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf', override_distortion="new_distortion.asdf")

#### Running from the command line

DMS runs the cal pipeline using the command line interface. The general syntax is

```
% strun romancal.pipeline.ExposurePipeline <input_file>
```

or using an alias

```
% strun roman_elp <input_file>
```

**Running the pipeline with local reference files**

Again, the above call uses reference files stored in CRDS. To use local reference files

```
% strun romancal.pipeline.ExposurePipeline <input_file> --steps.flat.override_flat=myflat.asdf
```

```
% strun romancal.step.FlatStep <input_file> --override_flat=myflat.asdf
```

**Skipping a step in the pipeline**

```
% strun romancal.pipeline.ExposurePipeline <input_file> --steps.linearity.skip=True
```

**List the parameters for a step**

To display a list of the parameters that are accepted for a given Step class, pass the `-h` parameter, and the name of a Step class or parameter file:

```
% strun -h romancal.step.RampFitStep

usage: strun [-h] [--logcfg LOGCFG] [--verbose] [--debug] [--save-parameters SAVE_PARAMETERS]
             [--disable-crds-steppars] [--pre_hooks] [--post_hooks] [--output_file] [--output_dir]
             [--output_ext] [--output_use_model] [--output_use_index] [--save_results] [--skip]
             [--suffix] [--search_output_file] [--input_dir] [--algorithm] [--save_opt] [--opt_name]
             [--maximum_cores] [--use_ramp_jump_detection] [--threshold_intercept]
             [--threshold_constant] [--override_readnoise] [--override_gain]
             cfg_file_or_class [args ...]

This step fits a straight line to the value of counts vs. time to determine the mean count rate for
each pixel.

positional arguments:
  cfg_file_or_class     The configuration file or Python class to run
  args                  arguments to pass to step

options:
  -h, --help            show this help message and exit
  --logcfg LOGCFG       The logging configuration file to load
  --verbose, -v         Turn on all logging messages
  --debug               When an exception occurs, invoke the Python debugger, pdb
  --save-parameters SAVE_PARAMETERS
                        Save step parameters to specified file.
  --disable-crds-steppars
                        Disable retrieval of step parameter references files from CRDS
  --pre_hooks           [default=list]
  --post_hooks          [default=list]
  --output_file         File to save output to.
  --output_dir          Directory path for output files
  --output_ext          Default type of output [default='.asdf']
  --output_use_model    When saving use `DataModel.meta.filename` [default=False]
  --output_use_index    Append index. [default=True]
  --save_results        Force save results [default=False]
  --skip                Skip this step [default=False]
  --suffix              Default suffix of results [default='rampfit']
  --search_output_file 
                        Use outputfile define in parent step [default=True]
  --input_dir           Input directory
  --algorithm           Algorithm to use to fit. ['ols','ols_cas22', default='ols_cas22']
  --save_opt            Save optional output [default=False]
  --opt_name 
  --maximum_cores       max number of processes to create
                        ['none','quarter','half','all',default='none']
  --use_ramp_jump_detection 
                        Use jump detection during ramp fitting [default=True]
  --threshold_intercept 
                        Override the intercept parameter for the threshold function in the jump
                        detection algorithm.
  --threshold_constant 
                        Override the constant parameter for the threshold function in the jump
                        detection algorithm.
  --override_readnoise 
                        Override the readnoise reference file
  --override_gain       Override the gain reference file
```

#### Calibration reference Data System (CRDS)

CRDS is a Python library, set of command line programs, and family of web servers used to assign and manage the best reference files that are used to calibrate HST, JWST and Roman data.

The primary function of CRDS is to assign best reference files to datasets so that they can be calibrated based upon CRDS rules.

The CRDS User guide is available on the front page of the CRDS server.

https://roman-crds.stsci.edu/static/users_guide/index.html



**Exercise:**

Run the Roman Exposure Level pipeline on a Level 1 file .

On the command line this is

```
strun roman_elp r0000101001001001001_01101_0002_WFI01_uncal.asdf --steps.rampfit.save_opt=true

strun romancal.ramp_fitting.RampFitStep r0000101001001001001_01101_0002_WFI01_darkcurrent.asdf --save_opt=true

```