### Intro to the Roman Pipeline

#### Outline

- Pipeline installation and setup
- Running the pipeline
- Exercise
- CRDS
- Feedback and discussion

### Roman pipeline installation and setup up

The Roman pipeline package is a Python package, called `romancal`. The code is available on [GitHub](https://github.com/spacetelescope/romancal) and releases are posted on [PyPi](https://pypi.org/project/romancal/). Before running the Roman pipeline make sure `romancal` and all its dependencies are installed. The latest installation instructions are available in the [README](https://github.com/spacetelescope/romancal/blob/main/README.md) file on github, and briefly mentioned below.

Public releases are installed directly from PyPI using 

```
% pip install romancal
```
This installs the pipeline and all its dependencies in the current Python environment. We recommend to use conda as a basis for managing the environments and that new releases are installed in new conda environments. 

The development version of the pipeline can be installed using the command:

```
% pip install git+https://github.com/spacetelescope/romancal.git
```

This will install the code from the main development branch and all its current dependencies.

In addition, we recommend that jupyter and ipython are installed as well. They are not dependencies, just convenience.

The pipeline uses reference stored in the [Calibration Reference Data System (CRDS)](https://roman-crds-test.stsci.edu/). We don't have a public CRDS server for Roman yet so examples in this notebook while showing how to use a server when it is available, will use local reference files.

**Pipeline documentation** is available publicly on [readthedocs](https://roman-pipeline.readthedocs.io/en/latest/).

#### CRDS configuration

To run the pipeline outside the STScI network, CRDS must be configured by setting two environment variables:

```
export CRDS_PATH=$HOME/crds_cache
export CRDS_SERVER_URL=https://roman-crds-test.stsci.edu
```

#### Running the Roman Pipeline

The pipeline can be run on the command line or in a python session. We'll look first at how to run it in Python. But first, let's download some data.

#### Stages of the Roman pipeline

The Roman calibration pipeline is split into three stages

- Level2 pipeline, or Exposure Level Pipeline (ELP)

  This stage runs on individual exposures and applies detector-level corrections to given exposure types (imaging, prism, and grism.). The currently implemented steps are listed [here](https://roman-pipeline.readthedocs.io/en/latest/roman/pipeline/exposure_pipeline.html#exposure-pipeline).


- Level 3 pipeline, or High Level Pipeline (HLP)

  The Level 3 pipeline combines individual exposures according to the association rules supplied. This is not implemented yet.


- Level 4 pipeline - generates high level products, like various types of catalogs.

**Steps in a Pipeline stage**

Every stage consistes of one or more steps which run in a sequence. The `ExposurePipeline` currently defines the following steps:

```
step_defs = {'dq_init': dq_init_step.DQInitStep,
             'saturation': SaturationStep,
             'linearity': LinearityStep,
             'dark_current': DarkCurrentStep,
             'jump': jump_step.JumpStep,
             'rampfit': ramp_fit_step.RampFitStep,
             'assign_wcs': AssignWcsStep,
             'flatfield': FlatFieldStep,
             'photom': PhotomStep,
            }

```

#### Data files naming conventions

File names are constructed using the following rules:

**WFI detector Level 1 files**

These are uncalibrated files, the standard suffix is `uncal`:
```
rPPPPPCCAAASSSOOOVVV_ggsaa_eeee_<detector>_uncal.asdf
```    

**WFI detector Level 2 files**

These are pixel calibrated files, the standard suffix is `cal`:

```
rPPPPPCCAAASSSOOOVVV_ggsaa_eeee_<detector>_cal.asdf
```

`PPPPP`: Program number

`CC`:    Execution plan number

`AAA`:   Pass number (with execution plan)

`SSS`:   Segment Number (within pass) 

`OOO`:   Observation number

`VVV`:   Visit number

`gg`:    Group identifier

`s`:     Sequence identifier (within the group)
         1 for the prime exposure
         > 1 for the parallel exposure
         
`aa`:    Activity Identifier (within the sequence)

`eeee`:  Exposure number (within the visit)



#### Running the ELP pipeline from a Python session

Using `call` is the recommended way to run the pipeline in a Python session. In this case the output data model is returned in memory and can be saved to disk either by passing an option to the `call` method or in a separate command:

In [1]:
from romancal.pipeline import ExposurePipeline

In [2]:
# Is there a way to pass a parameter to call?
out = ExposurePipeline.call('../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf',
                            steps={"jump":{"rejection_threshold": 600}},
                            save_results=True)

FileNotFoundError: [Errno 2] No such file or directory: '../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf'

**Alternatively:**
    
```
out.save(<filename.asdf>)
```

In this mode the call to `ExposurePipeline` saves the product to a file with the same root name and a suffix of **cal.asdf**.

#### Running individual steps

Individual steps can be run in the same way. For example, running `assign_wcs` on the resultant `cal` file.

All steps can be imported from a common namespace `romancal.step`. The input to each step
is a file name or a data model and the output is a data model. Data models are returned in memory when running in Python.

In [3]:
from romancal.step import AssignWcsStep

out_model = AssignWcsStep.call('../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf')

2022-06-29 16:55:22,315 - stpipe - INFO - PARS-ASSIGNWCSSTEP: CRDS parameter reference retrieval disabled.
2022-06-29 16:55:22,318 - stpipe.AssignWcsStep - INFO - AssignWcsStep instance created.
2022-06-29 16:55:22,385 - stpipe.AssignWcsStep - INFO - Step AssignWcsStep running with args ('../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf',).
2022-06-29 16:55:22,387 - stpipe.AssignWcsStep - INFO - Step AssignWcsStep parameters are: {'pre_hooks': [], 'post_hooks': [], 'output_file': None, 'output_dir': None, 'output_ext': '.asdf', 'output_use_model': False, 'output_use_index': True, 'save_results': False, 'skip': False, 'suffix': None, 'search_output_file': True, 'input_dir': ''}


FileNotFoundError: [Errno 2] No such file or directory: '../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf'

In [4]:
out_model.save('r002_assign_wcs.asdf')

NameError: name 'out_model' is not defined

The calls above used reference files in CRDS. **One can pass local reference files to a step or a pipeline.**

In [5]:
out_model = AssignWcsStep.call('../data/r0000101001001001001_01101_0001_WFI01_uncal.asdf', 
                               override_distortion='../data/roman_wfi_distortion_0008.asdf')

2022-06-29 16:55:22,566 - stpipe - INFO - PARS-ASSIGNWCSSTEP: CRDS parameter reference retrieval disabled.


ValidationError: Extra value 'override_distortion' in root

#### Running from the command line

DMS runs the cal pipeline using the command line interface. The general syntax is

```
% strun romancal.pipeline.ExposurePipeline <input_file>
```

**Running the pipeline with local reference files**

Again, the above call uses reference files stored in CRDS. To use local reference files

```
% strun romancal.pipeline.ExposurePipeline <input_file> --steps.flat.override_flat=myflat.asdf
```

```
% strun romancal.step.FlatStep <input_file> --override_flat=myflat.asdf
```

**Skipping a step in the pipeline**

```
% strun romancal.pipeline.ExposurePipeline <input_file> --steps.linearity.skip=True
```

**List the parameters for a step**

To display a list of the parameters that are accepted for a given Step class, pass the `-h` parameter, and the name of a Step class or parameter file:

```
strun -h romancal.step.JumpStep

usage: strun [-h] [--logcfg LOGCFG] [--verbose] [--debug]
             [--save-parameters SAVE_PARAMETERS] [--disable-crds-steppars]
             [--pre_hooks] [--post_hooks] [--output_file] [--output_dir]
             [--output_ext] [--output_use_model] [--output_use_index]
             [--save_results] [--skip] [--suffix] [--search_output_file]
             [--input_dir] [--rejection_threshold]
             [--three_group_rejection_threshold]
             [--four_group_rejection_threshold] [--maximum_cores]
             [--flag_4_neighbors] [--max_jump_to_flag_neighbors]
             [--min_jump_to_flag_neighbors] [--override_gain]
             [--override_readnoise]
             cfg_file_or_class [args ...]

JumpStep: Performs CR/jump detection. The 2-point difference method is applied.

positional arguments:
  cfg_file_or_class     The configuration file or Python class to run
  args                  arguments to pass to step

optional arguments:
  -h, --help            show this help message and exit
  --logcfg LOGCFG       The logging configuration file to load
  --verbose, -v         Turn on all logging messages
  --debug               When an exception occurs, invoke the Python debugger, pdb
  --save-parameters SAVE_PARAMETERS
                        Save step parameters to specified file.
  --disable-crds-steppars
                        Disable retrieval of step parameter references files from
                        CRDS
  --pre_hooks 
  --post_hooks 
  --output_file         File to save output to.
  --output_dir          Directory path for output files
  --output_ext          Default type of output
  --output_use_model    When saving use `DataModel.meta.filename`
  --output_use_index    Append index.
  --save_results        Force save results
  --skip                Skip this step
  --suffix              Default suffix for output files
  --search_output_file 
                        Use outputfile define in parent step
  --input_dir           Input directory
  --rejection_threshold 
                        CR sigma rej thresh
  --three_group_rejection_threshold 
                        CR sigma rej thresh
  --four_group_rejection_threshold 
                        CR sigma rej thresh
  --maximum_cores       max number of processes to create
  --flag_4_neighbors    flag the four perpendicular neighbors of each CR
  --max_jump_to_flag_neighbors 
                        maximum jump sigma that will trigger neighbor flagging
  --min_jump_to_flag_neighbors 
                        minimum jump sigma that will trigger neighbor flagging
  --override_gain       Override the gain reference file
  --override_readnoise 
                        Override the readnoise reference file
```

#### Calibration reference Data System (CRDS)

CRDS is a Python library, set of command line programs, and family of web servers used to assign and manage the best reference files that are used to calibrate HST, JWST and Roman data.

The primary function of CRDS is to assign best reference files to datasets so that they can be calibrated based upon CRDS rules.

Currently, the CRDS Ueer guide can be found externally on one of the JWST servers, e.g

https://jwst-crds.stsci.edu/static/users_guide/index.html

**Exercise:**

Run the Roman calibration level2 pipeline on a Level 1 file. Use `rejection_threshold=500` for the jump step.

On the command line this is

```
strun romancal.pipeline.ExposurePipeline r0000101001001001001_01101_0001_WFI01_uncal.asdf --steps.jump.rejection_threshold=500  
```